Along with the much more important decisions concerning what sequences should actually go into a next generation database, @bussec has suggested that we should start considering what such a database would look like.
He has also suggested that our organizing principle should be to give people what they want as quickly as possible. This is, for most users, a Fasta file with the germline sequences for their species of interest along with data about the sequences (see this topic for a discussion of what this might look like). I really like this idea, and would extend this in the following ways:
- Defaults should be sane. The easiest thing to get to should be a high confidence set of germline sequences.
- It should be clear what people are downloading.
- It should be easy to change to other levels of evidence for the sequences (e.g. direct germline sequencing only or a more inclusive set).
- It should be easy to access previous releases of the database.
- There should be some sort of standardized means by which computers (not humans clicking) should be able to access the latest version of the database. This could be as simple as fixing a url address like
http://new-database.org/human/high-confidence/latest
)
Please contribute what you would like to see! Then we can think about how to make those operations easy.
[Note: @bgaeta and @a.collins have stated that they don’t want to lead this effort by extending IgPdb, though I note that their site does very quickly shepherd you to a Fasta download link .]