Here’s how I see this working. Apologies for spelling this out at some length.
WIthin OGRDB we will hold the data as set out on slide 1. In other words there will be a table for each of the 5 record types on slide 2, and the data will be normalised. In particular, per Christian’s point, the source records will be in separate tables to the gene descriptions, which means we only hold one reference to a source item, even if it is referred to from several gene descriptions (for example as a possible paralog).
People will generally engage with the data by downloading files from OGRDB. If they want to make changes to the data they will modify the files and upload them. Consider, for example, a curator working on a gene description. It seems sensible to provide them with a gene description file which contains the gene description record itself, plus the source records that it refers to. It will be much easier for people to work with that, than to give them a list of every source record in the system, and expect them to cross reference between that list and the gene description.
How would this work for edit and upload? Subject to the checking and review discussed earlier on this thread, people would be able to make changes to that gene description file and push it back to OGRDB. OGRDB would reflect those changes in the underlying tables. Suppose, for example, that they change the confidence rating associated with a source. OGRDB will modify the rating in the source record table, and at that point, any gene description referencing that source will ‘see’ the updated rating, because all references to the source are linked through that sinlge record. The data, as held by OGRDB, is always held in normal form. The point I was trying to make with the note on the last slide was that you might not realise that, if you look at the file formats. It’s an issue I haven’t addressed sufficiently clearly in the past - this exercise started off as an attempt to define a file format for parsers, and morphed into something that was much more focussed on underlying data structure.
I hope this is clear now, and if it would help the explanation to modify the slides please let me know.
By the way I am wondering whether I should create a more formal entity-relationship diagram to go alongslide the diagram in slide 1. If people would like to see that, please let me know and I will put it on the list.