Sponsored by the AIRR Community

What are the copyright issues for IMGT germline sequences?

In the past I’ve heard vague references to copyright issues with IMGT. Indeed, if I go to http://www.imgt.org/Warranty.php we see:

Any other use of IMGT® material need prior written permission of the IMGT director and of the legal institutions (CNRS and Université de Montpellier).

Has anyone run into issues with this?

What would one need to do to have a copyright-free germline set?

I wonder what counts as “IMGT material” - surely at least part of the database is drawn from previous sources that are in the public domain???

I agree, it’s not clear what the position is, and I think it would be helpful to the community if we could work towards clearly open germline sets.

The IgBLAST website used to say that IMGT germline libraries were used with permission, but I can’t find this statement on the site any longer.

We published a web-based demonstrator of a small utility a while back and it would have worked a lot better if we’d incorporated a germline library, but we were put off by the copyright notice and ended up requiring the user to upload it to run the demo

Interesting observation, Erick. I think it would be relatively straightforward–but time consuming–to recurate the human genes from original sources, and that having a truly open dataset would be worthwhile.

I agree that this is an interesting issue, but I don’t think such restrictions can be accepted. I am sure the lawyers would have a lot to say on this, but genes can’t be patented, so why would a germline set be under copyright. And if a different set would free us of the challenges, then let’s produce a new and better set. Since almost half of the human functional IGHV genes in the IMGT repertoire are there in error, what is the point of this ‘copyright’ set anyway? The special contribution that IMGT has made is its nomenclature., but this is special because it is sanctioned by HUGO and WHO? I am sure they would not have put their authority behind the nomenclature if they thought it was going to be locked up with copyright issues.

1 Like

On the legal issues in general: Databases are typically not covered by copyright itself. However, EU legislation contains a special database protection directive that is similar to copyright. I am only familiar with the situation in Europe, but according to Wikipedia there is no similar clause in the US. However, - as with copyright - the author(s) can still publish the work under an open license (e.g. ODdL).

The reason that IgBLAST does not mention any IMGT permission is that they stopped shipping the IMGT databases in 2014 (instead referring you to a download script). When I asked the developers about this change I got the reply that “[…] there were issues with IMGT; we would love to supply [the databases], but IMGT objects.”.

I can’t imagine IMGT has copyrights on IG genes. All of these gene/allele entries and associated sequences should be linked to public repositories.

I’ve also heard there were issues regarding permission.
I am in favor of fully open and usable databases. You can get hold of the IMGT databases from their site but the process is a little cumbersome and I would be in favor of an easier process.
Those who work with non-human or non-mouse species often find the IMGT databases very restrictive in terms of their requirements for genomic validation of each sequence. Having ten or twenty alleles for a species results in a database that is virtually unusable for assignment purposes even if each of those twenty alleles are 100% correct.

I wonder what is told about the derivatives of the database? I mean mapping those sequences to, say, hg38 genome build and then assigning alleles using available HTS data? ENSEMBL database has a list of V/J segments, yet there are fewer of them than in IMGT if I recall it correctly.

And we definitely need such a database with a strong basis for segments and alleles, without cases when an allele is represented by a single genbank entry that doesn’t even cover full Variable segment. I don’t know whether it is possible to make backward compatibility to IMGT as the nomenclature is copyrighted. But we can basically integrate RNA-seq data and genbank records and re-map V/J segments (this can be done for any species), as well as RACE-based RepSeq samples. This will require some strict guidelines and community effort though, especially of those familiar with de-novo TCR/IG segment allele prediction. So we come back to http://b-t.cr/t/how-best-to-collect-and-share-germline-gene-repertoires/

I had a longer discussion with our legal department concerning this issue and by and large they confirmed what we already discussed here:

  1. There is no copyright on the nucleotide and amino acid sequences themselves.
  2. There is no copyright on gene/segment names. However, they theoretically could be protected by a trademark. Such a protection would most likely be void since it must not be descriptive (e.g. preexisting gene name) and linked to a product or service of the trademark owner.
  3. Under EU law the collection and curation of a non-trivial database creates a protected work, even if it contains only free material.
  4. Recreating such a database starting from free material does not violate the protection [as granted in (3)] even if it finally contains similar content. However, material from the protected database must not be copied in the process.
  5. In case IMGT should decide to take legal action against a competing database (claiming that protected material was copied and used without permission), the owner/operator should be able to show how the competing database was derived from the free material without resorting to protected data from IMGT.

Please note that the context of my discussion was limited to the “possible legal risks/IP violations when setting up a public database for Ig/TCR segment information under a free/open license”, since I considered this to be the most likely scenario. Furthermore, I assumed that IMGT’s current claims are actually valid (although it seems like this is not necessarily the usual thing to do).

For what it’s worth, I invited Marie-Paule LeFranc to this forum, to which she responded

IMGT does not participate to online discussion groups.

I was part of the initial group that started the IMGT database and was involved in the construction of the database.
Most of the IMGT database was initially annotated manually and a lot of effort went into the annotation process.
This annotation process is what is protected as it added additional useful information to the submitted sequences.

After a few years, we split up and I continued to work on automated sequence annotation programs and automated database generation programs which resulted in the VBASE2 database.

The VQUEST program was initially made by me, first time used for the analysis of human V genes for the VBASE database of Ian Tomlinson. It later moved into the IMGT database and after the split I had no control over its development.

The Vbase2 database set can be downloaded from my server. It was also used for the construction of the IgBlast program.

1 Like

To follow up on point 2 of my previous post, I had another chat with our legal department on the protection of gene names and naming schemes. Their final and unanimous conclusion was that

Gene names and gene naming systematics are neither eligible for protection by copyright nor by trademark.

So there should be one thing less to worry about.

From a different thread (I’m probably linking it in incorrectly):

Does this usage of database labels from IMGT implicate any copyright concerns?

@caschramm I am sorry that this is going to sound a little picky, but given the context I think it’s important to make a distinction. I am suggesting that we follow the naming convention of the IMGT ontology. IMGT V-quest or other IMGT systems report their findings in terms of the ontology. Perhaps they also use the terms internally, as database labels, we don’t really know. But we are not aiming to copy their database or its labels, we would just be following the ontology.

I think it’s relevant that "The WHO-IUIS Nomenclature SubCommittee for Immunoglobulins and T cell receptors follows the rules for the nomenclatures, as described in the IMGT Scientific chart, http://imgt.cines.fr[1]. These rules are based on the concepts of IMGT-ONTOLOGY [2] and [3], the first ontology in immunogenetics and immunoinformatics. " (ref) . In other words, the naming and definition of terms in the IMGT Ontology has been adopted as a world standard.

From Wikipedia (sorry) “In computer science and information science, an ontology is a formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse”

From these points of view it seems clear to me personally that the ontology falls in the category of ‘gene names and gene naming systematics’ as discussed just now by @bussec - but I think the way to resolve the copyright issue definitively would be through discussion with the WHO-IUIS Nomenclature SubCommittee. But I don’t see how the use of a WHO standard name could break copyright.

1 Like

Great --not picky at all! Thanks for clearing that up.

I am new to imunosequencing. I am working on a workflow for TCRs. Do you know of an alternative to VBASE2?
Thank you!

I can set up a search page for TCR sequences if you like. Just let me know.
Werner