8000 Locus tag question · Issue #101 · enasequence/webin-cli · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Locus tag question #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ValWood opened this issue May 3, 2023 · 5 comments
Open

Locus tag question #101

ValWood opened this issue May 3, 2023 · 5 comments

Comments

@ValWood
Copy link
ValWood commented May 3, 2023

Previous submissions of the fission yeast genome have used "locus_tag" for the systematic identifier (since before 2002).

We are now being asked to use "old_locus_tag" because the PomBase locus_tag includes a "." (period) e.g SPCC18B5.03

This change shouldn't be forced on existing IDs. We should not change systematic identifiers unecessarily (this is contrary to FAIR data principles), and we can only use "old_locus_tag" if we provide a current "locus_tag" according to the documentation, so we are a bit stuck.

The fission yeast systematic identifiers stored in locus_tag are probably used in 4-6 thousand publications and in many thousands of genome-wide functional genomics datasets. This label is also used by downstream databases and pipelines (e.g. UniPotKB). The S. pombe systematic identifiers currently in "locus_tag" will never be deprecated because they are used in every large dataset for fission yeast, and provide our only unique and constant identifier for every gene. Studied genes, and most conserved genes are given a 'standard name' but completely unstudied genes often have no "standard name" assigned. Standard names may change in exceptional circumstances (i.e to resolve conflicts, or adopt universal nomenclature).

Systematic identifiers (i.e. the current locus_tag in INDSC) will therefore continue to provide the only unique identifier for functional genomics datasets, because this label will never change (unless a gene merges or splits- in which case one or both IDs will become synonyms and are therefore still trackable when referencing the gene history in a model organism database). I assume this is the case for most Model Organism Databases (as far as I'm aware there is no other mechanism to provide a recognisable unique ID for a locus genome wide).

Is there another label that can be used for uniquely "orf name"

Thanks

@ValWood
Copy link
Author
ValWood commented May 4, 2023

We see that the original locus_tag is in the existing entries, so it should be possible to update like this?

https://www.ebi.ac.uk/ena/browser/api/embl/CAC21482.1?lineLimit=1000

@ValWood
Copy link
Author
ValWood commented May 11, 2023

We also notice that at the NCBI
Primary source
Locus tag CELE_Y74C9A.2

and this entry was updated
Gene ID: 171591, updated on 11-Apr-2023

CC @kimrutherford

@ValWood
Copy link
Author
ValWood commented May 11, 2023

I edited the first comment to provide some context.

@ValWood
Copy link
Author
ValWood commented May 11, 2023

Locus_tag is EVEN defined as a stable identifier:
https://www.ebi.ac.uk/ena/WebFeat/qualifiers/locus_tag.html
we need to keep it that way.

Qualifier locus_tag
Definition a submitter-supplied, systematic, stable identifier for a gene and its associated features, used for tracking purposes

@ValWood
Copy link
Author
ValWood commented May 11, 2023

Please advise how to continue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0