GenBank 特征表中关键词限定词.docx
- 文档编号:5648931
- 上传时间:2022-12-30
- 格式:DOCX
- 页数:130
- 大小:63.42KB
GenBank 特征表中关键词限定词.docx
《GenBank 特征表中关键词限定词.docx》由会员分享,可在线阅读,更多相关《GenBank 特征表中关键词限定词.docx(130页珍藏版)》请在冰豆网上搜索。
GenBank特征表中关键词限定词
TheDDBJ/ENA/GenBankFeatureTableDefinition
Version10.7December2017
1Introduction2OverviewoftheFeatureTableformat
2.1FormatDesign
2.2Keyaspectsofthisfeaturetabledesign
2.3FeatureTableTerminology3Featuretablecomponentsandformat
3.1Namingconventions3.2Featurekeys3.2.1Purpose
3.2.2Formatandconventions3.2.3Keygroupsandhierarchy
3.2.4Featurekeyexamples
3.3Qualifiers
3.3.1Purpose
3.3.2Formatandconventions
3.3.3Qualifiervalues
3.3.4Qualifierexamples
3.4Location
3.4.1Purpose
3.4.2Formatandconventions
3.4.3Locationexamples
4FeaturetableFormat
4.1Formatexamples
4.2Definitionoflinetypes
4.3Dataitempositions
4.4Useofblanks
5Examplesofsequenceannotation
5.1Eukaryoticgene
5.2Bacterialoperon
5.3Artificialcloningvector(circular)
5.4Plasmid
5.5Repeatelement
5.6Immunoglobulinheavychain
5.7T-cellreceptor
5.8TransferRNA6Limitationsofthisfeaturetabledesign
7Appendices
7.1AppendixIEMBL,GenBankandDDBJentries
7.1.1EMBLFormat
7.1.2GenBankFormat
7.1.3DDBJFormat
7.2AppendixII:
Featurekeysreference
7.3AppendixIII:
Summaryofqualifiersforfeaturekeys
7.3.1QualifierList
7.4AppendixIV:
Controlledvocabularies
7.4.1Nucleotidebasecodes(IUPAC)
7.4.2Modifiedbaseabbreviations
7.4.3Aminoacidabbreviations
7.4.4ModifiedandunusualAminoAcids
7.4.5GeneticCodeTables
7.4.6CountryNames
7.4.7Announces
1Introduction
Nucleicacidsequencesprovidethefundamentalstartingpointfordescribing
andunderstandingthestructure,function,anddevelopmentofgenetically
diverseorganisms.TheGenBank,EMBL,andDDBJnucleicacidsequencedata
bankshavefromtheirinceptionusedtablesofsitesandfeaturestodescribe
therolesandlocationsofhigherordersequencedomainsandelementswithin
thegenomeofanorganism.
InFebruary,1986,GenBankandEMBLbeganacollaborativeeffort(joinedby
DDBJin1987)todeviseacommonfeaturetableformatandcommonstandardsfor
annotationpractice.
2OverviewoftheFeatureTableformat
Theoverallgoalofthefeaturetabledesignistoprovideanextensive
vocabularyfordescribingfeaturesinaflexibleframeworkformanipulating
them.TheFeatureTabledocumentationrepresentsthesharedrulesthatallow
thethreedatabasestoexchangedataonadailybasis.
Therangeoffeaturestoberepresentedisdiverse,includingregionswhich:
*performabiologicalfunction,
*affectoraretheresultoftheexpressionofabiologicalfunction,
*interactwithothermolecules,
*affectreplicationofasequence,
*affectoraretheresultofrecombinationofdifferentsequences,
*arearecognizablerepeatedunit,
*havesecondaryortertiarystructure,
*exhibitvariation,orhavebeenrevisedorcorrected.
2.1FormatDesign
Theformatdesignisbasedonatabularapproachandconsistsofthefollowing
items:
Featurekey-asinglewordorabbreviationindicatingfunctionalgroup
Location-instructionsforfindingthefeature
Qualifiers-auxiliaryinformationaboutafeature
2.2Keyaspectsofthisfeaturetabledesign
*Featurekeysallowspecificannotationofimportantsequencefeatures.
*Relatedfeaturescanbeeasilyspecifiedandretrieved.
Featurekeysarearrangedhierarchically,allowingcomplexandcompound
featurestobeexpressed.Bothlocationoperatorsandthefeaturekeysshow
featurerelationshipsevenwhenthefeaturesarenotcontiguous.Thehierarchy
offeaturekeysallowsbroadcategoriesofbiologicalfunctionality,suchas
rRNAs,tobeeasilyretrieved.
*Genericfeaturekeysprovideameansforenteringneworundefinedfeatures.
Anumberof"generic"ormiscellaneousfeaturekeyshavebeenaddedtopermit
annotationoffeaturesthatcannotbeadequatelydescribedbyexistingfeature
keys.Thesegenericfeaturekeyswillserveasanintermediatestepinthe
identificationandadditionofnewfeaturekeys.Thesyntaxhasbeendesigned
toallowtheadditionofnewfeaturekeysastheyarerequired.
*Morecomplexlocations(fuzzyandalternateends,forexample)canbespecified.
Eachendpointofafeaturemaybespecifiedasasinglepoint,analternate
setofpossibleendpoints,abasenumberbeyondwhichtheendpointlies,or
aregionwhichcontainstheendpoint.
*Featurescanbecombinedandmanipulatedinmanydifferentways.
Thelocationfieldcancontainoperatorsorfunctionaldescriptorsspecifying
whatmustbedonetothesequencetoreproducethefeature.Forexample,a
seriesofexonsmaybe"join"edintoafullcodingsequence.
*Standardizedqualifiersprovideprecisionandparsibilityofdescriptivedetails
Acombinationofstandardizedqualifiersandtheircontrolled-vocabulary
valuesenablefree-textdescriptionstobeavoided.
*Thenatureofsupportingevidenceforafeaturecanbeexplicitlyindicated.
Features,suchasopenreadingframesorsequencesshowingsequencesimilarity
toconsensussequences,forwhichthereisnodirectexperimentalevidencecan
beannotated.Therefore,thefeaturetablecanincorporatecontributionsfrom
researchersdoingcomputationalanalysisofthesequencedatabases.However,
allfeaturesthataresupportedbyexperimentaldatawillbeclearlymarkedas
such.
*Thetablesyntaxhasbeendesignedtobemachineparsible.
Aconsistentsyntaxallowsmachineextractionandmanipulationofsequences
codingforallfeaturesinthetable.
2.3FeatureTableTerminology
Theformatandwordinginthefeaturetableusecommonbiologicalresearch
terminologywheneverpossible.Forexample,aniteminthefeaturetablesuchas:
KeyLocation/Qualifiers
CDS23..400
/product="alcoholdehydrogenase"
/gene="adhI"
mightbereadas:
ThefeatureCDSisacodingsequencebeginningatbase23andendingatbase
400,hasaproductcalled'alcoholdehydrogenase'andiscodedforbyagene
called"adhI".
Amorecomplexdescription:
KeyLocation/Qualifiers
CDSjoin(544..589,688..>1032)
/product="T-cellreceptorbeta-chain"
whichmightbereadas:
Thisfeature,whichisapartialcodingsequence,isformedbyjoining
elementsindicatedtoformonecontiguoussequenceencodingaproductcalledT-
cellreceptorbeta-chain.
Thefollowingsectionscontaindetailedexplanationsofthefeaturetable
designshowingconventionsforeachcomponentofthefeaturetable,examples
ofhowtheformatmightbeimplemented,adescriptionoftheexactcolumn
placementofallthedataitemsandexamplesofcompletesequenceentriesthat
havebeenannotatedusingthenewformat.Thelastsectionofthisdocument
describesknownlimitationsofthecurrentfeaturetabledesign.
AppendixIgivesanexampledatabaseentryfortheDDBJ,GenBankandEMBL
formats.
AppendicesIIandIIIprovidereferencemanualsforthefeaturetablekeysand
qualifiers,respectively.
AppendixIVincludescontrolledvocabulariessuchasnucleotidebasecodes,
modifiedbaseabbreviations,geneticcodetablesetc.
Thisdocumentdefinesthesyntaxandvocabularyofthefeaturetable.The
syntaxissufficientlyflexibletoallowexpressionofasinglebiological
entityinnumerousways.Insuchcases,theannotationstaffsatthedatabases
willproposeconventionsforstandardmeansofdenotingtheentities.
ThisfeaturetableformatissharedbyGenBank,EMBLandDDBJ.Comments,
corrections,andsuggestionsmaybesubmittedtoanyofthedatabasestaffs.
Newformatspecificationswillbeaddedasneeded.
3Featuretablecomponentsandformat
3.1Namingconventions
Featuretablecomponents,includingfeaturekeys,qualifiers,accession
numbers,databasenameabbreviations,andlocationoperators,areallnamed
followingthesameconventions.Componentnamesmaybenomorethan20
characterslong(Featurekeys15,Featurequalifiers20)andmust
containatleastoneletter.Thefollowingcharactersarepermittedto
occurinfeaturetablecomponentnames:
*Uppercaseletters(A-Z)
*Lowercaseletters(a-z)Numbers(0-9)
*Underscore(_)
*Hyphen(-)
*Singlequotationmarkorapostrophe(')
*Asterisk(*)
3.2Featurekeys
3.2.1Purpose
Featurekeysindicate
(1)thebiologicalnatureoftheannotatedfeatureor
(2)informationaboutchangestoorotherversionsofthesequence.
Thefeaturekeypermitsausertoquicklyfindorretrievesimilarfeaturesor
featureswithrelatedfunctions.
3.2.2Formatandconventions
Thereisadefinedlistofallowablefeaturekeys,whichisshowninAppendix
II.Eachfeaturemustcontainafeaturekey.
3.2.3Keygroupsandhierarchy
Thefeaturekeysfallintofamilieswhichareinsomesensesimilarin
functionandwhichareannotatedinasimilarmanner.Afunctionalfamilymay
havea"generic"ormiscellaneouskey,whichcanberecognizedbythe'misc.'
prefix,thatcanusedforinstancesnotcoveredbytheotherdefinedkeysof
thatgroup.
Thefeaturekeygroupsarelistedbelowwithashortdefinitionandan
annotationexample:
1.Differenceandchangefeatures
Indicatewaysinwhichasequenceshouldbechangedtoproduceadifferent
"version":
misc_differencelocation
/replace="change_location"
2.Transcriptfeatures
Indicateproductsmadebyaregion:
misc_RNAlocation
3.Bindingfeatures
Indicatethatasequenceornucleotideiscovalently,non-covalently,or
otherwiseboundtosomethingelse:
misc_bindinglocation
/bound_moiety="boundmolecule"
4.Repeatfeatures
Indicaterepetitivesequenceelements:
repeat_regionlocation
5.Recombinationfeatures
Indicateregionsthathavebeeneitherinsertedordeletedbyrecombination:
misc_recomblocation
6.Structur
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- GenBank 特征表中关键词限定词 特征 关键词 限定词