QIIME使用说明.docx
- 文档编号:8800888
- 上传时间:2023-02-01
- 格式:DOCX
- 页数:34
- 大小:1.21MB
QIIME使用说明.docx
《QIIME使用说明.docx》由会员分享,可在线阅读,更多相关《QIIME使用说明.docx(34页珍藏版)》请在冰豆网上搜索。
QIIME使用说明
QIIME(pronounced"chime")standsforQuantitativeInsightsIntoMicrobialEcology.QIIMEisanopensourcesoftwarepackageforcomparisonandanalysisofmicrobialcommunities,primarilybasedonhigh-throughputampliconsequencingdata
(suchasSSUrRNA小亚基核糖体rna
generatedonavarietyofplatforms,butalsosupportinganalysisofothertypesofdata
(suchasshotgunmetagenomicdatametagenomic意思是宏基因组学,是对环境样品中微生物群体基因组进行的分析).
QIIMEtakesusersfromtheirrawsequencingoutputthroughinitialanalysessuchasOTUpicking系统聚类,taxonomicassignment分类,andconstructionofphylogenetictrees系统树fromrepresentativesequencesofOTUs,andthroughdownstreamstatisticalanalysis,visualization,andproductionofpublication-qualitygraphics.QIIMEhasbeenappliedto适用于singlestudiesbasedonbillionsofsequencesfromthousandsofsamples.
Thistutorialexplainshowtousethe
QIIME(QuantitativeInsightsIntoMicrobialEcology)
Pipelinetoprocessdatafromhigh-throughput16SrRNAsequencingstudies.Ifyouhavenotalreadyinstalledqiime,pleaseseethesectionInstallingQiimefirst.Thepurposeofthis
pipeline流水线
istoprovideastart-to-finishworkflow,beginningwith
multiplexedsequence复合序列(多序列比对,整理分类和系统文件,比较样本,确定改变微生物群体形态的生物和环境因素)
readsandfinishingwithtaxonomicandphylogeneticprofilesandcomparisonsofthesamplesinthestudy.Withthisinformationinhand,itispossibletodeterminebiologicalandenvironmentalfactorsthataltermicrobialcommunityecologyinyourexperiment.
Asanexample,wewillusedatafromastudyoftheresponseofmousegutmicrobialcommunitiestofasting(Crawfordetal.,2009).Tomakethistutorialrunquicklyonapersonalcomputer,wewilluseasubsetofthedatageneratedfrom5animalskeptonthecontroladlibitumfeddiet,and4animalsfastedfor24hoursbeforesacrifice.Attheendofourtutorial,wewillbeabletocomparethecommunitystructureofcontrolvs.fastedanimals.Inparticular,wewillbeabletocomparetaxonomicprofilesforeachsampletype,differencesindiversitymetricswithinthesamplesandbetweenthegroups,andperformcomparativeclusteringanalysistolookforoveralldifferencesinthesamples.(给小鼠节食的例子)
Inthiswalkthrough,textlikethefollowing:
denotesthecommand-lineinvocation命令行调用ofscripts.Youcanfindfullusageinformationforeachscriptbypassingthe–hoption(help)and/orbyreadingthefulldescriptionintheDocumentation.Executealltutorialcommandsfromwithintheqiime_tutorialdirectory,whichcanbedownloadedfromhere:
QIIMETutorialfiles.
Toprocessourdata,wewillperformthefollowinganalyses,eachofwhichisdescribedinmoredetailbelow:
FiltertheDNAsequencereadsforqualityandassignmultiplexedreadstostartingsamplesbynucleotidebarcode条码.
PickOperationalTaxonomicUnits(OTUs操作分类单元)basedonsequencesimilaritywithinthereads,andpickarepresentativesequencefromeachOTU.
AssigntheOTUtoataxonomicidentityusingreferencedatabases.
AligntheOTUsequencesandcreateaphylogenetictree.
Calculatediversitymetricsforeachsampleandcomparethetypesofcommunities,usingthetaxonomicandphylogeneticassignments.
GenerateUPGMAandPCoAplotstovisuallydepictthedifferencesbetweenthesamples,anddynamicallyworkwiththesegraphstogeneratepublicationqualityfigures.
筛选DNA序列获取质量,记录样品的核苷酸条码。
基于读取文件的序列相似挑选操作分类单位,挑选每个OTU的代表序列。
使用参考数据库指定OUT的分类一致性。
对齐OTU序列,并创建一个系统进化树。
计算每个样本的多样性指标和比较社区的类型,使用分类和系统法。
类平均法和主坐标分析直观地描绘出样品之间的差异,并动态地使用这些曲线生成出版质量的图。
Sequences(.fna)¶
Thisisthe454-machinegeneratedFASTAfile格式文件.UsingtheAmpliconprocessingsoftwareonthe454FLXstandard,eachregionofthePTPplatewillyieldafastafileofformwhere“1”isreplacedwiththeappropriateregionnumber.Forthepurposesofthistutorial,wewillusethefastafile.
QualityScores(.qual)¶
Thisisthe454-machinegeneratedqualityscorefile,whichcontainsascoreforeachbaseineachsequenceincludedintheFASTAfile.Likethefastafilementionedabove,theAmpliconprocessingsoftwarewillgenerateoneofthesefilesforeachregionofthePTPplate,namedetc.Forthepurposesofthistutorial,wewillusethequalityscoresfile.
MappingFile(Tab-delimited.txt)¶
Themappingfileisgeneratedbytheuser.Thisfilecontainsalloftheinformationaboutthesamplesnecessarytoperformthedataanalysis.Ataminimum,themappingfileshouldcontainthenameofeachsample,thebarcodesequenceusedforeachsample,thelinker/primersequenceusedtoamplifythesample,andaDescriptioncolumn.Ingeneral,youshouldalsoincludeinthemappingfileanymetadata元数据;诠释资料thatrelatestothesamples(forinstance,healthstatusorsamplingsite)andanyadditionalinformationrelatingtospecificsamplesthatmaybeusefultohaveathandwhenconsideringoutliers剔除值(forexample,whatmedicationsapatientwastakingattime偶尔ofsampling).Ofnote:
thesamplenamesmayonlycontainalphanumericcharacters(A-z)andthedot(.).FullformatspecificationscanbefoundintheDocumentation(FileFormats).
Forthepurposesofthistutorial,wewillusethemappingfile.Thecontentsofthemappingfileareshownhere-asyoucansee,anucleotidebarcodesequenceisprovidedforeachofthe9samples,aswellasmetadatarelatedtotreatmentgroupanddateofbirth,andgeneralrundescriptionsabouttheproject.filecontents:
Note
#SampleIDBarcodeSequenceLinkerPrimerSequenceTreatmentDOBDescription
#ExamplemappingfilefortheQIIMEanalysispackage.These9samplesarefromastudyoftheeffectsof
#exerciseanddietonmousecardiacphysiology(Crawford,etal,PNAS,2009).
AGCACGAGCCTAYATGCTGCCTCCCGTAGGAGTControlAACTCGTCGATGYATGCTGCCTCCCGTAGGAGTControlACAGACCACTCAYATGCTGCCTCCCGTAGGAGTControlACCAGCGACTAGYATGCTGCCTCCCGTAGGAGTControlAGCAGCACTTGTYATGCTGCCTCCCGTAGGAGTControlAACTGTGCGTACYATGCTGCCTCCCGTAGGAGTFastACAGAGTCGGCTYATGCTGCCTCCCGTAGGAGTFastACCGCAGAGTCAYATGCTGCCTCCCGTAGGAGTFastACGGTGAGTGTCYATGCTGCCTCCCGTAGGAGTFastMappingFile¶
BeforebeginningwithQIIME,youshouldensurethatyourmappingfileisformattedcorrectlywiththescript.Type:
-omapping_output
Thisutilitywilldisplayamessageindicatingwhetherornotproblemswerefoundinthemappingfile.AHTMLfileshowingthelocationoferrorsandwarningswillbegeneratedintheoutputdirectory,andwillalsobewrittentotheoutputtoalogfile.Errorswillcausefatalproblemswithsubsequentscriptsandmustbecorrectedbeforemovingforward.Warningswillnotcausefatalproblems,butitisencouragedthatyoufixtheseproblemsastheyareoftenindicativeoftypos错别字inyourmappingfile,invalidcharacters,orotherunintendederrorsthatwillimpactdownstreamanalysis.Afilewillalsobecreatedintheoutputdirectory,whichwillhaveacopyofthemappingfilewithinvalidcharacters无效字符replacedbyunderscores下划线.
Reverseprimers反向引物canbespecified说明,指出inthemappingfile,forremovalduringthedemultiplexingstep.Thisisnotrequired,butitisSTRONGLYrecommended,asleavinginsequencesfollowingprimers,suchassequencingadapters,caninterferewithOTUpickingandtaxonomicassignmentswithRDP远程桌面协议.
Anexamplemappingfilewithfauxreverseprimersspecified,usingtheReversePrimerfield,isavailablehere:
reverseprimermappingfile.
Note
#SampleIDBarcodeSequenceLinkerPrimerSequenceTreatmentReversePrimerDescription
#ExamplemappingfilefortheQIIMEanalysispackage.These9samplesarefromastudyoftheeffectsof
#exerciseanddietonmousecardiacphysiology(Crawford,etal,PNAS,2009).
AGCACGAGCCTAYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTAAACTCGTCGATGYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTAACAGACCACTCAYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTAACCAGCGACTAGYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTAAGCAGCACTTGTYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTAAACTGTGCGTACYATGCTGCCTCCCGTAGGAGTFastGCGCACGGGTGAGTAACAGAGTCGGCTYATGCTGCCTCCCGTAGGAGTFastGCGCACGGGTGAGTAACCGCAGAGTCAYATGCTGCCTCCCGTAGGAGTFastGCGCACGGGTGAGTAACGGTGAGTGTCYATGCTGCCTCCCGTAGGAGTFastGCGCACGGGTGAGTAreverseprimers,liketheforwardprimers,arewrittenin5’->3’direction.Inthiscase,thesearenotthetruereverseprimersused,butratherjustasomewhat
conservedsite保守位点(所有的基因启动子上基本都有这个序列)
inthesequencesusedforthisexample.()
Anexampleimageofatheentireprimerconstructandampliconisshownbelow,usingQIIMEnomenclature命名法:
454sequencing,inmostcases,generatessequencesthatbeginattheBarcodeSequence,whichisfollowedbytheLinkerPrimerSequence,bothofwhichareautomaticallyremovedduringthedemultiplexingstepdescribedbelow.However,theReversePrimer.,theprimerattheendoftheread)isnotremovedbydefault默认,andneedstobespecified.Theadaptersequence(AdapterB接头)doesnotmatch匹配genomicdata,suchas16Ssequences,andassuchitcandisrupt打断、中断analyses.
AssignSamplestoMultiplexReads¶
Thenexttaskistoassignthemultiplexedreads多重读取tosamplesbasedontheirnucleotidebarcode.Also,thisstepperformsqualityfiltering质量筛选basedonthecharacteristicsofeachsequence,removinganylowqualityorambiguous模糊reads.Thescriptforthisstepis.AfulldescriptionofparametersforthisscriptaredescribedintheDocumentation.Forthistutorial,wewillusedefaultparameters(minimumqualityscore=25,minimum/maximumlength=200/1000,noambiguousbasesallowedandnomismatchesallowedintheprimersequence).Type:
-m-f-q-osplit_library_output
Thisinvocation调用willcreatethreefilesinthenewdirectorysplit_library_output/:
:
Thisfilecontainsthesummaryofsplitting,includingthenumberofreadsdetectedforeachsampleandabrief
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- QIIME 使用说明