plink19的GWAS数据处理流程.docx
- 文档编号:10518673
- 上传时间:2023-02-17
- 格式:DOCX
- 页数:20
- 大小:28.11KB
plink19的GWAS数据处理流程.docx
《plink19的GWAS数据处理流程.docx》由会员分享,可在线阅读,更多相关《plink19的GWAS数据处理流程.docx(20页珍藏版)》请在冰豆网上搜索。
plink19的GWAS数据处理流程
Datamanagement
Generatebinaryfileset
--make-bed
--make-bed createsanew PLINK1binaryfileset, after applyingsample/variantfiltersandotheroperationsbelow.Forexample,
plink--file text_fileset --maf 0.05 --make-bed--out binary_fileset
doesthefollowing:
1.Autogenerate binary_fileset-temporary.bed + .bim + .fam.(TheMAFfilterhasnotyetbeenappliedatthisstage.Seethe Orderofoperations pageformoredetails.)
2.Read binary_fileset-temporary.bed + .bim + .fam.CalculateMAFs.RemoveallvariantswithMAF< 0.05 fromthecurrentanalysis.
3.Generate binary_fileset.bed + .bim + .fam.Anysamples/variantsremovedfromthecurrentanalysisarealsonotpresentinthisfileset.(Thisisthe--make-bedstep.)
4.Delete binary_fileset-temporary.bed + .bim + .fam.
Incontrast,thefilesetleftbehindby --keep-autoconv isjusttheresultofstep1.
--make-just-bim
--make-just-fam
--make-just-bim isavariantof--make-bedwhichonlygeneratesa.bimfile,and --make-just-fam playsthesamerolefor.famfiles.UnlikemostotherPLINKcommands,thesedonotrequirethemaininputtoincludea.bedfile(thoughyouwon'thaveaccesstomanyfilteringflagswhenusingtheseinno-.bedmode).
Usethesecautiously. Itisveryeasytodesynchronizeyourbinarygenotypedataandyour.bim/.famindexesifyouusethesecommandsimproperly.Ifyouhaveanydoubt,stickwith--make-bed.
Generatetextfileset
--recode<01|12><23|A|A-transpose|AD|beagle|beagle-nomap|bimbam|bimbam-1chr|compound-genotypes|fastphase|fastphase-1chr|HV|HV-1chr|lgen|lgen-ref|list|oxford|rlist|structure|transpose|vcf|vcf-fid|vcf-iid>
--recode-allele[filename]
--recode createsanewtextfileset,afterapplyingsample/variantfiltersandotheroperations.Bydefault,thefilesetincludesa .ped anda .map file,readablewith --file.
∙The'12'modifiercausesA1(usuallyminor)allelestobecodedas'1'andA2allelestobecodedas'2',while'01'mapsA1→0andA2→1.(PLINKforcesyoutocombine'01'with --{output-}missing-genotypewhenthisisnecessarytopreventmissinggenotypesfrombecomingindistinguishablefromA1calls.)
∙The'23'modifiercausesa23andMe-formattedfiletobegenerated.Thiscanonlybeusedonasinglesample'sdata(aone-line --keep filemaycomeinhandyhere).ThereiscurrentlynospecialhandlingoftheXYpseudo-autosomalregion.
∙The'AD'modifiercausesan additive(0/1/2)+dominant(het=1,otherwise0)componentfile,suitableforloadingfromR,tobegenerated.'A'isthesame,exceptwithoutthedominancecomponent.
oBydefault,A1allelesarecounted;thiscanbecustomizedwith --recode-allele.--recode-allele'sinputfileshouldhavevariantIDsinthefirstcolumnandalleleIDsinthesecond.
oBydefault,theheaderlinefor.rawfilesonlynamesthecountedalleles.Toincludethealternateallelecodesaswell,addthe'include-alt'modifier.
oHaploidadditivecomponentsare0/2-valuedinsteadof0/1-valued,tomaintainaconsistentscaleontheXchromosome.
Seealso --R.
∙The'A-transpose'modifiercausesa variant-majoradditivecomponentfile tobegenerated.Thiscanalsobeusedwith--recode-allele.
∙The'beagle'modifiercausesunphasedper-autosome .datand.map files,readableby BEAGLE 3.3andearlier,tobegenerated,while'beagle-nomap'generatesasingle.datfile(nochromosomesplittingoccursinthiscase).
∙The'bimbam'modifiercausesa BIMBAM-formattedfileset tobegenerated.Ifyourinputdataonlycontainsonechromosome,youcanuse'bimbam-1chr'insteadtowriteatwo-column.pos.txtfile.
∙Ifallallelecodesaresingle-character,youcanusethe'compound-genotypes'modifiertoomitthespacebetweeneachpairofallelecodesinasinglegenotypecallwhengeneratinga.ped+.mapfileset.Youwillneedtousethe--compound-genotypesflagtoloadthisdatainPLINK1.07,butit'snotneededforPLINK1.9.
∙The'fastphase'modifiercausesper-chromosome fastPHASEfiles tobegenerated.Ifyourinputdataonlycontainsonechromosome,youcanuse'fastphase-1chr'insteadtoexcludethechromosomenumberfromthefileextension.
∙The'HV'modifiercausesaHaploview-format.ped+ .info filesettobegeneratedperchromosome.'HV-1chr'isanalogousto'fastphase-1chr'.
∙The'lgen'modifiercausesa long-formatfileset,loadablewith --lfile,tobegenerated.'lgen-ref'isequivalenttoPLINK1.07--recode-lgen--with-reference.
∙The'list'modifiercausesa genotype-basedlist tobegenerated.Thisdoesnotproducea.famor.mapfile.
∙The'oxford'modifiercausesaOxford-format .gen + .sample filesettobegenerated.Ifyoualsoincludethe'gen-gz'modifier,the.genfileisgzipped.
∙The'rlist'modifiercausesa rare-genotypefileset tobegenerated(similarto--list'soutput,butwith.famand.mapfiles,andwithouthomozygousmajorgenotypes).
∙Withthe'list'and'rlist'formats,the'omit-nonmale-y'modifiercausesnonmalegenotypestobeomittedontheYchromosome.
∙The'structure'modifiercausesa Structure-formatfile tobegenerated.
∙The'transpose'modifiercausesa transposedtextfileset,loadablewith --tfile,tobegenerated.
∙The'vcf','vcf-fid',and'vcf-iid'modifiersresultinproductionofa VCFv4.2file.'vcf-fid'and'vcf-iid'causefamilyIDsandwithin-familyIDsrespectivelytobeusedforthesampleIDsinthelastheaderrow,while'vcf'mergesbothIDsandputsanunderscorebetweenthem(inthiscase,awarningwillbegivenifanIDalreadycontainsanunderscore).
Ifthe'bgz'modifierisadded,theVCFfileisblock-gzipped.(Gzippingofother--recodeoutputfilesisnotcurrentlysupported.)
TheA2alleleissavedasthereferenceandnormallyflaggedasnotbasedonarealreferencegenome('PR'INFOfieldvalue).Whenitisimportantforreferenceallelestobecorrect,you'llusuallyalsowanttoinclude --a2-alleleand--real-ref-alleles inyourcommand.
∙The'tab'modifiermakestheoutputmostlytab-delimitedinsteadofmostlyspace-delimitedwhentheformatpermitsbothdelimiters.'tabx'and'spacex'forcealltabsandallspaces,respectively.(See thispage forguidelinesonswappingtabs/spacesinothercontexts.)
Forexample,
plink--bfile binary_fileset --recode--out new_text_fileset
generates new_text_fileset.ped and new_text_fileset.map fromthedatainbinary_fileset.bed + .bim + .fam,while
plink--bfile binary_fileset --recodevcf-iid--out new_vcf
generates new_vcf.vcf fromthesamedata,removingfamilyIDsintheprocess.
Irregularoutputcoding
--output-chr[MTcode]
Normally,autosomal/sex/mitochondrialchromosomecodesinPLINKoutputfilesarenumeric,e.g.'23'forhumanX. --output-chr letsyouspecifyadifferentcodingschemebyprovidingthedesiredhumanmitochondrialcode;supportedoptionsare'26'(default),'M','MT','0M','chr26','chrM',and'chrMT'.(PLINK1.9correctlyinterpretsalloftheseencodingsininputfiles.)
--output-missing-genotype[char]
--output-missing-phenotype[string]
--output-missing-genotype allowsyoutochangethecharacter(normallythe --missing-genotype value)usedtorepresentmissinggenotypesinPLINKoutputfiles,while --output-missing-phenotype changesthestring(normallythe --missing-phenotype value)representingmissingphenotypes.
Notethattheseflagsdonotaffect--{b}merge/--merge-listortheautoconverters,sincetheygeneratefilesthatmaybereloadedduringthesamerun.Add--make-bedifyouwanttochangemissinggenotype/phenotypecodingwhenperformingthoseoperations.
Setblocksofgenotypecallstomissing
--zero-cluster[filename]
If clustershavebeendefined, --zero-cluster takesafilewithvariantIDsinthefirstcolumnandclusterIDsinthesecond,andsetsallthecorrespondinggenotypecallstomissing.Seethe PLINK1.07documentation foranexample.
Thisflagmustnowbeusedwith--make-bedandnootheroutputcommands(sincePLINKnolongerkeepstheentiregenotypematrixinmemory).
Heterozygoushaploiderrors
--set-hh-missing
Normally,heterozygoushaploidandnonmaleYchromosomegenotypecallsareloggedto plink.hh andtreatedasmissingbyallanalysiscommands,butleftundisturbedby--make-bedand--recode(since,oncegenderand/orchromosomecodeerrorshavebeenfixed,thecallsareoftenvalid).Ifyouactuallywant--make-bed/--recodetoerasethisinformation,use --set-hh-missing.(ThescopeofthisflagisabitwiderthanforPLINK1.07,sincecommandslike--listand--recode-rlistwhichpreviouslydidnotrespect--set-hh-missinghavebeenconsolidatedunder--recode.)
Notethatthemostcommonsourceofheterozygoushaploiderrorsisimporteddatawhichdoesn'tfollowPLINK'sconventionforrepresentingtheXchromosomepseudo-autosomalregion.Thisshouldbeaddressedwith--split-xbelow,not--set-hh-missing.
--set-mixed-mt-missing
MitochondrialDNAissubjectto heteroplasmy,soPLINK1.9permits'heterozygous'genotypesandtreatsMTmorelikeadiploidthanahaploidchromosome.However,someanalyticalmethodsdon'tusemixedMTgenotypecalls,andinsteadassumethatno'heterozygous'MTcallsexist.The --set-mixed-mt-missing flagcanbeusedwith--make-bed/--recodetoexportadatasetwithmixedMTcallserased.
Xchromosomepseudo-autosomalregion
--split-x[lastbppositionofhead][firstbppositionoftail]
--split-x[buildcode]
--merge-x
PLINKpreferstorepresenttheXchromosome'spseudo-autosomalregionasaseparate'XY'chromosome(numericcode25inhumans);thisremovestheneedforspecialhandlingofmaleXheterozygouscalls.However,thisconventionhasnotbeenwidelyadopted,andasaconsequence,heterozygoushaploid'errors'a
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- plink19 GWAS 数据处理 流程