书签分享收藏举报版权申诉 / 9

立即下载加入VIP,免费下载

当前位置：首页 > 成人教育 > 远程网络教育 > 基因家族分析套路docx.docx

基因家族分析套路docx.docx

文档编号：29140585
上传时间：2023-07-20
格式：DOCX
页数：9
大小：648.77KB

基因家族分析套路docx.docx

《基因家族分析套路docx.docx》由会员分享，可在线阅读，更多相关《基因家族分析套路docx.docx（9页珍藏版）》请在冰豆网上搜索。

基因家族分析套路docx.docx

基因家族分析套路docx

基因家族分析套路

（一）

近年来，测序价格的下降，导致越来越多的基因组完成了测序，在数据库中形成了大量的可用资源。

如何利用这些资源呢？

今天小编带你认识一下不测序也能发文章的思路--全基因组基因家族成员鉴定与分析（现在这一领域可是很热奥）；

一、基本分析内容

⏹数据库检索与成员鉴定

⏹进化树构建

⏹保守domain和motif分析.

⏹基因结构分析.

⏹转录组或荧光定量表达分析.

二、数据库检索与成员鉴定

1、数据库检索

1）首先了解数据库用法，学会下载你要分析物种的基因组相关数据。

一般也就是下面这些数据库了

⏹Brachypodiumdb:

⏹Rice?

Genome?

Annotation?

Project?

：

.

2）已鉴定的家族成员获取。

?

如何获得其他物种已发表某个基因家族的所有成员呢，最简单的就是下载该物种蛋白序列文件（可以从上述数据库中下载），然后按照文章中的ID，找到对应成员。

对于没有全基因组鉴定的，可以下列数据库中找：

?

a.?

NCBI:

?

nucleotide?

and?

protein?

db.

2、比对工具。

一般使用blast和hmmer，具体使用命令如下：

⏹Local?

BLAST

formatdb–i?

db.fas–p?

F/T；

blastall–p?

blastp（orelse）?

–i?

known.fas–d?

db.fas–m?

8?

–b?

2（or?

else）?

e?

1e-5?

–o?

alignresult.txt.

-b:

output?

two?

different?

members?

in?

subject?

sequences?

（db）.

⏹Hmmer?

（hidden?

Markov?

Model）?

search.?

Thesame?

as?

PSI-BLAST?

in?

function.?

It?

has?

a?

higher?

sensitivity,?

but?

the?

speed?

islower.

Command:

3、过滤。

⏹Identity:

?

至少50%.

⏹Cover?

region:

?

也要超过50%或者蛋白结构域的长度.

⏹EST?

支持

⏹?

Blast?

and?

Hmmer同时检测到

4、通过上述操作获得某家族的所有成员

基因家族分析套路

（二）

本次主要讲解在基因家族分析类文章中，进化部分分析的内容。

主要是进化树的构建与分析。

一、构建进化树的基本步骤

１、多序列比对.?

Muscle?

program.

３、算法选择。

三种.?

NJ,?

ML?

and?

BI.

４、软件选

二、具体步骤

?

2.1?

多序列比对。

一般采用muscle。

因为?

MUSCLE?

is?

one?

of?

the?

best-performing?

multiple?

alignment?

programs?

according?

to?

published?

benchmark?

tests,?

with?

accuracy?

and?

speed?

that?

are?

consistently?

better?

than?

CLUSTALW.

2.2?

模型选择。

对于用蛋白序列构建进化树的可以采用下面命令：

?

java?

?

-Xmx250m?

?

-classpath?

?

path/ProtTest.jar?

?

prottest.ProtTest?

?

-i?

alignmfile.phy.

运行结果如下图

?

注意：

1）“.Phy”?

format.?

Only?

allow?

ten?

charaters.注意名字不能重复相同。

2）AIC:

?

Akaike?

Information?

Criterion?

framework.

3）Gamma?

distribution?

parameter?

（G）:

?

gamma?

shape.

3）proportion?

of?

invariable?

sites:

?

I.

2.3构建进化树

2.3.1?

意义：

a聚类分析。

如亚家族分类。

像MAPKKK基因家族通过进化树可以清楚分为?

MEKK,?

Raf?

and?

ZIK三个亚家族.

b亲缘关系鉴定。

在进化树上位于同一支的往往暗示这亲缘关系很近

c?

基因家族复制分析。

研究基因家族复制事件（duplication?

events），两种复制事件类型常采用的标准：

Tandem?

duplication:

?

Identity?

and?

cover?

region?

more?

than?

70%?

and?

tightly?

linked?

（Holub,?

2001）.

2.3.2?

进化树。

一般ML树比较准确，但应结合方法，如NJ树，相互验证。

2.3.3?

进化部分分析：

KaKs计算

a.?

ParaAT:

?

ParaAT.pl-h?

test.homologs?

-n?

test.cds?

-a?

test.pep?

-p?

proc?

–f?

axt?

–k?

-o?

output

c.分歧时间计算：

Divergenttime（T）?

calculation.

?

T=Ks/2λ.?

λ?

:

?

mean?

5.1-7.1×10-9?

?

.

d.Ka/Ks意义：

?

Ka/Ks=1.中性进化。

.

?

Ka/Ks<>

?

Ka/Ks>1.正选择。

Positively?

selected?

genes?

and?

produce?

fitness?

advantagemutations?

to?

evolve?

new?

functions.

基因家族分析套路（三）

本节主要讲基因结构分析套路

1、Motif分析

使用软件MEME，命令如下：

?

meme?

sample.fa?

-dna?

–revcomp?

-nmotifs?

10?

?

-mod?

zoops?

-minw?

6-maxw?

50>meme_htmlFormat.html

2、基因结构分布图

用法如下：

结果展示

3、基因结构常见统计信息：

自己excel或写程序统计

?

a.?

The?

number?

of?

intron?

andexon.

?

b.?

The?

splicing?

intronpattern?

inculding?

0,1,2?

phase.

?

c.?

The?

marked?

region.?

Forexample?

kinase?

domain.

?

d.?

sequence?

length.

?

e.?

UTR.

4、启动子分析。

网站：

主要做植物的：

注意事项：

a.?

IE?

brower.

b.?

Only?

one?

sequence?

for?

oncesearch?

and?

the?

length?

was?

limited?

in?

1000?

bp.

c.?

DNA?

sequence?

origin:

?

1000?

or1500?

bp?

upstream?

of?

ATG?

of?

one?

gene.

分析结果：

基因家族分析套路（四）

一、转录组及芯片原始数据下载网站

?

1、?

?

。

用法见下图。

GEO数据ID命名规则：

GPL->GSE->GSM.

GPL:

?

platform

GSE:

?

multiple?

series.

GSM:

?

multiple?

samples.

GDS?

≈?

GSE.?

Thedifference?

concentrated?

on?

the?

data?

labeled?

GDS?

can?

be?

analyzed?

for?

one?

geneonline.?

It?

is?

simple?

and?

easily.

The?

data?

in?

the?

sameGPL?

can?

be?

used?

to?

?

compare?

inexperiment

下面是在线分析转录组数据的用法：

2、

?

该数据库下载数据用法如下：

3、

该数据库下载数据用法如下，注意用户名和密码！

4、

5、DRA?

db（）

二、数据处理

拿到原始数据，要进行处理，才能进行后续数据分析。

1、芯片数据。

原始数据格式“.cel”格式。

以AffyMicroarray数据处理为例讲述主要的命令如下：

>?

library（affy）;?

>library（makecdfenv）;?

?

>library……

>mydata?

<-?

ReadAffy（）?

##choose?

“.cel?

“?

file?

analyzed.

>eset?

<-?

rma（mydata）;

>write.exprs（eset,file="mydata.txt"）

>design?

<-?

model.matrix（~-1+factor（c（1,1,2,2,3,3）））?

#?

Createsappropriate?

design?

matrix.?

>colnames（design）?

<-c（"group1",?

"group2",?

"group3"）?

#?

Assigns?

column?

names.

>fit?

<-?

lmFit（eset,?

design）?

#?

Fits?

a?

linear?

model?

for?

each?

gene?

based?

onthe?

given?

series?

of?

arrays.

>contrast.matrix?

<-?

makeContrasts（group2-group1,group3-group2,?

group3-group1,?

levels=design）?

#?

Creates?

appropriate?

contrast?

matrix?

toperform?

all?

pairwise?

comparisons.

>fit2?

<-?

contrasts.fit（fit,?

contrast.matrix）#?

Computes?

estimatedcoefficients?

and?

standard?

errors?

for?

a?

given?

set?

of?

contrasts.

>fit2?

<-?

eBayes（fit2）?

#?

Computes?

moderated?

t-statistics?

and?

log-oddsof?

differential?

expression?

by?

empirical?

Bayes?

>topTable（fit2,?

coef=1,adjust="fdr",?

sort.by="B",?

number=10）?

#?

Generates?

list?

of?

top?

10?

（'number=10'）differentially?

expressed?

genes?

sorted?

by?

B-values?

（'sort.by=B'）?

for?

firstcomparison?

group.

>write.table（topTable（fit2,?

coef=1,adjust="fdr",?

sort.by="B",?

number=500）,file="limma_complete.xls",?

row.names=F,?

sep="\t"）?

#?

Exports?

complete?

limma?

statistics?

table?

forfirst?

comparison?

group.

>results?

<-?

decideTests（fit2,p.value=0.05）;?

vennDiagram（results）?

2、转录组数据处理。

原始数据格式为sra或fastq格式。

Sra可以转换为fastq然后运用下面的命令进行处理。

1）获得cleandata；

?

fastx_clipper?

:

clip?

adapter.

?

fastq_quality_filter:

?

base?

quality?

control.

?

fastq_quality_trimmer:

?

trim?

5’?

low?

quality?

bases.

2）计算RPKM.

?

bowtie2-buildpath/db.seq?

path/db

?

tophat?

db?

read.fastq

?

bam_filter?

?

path/accepted_hits.bam

?

samtools?

view?

-h?

-o?

output-uniq.sam?

output_uniq.bam

excel?

for?

calculation（low?

frequencyreads?

≤5?

were?

omitted?

）.

3）差异表达的基因。

?

寻找存在差异表达的家族成员，推测其可能的功能。

有下面两种分析策略，均可采用。

a.倍数法。

对于基因家族分析，可以采用倍数法，以2倍为标准，得到上调和小的基因

b.CV值。

计算某个成员在不同处理下的基因表达变化。

CV?

=SD/mean.Used?

in?

differenttissues?

or?

organs?

anlysis.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 基因家族分析套路 docx

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：基因家族分析套路docx.docx
链接地址：https://www.bdocx.com/doc/29140585.html

基因家族分析套路docx.docx

热门标签