北京工业大学数学建模竞赛初赛试题B题.docx
- 文档编号:12535769
- 上传时间:2023-04-20
- 格式:DOCX
- 页数:22
- 大小:315.27KB
北京工业大学数学建模竞赛初赛试题B题.docx
《北京工业大学数学建模竞赛初赛试题B题.docx》由会员分享,可在线阅读,更多相关《北京工业大学数学建模竞赛初赛试题B题.docx(22页珍藏版)》请在冰豆网上搜索。
北京工业大学数学建模竞赛初赛试题B题
2007年北京工业大学数学建模竞赛初赛试题B题:
化验结果的处理题解
摘要:
本文运用了距离判别和Fisher判别两种方法对问题进行分析求解,得出了我们想要的结论,即通过体内元素含量较准确的判别个体是否患有肾炎。
1、问题的提出
人们到医院就诊时,通常要化验一些指标来协助医生的诊断。
诊断就诊人员是否患肾炎时通常要化验人体内各种元素含量。
表B.1是确诊病例的化验结果,其中1-30号病例是已经确诊为肾炎病人的化验结果;31-60号病例是已经确定为健康人的结果。
表B.2是就诊人员的化验结果。
我们的问题是:
1)根据表B.1中的数据,提出一种或多种简便的判别方法,判别属于患者或健康人的方法,并检验你提出方法的正确性。
2)按照1提出的方法,判断表B.2中的30名就诊人员的化验结果进行判别,判定他(她)们是肾炎病人还是健康人。
3)能否根据表B.1的数据特征,确定哪些指标是影响人们患肾炎的关键或主要因素,以便减少化验的指标。
4)根据3的结果,重复2的工作。
5)对2和4的结果作进一步的分析。
(表见附录)
2、问题分析
1)题目中表.1中给出了已经确诊为肾炎病人和健康人的各30组数据;
2)每一组数据都有七个数,分别代表了Zn,Cu,Fe,Ca,Mg,K,Na在每个人体内的量;
3)第一问要求我们提出判别一个人属于患者还是健康人的方法,这就需要通过对60组数据的分析得出健康人和肾炎患者体中这些元素量之差异,这些差异的大小又同时是解决第三问的主要影响因素;
4)在寻找数据的差异时,我们用到的传统方法就是求数据的方差和均值,用excel列表分析,用matlab作直方图分析。
5)第二问最可靠的方法就是用判别分析来做,这就需要在R软件中进行一些必要的编程和处理;
6)第四问是建立在第三问的基础上的;当解决了第三问中到底是那些因素影响到了人们患肾炎的关键时,只需要在那些主要因素中进行判断就可以省去一些复杂繁琐的步骤;
7)将以上问题都解决之后,我们使用和步骤5)相同的方法,使用R软件帮助我们高效地对精简后的数据进行再次分析,并且把第二问和第四问的结果之间进行比较,观察差异和详细的分析。
8)为了进一步验证我们这种做法的合理性,我们又要用C语言编一个程序,把表B2中的数据与4)中所求出各元素的均值进行比较,进行了一下直观的分析。
3.符号约定
后缀为1:
患者体内元素的含量(例如:
Zn1代表患者体内Zn的含量);
后缀为2:
健康人体内元素的含量(例如:
Zn2代表健康人体内Zn的含量);
1:
患者;
2:
健康人;
4.模型假设
1)题中所给的内容和数据都是真实可信的;
2)除了表中列出的元素外,其他元素对是否会患肾炎的影响很小;
3)外界条件对肾炎患者的影响不计;
4)没病的个体都是健康体。
5.模型建立
该问题的关键是如何判断一个人是有病的还是健康的,即这是个判别问题,可以采用统计方法中的判别分析法进行分析处理。
题目中只有两类——病体和健康体,所以可采用二类群体的判别方法。
首先考虑用一种简单而直观的判别方法——Mahalanobis距离判别。
根据两个母体样本计算出他们的均值向量和协方差阵,求取待测样本x对两个样本的Mahalanobis距离,二者取差值,判断离那个母体近似。
设x,y是从均值为μ,协方差阵为Σ的总体A中抽取的样本,则总体A内两点x与y的Mahalanobis距离定义为
.定义样本x与A的Mahalanobis距离为
。
在现实中,母体的均值向量和协方差阵由样本的均值向量和协方差阵来代替:
设
,
,……
是来自母体A的
个样本,
,
,……
是来自母体A的
个样本,则样本的均值与协方差为
,
,
对于待测样本x,如果两个母体样本的协方差相同,由
得到判别函数为
,其中
,其判别准则是
。
如果两个母体样本协方差不同,即
,对于样本x判别函数定义为:
,
。
其次考虑用另外一种方法求取解决办法——Fisher判别法,即按类内方差尽量小,类间方差尽量大的准则来求判断函数。
设两个总体A、B的均值和协方差阵分别是
、
和
、
,对任一测样本x,设它的判别函数为
,并假设
使
满足类内偏差平方和
最小,而类间偏差平方和
最大,其中
。
即
要满足
最大,若
,则
,否则
。
通过推导得出判别函数
,其中
,
,当
,
否则
。
6.模型求解
利用模型求解时通过R软件将以上两种算法编写成程序代码,通过手动输入样本,利用计算机进行求解,程序清单如下:
Mahalanobis距离判别:
A<-matrix(c(166,15.8,24.5,700,112,179,513,185,15.7,31.5,701,125,184,427,193,9.80,25.9,541,163,128,642,159,14.2,39.7,896,99.2,239,726,226,16.2,23.8,606,152,70.3,218,171,9.29,9.29,307,187,45.5,257,201,13.3,26.6,551,101,49.4,141,147,14.5,30.0,659,102,154,680,172,8.85,7.86,551,75.7,98.4,318,156,11.5,32.5,639,107,103,552,132,15.9,17.7,578,92.4,1314,1372,182,11.3,11.3,767,111,264,672,186,9.26,37.1,958,233,73.0,347,162,8.23,27.1,625,108,62.4,465,150,6.63,21.0,627,140,179,639,159,10.7,11.7,612,190,98.5,390,117,16.1,7.04,988,95.5,136,572,181,10.1,4.04,1437,184,101,542,146,20.7,23.8,1232,128,150,1092,42.3,10.3,9.70,629,93.7,439,888,28.2,12.4,53.1,370,44.1,454,852,154,13.8,53.3,621,105,160,723,179,12.2,17.9,1139,150,45.2,218,13.5,3.36,16.8,135,32.6,51.6,182,175,5.84,24.9,807,123,55.6,126,113,15.8,47.3,626,53.6,168,627,50.5,11.6,6.30,608,58.9,58.9,139,78.6,14.6,9.70,421,70.8,133,464,90.0,3.27,8.17,622,52.3,770,852,178,28.8,32.4,992,112,70.2,169),ncol=7,byrow=T)
B<-matrix(c(213,19.1,36.2,2220,249,40.0,168,170,13.9,29.8,1285,226,47.9,330,162,13.2,19.8,1521,166,36.2,133,203,13.0,90.8,1544,162,98.90,394,167,13.1,14.1,2278,212,46.3,134,164,12.9,18.6,2993,197,36.3,94.5,167,15.0,27.0,2056,260,64.6,237,158,14.4,37.0,1025,101,44.6,72.5,133,22.8,31.0,1633,401,180,899,156,135,322,6747,1090,228,810,169,8.00,308,1068,99.1,53.0,289,247,17.3,8.65,2554,241,77.9,373,166,8.10,62.8,1233,252,134,649,209,6.43,86.9,2157,288,74.0,219,182,6.49,61.7,3870,432,143,367,235,15.6,23.4,1806,166,68.8,188,173,19.1,17.0,2497,295,65.8,287,151,19.7,64.2,2031,403,182,874,191,65.4,35.0,5361,392,137,688,223,24.4,86.0,3603,353,97.7,479,221,20.1,155,3172,368,150,739,217,25.0,28.2,2343,373,110,494,164,22.2,35.5,2212,281,153,549,173,8.99,36.0,1624,216,103,257,202,18.6,17.7,3785,225,31.0,67.3,182,17.3,24.8,3073,246,50.7,109,211,24.0,17.0,3836,428,73.5,351,246,21.5,93.2,2112,354,71.7,195,164,16.1,38.0,2135,152,64.3,240,179,21.0,35.0,1560,226,47.9,330),ncol=7,byrow=T)
X<-matrix(c(58.2,5.42,29.7,323,138,179,513,106,1.87,40.5,542,177,184,427,152,0.80,12.5,1332,176,128,646,85.5,1.70,3.99,503,62.3,238,762.6,144,0.70,15.1,547,79.7,71.0,218.5,85.7,1.09,4.2,790,170,45.8,257.9,144,0.30,9.11,417,552,49.5,141.5,170,4.16,9.32,943,260,155,680.8,176,0.57,27.3,318,133,99.4,318.8,192,7.06,32.9,1969,343,103,553,188,8.28,22.6,1208,231,1314,1372,153,5.87,34.8,328,163,264,672.5,143,2.84,15.7,265,123,73.0,347.5,213,19.1,36.2,2220,249,62.0,465.8,192,20.1,23.8,1606,156,40.0,168,171,10.5,30.5,672,145,47.0,330.5,162,13.2,19.8,1521,166,36.2,133,203,13.0,90.8,1544,162,98.9,394.5,164,20.1,28.9,1062,161,47.3,134.5,167,13.1,14.1,2278,212,36.5,96.5,164,12.9,18.6,2993,197,65.5,237.8,167,15.0,27.0,2056,260,44.8,72.0,158,14.4,37.0,1025,101,180,899.5,133,22.8,31.3,1633,401,228,289,169,8.0,30.8,1068,99.1,53.0,817,247,17.3,8.65,2554,241,77.5,373.5,185,3.90,31.3,1211,190,134,649.8,209,6.43,86.9,2157,288,74.0,219.8,182,6.49,61.7,3870,432,143,367.5,235,15.6,23.4,1806,166,68.9,188),ncol=7,byrow=T)
discri1<-function(TrnA,TrnB,TstX=NULL,
var.equal=FALSE){//*TrnA,TrnB,TstX分别为有病,健康和待测得样本,
var.equal缺省值为FALSE,意思是以样本TrnA为参考
if(is.null(TstX)==TRUE)
TstX<-rbind(TrnA,TrnB)
nx<-nrow(TstX);blong<-array(0,c(nx))//*把待测样本数(30)给nx,建立一个向量值为0的1行30列的矩阵
Ab<-apply(TrnA,2,mean);Bb<-apply(TrnB,2,mean)//Ab为1行7列的矩阵,列向量为TrnA矩阵对应列的所有数的均值,Bb同理
if(var.equal==TRUE||var.equal==T){//两个样本的协方差相等时
S<-var(rbind(TrnA,TrnB));Xb<-(Ab+Bb)/2
//计算
,
for(iin1:
nx){
w<-(TstX[i,]-Xb)%*%solve(S,Ab-Bb);//得到判别函数值
if(w>0)
blong[i]<-1//待测体有病
else
blong[i]<-2//待测体没病
}
}
else{
Sa<-var(TrnA);Sb<-var(TrnB);//两个样本的协方差不等时
for(iin1:
nx){
y<-TstX[i,]-Ab;z<-TstX[i,]-Bb
w=z%*%solve(Sb,z)-y%*%solve(Sa,y)//得到判别函数值
if(w>0)
blong[i]<-1
else
blong[i]<-2
}
}
blong
}
discri1(A,B,X)
Fisher判别:
A<-matrix(c(166,15.8,24.5,700,112,179,513,185,15.7,31.5,701,125,184,427,193,9.80,25.9,541,163,128,642,159,14.2,39.7,896,99.2,239,726,226,16.2,23.8,606,152,70.3,218,171,9.29,9.29,307,187,45.5,257,201,13.3,26.6,551,101,49.4,141,147,14.5,30.0,659,102,154,680,172,8.85,7.86,551,75.7,98.4,318,156,11.5,32.5,639,107,103,552,132,15.9,17.7,578,92.4,1314,1372,182,11.3,11.3,767,111,264,672,186,9.26,37.1,958,233,73.0,347,162,8.23,27.1,625,108,62.4,465,150,6.63,21.0,627,140,179,639,159,10.7,11.7,612,190,98.5,390,117,16.1,7.04,988,95.5,136,572,181,10.1,4.04,1437,184,101,542,146,20.7,23.8,1232,128,150,1092,42.3,10.3,9.70,629,93.7,439,888,28.2,12.4,53.1,370,44.1,454,852,154,13.8,53.3,621,105,160,723,179,12.2,17.9,1139,150,45.2,218,13.5,3.36,16.8,135,32.6,51.6,182,175,5.84,24.9,807,123,55.6,126,113,15.8,47.3,626,53.6,168,627,50.5,11.6,6.30,608,58.9,58.9,139,78.6,14.6,9.70,421,70.8,133,464,90.0,3.27,8.17,622,52.3,770,852,178,28.8,32.4,992,112,70.2,169),ncol=7,byrow=T)
B<-matrix(c(213,19.1,36.2,2220,249,40.0,168,170,13.9,29.8,1285,226,47.9,330,162,13.2,19.8,1521,166,36.2,133,203,13.0,90.8,1544,162,98.90,394,167,13.1,14.1,2278,212,46.3,134,164,12.9,18.6,2993,197,36.3,94.5,167,15.0,27.0,2056,260,64.6,237,158,14.4,37.0,1025,101,44.6,72.5,133,22.8,31.0,1633,401,180,899,156,135,322,6747,1090,228,810,169,8.00,308,1068,99.1,53.0,289,247,17.3,8.65,2554,241,77.9,373,166,8.10,62.8,1233,252,134,649,209,6.43,86.9,2157,288,74.0,219,182,6.49,61.7,3870,432,143,367,235,15.6,23.4,1806,166,68.8,188,173,19.1,17.0,2497,295,65.8,287,151,19.7,64.2,2031,403,182,874,191,65.4,35.0,5361,392,137,688,223,24.4,86.0,3603,353,97.7,479,221,20.1,155,3172,368,150,739,217,25.0,28.2,2343,373,110,494,164,22.2,35.5,2212,281,153,549,173,8.99,36.0,1624,216,103,257,202,18.6,17.7,3785,225,31.0,67.3,182,17.3,24.8,3073,246,50.7,109,211,24.0,17.0,3836,428,73.5,351,246,21.5,93.2,2112,354,71.7,195,164,16.1,38.0,2135,152,64.3,240,179,21.0,35.0,1560,226,47.9,330),ncol=7,byrow=T)
X<-matrix(c(58.2,5.42,29.7,323,138,179,513,106,1.87,40.5,542,177,184,427,152,0.80,12.5,1332,176,128,646,85.5,1.70,3.99,503,62.3,238,762.6,144,0.70,15.1,547,79.7,71.0,218.5,85.7,1.09,4.2,790,170,45.8,257.9,144,0.30,9.11,417,552,49.5,141.5,170,4.16,9.32,943,260,155,680.8,176,0.57,27.3,318,133,99.4,318.8,192,7.06,32.9,1969,343,103,553,188,8.28,22.6,1208,231,1314,1372,153,5.87,34.8,328,163,264,672.5,143,2.84,15.7,265,123,73.0,347.5,213,19.1,36.2,2220,249,62.0,465.8,192,20.1,23.8,1606,156,40.0,168,171,10.5,30.5,672,145,47.0,330.5,162,13.2,19.8,1521,166,36.2,133,203,13.0,90.8,1544,162,98.9,394.5,164,20.1,28.9,1062,161,47.3,134.5,167,13.1,14.1,2278,212,36.5,96.5,164,12.9,18.6,2993,197,65.5,237.8,167,15.0,27.0,2056,260,44.8,72.0,158,14.4,37.0,1025,101,180,899.5,133,22.8,31.3,1633,401,228,289,169,8.0,30.8,1068,99.1,53.0,817,247,17.3,8.65,2554,241,77.5,373.5,185,3.90,31.3,1211,190,134,649.8,209,6.43,86.9,2157,288,74.0,219.8,182,6.49,61.7,3870,432,143,367.5,235,15.6,23.4,1806,166,68.9,188),ncol=7,byrow=T)
discri2<-function(TrnA,TrnB,TstX=NULL){
if(is.null(TstX)==TRUE)
TstX<-rbind(TrnA,TrnB)
nx<-nrow(TstX);blong<-array(0,c(nx))
na<-nrow(TrnA);nb<-nrow(TrnB)
Ab<-apply(TrnA,2,mean);Bb<-apply(TrnB,2,mean)
S<-(na-1)*var(TrnA)+(nb-1)*var(TrnB)
xb<-na/(na+nb)*Ab+nb/(na+nb)*Bb
for(iin1:
nx){
w<-(TstX[i,]-xb)%*%solve(S,Bb-Ab);
if(w<=0)
blong[i]<-1
else
blong[i]<-2
}
blong
}
discri2(A,B,X)
7.模型结果及分析
Mahalanobis距离判别法结果:
112121222211222122222212122222
Fisher判别:
112111211211121122122212122222
通过结果可以看出这两种方法得到的结果基本一致,但存在一定的误差,对结果有出入的个体应进行进一步的检查,以达到确诊的目的。
期望值表示样本的平均值,通过健康者与患者的期望差的绝对值除以健康人的期望分析各元素对总体结果的影响程度,值越大说明该元素对诊断结果起的决定作用越大。
首先确定Fe、Ca、Mg、K为主要元素,并再次通过程序对待测样本进行测试,再对Ca和K进行一次测试,之后单独对K进行测试,整理得到如下结果:
Mahalanobis:
没去discri1[1]112121222211222122222212122222
4种discri1(A,B,X)[1]112111221211222122222212222222
2种discri1(A,B,X)[1]112111111211122122222212222222
1种discri1(A,B,X)[1]111121112211211112112111121212
Fisher:
没去descri2[1]112111211211121122122212122222
4种discri2(A,B,X)[1]11111111121112
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 北京工业大学 数学 建模 竞赛 初赛 试题