UCI大数据库使用说明书.docx
- 文档编号:29258578
- 上传时间:2023-07-21
- 格式:DOCX
- 页数:7
- 大小:19.50KB
UCI大数据库使用说明书.docx
《UCI大数据库使用说明书.docx》由会员分享,可在线阅读,更多相关《UCI大数据库使用说明书.docx(7页珍藏版)》请在冰豆网上搜索。
UCI大数据库使用说明书
UCI数据库使用说明
机器学习领域的UCI数据集使用说明
此目录包含数据集和相关领域知识(后面以简短的列表形式进行的注释),这些数据已经或能用于评价学习算法。
每个数据文件(*.data)包含以“属性-值”对形式描述的很多个体样本的记录。
对应的*.info文件包含的大量的文档资料。
(有些文件_generate_databases;他们不包含*.data文件。
)作为数据集和领域知识的补充,在utilities目录里包含了一些在使用这一数据集时的有用资料。
地址http:
//www.ics.uci.edu/~mlearn/MLRepository.html,这里的UCI数据集可以看作是通过web的远程拷贝。
作为选择,这些数据同样可以通过ftp获得,ftp:
//ftp.ics.uci.edu.可是使用匿名登陆ftp。
可以在pub/machine-learning-databases目录中找到。
注意:
UCI一直都在寻找可加入的新数据,这些数据将被写入incoming子目录中。
希望您能贡献您的数据,并提供相应的文档。
谢谢——贡献过程可以参考DOC-REQUIREMENTS文件。
目前,多数数据使用下面的格式:
一个实例一行,没有空格,属性值之间使用逗号“,”隔开,并且缺少的值使用问号“?
”表示。
并请在做出您的贡献后提醒一下站点管理员:
ml-repository@ics.uci.edu
下面以UCI中IRIS为例介绍一下数据集:
ucidata\iris中有三个文件:
Index
iris.data
iris.names
index为文件夹目录,列出了本文件夹里的所有文件,如iris中index的内容如下:
Indexofiris
18Mar1996 105Index
08Mar1993 4551iris.data
30May1989 2604iris.names
iris.data为iris数据文件,内容如下:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
……
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
……
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
……
如上,属性直接以逗号隔开,中间没有空格(5.1,3.5,1.4,0.2,),最后一列为本行属性对应的值,即决策属性Iris-setosa
。
iris.names介绍了irir数据的一些相关信息,如数据标题、数据来源、以前使用情况、最近信息、实例数目、实例的属性等,如下所示部分:
……
7.AttributeInformation:
1.sepallengthincm
2.sepalwidthincm
3.petallengthincm
4.petalwidthincm
5.class:
--IrisSetosa
--IrisVersicolour
--IrisVirginica
……
9.ClassDistribution:
33.3%foreachof3classes.
本数据的使用实例请参考其他论文,或本站后面的内容。
ThisistheUCIRepositoryOfMachineLearningDatabasesandDomainTheories
============================================================================
ThisistheUCIRepositoryOfMachineLearningDatabasesandDomainTheories
4December1995
ftp.ics.uci.edu:
pub/machine-learning-databases
http:
//www.ics.uci.edu/~mlearn/MLRepository.html
Librarian:
PatrickM.Murphy(ml-repository@ics.uci.edu)
111databasesanddomaintheories(36MB)
============================================================================
Thisdirectorycontainsdatasetsanddomaintheories(thelatterhavebeen
annotatedassuchinthefollowingbrieflisting)thathavebeenorcanbe
usedtoevaluatelearningalgorithms.Eachdatafile(*.data)contains
individualrecordsdescribedintermsofattribute-valuepairs. The
corresponding*.infofilecontainsvoluminousdocumentation. (Somefiles
_generate_databases;theydonothave*.datafiles.)
Inadditiontodatasetsanddomaintheories,the"utilities/"directory
containsutilitiesthatyoumayfindusefulwhenusingdatasetsinthis
repository.
Thecontentsofthisrepositorycanbeviewedandremotelycopiedover
theweb. Theaddressishttp:
//www.ics.uci.edu/~mlearn/MLRepository.html.
Alternatively,thecontentsofthisrepositorycanberemotelycopiedvia
ftptoftp.ics.uci.edu. Enter"anonymous"foruserid,ande-mailaddress
([email=user@host]user@host[/email])forpassword. Thesedatabasescanbefoundbyexecuting
"cdpub/machine-learning-databases".
Notes:
1.We'realwayslookingforadditionaldatabases,whichcanbe
writtentothesub-directorynamed"/incoming".Pleasesendyours,with
documentation. Thanks--SeeDOC-REQUIREMENTSforsuggesteddocumentation
procedures.Presently,mostdatabaseshavethefollowingformat:
1
instanceperline,nospaces,commasseparateattributevalues,and
missingvaluesaredenotedby"?
". Also,pleasenotifythesitelibrarian
(ml-repository@ics.uci.edu)aftermakingadonation.
2.IvanBratkorequestedthatthedatabaseshedonatedfromtheLjubljana
OncologyInstitute(e.g.,breast-cancer,lymphography,andprimary-tumor)
haverestrictedaccess.Weareallowedtosharethemwithacademic
institutionsuponrequest.Thesedatabases(likeseveralothers)require
providingpropercitationsbemadeinpublishedarticlesthatusethem.
Citationrequirementsareineachdatabase'scorresponding*.docfile.
Toaccessanyofthesedatabases,sendemailtoml-repository@ics.uci.edu.
Toaidyouindecidingifyouwantanyofthesedatabases,the
documentationfilesareavailable.
3.Anarchiveservermaynowbeusedtorecieveviae-mailfilesinthis
repository. Installedonics,itprovidesemailaccesstofilesin
ouranonymousftp/uucparea(~ftp). Ifpeoplehavenootheraccessto
ourarchives,thentheycansendmailto:
archive-server@ics.uci.edu
Commandstotheservermaybegiveninthebody. Somecommandsare:
help
send
find
Thehelpcommandreplieswithausefulhelpmessage.
Ifyoupublishmaterialbasedondatabasesobtainedfromthisrepository,
then,inyouracknowledgements,pleasenotetheassistanceyoureceivedby
usingthisrepository. Thanks--thiswillhelpotherstoobtainthesame
datasetsandreplicateyourexperiments. Wesuggestthefollowingpseudo-APA
referenceformatforreferringtothisrepository(LaTeX'd):
Murphy,~P.~M.,\&Aha,~D.~W.(1994).{\itUCIRepositoryofmachine
learningdatabases}[http:
//www.ics.uci.edu/~mlearn/MLRepository.html].
Irvine,CA:
UniversityofCalifornia,DepartmentofInformationandComputer
Science.
PatrickM.Murphy(RepositoryLibrarian)
----------------------------------------------------------------------
BriefOverviewofDatabasesandDomainTheories:
QuickListing:
1.annealing(DavidSterlingandWrayBuntine)
2.ArtificialCharactersDatabase&DT(donatedbyAttilioGiordana)
3-4.audiology(RayBareissandBrucePorter,usedinProtos)
1.OriginalVersion
2.Standardized-AttributeVersionoftheOriginal.
5.auto-mpg(fromCMUStatLiblibrary)
6.autos(JeffSchlimmer)
7.badges(HaymHirsh)
8.balance-scale(TimHume)
9.balloons(MichaelPazzani)
10.breast-cancer(LjubljanaInstituteofOntcology,restrictedaccess)
11.breast-cancer-wisconsin(WisconsinBreastCancerD'base,OlviMangasarian)
1.Originalversion
2.Diagnosticdataset
3.Prognosticdataset
12.bridges(YoramReich)
13-21.chess
1.PartialgeneratorofQuinlan'schess-end-gamedata(kr-vs-kn)(Schlimmer)
2.Shapiros'endgamedatabase(kr-vs-kp)(RobHolte)
3.king-rook-vs-king(MichaelBain,ArthurvanHoff)
4-9.Sixdomaintheories(NickFlann)
22.BachChorales(time-series)database(DarrellConklin)
23.Connect-4Database(JohnTromp)
24-25.CreditScreeningDatabase
1.JapaneseCreditScreeningDataanddomaintheory(ChiharuSano)
2.CreditCardApplicationApprovalDatabase(RossQuinlan)
26.Ein-DorandFeldmesser'scpu-performancedatabase(DavidAha)
27.DiabetesData(SerdarUckun,AI-M94)
28.dgp-2datagenerationprogram(PowellBenedict)
29.DocumentUnderstanding(DonatoMalerba)
30.NinesmallEBLdomaintheoriesandexamplesinsub-directoryebl
31.EvlinKinney'sechocardiogramdatabase(StevenSalzberg)
32.flags(RichardForsyth)
33.function-finding(CullenSchafer's352casestudies)
34.glass(VinaSpiehler)
35.hayes-roth(fromHayes-Roth^2'spaper)
36-39.heart-disease(RobertDetrano)
40.hepatitis(G.Gong)
41.horsecolicdatabase(MaryMcLeish&MattCecile)
42.(Boston)Housingdatabase(fromCMUStatLiblibrary)
43.ICUdata(SerdarUckun,AIM-94)
44.Imagesegmentationdatabase(CarlaBrodley)
45.ionosphereinformation(VinceSigillito)
46.iris(R.A.Fisher,1936)
47.isolet(RonColeandMarkFanty'sdatabasedonatedbyTomDietterich)
48.kinship(J.RossQuinlan)
49.labor-negotiations(StanMatwin)
50-51.led-display-creator(fromtheCARTbook)
52.lenses(Cendrowska'sdatabasedonatedbyBenoitJulien)
53.letter-recognitiondatabase(createdanddonatedbyDavidSlate)
54.liver-disorders(BUPAMedical'sdatabasedonatedbyRichardForsyth)
55.logic-theorist(PaulO'Rorke)
56.lungcancer(StefanAeberhard)
57.lymphography(LjubjanaInstituteofOncology,restrictedaccess)
58-59.mechanical-analysis(FrancescoBergadano)
1.OriginalMechanicalAnalysisDataSet
2.PUMPSDATASET
60mobilerobots(donatedbyKlingspor,MorikandRieger)
61-64.molecular-biology
1.promotersequences(Towell,Shavlik,&Noordewier,domaintheoryalso)
2.splice-junctionsequences(Towell,Noordewier,&Shavlik,
domaintheoryalso)
3.proteinsecondarystructuredatabase(QianandSejnowski)
4.proteinsecondarystructuredomaintheory(JudeShavlik&RichMaclin)
65.MONK'sProblems(donatedbySebastianThrun)
66.MoralReasonerDatabase(donatedbyJamesWogulis)
67.mushroom(JeffSchlimmer)
68.MUSKdatabases
(2)(donatedbyTomDietterich)
69.othellodomaintheory(TomFawcett)
70.PageBlocksClassification(DonatoMalerba)
71.PimaIndiansdiabetesdiagnoses(VinceSigillito)
72.PostoperativePatientdata(JerzyW.Grzymala-Busse)
73.PrimaryTumor(LjubjanaInstituteofOncology,restrictedaccess)
74.QualitativeStructureActivityRelationships(QSARs)(RossKing)
75.QuadrapedAnimals(JohnH.Gennari)
76.Servodata(RossQuinlan)
77.shuttle-landing-control(BojanCestnik)
78.solarflare(GaryBradshaw)
79-80.soybean(fromRyszardMichalski'sgroups)
81.spaceshuttledatabases(DavidDraper)
82.spectrometer(Infra-RedAstronomySatelliteProjectDatabase,JohnStutz)
83.SpongeDatabase(IosuneUrizandMartaDomingo)
84.StatlogProjectdatabases(7)(fromRossKing,...)
85 StudentLoanrelationaldatabase(fromMichaelPazzani)
86.tic-tac-toeendgamedatabase(TuringInstitute,DavidW.Aha)
87-97.thyroid-disease(GaravanInstitute,J.RossQuinlan;StefanAeberhard)
98.trainsdatabase(DavidAha&Eri
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- UCI 数据库 使用 说明书