Using the NCBI eUtilities via CGI.docx
- 文档编号:4957390
- 上传时间:2022-12-12
- 格式:DOCX
- 页数:22
- 大小:27.04KB
Using the NCBI eUtilities via CGI.docx
《Using the NCBI eUtilities via CGI.docx》由会员分享,可在线阅读,更多相关《Using the NCBI eUtilities via CGI.docx(22页珍藏版)》请在冰豆网上搜索。
UsingtheNCBIeUtilitiesviaCGI
UsingtheNCBIeUtilitiesviaCGI
DRAFT
TheEntrezquerysystematNCBIallowsuserstoquerythevarious(over30)primaryNCBIdatabasesthroughasingleinterface,theEntrezCore.TheCoreprovidessupportforboththeNCBIwebinterfaceandvariousprograminterfaces,especiallytheeUtilitiesthatareavailblethroughtheWeb'sCommonGatewayInterface(CGI).HerethefocusisonusingtheeUtilitiesfromprogramswritteninPerl,Java,etc.,butPerlwillbeusedforallexamples.
ThebasicoperationoftheCoreisfairlystraightforward.Forexample,recordsinaparticulardatabasecanbeeasilyretrieveviasearchesbasedonthecontentsofspecifiedfields.
However,searchingforrecordsinonedatabaseasafunctionofthecontentsofrecordsinaseconddatabaseisnotassimple.Suchasearchrequirestheuseof"link"informationthatconnectsrecordsinthesameordifferentprimarydataresources.(Thisisthesameinformationthatunderliesthefamiliar"Links"pull-downoptionsonNCBIWebpages.)
ThispresentationwilldescribebasicuseoftheeUtilitiesthroughpracticalexamples,coveringthefollowingtopics:
∙EntrezmanipulatessetsofUIDs
∙TheEntrez"queryresultdatabase"
∙NCBIprograminterfacestotheEntrezCore
∙UsingePostinaPerlprogram
∙UsingeSearchtocreatenewqueryresultdatabaseentries
∙Links,linksets,andusingeLinktoretrievelinksets
∙UsingeLinktoputUIDsintothequeryresultdatabase
∙UsingeLinktogetinformationaboutSNPsrelatedtoaspecificgene once
∙UsingLWPtopostlargeUIDlistsandretrieveresultsinbatches
∙RetrievingeLinkeddatainbatchesusing"indexlists"
∙Additionalinformation
EntrezmanipulatessetsofUIDs
EverydatabaseintheEntrezdomainassignsuniqueIDs(UIDs)tomajorrecord-typesineachdatabase.TheseIDsareintegervaluesuniquewithinthedatabase,butthesameintegermaybeusedtoidentifyrecordsinmultipledatabases.(Thus,toidentifyaparticularrecord,onemustspecifyboththedatabaseandtherecord'sUID.)
TheprimaryfunctionoftheEntrezprogrammaticinterfaceistohelpusersmanipulatesetsofUIDs,andfetchdatarecordsidentifiedbythoseUIDs. Entrez,itself,mustalsoformatdataforWeb-basedusers,buttheprogrammaticinterfaceleavesdatadisplaytotheclientprogram.
Theinterfaceallowsprogramsto:
∙defineasetofUIDs,
∙displaythecontentsofrecordsidentifiedbyasetofUIDs,
∙createanewUIDsetfromanexistingsetbychoosingmembersoftheexistingsetwhosedatarecordssatisfyspecifiedcriteria,and
∙createanewsetofUIDsrepresentingrecordsthatareinsomewayrelatedtomembersoftherecordsidentifiedbyanexistingsetofUIDs.
TheseprimitivecapabilitiescanbecombinedintopowerfulsequencesthatcanintegratedatafromamongmostoftheEntrezdataresources.Infact,theymakeitpossibleto(partially)mimicrelationaldatabaseoperationssuchasselectsandjoinsondatainseparatedataresources.
Note,however,thatrecordcontentmayberetrievedinalimitednumberofreportformats,whereareporttypecontainsa fixed subsetofelementstakenfromtherawdatarecord.Asaresult,additionalprocessingmayberequiredtoprunereportdataforsubsequentdisplayoruse,and/ormultiplerequestsmayberequiredtoretrievedatainmultiplereportformatstoobtainalldesireddatafields.
TheEntrez"queryresultdatabase"
TheEntrezCorecankeeparecordofeachqueryitprocesses,includingtheUIDsetresultingfromeachquery.Thedatabaseholdingtheserecordswillbereferredtoas"thequeryresultdatabase"withinthispresentation,althoughitisdescribedas"theHistory"or"theHistoryserver"insomeNCBIdocumentation.
Thequeryresultdatabasecanbeusedby(mostofthe)programsthatimplementtheEntrezsetmanipulationfunctionslistedabove,andissoimportantfor efficient useofEntrezthatthispresentationisalmostentirelyorientedaroundit. "Efficient"useofthequeryresultdatabaseallowsuserstodownloadlargenumbersofrecordswithoutviolatingtheaccessratelimitsthatNCBIimposesuponremotequeries.
EachUIDsetintheCoredatabaseisidentifiedby3piecesofinformation:
∙aqueryidentifier,knownasthe"querykey",
∙thenameofthedatabaseusedtogeneratetheassociatedUIDset,and
∙anidentifierforthestateofthedatabaseatthetimeofthequery,knownasthe"webenvironment".
Querykeysareintegers,butareoftendisplayedasapoundsign(#)followedbyaninteger.TheEntrezdatabasesnamesarestringslike"snp","nuc","nucest","gene",etc.Webenvironmentidentifiersarelong(around60character)strings.
Hereisaschematicqueryresultdatabaseentry:
Database
Query
Key
WebEnv
(edited)
UIDset
snp
2
A3zq156CDS_p1DdWz...AU6u3yb5D3B634BAF50
242,28853987
NCBIprograminterfacestotheEntrezCore
Thereexistseveral"technologies"foraccessingremotedataandcomputingresourcesprogrammatically.
Thetwomostpopularapproachesare:
∙theWebCommonGatewayInterface(CGI),and
∙RemoteProcedureCalls(RPC)overSOAP,sometimesknownasJAX-RPCor"WebServices".
NCBIsupportsbothoftheseinterfacestotheEntrezCore.Inaddition,NCBIprovidesaneducationalPerlmodule(NCBI_PowerScripting.pm)thatdefinesasetofobjectsthatcalltheCGIservicesbehindthescenes.
TheCGIandWebServicesroutinesareknownasthe"eUtilities"or"eUtils",andmaybecategorizedwithrespecttotheUIDmanipulationfunctionslistedaboveas:
Function
Genericname
CGIroutine
defineasetofUIDs
ePost(andsometimeseSearch)
epost.fcgi,esearch.fcgi
displaythecontentsofrecordsidentifiedbyUIDs
eSummaryandeFetch
esummary.fcgi,efetch.fcgi
createaUIDsetfromapreviouslydefinedset
eSearch
esearch.fcgi
createaUIDsetbyfindinglinksfromanexistingset
eLink
elink.fcgi
ThispresentationwilldealonlywiththeCGIfunctions,buttheWebServicesprovideidenticalfunctionalitywithintheJAX-RPCframework.(NotethattheWebServicesarenotcurrently,circa2007,availableviaPerl.)
HereisanURLthatusestheepost.fcgiscripttoinsert(or"post")2UIDs(242and2885398)intothequeryresultdatabase:
http:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?
db=snp&id=242,28853987
IfyouenterthisURLintoaWebbrowseryouwillgetaresponselike:
xmlversion="1.0"?
>
DOCTYPEePostResultPUBLIC"-//NLM//DTD
ePostResult,11May2002//EN"
"http:
//www.ncbi.nlm.nih.gov/entrez/query/DTD/ePost_020511.dtd">
01yWrS_p1DdWzAUPU6eOwxX2...s@1FBE5D3B634BAF50_0012SID
andthequeryresultdatabasewillthenincludeanewrecordcontainingthe2UIDsspecifiedbyusingthe"id"parameter:
Database
Query
Key
WebEnv
(edited)
UIDset
snp
1
01yWrS_p1DdWzAUPU6e...E5D3B634BAF50_0012SID
242,28853987
Ifyouthenspecifythe"db","query_key",and"WebEnv"parametersinaURLlike:
http:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?
db=snp&query_key=1&\
WebEnv=01yWrS_p1DdWzAUPU6eOwxX2...s@1FBE5D3B634BAF50_0012SID
wherethe"\"attheendofthelinesignifiesthatthelineactuallycontinuesontothenextline(butdoesNOTgettypedin),eSummary.fcgiwillreturnadocumentlikethis(withmanylinesremoved):
xmlversion="1.0"?
>
DOCTYPEeSummaryResultPUBLIC"-//NLM//DTDeSummaryResult,29October2004//EN"
"http:
//www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd">
20742047
800
Thefullresultisshownin first-query-xml.html.
NotethatsummaryrecordswereretrievedforbothoftheSNPUIDsplacedontheEntrezquerydatabasePRIORtothisrequestforasummary.esummary.fcgiusedthedatabasename,thequerykey,andthewebenvironmentparameterstofindtheUIDlist,andthenretrievedarecordfromthespecifieddatabaseforeachUIDonthelist.
ThefollowingURLshowshowtouseefetch.fcgitogetafullXMLrecordforthesetwoSNPUIDs:
http:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=snp&query_key=2&\
WebEnv=01yWrS_p1DdWzAUPU6eOwxX2...s@1FBE5D3B634BAF50_0012SID&\
report=sgml&mode=xml
Theresultmaybeexaminedin fetch-example-xml.html.Notethatthe"report"and"mode"optionswereusedtospecifythereportcontentsandformat.Selectionofvaluesfortheseoptionsseemsratherunusual.
UsingePostinaPerlprogram
Thehand-enteredqueriesshownabovecanallbesenttoEntrezviaprograms.APerlprogramtopost2UIDs(242and28853987)tothequeryresultdatabaseisshownbelow:
#!
/usr/bin/perl-w
useLWP:
:
Simple;
$url=
"http:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?
db=snp&id=242,28853987";
@result_array=get("$url");#notethatepostisreturninganarraylinesofXML.
print@result_array;
Notethatthequeryisidenticaltotheoneissuedinthefirstexampleabove,andtheresultswillbeidentical,exceptforchangesintheWeb
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Using the NCBI eUtilities via CGI
![提示](https://static.bdocx.com/images/bang_tan.gif)