计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究.docx
- 文档编号:15305
- 上传时间:2022-09-30
- 格式:DOCX
- 页数:14
- 大小:25.82KB
计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究.docx
《计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究.docx》由会员分享,可在线阅读,更多相关《计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究.docx(14页珍藏版)》请在冰豆网上搜索。
计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究
AStudyonArchitectureofMassiveInformationManagementforDigitalLibrary
(DepartmentofComputerScienceandTechnology,TsinghuaUniversity,Beijing100084,China)
XINGChun-Xiao+,ZENGChun,LIChao,ZHOULi-Zhu
Abstract
Thispaperinvestigatesthechallengingissuesandtechnologiesinmanagingverylargedigitalcontentsandcollections,andgivesanoverviewoftheworksandenablingtechnologiesintherelatedareas.Basedontheanalysisandcomparisonoftherelatedwork,anovelarchitectureofmassiveinformationmanagementfordigitallibraryisdesigned.Thekeycomponentsandcoreservicesaredescribedindetail.Finally,acasestudyTHADL(TsinghuaUniversityarchitecturedigitallibrary)thatcomplieswiththearchitecturalframeworkispresented.
Keywords:
digitallibrary;architecture;massiveinformationmanagement;interoperability;metadata
1Introducn
Intherecordedhitiostoryofhumanbeing,theprintedmaterialsusedtoplayadominantroleinthepreservationandpervasionofhumaninformationandknowledge.However,withtherapiddevelopmentoftechnologiesincomputer,communication,multimediaandstorage,thisroleisgivingawaytothedigitalresourcesinthenewera.Theexplosivegrowthofinformationindigitalformshasposedchallengesnotonlytotraditionalarchivesandtheirinformationproviders,butalsotoorganizationsinthegovernment,commercialandnon-profitsectors.AccordingtothelatestreportbyLymanandVarian,theworld’stotalyearlyproductionofprint,film,optical,andmagneticcontentwouldrequireroughly1.5billiongigabytesofstoragewhichisroughly250megabytesforeverypersonontheearth.Printeddocumentsofallkindscompriseonly0.003%ofthetotal.Magneticstorageisbyfarthelargestmediumforstoringinformationandisthemostrapidlygrowingsection,withashippedharddrivecapacitydoublingeveryyear.Thetypesofdigitalresourcesarediverse.Theyincludedigitaltexts,documents,scientificdata,images,animation,video,audioetc.Theapplicationsofthedigitalresourcesarequitebroad,includingDL(digitallibrary),movie/videocenter,otherpublicmedia(television,broadcast,newspaper,etc.),museum,andnationalorcooperativeinformationcenter.Atthesametimetheinformationhighway,whichisrepresentedbyInternet,hasbeenanimportanttoolofthepervasionofdigitalresources.Thegovernments,companies,groups,researchinstitutes,non-governmentorganizations,educationinstitutesallovertheworldputmassiveinformationontheWeb.
Technologychallengesandkeyissues
Thesemassivedigitalresourcespresentmanychallengingissuesindatamanagementtechnologyarea.Thefollowingaresomeexamples.
(1)Datamodel.
Traditionaldatamodeltheoriesareonlyapplicabletostructureddata,butnotforthemassivedigitalresourcesofvarioustypesandtheyaremostlysemi-structuredorunstructured.Thus,newdatamodelsaredemanded.
(2)Systemarchitecture.
Traditionaldatabasemanagementsystemsaredesignedforbusinessdataprocessingfeaturedbyconcurrent,short,andupdatetransactions.Thereforetransactionmanagementandconcurrentcontrolremainsasthecenterofsystemarchitecture.Thearchitectureisnotsuitableforthemanagementofdigitalresourcesasclassicaltransactionconceptisbecominglessimportantintheseresources.Weneedtopursuenovelanduniversalframeworksformassivedigitalresourcesmanagement.
(3)Massiveinformationstorage.
Thevolumeofdigitaldataresourcesiscountedbyterabytesorpetabytes.TraditionalstoragedevicesusingSCSIcannotworkforefficientstorage,onlinemigrationandpersistentarchiveofsuchmassivedigitalresources.Sotheresearchofmulti-levelstoragesystems,SAN(StorageAreaNetworks)andothertechnologyareinevitable.
(4)Queryprocessing.
Intraditionaldatabasesystems,queriesareexpressedinquerylanguagesuchasSQL,butinthequeryandsearchofmassivedigitalresources,manynewmechanismsshouldbeused,suchaskeywordsearch,full-textsearch,similarityquery,andcontent-basedmultimediaretrieval.Howtointegratethequerymethods(includingSQL,OQL,anddifferentXMLquerylanguages,e.g.,XQL,XML-QL,XML-GL)efficientlytobuildanefficientandflexiblequeryprocessingmethodhasnotbeensatisfactorilysolvedyet.
Tosolvetheproblemsmentionedabovewillremainasamajorgoaltoresearchersinthenextfewyears.Tofulfillthisend,wepresentanovelarchitectureformassiveinformationmanagementofdigitalresourcesinthispaper.Thisarchitectureisintendedtomeettherequirementsofmanagingdigitalresourcescharacterizedbydistributed,dynamic,massiveandheterogeneousproperties.
2OverviewoftheRelatedWork
TheIEEESTD610.12[2]definesarchitectureasthestructureofcomponents,theirrelat
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 计算机专业 文献 翻译 面向 数字图书馆 海量 信息管理 体系结构 研究