Reaching over the gap外文文献.docx
- 文档编号:24229169
- 上传时间:2023-05-25
- 格式:DOCX
- 页数:13
- 大小:24.85KB
Reaching over the gap外文文献.docx
《Reaching over the gap外文文献.docx》由会员分享,可在线阅读,更多相关《Reaching over the gap外文文献.docx(13页珍藏版)》请在冰豆网上搜索。
Reachingoverthegap外文文献
Reachingoverthegap:
Areviewofeffortstolinkhumanandautomaticspeechrecognitionresearch
1.Introduction
ldalgorithmsthatareabletoBoththeresearchfieldsofhumanspeechrecognition(HSR)andautomaticspeechrecognition(ASR)investigate(partsof)thespeechrecognitionprocess.Thetworesearchareasarecloselyrelatedsincetheybothstudythespeechrecognitionprocessandthecentralissueofbothiswordrecognition.However,theirresearchobjectives,theirresearchapproaches,andthewayHSRandASRsystemsdealwithdifferentaspectsofthewordrecognitionprocessdifferconsiderably(seeSection2).Inshort,inHSRresearch,thegoalistounderstandhowwe,aslisteners,recognisespokenutterances.ThisisoftendonebybuildingcomputationalmodelsofHSR,whichcanbeusedforthesimulationandexplanationofbehaviouraldatarelatedtothehumanspeechrecognitionprocess.TheaimofASRresearchistobuildalgorithmsthatareabletorecognisethewordsinaspeechutteranceautomatically,underavarietyofconditions,withtheleastpossiblenumberofrecognitionerrors.MuchresearcheffortinASRhasthereforebeenputintotheimprovementof,amongstothers,signalrepresentations,searchmethods,andtherobustnessofASRsystemsinadverseconditions.
Onemightexpectthatinthepastthiscommongoalwouldhaveresultedinclosecollaborationsbetweenthetwodisciplines,butinreality,theoppositeistrue.ThislackofcommunicationismostlikelytobeattributedtoanotherdifferencebetweenASRandHSR.AlthoughbothASR
andHSRclaimtoinvestigatethewholerecognitionprocessfromtheacousticsignaltotherecognisedunits,anautomaticspeechrecognisernecessarilyisanend-toendsystem–itmustbeabletorecognisewordsfromtheacousticsignal–whilemostmodelsofHSRonlycoverpartsofthehumanspeechrecognitionprocess(Nearey,2001;MooreandCutler,2001).Furthermore,inASR,thealgorithmsandthewaytotraintheASRsystemsarecompletelyunderstoodfromamathematicalpointofview,
butinpracticeithassofarprovedimpossibletogetthedetailssufficientlyrighttoachievearecognitionperformancethatisevenclosetohumanperformance.Humanlisteners,ontheotherhand,achievesuperiorperformance,butmanyofthedetailsoftheinternalprocessesare
unknown.
Despitethisgapthatseparatesthetworesearchfields,thereisagrowinginterestinpossiblecross-fertilisation(Furui,2001;Hermansky,2001;Huckvale,1998;KirchhoffandSchimmel,2005;Moore,1995;MooreandCutler,2001;Pols,1999;Scharenborg,2005a,b;Scharenborgetal.,2005;tenBosch,2001).(Forahistoricalbackgroundontheemergenceofthisgap,seeHuckvale(1998).)Thisis,forinstance,clearlyillustratedbytheorganisationoftheworkshoponSpeechrecognitionaspatternclassification(July11–13,2001,Nijmegen,TheNetherlands),theorganisationofthespecialsessionBridgingthegapbetweenhumanandautomaticspeechprocessingatInterspeech2005(September6,2005,Lisbon,Portugal)andofcoursewiththecomingaboutofthis‘SpeechCommunication’specialissueonBridgingthegapbetweenhumanandautomaticspeechprocessing.
ItisgenerallyacknowledgedwithintheASRcommunitythattheimprovementinASRperformanceobservedinthelastfewyearscantoalargeextentbeattributedtoanincreaseincomputingpowerandtheavailabilityofmorespeechmaterialtotraintheASRsystems(e.g.,Bourlardetal.,1996;MooreandCutler,2001).However,theincremental
performanceisasymptotingtoalevelthatfallsshortofhumanperformance.ItistobeexpectedthatfurtherincreasingtheamountoftrainingdataforASRsystemswillnotresultinrecognitionperformancesthatareevenapproachingthelevelofhumanperformance(Moore,2001,2003;MooreandCutler,2001).Whatseemstobeneededisachangeinapproach–evenifthismeansaninitialworseningoftherecognitionperformance(Bourlardetal.,1996).AspointedoutbyMooreandCutler(2001):
‘‘trueASRprogressisnotonlydependentontheanalysisofevermoredata,butonthedevelopmentofmorestructuredmodelswhichbetterexploittheinformationavailableinexistingdata’’.ASRengineershopetogetcluesaboutthose‘‘structuredmodels’’fromtheresultsofresearchin
HSR.Thus,fromthepointofviewofASR,thereishopeofimprovingASRperformancebyincorporatingessentialknowledgeaboutHSRintocurrentASRsystems(Carpenter,1999;DusanandRabiner,2005;Furui,2001;Hermansky,1998;MaierandMoore,2005;Moore,2003;MooreandCutler,2001;Pols,1999;Scharenborgetal.,2007;Strik,2003,2006;Wright,2006).
WithrespecttothefieldofHSR,specificstrandsinHSRresearchhopetodeployASRapproachestointegratepartialmodulesintoaconvincingend-to-endmodel(Nearey,2001).Aspointedoutabove,computationalmodelsofHSRonlymodelpartsofthehumanspeechrecognitionprocess;anintegralmodelcoveringallstagesofthehuman
speechrecognitionprocessdoesnotyetexist.Themostconspicuouspartoftherecognitionprocessthat,untilrecently,virtuallyallmodelsofhumanspeechrecognitiontookforgrantedisamodulethatconvertstheacousticsignalintosomekindofsymbolicsegmentalrepresentation.
So,unlikeASRsystems,mostexistingHSRmodelscannotrecogniserealspeech,becausetheydonottaketheacousticsignalastheirstartingpoint.In2001,NeareypointedoutthattheonlyworkingmodelsoflexicalaccessthattakeanacousticsignalasinputwereASRsystems.Mainstream
ASRsystemshoweverareusuallyimplementationsofaspecificcomputationalparadigmandtheirrepresentationsandprocessesneednotbepsychologicallyplausible.IntheircomputationalanalysisoftheHSRandASRspeechrecognitionprocess,Scharenborgetal.(2005)showedthat
someASRalgorithmsservethesamefunctionsasanalogousHSRmechanisms.Thusdespitefirstappearances,thismakesitpossibletousecertainASRalgorithmsandtechniquesinordertobuildandtestmorecompletecomputationalmodelsofHSR(RoyandPentland,2002;Scharenborg,2005b;Scharenborgetal.,2003,2005;Wadeetal.,2002;Yuetal.,2005).
Furthermore,withinthefieldofASRtherearemany(automaticallyorhand-annotated)speechandlanguagecorpora.ThesecorporacaneasilyandquicklybeanalysedusingASRsystems,sinceASRsystemsandtoolsareabletoprocesslargeamountsofdataina(relatively)short
time.ThismakesASRtechniquesvaluabletoolsfortheanalysisandselectionofspeechsamplesforbehaviouralexperimentsandthemodellingofhumanspeechrecognition(deBoerandKuhl,2003;KirchhoffandSchimmel,2005;Pols,1999).
ResearchersfrombothASRandHSRarethusrealisingthepotentialbenefitoflookingattheresearchfieldontheothersideofthegap.Thispaperintendstogiveacomprehensiveoverviewofpastandpresenteffortstolinkhumanandautomaticspeechrecognitionresearch.ThefocusofthepaperisonthemutualbenefitstobederivedfromestablishingclosercollaborationsandknowledgeinterchangebetweenASRandHSR.FirstabriefoverviewofthegoalsandresearchapproachesinHSRandASRisgiveninSection2.InSection3,wediscusstheperformance
differencebetweenmachinesandhumanlistenersforvariousrecognitiontasksandwhatcanbelearnedfromcomparingthoserecognitionperformances.Section4discussesapproachesforimprovingthecomputationalmodellingofHSRbyusingASRtechniques.Section5presents
anoverviewoftheresearcheffortsaimedatusingknowledgefromHSRtoimproveASRrecognitionperformance.Thepaperendswithadiscussionandconcludingremarks.
2.Humanandautomaticspeechrecognition
Inthissection,theresearchfieldsandapproachesofhumanspeechrecognition(Section2.1)andautomaticspeechrecognition(Section2.2)willbediscussedbriefly.Amoredetailedexplanationofthetworesearchfieldswouldbebeyondthescopeofthisarticle;thereaderisreferredtotextbookaccountssuchasHarley(2001)foranin-depthcoverageoftheresearchfieldofhumanspeechrecognition,andtotextbookaccountssuchasRabinerandJuang(1993)orHolmesandHolmes(2002),foran
explanationoftheprinciplesofautomaticspeechrecognition.Comprehensivecomparisonsoftheresearchgoalsandapproachesofthetwofields(Huckvale,1998;MooreandCutler,2001;Scharenborgetal.,2005),aswellascomparisonsofthecomputationalfunctioningandarchitectures
(Scharenborgetal.,2005)ofthespeechrecognitionprocessinautomaticrecognitionsystemsandhumanlistenerscanalsobefoundintheliterature.Furthermore,DusanandRabiner(2005)provideanextensivecomparisonbetweenhumanandautomaticspeechrecognitionalongsixkeydimensions(includingthearchitectureofthespeechrecognitionsystem)ofASR.
2.1.Humanspeechrecognition
Toinvestigatethepropertiesunderlyingthehumanspeechrecognitionprocess,HSRexperimentswithhumansubjectsareusuallycarriedoutinalaboratoryenvironment.Subjectsareaskedtocarryoutvarioustasks,suchas:
•Auditorylexicaldecision:
Spokenwordsandnon-wordsarepresentedinrandomordertoalistener,whoisaskedtoidentifythepresenteditemsasawordoranon-word.
•Phoneticcategorisation:
Identificationofunambiguousandambiguousspeechsoundsonacontinuumbetweentwophonemes.
•Sequencemonitoring:
Detectionofatargetsequence(largerthanaphoneme,smallerthanaword),whichmaybeembeddedinasentenceorlistofwords/nonwords,orinasinglewordornon-word.
•Gating:
Awordispresentedinsegmentsofincreasingdurationandsubjectsareaskedtoidentifythewordbeingpresentedandtogiveaconfidenceratingaftereachsegment.
Intheseexperiments,va
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Reaching over the gap外文文献 gap 外文 文献