英文文献.docx
- 文档编号:30668329
- 上传时间:2023-08-19
- 格式:DOCX
- 页数:16
- 大小:559.22KB
英文文献.docx
《英文文献.docx》由会员分享,可在线阅读,更多相关《英文文献.docx(16页珍藏版)》请在冰豆网上搜索。
英文文献
Improvedspeechrecognitionmethod
forintelligentrobot
1Overviewofspeechrecognition
Speechrecognitionhasreceivedmoreandmoreattentionrecentlyduetotheimportanttheoreticalmeaningandpracticalvalue[5].Uptonow,mostspeechrecognitionisbasedonconventionallinearsystemtheory,suchasHiddenMarkovModel(HMM)andDynamicTimeWarping(DTW).Withthedeepstudyofspeechrecognition,itisfoundthatspeechsignalisacomplexnonlinearprocess.Ifthestudyofspeechrecognitionwantstobreakthrough,nonlinear
-systemtheorymethodmustbeintroducedtoit.Recently,withthedevelopmentofnonlinea-systemtheoriessuchasartificialneuralnetworks(ANN),chaosandfractal,itispossibletoapplythesetheoriestospeechrecognition.Therefore,thestudyofthispaperisbasedonANNandchaosandfractaltheoriesareintroducedtoprocessspeechrecognition.
Speechrecognitionisdividedintotwowaysthatarespeakerdependentandspeakerindependent.Speakerdependentreferstothepronunciationmodeltrainedbyasingleperson,theidentificationrateofthetrainingperson?
sordersishigh,whileothers’ordersisinlowidentificationrateorcan’tberecognized.Speakerindependentreferstothepronunciationmodeltrainedbypersonsofdifferentage,sexandregion,itcanidentifyagroupofpersons’orders.Generally,speakerindependentsystemismorewidelyused,sincetheuserisnotrequiredtoconductthetraining.Soextractionofspeakerindependentfeaturesfromthespeechsignalisthefundamentalproblemofspeakerrecognitionsystem.
Speechrecognitioncanbeviewedasapatternrecognitiontask,whichincludestrainingandrecognition.Generally,speechsignalcanbeviewedasatimesequenceandcharacterizedbythepowerfulhiddenMarkovmodel(HMM).Throughthefeatureextraction,thespeechsignalistransferredintofeaturevectorsandactasobservations.Inthetrainingprocedure,theseobservationswillfeedtoestimatethemodelparametersofHMM.Theseparametersincludeprobabilitydensityfunctionfortheobservationsandtheircorrespondingstates,transitionprobabilitybetweenthestates,etc.Aftertheparameterestimation,thetrainedmodelscanbeusedforrecognitiontask.Theinputobservationswillberecognizedastheresultedwordsandtheaccuracycanbeevaluated.ThewholeprocessisillustratedinFig.1.
Fig.1 Blockdiagramofspeechrecognitionsystem
2Theoryandmethod
Extractionofspeakerindependentfeaturesfromthespeechsignalisthefundamentalproblemofspeakerrecognitionsystem.ThestandardmethodologyforsolvingthisproblemusesLinearPredictiveCepstralCoefficients(LPCC)andMel-FrequencyCepstralCo-efficient(MFCC).Boththesemethodsarelinearproceduresbasedontheassumptionthatspeakerfeatureshavepropertiescausedbythevocaltractresonances.Thesefeaturesformthebasicspectralstructureofthespeechsignal.However,thenon-linearinformationinspeechsignalsisnoteasilyextractedbythepresentfeatureextractionmethodologies.Soweusefractaldimensiontomeasurenon2linearspeechturbulence.
ThispaperinvestigatesandimplementsspeakeridentificationsystemusingbothtraditionalLPCCandnon-linearmultiscaledfractaldimensionfeatureextraction.
2.1 LinearPredictiveCepstralCoefficients
Linearpredictioncoefficient(LPC)isaparametersetwhichisobtainedwhenwedolinearpredictionanalysisofspeech.Itisaboutsomecorrelationcharacteristicsbetweenadjacentspeechsamples.Linearpredictionanalysisisbasedonthefollowingbasicconcepts.Thatis,aspeechsamplecanbeestimatedapproximatelybythelinearcombinationofsomepastspeechsamples.Accordingtotheminimalsquaresumprincipleofdifferencebetweenrealspeechsampleincertainanalysisframeshort-timeandpredictivesample,theonlygroupofpredictioncoefficientscanbedetermined.
LPCcoefficientcanbeusedtoestimatespeechsignalcepstrum.Thisisaspecialprocessingmethodinanalysisofspeechsignalshort-timecepstrum.Systemfunctionofchannelmodelisobtainedbylinearpredictionanalysisasfollow.
Whereprepresentslinearpredictionorder,ak,(k=1,2,…,p)representspredictioncoefficient,Impulseresponseisrepresentedbyh(n).Supposecepstrumofh(n)isrepresentedby
then
(1)canbeexpandedas
(2).
Thecepstrumcoefficientcalculatedinthewayof(5)iscalledLPCC,nrepresentsLPCCorder.
WhenweextractLPCCparameterbefore,weshouldcarryonspeechsignalpre-emphasis,framingprocessing,windowingprocessingandendpointsdetectionetc.,sotheendpointdetectionofChinesecommandword“Forward”isshowninFig.2,next,thespeechwaveformofChinesecommandword“Forward”andLPCCparameterwaveformafterEndpointdetectionisshowninFig.3.
2.2SpeechFractalDimensionComputation
Fractaldimensionisaquantitativevaluefromthescalerelationonthemeaningoffractal,andalsoameasuringonself-similarityofitsstructure.Thefractalmeasuringisfractaldimension[6-7].Fromtheviewpointofmeasuring,fractaldimensionisextendedfromintegertofraction,breakingthelimitofthegeneraltopologysetdimensionbeingintegerFractaldimension,fractionmostly,isdimensionextensioninEuclideangeometry.
Therearemanydefinitionsonfractaldimension,eg.,similardimension,Hausdoffdimension,inforationdimension,correlationdimension,capabilityimension,box-countingdimensionetc,where,Hausdoffdimensionisoldestandalsomostimportant,foranysets,itisdefinedas[3].
Where,M£(F)denoteshowmanyunit£neededtocoversubsetF.Inthispaper,theBox-Countingdimension(DB)of,F,isobtainedbypartitioningtheplanewithsquaresgridsofside£,andthenumberofsquaresthatintersecttheplane(N(£))andisdefinedas[8].
ThespeechwaveformofChinesecommandword“Forward”andfractaldimensionwaveformafterEndpointdetectionisshowninFig.4.
2.3 Improvedfeatureextractionsmethod
ConsideringtherespectiveadvantagesonexpressingspeechsignalofLPCCandfractaldimension,wemixbothtobethefeaturesignal,thatis,fractaldimensiondenotestheself2similarity,periodicityandrandomnessofspeechtimewaveshape,meanwhileLPCCfeatureisgoodforspeechqualityandhighonidentificationrate.
DuetoANN′snonlinearity,self-adaptability,robustandself-learningsuchobviousadvantages,itsgoodclassificationandinput2outputreflectionabilityaresuitabletoresolvespeechrecognitionproblem.
DuetothenumberofANNinputnodesbeingfixed,thereforetimeregularizationiscarriedouttothefeatureparameterbeforeinputtedtotheneuralnetwork[9].Inourexperiments,LPCCandfractaldimensionofeachsampleareneedtogetthroughthenetworkoftimeregularizationseparately,LPCCis4-framedata(LPCC1,LPCC2,LPCC3,LPCC4,eachframeparameteris14-D),fractaldimensionisregularizedtobe12-framedata(FD1,FD2,…,FD12,eachframeparameteris1-D),sothatthefeaturevectorofeachsamplehas4*14+1*12=68-D,theorderis,thefirst56dimensionsareLPCC,therest12dimensionsarefractaldimensions.Thus,suchmixedfeatureparametercanshowspeechlinearandnonlinearcharacteristicsaswell.
ArchitecturesandFeaturesofASR
ASRisacuttingedgetechnologythatallowsacomputerorevenahand-heldPDA(Myers,2000)toidentifywordsthatarereadaloudorspokenintoanysound-recordingdevice.TheultimatepurposeofASRtechnologyistoallow100%accuracywithallwordsthatareintelligiblyspokenbyanypersonregardlessofvocabularysize,backgroundnoise,orspeakervariables(CSLU,2002).However,mostASRengineersadmitthatthecurrentaccuracylevelforalargevocabularyunitofspeech(e.g.,thesentence)remainslessthan90%.Dragon'sNaturallySpeakingorIBM'sViaVoice,forexample,showabaselinerecognitionaccuracyofonly60%to80%,dependinguponaccent,backgroundnoise,typeofutterance,etc.(Ehsani&Knodt,1998).MoreexpensivesystemsthatarereportedtooutperformthesetwoareSubarashii(Bernstein,etal.,1999),EduSpeak(Franco,etal.,2001),Phonepass(Hinks,2001),ISLEProject(Menzel,etal.,2001)andRAD(CSLU,2003).ASRaccuracyisexpectedtoimprove.
AmongseveraltypesofspeechrecognizersusedinASRproducts,bothimplementedandproposed,theHiddenMarkovModel(HMM)isoneofthemostdominantalgorithmsandhasproventobeaneffectivemethodofdealingwithlargeunitsofspeech(Ehsani&Knodt,1998).DetaileddescriptionsofhowtheHHMmodelworksgobeyondthescopeofthispaperandcanbefoundinanytextconcernedwithlanguageprocessing;amongthebestareJurafsky&Martin(2000)andHosom,Cole,andFanty(2003).Putsimply,HMMcomputestheprobablematchbetweentheinputitreceivesandphonemescontainedinadatabaseofhundredsofnativespeakerrecordings(Hinks,2003,p.5).Thatis,aspeechrecognizerbasedonHMMcomputeshowclosethephonemesofaspokeninputaretoacorrespondingmodel,basedonprobabilitytheory.Highlikelihoodrepresentsgoodpronunciation;lowlikelihoodrepresentspoorpronunciation(Larocca,etal.,1991).
WhileASRhasbeencommonlyusedforsuchpurposesasbusinessdictationandspecialneedsaccessibility,itsmarketpresenceforlanguagelearninghasincreaseddramaticallyinrecentyears(Aist,1999;Eskenazi,1999;Hinks,2003).EarlyASR-basedsoftwareprogramsadoptedtemplate-basedrecognitionsystemswhichperformpatternmatchingusingdynamicprogrammingorothertimenormalizationtechniques(Dalby&Kewley-Port,1999).TheseprogramsincludeTalktoMe(Auralog,1995),theTellMeMoreSeries(Auralog,2000),Triple-PlayPlus(Mackey&Choi,1998),NewDynamicEnglish(DynEd,1997),EnglishDiscoveries(Edusoft,1998),and
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 英文 文献