音频和语音信号分析中英文对照外文翻译文献.docx
- 文档编号:28816068
- 上传时间:2023-07-19
- 格式:DOCX
- 页数:17
- 大小:345.98KB
音频和语音信号分析中英文对照外文翻译文献.docx
《音频和语音信号分析中英文对照外文翻译文献.docx》由会员分享,可在线阅读,更多相关《音频和语音信号分析中英文对照外文翻译文献.docx(17页珍藏版)》请在冰豆网上搜索。
音频和语音信号分析中英文对照外文翻译文献
中英文对照外文翻译
原文:
TimeVaryingautoregressivemodelingofAudioandspeechsignals
Conventionallinearpredictivetechniquesformodelingofspeechandaudiosignalsarebasedonanassumptionthatasignalisstationarywithineachanalysisframe.However,naturalsignalsareoftencontinuouslytimevarying,i.e.,nonstationary.Thereforethisassumptionmightnotbewelljustified.Inthispaper,westudyatimevaryingautoregressive(TVAR)modelingtechniqueinwhichthisrestrictionisrelaxed.Afrequency-warpedformulationoftheSubbaRaoLiporaceTVARalgorithmisintroducedinthearticle.Theapplicabilityofthepresentedmethodologytovariousspeechandaudiosignalprocessingtasksisillustratedanddiscussed.ItisalsoshownthattheTVARschemeyieldsanefficientparametrizationfortimevaryingsounds.
1Introduction
Linearprediction,LP,isastandardtechniqueinspeechandaudiocoding[1]andinseveralotherapplicationsforanalysisandsynthesisofaudioandspeechsignals.ConventionalLPtechn-iquesutilizeframebasedautoregressivespectralmodeling,e.g.,autocorrelationorautocovariancemethodofLP.InconventionalLP,itisassumedthatasignalx(t)canbeexpressedasalinearcombinationoftheprevioussamplesby
(1)
Here,akareasetoffixedsetofpcoefficientswhichcanbeestimatedfromasignalframeofTsamples,ande(t)isapredictionerror,oraresidual,signal.
InwarpedLP[2],thisismodifiedbyreplacingx(t−k)in
(1)bydk[x(t)]thatisproducedbyfilteringx(t)withachainofkfirst-orderallpassfilters.Thetransferfunctionoftheallpassfilterisoftheform
.Here,
iscalledawarpingparameter.With
thesystemreducestotheconventionallinearprediction,andotherchoicesyieldtothemodified,warped,frequencyrepresentationofthesystem.Inspeechandaudioapplicationsitisbeneficialtochoose
suchthatthenewfrequencyrepresentationapproximatesthatofhumanhearing[3,4].Thesignalmodelforwarpedlinearprediction,WLP,isgivenby
(2)
ThisisclearlyageneralizationoftheconventionalLP(whichisobtainedif
=0).Inpractice,theonlydifferenceisthatallunitdelaysz−1inthecomputationofthecorrelationvaluesandintheimplementationofinverseandsynthesisfiltersarereplacedbyfirstorderallpasselementsD(z)[5,4].Theestimationofthecoefficientsinbothconventionalandwarpedlinearpredictionreliesonaninherentassumptionthatthesignalisstationarywithintheanalysisframe.Inmanycasesthisassumptionisreasonable.Nevertheless,speechandaudiosignalsareoftennonstationary.Onsetsandoffsetsofmusicalsounds,brieftransients,transitions,andchirpsaretypicalexamplesofnonstationaritieswhichmayoccurwithinatimescalethatissignificantlyshorterthanthetypicallengthoftheanalysisframeintheLPanalysis.Infact,thosearealltypicalinstancesinwhichtheconventionalLPtechniques,e.g.,LPbasedcoders,usuallyfail.Apartialsolutionthisistoperformanalysisusingoverlappingframesandinterpolatingcoefficients,usuallylinearly,betweenadjacentsignalframes.However,thisapproachismoreorlessarbitraryandisnotrelatedtotheactualfluctuationsinthesignal.Timevaryingautoregressive,TVAR,modelswerefirstintroducedin[6]andhasthereafterbeenpartiallyreformulatedandappliedtospeech[7,8].Inthisarticle,afrequencywarpedversionoftheconventionalalgorithmfordirectformtimevaryingfiltersispresented.
2WarpedTVARModels
Inthetimevaryingcasethepredictioncoefficients
in
(1)arefunctionsoftimeandthemodelofasignalx(t)isoftheform
(3)
In(3)therearemoreparametersthandata.Thusthereareinfinitelymanyexactsolutionsthatfulfill,mostofwhicharetotallymeaningless.Thedeterminationoftimevaryingpredictioncoefficientsisanillposedproblemanditischaracteristicfortheclassofinverseproblems.Withoutanyfurtherpriorinformationorfeasibleassumptionstheproblemisalmostimpossibletosolve.Itispossibletouse,e.g.,adaptivefiltering[9]orsmoothnesspriorsTVARtechniques[10]toestimatethetimevaryingcoefficients.However,thesedonotimmediatelyleadtotheefficientparametrizationoftheprocess.Thetechniquepresentedinthisarticleisbasedonanassumptionthatthecoefficientevolutions
canbeexpressedaslinearcombinationsofpredefinedbasisfunctions
i.e.,
(4)
where
arebasiscoefficients.Substitutionof(4)to(3)yieldsto
(5)
Thebasiscoefficientsareobtainedbyminimizingtheresidualinleastsquaressense,i.e.,bysolvingmin
(6)
Heretheparametervectorisoftheform
(7)
andtheregressormatrix
(8)
where
(9)
InEqs(7)-(9)中
.denotestranspose.Thesolutionof(6)isformallyobtainedfrom
althoughtheutilizationofanorthogonalizationalgorithmmightbeappropriate.Thetimevaryingpredictioncoefficientsareassembledvia(4).
Itisobservedthatthecomputationalcomplexityofthealgorithmdependsonthenumberofthebasisfunctionsandtheorderofthemodel.Inprinciple,thecomputationalburdenisMtimeshigherthanintheusualtime-invariantcase.However,themodelingcapabilitiesofthetime-varyingschemecouldbebetterthanthetime-invariantone.Thus,theextracomputationalburdenismaybeacceptable.
3Examples
ThefollowinghighlysimplifiedexampleillustratestheapplicabilityofTVARtechniquestoLPCbasedaudiocoding.Inthiscase,thereareonlytwobasisfunctionsandthesystemisunwar-ped,i.e.,
.ThesignalinFig.1(top)isasegmentofanaudiosignalinwhichthebeginningisanattenuatingsoundofVibraphoneandinthemiddleoftheframeisanonsetofasoundofanacousticguitar.
Figure1:
Top:
Anexcerptfromamusicalsignal.
Bottom:
BasisfunctionsoftheTVARmodel
Thebasisfunctions
areshowninFig.1.Thefirstbasisfunctionisconstantandthesecon-doneisasigmoidalfunction.Ifonlythefirstbasisfunctionwasusedthetechniquewouldbeana-logoustotheconventionalframebasedLPC.Inthisexample,thecenterandthesteep-nessofthesecondbasisfunctionhavebeenchosenusinganiterativealgorithmthattriestomaximizeSegme-ntalPredictionGain,givenby
(10)
Here,theoriginalsignalandtheresidualaredividedintoSsegmentsandthevarianceofeachsignalandresidualsegmentsaredenotedby
and
respectively.
ThefinalresultoftheTVARanalysis,thatis,thecoefficientevolutionsequencesoverthesignalframe,usingthebasisfunctionofFig.1,areshowninFig.2.Inthisexample,thefilterscorrespondingTVARmodelsarestableateachtimeinstant.ThepotentialinstabilitiesofTVARmodels[7]canbehandled,forexample,withmetho-dsdiscussedin[11,12].
Figure2:
Time-varyingcoefficientevolutionsak(t).
Thepredictionerrorsignal,residual,ofa10thorderTVARmodelisshowninFig.3(top).Theresidualofaconventional20thorderLPCisalsoshowninFig3(bottom).InthelattercasetheLPChasbeenestimatedbyusingHammingwindowandautoc-orrelationmethodofLP.SincethereareonlytwobasisfunctionsintheTVARmodel,thenumberofparametersinbothcasesisthesame.However,inTVARschemeitisnecessarytotransmittwoadditionalparam-eterwhichdescribethecenterandsteepn-essofthebasisfunction.Thepredictiongain,PGdB,overthewholesignalexcerptinthetwocasesisapproximatelythesame.SPGdBishigherinthecaseoftheTVARmodel,i.e.,43dBfortheTVARmodeland38dBfortheLPCmodel.ThedifferencebetweenthetwotechniquesisevenhigherintheVibraphonepartofthesignal,i.e.,thePGdBintheTVARexceedsthatoftheLPCmodelbymorethan6dB.ThismeansthattheLPCmodelforthebeginningoftheexcerptisinaccurate.Itisprobable,thatinacodingapplica-tionthiswouldprodu-ceanartifactwhichissometimescalledthepreechoeffect.
Figure3:
Residualsignalinthecaseofa10thorderTVAR
model(top)and20thorderconventionalLPCmodel(bottom)
ThetoppanelofFig.4showsawaveformofamalespeechutte-rance/ma/atthesamplingrateof10kHz.Themiddlefigureshowsasetofsevenprolatespheroidalbasisfunctionsandaconstantfunction.Thebottomfigureshowscoefficientevolutionsofa12thorderwarpedTVARmodelofthesignal.Atthissamplingrate,
=0.46yieldsafrequencyrepresentationwhichveryclosetothefrequencyresolutionofhumanhearing[3].Computingthefrequencyresponseofthetimevaryingfilterateachtimeinstantonemayproducearepresentationwhichisherecalledanallpolespectrogram.
Figure4:
(Top)Originalmalespeechutterance/ma/(Middle)
Asetofprolatespheroidalbasisfunctions(Bottom)Coefficientevolutions
ForthiscoefficientevolutionthisisshowninFig.5d.Fig.5bshowstheallpolespectrogramcorrespond-ingtotheunwa-rpedcase,i.e.,
.
Figure5:
Allpolespectrogramscorrespondingtotheestimatedparametricmodels.
PanelsaandcinFig.5showallpolespectrogramsestimatedusingconventionalandwarpedLP,respectively,suchthatthelengthoftheHanningwindowwas40msandthemodelwasestimatedatintervalsof1ms.Thetotalnumberofparametersforthepanelsaandcis200×12=2400whileinpanelsbandditisonly8×12=96.Asexpected,theuseofwarpinginpanelscanddenhancesfrequencyresolutionatlowfrequencies,e.g.,thesecondformantandthenasalformantsatlowfrequenciesareclearlyvisibleinthetwobottompanels.Accordingtopsychoac-oustictheoriesandlisteningtestresults[4],thisisadvantageousinmanyspeechandaudioappl-ications.
4Discussion
Therepresentationoftimevaryingcoefficientsasalinearcombinationofpredefinedbasisfunc
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 音频 语音 信号 分析 中英文 对照 外文 翻译 文献