书签分享收藏举报版权申诉 / 17

立即下载加入VIP,免费下载

当前位置：首页 > 外语学习 > 英语学习 > 音频和语音信号分析中英文对照外文翻译文献.docx

音频和语音信号分析中英文对照外文翻译文献.docx

文档编号：28816068
上传时间：2023-07-19
格式：DOCX
页数：17
大小：345.98KB

《音频和语音信号分析中英文对照外文翻译文献.docx》由会员分享，可在线阅读，更多相关《音频和语音信号分析中英文对照外文翻译文献.docx（17页珍藏版）》请在冰豆网上搜索。

音频和语音信号分析中英文对照外文翻译文献.docx

音频和语音信号分析中英文对照外文翻译文献

中英文对照外文翻译

原文:

TimeVaryingautoregressivemodelingofAudioandspeechsignals

Conventionallinearpredictivetechniquesformodelingofspeechandaudiosignalsarebasedonanassumptionthatasignalisstationarywithineachanalysisframe.However,naturalsignalsareoftencontinuouslytimevarying,i.e.,nonstationary.Thereforethisassumptionmightnotbewelljustified.Inthispaper,westudyatimevaryingautoregressive（TVAR）modelingtechniqueinwhichthisrestrictionisrelaxed.Afrequency-warpedformulationoftheSubbaRaoLiporaceTVARalgorithmisintroducedinthearticle.Theapplicabilityofthepresentedmethodologytovariousspeechandaudiosignalprocessingtasksisillustratedanddiscussed.ItisalsoshownthattheTVARschemeyieldsanefficientparametrizationfortimevaryingsounds.

1Introduction

Linearprediction,LP,isastandardtechniqueinspeechandaudiocoding[1]andinseveralotherapplicationsforanalysisandsynthesisofaudioandspeechsignals.ConventionalLPtechn-iquesutilizeframebasedautoregressivespectralmodeling,e.g.,autocorrelationorautocovariancemethodofLP.InconventionalLP,itisassumedthatasignalx（t）canbeexpressedasalinearcombinationoftheprevioussamplesby

（1）

Here,akareasetoffixedsetofpcoefficientswhichcanbeestimatedfromasignalframeofTsamples,ande（t）isapredictionerror,oraresidual,signal.

InwarpedLP[2],thisismodifiedbyreplacingx（t−k）in

（1）bydk[x（t）]thatisproducedbyfilteringx（t）withachainofkfirst-orderallpassfilters.Thetransferfunctionoftheallpassfilterisoftheform

.Here,

iscalledawarpingparameter.With

thesystemreducestotheconventionallinearprediction,andotherchoicesyieldtothemodified,warped,frequencyrepresentationofthesystem.Inspeechandaudioapplicationsitisbeneficialtochoose

suchthatthenewfrequencyrepresentationapproximatesthatofhumanhearing[3,4].Thesignalmodelforwarpedlinearprediction,WLP,isgivenby

（2）

ThisisclearlyageneralizationoftheconventionalLP（whichisobtainedif

=0）.Inpractice,theonlydifferenceisthatallunitdelaysz−1inthecomputationofthecorrelationvaluesandintheimplementationofinverseandsynthesisfiltersarereplacedbyfirstorderallpasselementsD（z）[5,4].Theestimationofthecoefficientsinbothconventionalandwarpedlinearpredictionreliesonaninherentassumptionthatthesignalisstationarywithintheanalysisframe.Inmanycasesthisassumptionisreasonable.Nevertheless,speechandaudiosignalsareoftennonstationary.Onsetsandoffsetsofmusicalsounds,brieftransients,transitions,andchirpsaretypicalexamplesofnonstationaritieswhichmayoccurwithinatimescalethatissignificantlyshorterthanthetypicallengthoftheanalysisframeintheLPanalysis.Infact,thosearealltypicalinstancesinwhichtheconventionalLPtechniques,e.g.,LPbasedcoders,usuallyfail.Apartialsolutionthisistoperformanalysisusingoverlappingframesandinterpolatingcoefficients,usuallylinearly,betweenadjacentsignalframes.However,thisapproachismoreorlessarbitraryandisnotrelatedtotheactualfluctuationsinthesignal.Timevaryingautoregressive,TVAR,modelswerefirstintroducedin[6]andhasthereafterbeenpartiallyreformulatedandappliedtospeech[7,8].Inthisarticle,afrequencywarpedversionoftheconventionalalgorithmfordirectformtimevaryingfiltersispresented.

2WarpedTVARModels

Inthetimevaryingcasethepredictioncoefficients

in

（1）arefunctionsoftimeandthemodelofasignalx（t）isoftheform

（3）

In（3）therearemoreparametersthandata.Thusthereareinfinitelymanyexactsolutionsthatfulfill，mostofwhicharetotallymeaningless.Thedeterminationoftimevaryingpredictioncoefficientsisanillposedproblemanditischaracteristicfortheclassofinverseproblems.Withoutanyfurtherpriorinformationorfeasibleassumptionstheproblemisalmostimpossibletosolve.Itispossibletouse,e.g.,adaptivefiltering[9]orsmoothnesspriorsTVARtechniques[10]toestimatethetimevaryingcoefficients.However,thesedonotimmediatelyleadtotheefficientparametrizationoftheprocess.Thetechniquepresentedinthisarticleisbasedonanassumptionthatthecoefficientevolutions

canbeexpressedaslinearcombinationsofpredefinedbasisfunctions

i.e.,

（4）

where

arebasiscoefficients.Substitutionof（4）to（3）yieldsto

（5）

Thebasiscoefficientsareobtainedbyminimizingtheresidualinleastsquaressense,i.e.,bysolvingmin

（6）

Heretheparametervectorisoftheform

（7）

andtheregressormatrix

（8）

where

（9）

InEqs（7）-（9）中

.denotestranspose.Thesolutionof（6）isformallyobtainedfrom

althoughtheutilizationofanorthogonalizationalgorithmmightbeappropriate.Thetimevaryingpredictioncoefficientsareassembledvia（4）.

Itisobservedthatthecomputationalcomplexityofthealgorithmdependsonthenumberofthebasisfunctionsandtheorderofthemodel.Inprinciple,thecomputationalburdenisMtimeshigherthanintheusualtime-invariantcase.However,themodelingcapabilitiesofthetime-varyingschemecouldbebetterthanthetime-invariantone.Thus,theextracomputationalburdenismaybeacceptable.

3Examples

ThefollowinghighlysimplifiedexampleillustratestheapplicabilityofTVARtechniquestoLPCbasedaudiocoding.Inthiscase,thereareonlytwobasisfunctionsandthesystemisunwar-ped,i.e.,

.ThesignalinFig.1（top）isasegmentofanaudiosignalinwhichthebeginningisanattenuatingsoundofVibraphoneandinthemiddleoftheframeisanonsetofasoundofanacousticguitar.

Figure1:

Top:

Anexcerptfromamusicalsignal.

Bottom:

BasisfunctionsoftheTVARmodel

Thebasisfunctions

areshowninFig.1.Thefirstbasisfunctionisconstantandthesecon-doneisasigmoidalfunction.Ifonlythefirstbasisfunctionwasusedthetechniquewouldbeana-logoustotheconventionalframebasedLPC.Inthisexample,thecenterandthesteep-nessofthesecondbasisfunctionhavebeenchosenusinganiterativealgorithmthattriestomaximizeSegme-ntalPredictionGain，givenby

（10）

Here,theoriginalsignalandtheresidualaredividedintoSsegmentsandthevarianceofeachsignalandresidualsegmentsaredenotedby

and

respectively.

ThefinalresultoftheTVARanalysis,thatis,thecoefficientevolutionsequencesoverthesignalframe,usingthebasisfunctionofFig.1,areshowninFig.2.Inthisexample,thefilterscorrespondingTVARmodelsarestableateachtimeinstant.ThepotentialinstabilitiesofTVARmodels[7]canbehandled,forexample,withmetho-dsdiscussedin[11,12].

Figure2:

Time-varyingcoefficientevolutionsak（t）.

Thepredictionerrorsignal,residual,ofa10thorderTVARmodelisshowninFig.3（top）.Theresidualofaconventional20thorderLPCisalsoshowninFig3（bottom）.InthelattercasetheLPChasbeenestimatedbyusingHammingwindowandautoc-orrelationmethodofLP.SincethereareonlytwobasisfunctionsintheTVARmodel,thenumberofparametersinbothcasesisthesame.However,inTVARschemeitisnecessarytotransmittwoadditionalparam-eterwhichdescribethecenterandsteepn-essofthebasisfunction.Thepredictiongain,PGdB,overthewholesignalexcerptinthetwocasesisapproximatelythesame.SPGdBishigherinthecaseoftheTVARmodel,i.e.,43dBfortheTVARmodeland38dBfortheLPCmodel.ThedifferencebetweenthetwotechniquesisevenhigherintheVibraphonepartofthesignal,i.e.,thePGdBintheTVARexceedsthatoftheLPCmodelbymorethan6dB.ThismeansthattheLPCmodelforthebeginningoftheexcerptisinaccurate.Itisprobable,thatinacodingapplica-tionthiswouldprodu-ceanartifactwhichissometimescalledthepreechoeffect.

Figure3:

Residualsignalinthecaseofa10thorderTVAR

model（top）and20thorderconventionalLPCmodel（bottom）

ThetoppanelofFig.4showsawaveformofamalespeechutte-rance/ma/atthesamplingrateof10kHz.Themiddlefigureshowsasetofsevenprolatespheroidalbasisfunctionsandaconstantfunction.Thebottomfigureshowscoefficientevolutionsofa12thorderwarpedTVARmodelofthesignal.Atthissamplingrate,

=0.46yieldsafrequencyrepresentationwhichveryclosetothefrequencyresolutionofhumanhearing[3].Computingthefrequencyresponseofthetimevaryingfilterateachtimeinstantonemayproducearepresentationwhichisherecalledanallpolespectrogram.

Figure4:

（Top）Originalmalespeechutterance/ma/（Middle）

Asetofprolatespheroidalbasisfunctions（Bottom）Coefficientevolutions

ForthiscoefficientevolutionthisisshowninFig.5d.Fig.5bshowstheallpolespectrogramcorrespond-ingtotheunwa-rpedcase,i.e.,

.

Figure5:

Allpolespectrogramscorrespondingtotheestimatedparametricmodels.

PanelsaandcinFig.5showallpolespectrogramsestimatedusingconventionalandwarpedLP,respectively,suchthatthelengthoftheHanningwindowwas40msandthemodelwasestimatedatintervalsof1ms.Thetotalnumberofparametersforthepanelsaandcis200×12=2400whileinpanelsbandditisonly8×12=96.Asexpected,theuseofwarpinginpanelscanddenhancesfrequencyresolutionatlowfrequencies,e.g.,thesecondformantandthenasalformantsatlowfrequenciesareclearlyvisibleinthetwobottompanels.Accordingtopsychoac-oustictheoriesandlisteningtestresults[4],thisisadvantageousinmanyspeechandaudioappl-ications.

4Discussion

Therepresentationoftimevaryingcoefficientsasalinearcombinationofpredefinedbasisfunc