书签分享收藏举报版权申诉 / 51

立即下载加入VIP,免费下载

当前位置：首页 > 人文社科 > 视频讲堂 > Matlab在语音识别中的应用Word文档格式.docx

Matlab在语音识别中的应用Word文档格式.docx

文档编号：20821985
上传时间：2023-01-25
格式：DOCX
页数：51
大小：3.37MB

《Matlab在语音识别中的应用Word文档格式.docx》由会员分享，可在线阅读，更多相关《Matlab在语音识别中的应用Word文档格式.docx（51页珍藏版）》请在冰豆网上搜索。

Matlab在语音识别中的应用Word文档格式.docx

本功能和上面重复的部分省略掉了，现在只补充添加的代码：

2.5

语音识别

将打开的语音与提前录好的语音库进行识别，采用的是DTW算法。

识别完后就会在相应的文本框里显示识别的文字。

代码如下：

程序运行前后的对比图：

GUI的整体效果图：

总结

实验已经实现了对“东、北、大、学、中、荷、学、院”文字的识别，前提是用模版的语音作为样本去和语音库测试，这已经可以保证１００％的正确率，这说明算法是正确的，只是需要优化。

而现场录音和模版匹配时，则不能保证较高的正确率，这说明特征参数的提取这方面还不够完善。

特征参数提取的原则是类内距离尽量小，类间距离尽量大的原则，这是需要以后完善的地方。

ＧＵＩ也需要优化，先生成一个模版库，然后用待测语音和模版库语音识别，让这个模版库孤立出来，不需要每次测试都要重复生成模版库，提高运算速率。

以后有机会可以实现连续语音的识别！

附件

这是全部代码文件

mfcc.mat文件是程序运行过程中生成的；

test文件夹里面存放了录音的模版：

这里是6个.M文件，如下：

1WienerScalart96.m

functionoutput=WienerScalart96（signal,fs,IS）

%output=WIENERSCALART96（signal,fs,IS）

%WienerfilterbasedontrackingaprioriSNRusingDecision-Directed

%method,proposedbyScalartetal96.Inthismethoditisassumedthat

%SNRpost=SNRprior+1.basedonthistheWienerFiltercanbeadaptedtoa

%modellikeEphraimsmodelinwhichwehaveagainfunctionwhichisa

%functionofaprioriSNRandaprioriSNRisbeingtrackedusingDecision

%Directedmethod.

%Author:

EsfandiarZavarehei

%Created:

MAR-05

if（nargin<

3|isstruct（IS））

IS=.25;

%InitialSilenceorNoiseOnlypartinseconds

end

W=fix（.025*fs）;

%Windowlengthis25ms

SP=.4;

%Shiftpercentageis40%（10ms）%Overlap-Addmethodworksgoodwiththisvalue（.4）

wnd=hamming（W）;

%IGNOREFROMHERE...............................

if（nargin>

=3&

isstruct（IS））%Thisoptionisforcompatibilitywithanotherprogramme

W=IS.windowsize

SP=IS.shiftsize/W;

%nfft=IS.nfft;

wnd=IS.window;

ifisfield（IS,'

IS'

）

IS=IS.IS;

else

%......................................UPTOHERE

pre_emph=0;

signal=filter（[1-pre_emph],1,signal）;

NIS=fix（（IS*fs-W）/（SP*W）+1）;

%numberofinitialsilencesegments

y=segment（signal,W,SP,wnd）;

%Thisfunctionchopsthesignalintoframes

Y=fft（y）;

YPhase=angle（Y（1:

fix（end/2）+1,:

））;

%NoisySpeechPhase

Y=abs（Y（1:

%Specrogram

numberOfFrames=size（Y,2）;

FreqResol=size（Y,1）;

N=mean（Y（:

1:

NIS）'

）'

;

%initialNoisePowerSpectrummean

LambdaD=mean（（Y（:

）.^2）'

%initialNoisePowerSpectrumvariance

alpha=.99;

%usedinsmoothingxi（ForDeciesionDirectedmethodforestimationofAPrioriSNR）

NoiseCounter=0;

NoiseLength=9;

%Thisisasmoothingfactorforthenoiseupdating

G=ones（size（N））;

%InitialGainusedincalculationofthenewxi

Gamma=G;

X=zeros（size（Y））;

%InitializeX（memoryallocation）

h=waitbar（0,'

Wait...'

）;

fori=1:

numberOfFrames

%%%%%%%%%%%%%%%%VADandNoiseEstimationSTART

ifi<

=NIS%IfinitialsilenceignoreVAD

SpeechFlag=0;

NoiseCounter=100;

else%ElseDoVAD

[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad（Y（:

i）,N,NoiseCounter）;

%MagnitudeSpectrumDistanceVAD

ifSpeechFlag==0%IfnotSpeechUpdateNoiseParameters

N=（NoiseLength*N+Y（:

i））/（NoiseLength+1）;

%Updateandsmoothnoisemean

LambdaD=（NoiseLength*LambdaD+（Y（:

i）.^2））./（1+NoiseLength）;

%Updateandsmoothnoisevariance

%%%%%%%%%%%%%%%%%%%VADandNoiseEstimationEND

gammaNew=（Y（:

i）.^2）./LambdaD;

%ApostirioriSNR

xi=alpha*（G.^2）.*Gamma+（1-alpha）.*max（gammaNew-1,0）;

%DecisionDirectedMethodforAPrioriSNR

Gamma=gammaNew;

G=（xi./（xi+1））;

X（:

i）=G.*Y（:

i）;

%ObtainthenewCleanedvalue

waitbar（i/numberOfFrames,h,num2str（fix（100*i/numberOfFrames）））;

close（h）;

output=OverlapAdd2（X,YPhase,W,SP*W）;

%Overlap-addSynthesisofspeech

output=filter（1,[1-pre_emph],output）;

%UndotheeffectofPre-emphasis

functionReconstructedSignal=OverlapAdd2（XNEW,yphase,windowLen,ShiftLen）;

%Y=OverlapAdd（X,A,W,S）;

%Yisthesignalreconstructedsignalfromitsspectrogram.Xisamatrix

%witheachcolumnbeingthefftofasegmentofsignal.Aisthephase

%angleofthespectrumwhichshouldhavethesamedimensionasX.ifitis

%notgiventhephaseangleofXisusedwhichinthecaseofrealvaluesis

%zero（assumingthatitsthemagnitude）.Wisthewindowlengthoftime

%domainsegmentsifnotgiventhelengthisassumedtobetwiceaslongas

%fftwindowlength.Sistheshiftlengthofthesegmentationprocess（for

%exampleinthecaseofnonoverlappingsignalsitisequaltoWandinthe

%caseof%50overlapisequaltoW/2.ifnotgivvenW/2isused.Yisthe

%reconstructedtimedomainsignal.

%Sep-04

%EsfandiarZavarehei

ifnargin<

2

yphase=angle（XNEW）;

3

windowLen=size（XNEW,1）*2;

4

ShiftLen=windowLen/2;

iffix（ShiftLen）~=ShiftLen

ShiftLen=fix（ShiftLen）;

disp（'

Theshiftlengthhavetobeanintegerasitisthenumberofsamples.'

disp（['

shiftlengthisfixedto'

num2str（ShiftLen）]）

[FreqResFrameNum]=size（XNEW）;

Spec=XNEW.*exp（j*yphase）;

ifmod（windowLen,2）%ifFreqResolisodd

Spec=[Spec;

flipud（conj（Spec（2:

end,:

）））];

end-1,:

sig=zeros（（FrameNum-1）*ShiftLen+windowLen,1）;

weight=sig;

FrameNum

start=（i-1）*ShiftLen+1;

spec=Spec（:

sig（start:

start+windowLen-1）=sig（start:

start+windowLen-1）+real（ifft（spec,windowLen））;

ReconstructedSignal=sig;

functionSeg=segment（signal,W,SP,Window）

%SEGMENTchopsasignaltooverlappingwindowedsegments

%A=SEGMENT（X,W,SP,WIN）returnsamatrixwhichitscolumnsaresegmented

%andwindowedframesoftheinputonedimentionalsignal,X.Wisthe

%numberofsamplesperwindow,defaultvalueW=256.SPistheshift

%percentage,defaultvalueSP=0.4.WINisthewindowthatismultipliedby

%eachsegmentanditslengthshouldbeW.thedefaultwindowishamming

%window.

%06-Sep-04

%EsfandiarZavarehei

W=256;

Window=hamming（W）;

Window=Window（:

%makeitacolumnvector

L=length（signal）;

SP=fix（W.*SP）;

N=fix（（L-W）/SP+1）;

%numberofsegments

Index=（repmat（1:

W,N,1）+repmat（（0:

（N-1））'

*SP,1,W））'

hw=repmat（Window,1,N）;

Seg=signal（Index）.*hw;

function[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad（signal,noise,NoiseCounter,NoiseMargin,Hangover）

%[NOISEFLAG,SPEECHFLAG,NOISECOUNTER,DIST]=vad（SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER）

%SpectralDistanceVoiceActivityDetector

%SIGNAListhethecurrentframesmagnitudespectrumwhichistolabeldas

%noiseorspeech,NOISEisnoisemagnitudespectrumtemplate（estimation）,

%NOISECOUNTERisthenumberofimediatepreviousnoiseframes,NOISEMARGIN

%（default3）isthespectraldistancethreshold.HANGOVER（default8）is

%thenumberofnoisesegmentsafterwhichtheSPEECHFLAGisreset（goesto

%zero）.NOISEFLAGissettooneifthethesegmentislabeldasnoise

%NOISECOUNTERreturnsthenumberofpreviousnoisesegments,thisvalueis

%reset（tozero）wheneveraspeechsegmentisdetected.DISTisthe

%spectraldistance.

%SaeedVaseghi

%editedbyEsfandiarZavarehei

NoiseMargin=3;

5

Hangover=8;

FreqResol=length（signal）;

SpectralDist=20*（log10（signal）-log10（noise））;

SpectralDist（find（SpectralDist<

0））=0;

Dist=mean（SpectralDist）;

if（Dist<

NoiseMargin）

NoiseFlag=1;

NoiseCounter=NoiseCounter+1;

NoiseFlag=0;

%Detectnoiseonlyperiodsandattenuatethesignal

if（NoiseCounter>

Hangover）

else

SpeechFlag=1;

2mfcc.m

functioncc=mfcc（k）

%------------------------------

%cc=mfcc（k）计算语音k的MFCC系数

%M为滤波器个数，N为一帧语音采样点数

M=24;

N=256;

%归一化mel滤波器组系数

bank=melbankm（M,N,22050,0,0.5,'

m'

figure;

plot（linspace（0,N/2,129）,bank）;

title（'

Mel-SpacedFilterbank'

xlabel（'

Frequency[Hz]'

bank=full（bank）;

bank=bank/max（bank（:

%DCT系数,12*24

12

j=0:

23;

dctcoef（i,:

）=cos（（2*j+1）*i*pi/（2*24））;

%归一化倒谱提升窗口

w=1+6*sin（pi*[1:

12]./12）;

w=w/max（w）;

%预加重

AggrK=double（k）;

AggrK=filter（[1,-0.9375],1,AggrK）;

%分帧

FrameK=enframe（AggrK,N,80）;

%加窗

size（FrameK,1）

FrameK（i,:

）=（FrameK（i,:

））'

.*hamming（N）;

FrameK=FrameK'

%计算功率谱

S=（abs（fft（FrameK）））.^2;

显示功率谱……'

plot（S）;

axis（[1,size（S,1）,0,2]）;

PowerSpectrum（M=24,N=256）'

Frame'

ylabel（'

colorbar;

%将功率谱通过滤波器组

P=bank*S（1:

129,:

%取对数后作离散余弦变换

D=dctcoef*log（P）;

%倒谱提升窗

size（D,2）

m（i,:

）=（D（:

i）.*w'

%差分系数

dtm=zeros（size（m））;

fori=3:

size（m,1）-2

dtm（i,:

）=-2*m（i-2,:

）-m（i-1,:

）+m（i+1,:

）+2*m（i+2,:

dtm=dtm/3;

%合并mfcc参数和一阶差分mfcc参数

cc=[m,dtm];

%去除首尾两帧，因为这两帧的一阶差分参数为0

cc=cc（3:

size（m,1）-2,:

3getpoint.m

function[StartPoint,EndPoint]=getpoint（k,fs）

%UNTITLED此处显示有关此函数的摘要

%此处显示详细说明

signal=WienerScalart96（k,fs）;

sigLength=length（signal）;

%计算信号长度

t=（0:

sigLength-1）/fs;

%计算信号对应时间坐标

FrameLen=round（（0.012/max（t））*sigLength）;

%定义每一帧长度

FrameInc=round（FrameLen/3）;

%每一帧的重叠区域，选为帧长的1/3~1/2

tmp=enframe（signal（1:

end）,FrameLen,FrameInc）;

signal=signal/max（abs（signal））;

signal=double（signal）;

signal=filter（[1,-0.9735],1,signal）;

tmp1=enframe（signal（1:

end-1）,FrameLen,FrameInc）;

tmp2=enf

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: Matlab 语音识别中的应用

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：Matlab在语音识别中的应用Word文档格式.docx
链接地址：https://www.bdocx.com/doc/20821985.html

Matlab在语音识别中的应用Word文档格式.docx

热门标签