Matlab在语音识别中的应用Word文档格式.docx
- 文档编号:20821985
- 上传时间:2023-01-25
- 格式:DOCX
- 页数:51
- 大小:3.37MB
Matlab在语音识别中的应用Word文档格式.docx
《Matlab在语音识别中的应用Word文档格式.docx》由会员分享,可在线阅读,更多相关《Matlab在语音识别中的应用Word文档格式.docx(51页珍藏版)》请在冰豆网上搜索。
本功能和上面重复的部分省略掉了,现在只补充添加的代码:
2.5
语音识别
将打开的语音与提前录好的语音库进行识别,采用的是DTW算法。
识别完后就会在相应的文本框里显示识别的文字。
代码如下:
程序运行前后的对比图:
GUI的整体效果图:
总结
实验已经实现了对“东、北、大、学、中、荷、学、院”文字的识别,前提是用模版的语音作为样本去和语音库测试,这已经可以保证100%的正确率,这说明算法是正确的,只是需要优化。
而现场录音和模版匹配时,则不能保证较高的正确率,这说明特征参数的提取这方面还不够完善。
特征参数提取的原则是类内距离尽量小,类间距离尽量大的原则,这是需要以后完善的地方。
GUI也需要优化,先生成一个模版库,然后用待测语音和模版库语音识别,让这个模版库孤立出来,不需要每次测试都要重复生成模版库,提高运算速率。
以后有机会可以实现连续语音的识别!
附件
这是全部代码文件
mfcc.mat文件是程序运行过程中生成的;
test文件夹里面存放了录音的模版:
这里是6个.M文件,如下:
1WienerScalart96.m
functionoutput=WienerScalart96(signal,fs,IS)
%output=WIENERSCALART96(signal,fs,IS)
%WienerfilterbasedontrackingaprioriSNRusingDecision-Directed
%method,proposedbyScalartetal96.Inthismethoditisassumedthat
%SNRpost=SNRprior+1.basedonthistheWienerFiltercanbeadaptedtoa
%modellikeEphraimsmodelinwhichwehaveagainfunctionwhichisa
%functionofaprioriSNRandaprioriSNRisbeingtrackedusingDecision
%Directedmethod.
%Author:
EsfandiarZavarehei
%Created:
MAR-05
if(nargin<
3|isstruct(IS))
IS=.25;
%InitialSilenceorNoiseOnlypartinseconds
end
W=fix(.025*fs);
%Windowlengthis25ms
SP=.4;
%Shiftpercentageis40%(10ms)%Overlap-Addmethodworksgoodwiththisvalue(.4)
wnd=hamming(W);
%IGNOREFROMHERE...............................
if(nargin>
=3&
isstruct(IS))%Thisoptionisforcompatibilitywithanotherprogramme
W=IS.windowsize
SP=IS.shiftsize/W;
%nfft=IS.nfft;
wnd=IS.window;
ifisfield(IS,'
IS'
)
IS=IS.IS;
else
%......................................UPTOHERE
pre_emph=0;
signal=filter([1-pre_emph],1,signal);
NIS=fix((IS*fs-W)/(SP*W)+1);
%numberofinitialsilencesegments
y=segment(signal,W,SP,wnd);
%Thisfunctionchopsthesignalintoframes
Y=fft(y);
YPhase=angle(Y(1:
fix(end/2)+1,:
));
%NoisySpeechPhase
Y=abs(Y(1:
%Specrogram
numberOfFrames=size(Y,2);
FreqResol=size(Y,1);
N=mean(Y(:
1:
NIS)'
)'
;
%initialNoisePowerSpectrummean
LambdaD=mean((Y(:
).^2)'
%initialNoisePowerSpectrumvariance
alpha=.99;
%usedinsmoothingxi(ForDeciesionDirectedmethodforestimationofAPrioriSNR)
NoiseCounter=0;
NoiseLength=9;
%Thisisasmoothingfactorforthenoiseupdating
G=ones(size(N));
%InitialGainusedincalculationofthenewxi
Gamma=G;
X=zeros(size(Y));
%InitializeX(memoryallocation)
h=waitbar(0,'
Wait...'
);
fori=1:
numberOfFrames
%%%%%%%%%%%%%%%%VADandNoiseEstimationSTART
ifi<
=NIS%IfinitialsilenceignoreVAD
SpeechFlag=0;
NoiseCounter=100;
else%ElseDoVAD
[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad(Y(:
i),N,NoiseCounter);
%MagnitudeSpectrumDistanceVAD
ifSpeechFlag==0%IfnotSpeechUpdateNoiseParameters
N=(NoiseLength*N+Y(:
i))/(NoiseLength+1);
%Updateandsmoothnoisemean
LambdaD=(NoiseLength*LambdaD+(Y(:
i).^2))./(1+NoiseLength);
%Updateandsmoothnoisevariance
%%%%%%%%%%%%%%%%%%%VADandNoiseEstimationEND
gammaNew=(Y(:
i).^2)./LambdaD;
%ApostirioriSNR
xi=alpha*(G.^2).*Gamma+(1-alpha).*max(gammaNew-1,0);
%DecisionDirectedMethodforAPrioriSNR
Gamma=gammaNew;
G=(xi./(xi+1));
X(:
i)=G.*Y(:
i);
%ObtainthenewCleanedvalue
waitbar(i/numberOfFrames,h,num2str(fix(100*i/numberOfFrames)));
close(h);
output=OverlapAdd2(X,YPhase,W,SP*W);
%Overlap-addSynthesisofspeech
output=filter(1,[1-pre_emph],output);
%UndotheeffectofPre-emphasis
functionReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen);
%Y=OverlapAdd(X,A,W,S);
%Yisthesignalreconstructedsignalfromitsspectrogram.Xisamatrix
%witheachcolumnbeingthefftofasegmentofsignal.Aisthephase
%angleofthespectrumwhichshouldhavethesamedimensionasX.ifitis
%notgiventhephaseangleofXisusedwhichinthecaseofrealvaluesis
%zero(assumingthatitsthemagnitude).Wisthewindowlengthoftime
%domainsegmentsifnotgiventhelengthisassumedtobetwiceaslongas
%fftwindowlength.Sistheshiftlengthofthesegmentationprocess(for
%exampleinthecaseofnonoverlappingsignalsitisequaltoWandinthe
%caseof%50overlapisequaltoW/2.ifnotgivvenW/2isused.Yisthe
%reconstructedtimedomainsignal.
%Sep-04
%EsfandiarZavarehei
ifnargin<
2
yphase=angle(XNEW);
3
windowLen=size(XNEW,1)*2;
4
ShiftLen=windowLen/2;
iffix(ShiftLen)~=ShiftLen
ShiftLen=fix(ShiftLen);
disp('
Theshiftlengthhavetobeanintegerasitisthenumberofsamples.'
disp(['
shiftlengthisfixedto'
num2str(ShiftLen)])
[FreqResFrameNum]=size(XNEW);
Spec=XNEW.*exp(j*yphase);
ifmod(windowLen,2)%ifFreqResolisodd
Spec=[Spec;
flipud(conj(Spec(2:
end,:
)))];
end-1,:
sig=zeros((FrameNum-1)*ShiftLen+windowLen,1);
weight=sig;
FrameNum
start=(i-1)*ShiftLen+1;
spec=Spec(:
sig(start:
start+windowLen-1)=sig(start:
start+windowLen-1)+real(ifft(spec,windowLen));
ReconstructedSignal=sig;
functionSeg=segment(signal,W,SP,Window)
%SEGMENTchopsasignaltooverlappingwindowedsegments
%A=SEGMENT(X,W,SP,WIN)returnsamatrixwhichitscolumnsaresegmented
%andwindowedframesoftheinputonedimentionalsignal,X.Wisthe
%numberofsamplesperwindow,defaultvalueW=256.SPistheshift
%percentage,defaultvalueSP=0.4.WINisthewindowthatismultipliedby
%eachsegmentanditslengthshouldbeW.thedefaultwindowishamming
%window.
%06-Sep-04
%EsfandiarZavarehei
W=256;
Window=hamming(W);
Window=Window(:
%makeitacolumnvector
L=length(signal);
SP=fix(W.*SP);
N=fix((L-W)/SP+1);
%numberofsegments
Index=(repmat(1:
W,N,1)+repmat((0:
(N-1))'
*SP,1,W))'
hw=repmat(Window,1,N);
Seg=signal(Index).*hw;
function[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad(signal,noise,NoiseCounter,NoiseMargin,Hangover)
%[NOISEFLAG,SPEECHFLAG,NOISECOUNTER,DIST]=vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER)
%SpectralDistanceVoiceActivityDetector
%SIGNAListhethecurrentframesmagnitudespectrumwhichistolabeldas
%noiseorspeech,NOISEisnoisemagnitudespectrumtemplate(estimation),
%NOISECOUNTERisthenumberofimediatepreviousnoiseframes,NOISEMARGIN
%(default3)isthespectraldistancethreshold.HANGOVER(default8)is
%thenumberofnoisesegmentsafterwhichtheSPEECHFLAGisreset(goesto
%zero).NOISEFLAGissettooneifthethesegmentislabeldasnoise
%NOISECOUNTERreturnsthenumberofpreviousnoisesegments,thisvalueis
%reset(tozero)wheneveraspeechsegmentisdetected.DISTisthe
%spectraldistance.
%SaeedVaseghi
%editedbyEsfandiarZavarehei
NoiseMargin=3;
5
Hangover=8;
FreqResol=length(signal);
SpectralDist=20*(log10(signal)-log10(noise));
SpectralDist(find(SpectralDist<
0))=0;
Dist=mean(SpectralDist);
if(Dist<
NoiseMargin)
NoiseFlag=1;
NoiseCounter=NoiseCounter+1;
NoiseFlag=0;
%Detectnoiseonlyperiodsandattenuatethesignal
if(NoiseCounter>
Hangover)
else
SpeechFlag=1;
2mfcc.m
functioncc=mfcc(k)
%------------------------------
%cc=mfcc(k)计算语音k的MFCC系数
%M为滤波器个数,N为一帧语音采样点数
M=24;
N=256;
%归一化mel滤波器组系数
bank=melbankm(M,N,22050,0,0.5,'
m'
figure;
plot(linspace(0,N/2,129),bank);
title('
Mel-SpacedFilterbank'
xlabel('
Frequency[Hz]'
bank=full(bank);
bank=bank/max(bank(:
%DCT系数,12*24
12
j=0:
23;
dctcoef(i,:
)=cos((2*j+1)*i*pi/(2*24));
%归一化倒谱提升窗口
w=1+6*sin(pi*[1:
12]./12);
w=w/max(w);
%预加重
AggrK=double(k);
AggrK=filter([1,-0.9375],1,AggrK);
%分帧
FrameK=enframe(AggrK,N,80);
%加窗
size(FrameK,1)
FrameK(i,:
)=(FrameK(i,:
))'
.*hamming(N);
FrameK=FrameK'
%计算功率谱
S=(abs(fft(FrameK))).^2;
显示功率谱……'
plot(S);
axis([1,size(S,1),0,2]);
PowerSpectrum(M=24,N=256)'
Frame'
ylabel('
colorbar;
%将功率谱通过滤波器组
P=bank*S(1:
129,:
%取对数后作离散余弦变换
D=dctcoef*log(P);
%倒谱提升窗
size(D,2)
m(i,:
)=(D(:
i).*w'
%差分系数
dtm=zeros(size(m));
fori=3:
size(m,1)-2
dtm(i,:
)=-2*m(i-2,:
)-m(i-1,:
)+m(i+1,:
)+2*m(i+2,:
dtm=dtm/3;
%合并mfcc参数和一阶差分mfcc参数
cc=[m,dtm];
%去除首尾两帧,因为这两帧的一阶差分参数为0
cc=cc(3:
size(m,1)-2,:
3getpoint.m
function[StartPoint,EndPoint]=getpoint(k,fs)
%UNTITLED此处显示有关此函数的摘要
%此处显示详细说明
signal=WienerScalart96(k,fs);
sigLength=length(signal);
%计算信号长度
t=(0:
sigLength-1)/fs;
%计算信号对应时间坐标
FrameLen=round((0.012/max(t))*sigLength);
%定义每一帧长度
FrameInc=round(FrameLen/3);
%每一帧的重叠区域,选为帧长的1/3~1/2
tmp=enframe(signal(1:
end),FrameLen,FrameInc);
signal=signal/max(abs(signal));
signal=double(signal);
signal=filter([1,-0.9735],1,signal);
tmp1=enframe(signal(1:
end-1),FrameLen,FrameInc);
tmp2=enf
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Matlab 语音 识别 中的 应用