倾向性分析评测论文.docx
- 文档编号:29850825
- 上传时间:2023-07-27
- 格式:DOCX
- 页数:15
- 大小:75.70KB
倾向性分析评测论文.docx
《倾向性分析评测论文.docx》由会员分享,可在线阅读,更多相关《倾向性分析评测论文.docx(15页珍藏版)》请在冰豆网上搜索。
倾向性分析评测论文
AnalysisoftheEvaluationResultsforourTasksinCOAE2009
CHENMosha1,WANGRui2,ZHANGXiaojun1,QIUWei1,ZHANGYi3,LITingyu1,ZHANGWenbo1,andYAOTianfang1
1UdS-SJTUJointResearchLabforLanguageTechnology
Dept.ofComputerScienceandEngineering
ShanghaiJiaoTongUniversity,Shanghai200240,China.
2ComputationalLinguisticsDepartment,SaarlandUniversity,Germany
3DFKIGmbH,Saarbruecken,Germany
1{mosha,jiuren,hh190,hellowenniu}@;
1Xiaojun.zhang.iiken@;
1yao-tf@;
2rwang@coli.uni-sb.de;
3yzhang@coli.uni-sb.de;
Abstract
COAE2009hasfivetasksandwetakepartinTask3,4and5.Task3isdesignedforidentificationoftheopinionedsentence;Task4isdesignedfortopicidentificationbasedonthesentencesfromTask3andmakesthepolarityclassification;Task5isaboutopinionretrievalplusthesentimentpolarityanalysis.Thispaperwillpresentourmethodsinthethreetasksandfinallydrawourconclusionandpresentourfuturework.
1Introduction
TextOpinionAnalysisisataskofgrowinginterestinrecentyears.ManyresearchesonthisissuebegintoexistintopconferencesuchasACL,SIGIRandWI.AlsointernationalevaluationcontestlikeTrecBlogTrackandNTCIRbeginaddressingthisissueinrecentlyyears.ItisarelativenewtopicinChineselanguageprocessing.Followinglastyear’sevaluationcontest(COAE2008),theChineseInformationProcessingSocietyofChinaholdsthe2ndevaluationcontest(COAE2009).COAE2009has5tasks,focusingonsentimentclassification,opinionsentenceselection,topicextractionandtopicretrieval.Thesetasksrangefrombasicwordleveltocomplexchapterlevel.InCOAE2008,wetakepartinthefirstfourtasksandgetgoodresults,andthistimewehaveTask3,4and5.Task3istorecognizetheopinionatesentences;Task4istoidentifythetopicandthenclassifyitspolarity;Task5ismainlyaboutopinionretrievalandonthisbasisanalyzethesentimentalpolarity.
Therestofthispaperisorganizedasfollows:
Section2describesTask3;Section3describesTask4;Section5describesTask5;Section6givestheconclusionandfuturework.
2Task3
Intask3,itisrequiredtoautomaticallyidentify1000opinionatedsentencesintestsetDataset1.Thatis,extract1000sentencesthatcontainexplicitsentimentpolaritytowardssomepointofview.Theoutputoftheresultshouldbesortedbyconfidence.Meanwhile,theformatisconstraintbyaddingtheparticipants’informationandthenumberofarticleinDataset1.
2.1ProblemAnalysis
InLiuBing’soverviewofopinionminingtutorial(BingLiu,2005).Themaintasksinopinionatedtextanalysisareconsistedwiththefollowingones:
(1)Detectthesentimentelementindocuments.
(2)Identifythepolarityandthestrengthofsentimentelement.(3)illustratetherelationbetweenopinionobjectandsentimentelement.
Engstrom(Engstrom,2004)studiedhowthetopicdependenceaffactstheaccuracyofsentimentclassificationvalueisobservedforagivenstatement.NasukawaandYi(NasukawaandYi,2003)extractedpositiveornegativeexpressionsonagivenproductnameusinghandmadelexicons.
Theseopinion-miningissuesareallbasedontheextractionofopinionatedsentencesinscaledtexts.Therefore,howtoidentifythesentencewithopinionworksasakeyroleinsentimentanalysis.Inthistask,wedesigntwodifferentalgorithmstoextractthreetypesofsentences,asfollows:
1)Sentencewithexplicitsentimentelement,whicharemainlysentimentadjectiveandadverb.
“炫目的色彩,动听的音乐,逼真的音效……这些都是张艺谋的长处。
”
Comparatives(Doranetal.,1994)
Met-linguisticComparatives:
Thosewhichcomparetheextenttowhichanentityhasonepropertytoagreaterorlessextentthananotherproperty
“与其说生气,罗纳尔多更多的是沮丧.”
PropositionalComparatives:
Thosethatmakeacomparisonbetweentwopropositions.Thiscategoryhassubcategories:
NominalComparatives:
Theycomparethecardinalityoftwosetsofentitiesdenotedbynominalphrases.
“保尔吃的香蕉比苹果多.”
AdjectivalComparatives:
Ingeneral,thesecomparativesappearswithsomecomparativeadverbssuchas“更\更加\最”
“首先,它是目前备有3倍光学变焦200万像素数码相机中最薄最扁以及最轻的一款.”
AdverbialComparatives:
Theyaresimilartonominalandadjectivalones,insteadofcomparativeadjective,theyuseadverbsasthedescriptionforcertainproperties:
“宝马Z4跑车比其他系列启动更迅速”
Sentenceswithexplicitwordsorphraseswhichfollowingsentiment/opinionatedclauses.Thesewords/phrasescanbe“认为”,“觉得”,“指出”,etc.
2.2Solution
2.2.1Linearcombinationbasedonsentimentelementextraction
Inthisapproach,weextracttheadjectivesandadverbsforeachsentence.Thesewordshighlyrepresentthesentimentpolarity,andeachofthemobtainsapolarityvalue,whichdescribesthesentimentstrength.Thesentimentstrengthforasentenceislinearcombinedbyalltheopinionatedelements.Consideringthelongersentencemaycontainmoresentimentelements,wenormalizeitbydividing.ThefollowingformuladescribesthestrengthvalueofsentenceS:
2.2.2Classificationbaseonwordandphraselevel
Firstly,wecollectacorpusthatisoutofthetestDataset1.Weannotatethembytaggingthesentimentpolarity,andbuildatwoclass’sclassifieraftertrainingprocess.Byusingthisclassifier,wecouldclassifythetestdataset,andcollecttheresultbysortingtheconfidencevalue.Inourapproach,weuseSupportVectorMachinefortrainingandtest.Inthetrainingstep,threelevelsoffeaturesetsareextractedautomaticallyandmanually,asfollows:
a.Wordlevel:
inthislevel,asentimentdictionaryisusedtomatchtheadjectiveandadverbappearsintargetsentence.
b.Phraselevel:
weuseStanfordLog-linearPart-Of-SpeechTagger(Toutanova,2003)toannotatethecorpus,andmanuallyfilterthePOStemplateswhichindicatethesentimentpolarity.
c.Furthermore,wecombinethewordandthephrasetemplateasamorespecificfeature.
2.3ExperimentandResults
Intheabovetwosolutions,weusethesentimentdictionary(FangandYao,2008).Only500adjectiveandadverbsareselectedbytheirhigherconfidence.
Insolution2,insteadoffeaturesofsentimentdictionary,wemanuallyaddcomparativeadverbsandotherverbsthatintroduceanopinionatedclause.Intotal,thequantityoffeaturesetis512.
TheTable1describesthecomparisonofourresultandothercompetitors.Inthefirstrun,wesimplyusesolution1.Theresultissimilarwiththerun2,whichusedthefeaturesetbywordsandphrasetemplates.Butthereissignificantimprovementwhencombinethemtogether.Thisindicatestheconstraintofbothwordsanditscorrespondingphrasetemplatewillhelptoidentifythesentimentsentenceintext.
Table1ResultofTask3
Run-tag
P@1000
Precision
Recall
F1
R-accuracy
Run1
0.402
0.40321
0.0603604
0.105002
0.0603604
Run2
0.418
0.419258
0.0627628
0.109181
0.0627628
Run3
0.461
0.462387
0.0692192
0.120413
0.0692192
MEDIAN
0.45
0.45
0.0675676
0.117493
0.0675676
MAX
0.625
0.625
0.0938438
0.163185
0.0938438
3Task4
Fortheevaluationtask4,weidentifytheopinionobjectsfromthesubjectivesentences,andclassifytheopinionpolarities.Inbrief,wefirstlytakeemotionalwordsascueforselectingsubjectivesentences,thenweapplyalog-linearmodeltorankallthecandidatetargets(i.e.theobjectoftheopinion)togetherwiththeirpolarities,andfinallywepickupthebesttarget-polaritypairastheoutput.
Beforeextractingthefeaturesforthelog-linearrankingmodel,wepreprocessthecorpususingapipelinesystem,includingthefollowingmodulesinorder,
Sentenceboundarydetection(withourownscriptbasedonregularexpressions)
Wordsegmentation(StanfordChineseWordSegmenter–Tsengetal.,2005)
Part-of-Speechtagging(StanfordLog-linearPart-Of-SpeechTagger–Toutanovaetal.,2003)
Dependencyparsing(MSTParser–McDonaldetal.,2005)
Semanticrolelabeling(ourownsystem–Zhangetal.,2009)
Thelastmodule,semanticrolelabeler,isatypeofshallowsemanticprocessingtechnique,whichnormallyrevealsthepredicate-argumentrelationsbetweenwordsorconstituentsinthesentence.Forthistask,weusetheChinesesemanticrolelabelerdescribedin(Zhangetal.,2009)toprocessallthedocumentsprovidedbytheevaluationtask.TheSRLsystemwastrainedontheChinesePropBankandsuccessfullyparticipatedintheCoNLLSharedTask2009(Hajicetal.,2009).AnnotationsinChinesePropBankuserolenameslike“A0,A1”todenotearguments,and“TMP,LOC,ADV”toidentifytemporal,location,andadverbialmodificationrelations.
Themainsystemstartsfromanexistingemotionalworddictionary(Liuetal.,2008),andusethosewordswithstrongpolarities(3,thestrongest)ascueforselectingsentencesfromthewholecorpus.Inpractice,wechoose7485sentencesasoursubjectivesentences,aswellasthecandidatesforopinionobjectidentification.
Inordertobuildasupervisedlearningmodel,wemanuallygothroughabout1000sentencesandannotate294positiveinstancesand244negativeones.Theannotationlabelsweuseare+*/-*foremotionalwordsand+#/-#foropinionobjects.Duetothelimitedsizeofthetrainingset,weprunethesearchspacebyonlytakingconsiderationonnounphrasesastargetsandtheannotationisalsorestrictedontheheadwordinsteadofthewholenounphrase.
Withthemanuallyannotatedopinionobjectsandpolarities,wedevelopedtwostatisticalclassifierstoi)identifytheopinionobjectsinthegivensubjectivesentence,ii)classifythepolarityoftheopinion.Bothclassifiersaretrainedonthemanuallyannotateddataset.
Fortheobjectidentifier,thesystemstartsfromtheemotionword,andsearchinthesyntacticdependencygraphforapoten
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 倾向性 分析 评测 论文