W4 Who When Where What翻译原文.docx

文档编号：9338749
上传时间：2023-02-04
格式：DOCX
页数：18
大小：31.93KB

《W4 Who When Where What翻译原文.docx》由会员分享，可在线阅读，更多相关《W4 Who When Where What翻译原文.docx（18页珍藏版）》请在冰豆网上搜索。

W4 Who When Where What翻译原文.docx

W4WhoWhenWhereWhat翻译原文

W4:

Who?

When?

Where?

What?

ARealTimeSystemforDetectingandTrackingPeople

IsmailHaritaoglu,DavidHarwoodandLarryS.Davis

ComputerVisionLaboratory

UniversityofMaryland

CollegePark,MD20742

Abstract

W4isarealtimevisualsurveillancesystemfordetectingandtrackingpeopleandmonitoringtheiractivitiesinanoutdoorenvironment.Itoperatesonmonoculargrayscalevideoimagery,oronvideoimageryfromaninfraredcamera.Unlikemanyofsystemsfortrackingpeople,W4makesnouseofcolorcues.Instead,W4employsacombinationofshapeanalysisandtrackingtolocatepeopleandtheirpartshead,hands,feet,torsoandtocreatemodelsofpeople'sappearancesothattheycanbetrackedthroughinteractionssuchasocclusions.W4iscapableofsimultaneouslytrackingmultiplepeopleevenwithocclusion.Itrunsat25Hzfor320x240resolutionimagesonadual-PentiumPC.

1.Introduction

W4isarealtimesystemfortrackingpeopleandtheirbodypartsinmonochromaticimagery.Itconstructsdynamicmodelsofpeople'smovementstoanswerquestionsaboutwhattheyaredoing,andwhereandwhentheyact.Itconstructsappearancemodelsofthepeopleittrackssothatitcantrackpeoplethroughocclusioneventsintheimagery.InthispaperwedescribethecomputationalmodelsemployedbyW4todetectandtrackpeopleandtheirparts.ThesemodelsaredesignedtoallowW4todeterminetypesofinteractionsbetweenpeopleandobjects,andtoovercometheinevitableerrorsandambiguitiesthatariseindynamicimageanalysissuchasinstabilityinsegmentationprocessesovertime,splittingofobjectsduetocoincidentalalignmentofobjectspartswithsimilarlycoloredbackgroundregions,etc.W4employsacombinationofshapeanalysisandrobusttechniquesfortrackingtodetectpeople,andtolocateandtracktheirbodyparts.Itbuilds“appearance"modelsofpeoplesothattheycanbeidentifiedafterocclusionsorafterotherinteractionsduringwhichW4cannottrackthemindividually.

W4hasbeendesignedtoworkwithonlymonochromaticvideosources,eithervisibleorinfrared.Whilemostpreviousworkondetectionandtrackingofpeoplehasreliedheavilyoncolorcues,W4isdesignedforoutdoorsurveillancetasks,andparticularlyfornighttimeorotherlowlightlevelsituations.Insuchcases,colorwillnotbeavailable,andpeopleneedtobedetectedandtrackedbasedonweakerappearanceandmotioncues.W4isarealtimesystem.ItcurrentlyisimplementedonadualprocessorPentiumPCandcanprocessbetween20-30framesperseconddependingontheimageresolutiontypicallylowerforIRsensorsthanvideosensorsandthenumberofpeopleinitsfieldofview.Inthelongrun,W4willbeextendedwithmodelstorecognizetheactionsofthepeopleittracks.Specifically,weareinterestedininteractionsbetweenpeopleandobjects–e.g.,peopleexchangingobjects,leavingobjectsinthescene,takingobjectsfromthescene.Thedescriptionsofpeople-their-globalmotionsandthemotionsoftheirparts-developedbyW4aredesignedtosupportsuchactivityrecognition.

W4currentlyoperatesonvideotakenfromastationarycamera,andmanyofitsimageanalysisalgorithmswouldnotgeneralizeeasilytoimagestakenfromamovingcamera.Otherongoingresearchinourlaboratoryattemptstodevelopbothappearanceandmotioncuesfromamovingsensorthatmightalertasystemtothepresenceofpeopleinitsfieldofregard[9].Atthispoint,thesurveillancesystemmightstopandinvokeasystemlikeW4toverifythepresenceofpeopleandrecognizetheiractions.Moregenerally,however,onewouldbeinterestedindetectingandtrackingpeoplefromamovingsurveillanceplatform,andthisisatopiccurrentlybeinginvestigatedinourlaboratoryalso.

InW4,foregroundregionsaredetectedineveryframebyacombinationofbackgroundanalysisandsimplelowlevelprocessingoftheresultingbinaryimage.Thebackgroundsceneisstaticallymodeledbytheminimumandmaximumintensityvaluesandmaximaltemporalderivativeforeachpixelrecordedoversomeperiod,andisupdatedperiodically.ThesealgorithmsaredescribedinSection3.Eachforegroundregionismatchedtothecurrentsetofobjectsusingacombinationofshapeanalysisandtracking.Theseincludesimplespatialoccupancyoverlaptestsbetweenthepredictedlocationsofobjectsandthelocationsofdetectedforegroundregions,and“dynamic"templatematchingalgorithmsthatcorrelateevolvingappearancemodelsofobjectswithforegroundregions.Second-ordermotionmodels,whichcombinerobusttechniquesforregiontrackingandmatchingofsilhouetteedgeswithrecursiveleastsquareestimation,areusedtopredictthelocationsofobjectsinfutureframes.ThesealgorithmsaredescribedinSection4.Acardboardhumanmodelofapersoninastandarduprightposeisusedtomodelthehumanbodyandtopredictthelocationofhumanbodypartshead,torso,hands,legsandfeet.Thelocationsofthesepartsareverifiedandrefinedusingdynamictemplatematchingmethods.W4candetectandtrackmultiplepeopleincomplicatedscenesat25Hzspeedfor320x240resolutionon300MHzdual-PentiumPC.W4hasalsobeenappliedtoinfraredvideoimageryat30Hzfor160x120resolutiononthesamePC.

2.PreviousTrackingSystems

Pfinder[1]isareal-timesystemfortrackingapersonwhichusesamulti-classstatisticalmodelofcolorandshapetosegmentapersonfromabackground.Itfindsandtrackspeople'sheadandhandsunderawiderangeofviewingcondition.

[5]isageneralpurposesystemformovingobjectdetectionandeventrecognitionwheremovingobjectsmovingobjectsaredetectedusingchangedetectionandtrackedusingfirst-orderpredictionandnearestneighbormatching.Eventsarerecognizedbyapplyingpredicatestoagraphformedbylinkingcorrespondingobjectsinsuccessiveframes.

KidRooms[2,8]isatrackingsystembasedon”closed-worldregions".Theseareregionsofspaceandtimeinwhichthespecificcontextofwhatisintheregionsisassumedtobeknown.Theseregionsaretrackedinreal-timedomainswhereobjectmotionsarenotsmoothorrigid,andwheremultipleobjectsareinteracting.Breglerusesmanylevelsofrepresentationbasedonmixturemodels,EM,andrecursiveKalmanandMarkovestimationtolearnandrecognizehumandynamics[4].Deformabletrackersthattracksmallimagesofpeoplearedescribedin[6].

3.BackgroundSceneModelingandForegroundRegionDetection

FramedifferencinginW4isbasedonamodelofbackgroundvariationobtainedwhilethescenecontainsnopeople.Thebackgroundsceneismodeledbyrepresentingeachpixelbythreevalues;itsminimumandmaximumintensityvaluesandthemaximumintensitydifferencebetweenconsecutiveframesobservedduringthistrainingperiod.ThesevaluesareestimatedoverseveralsecondsofvideoandareupdatedperiodicallyforthosepartsofthescenethatW4determinestocontainnoforegroundobjects.

Foregroundobjectsaresegmentedfromthebackgroundineachframeofthevideosequencebyafourstageprocess:

thresholding,noisecleaning,morphologicalfilteringandobjectdetection.

Eachpixelisfirstclassifiedaseitherabackgroundoraforegroundpixelusingthebackgroundmodel.Givingtheminimum,maximum,andthelargestinterframeabsolutedifferenceimagesthatrepresentthebackgroundscenemodel,pixelxfromimageIisaforegroundpixelif:

Figure1:

MotionestimationofbodyusingSilhouetteEdgeMatchingbetweentwosuccessiveframea:

inputimage;b:

detectedforegroundregions;c:

alignmentofsilhouttheedgesbasedondifferenceinmedian;d:

finalallignmentaftersilhouettecorelation.

|M（x）-I（x）|>D（x）or|N（x）-I（x）|>D（x）

（1）

Thresholdingalone,however,isnotsufficienttoobtainclearforegroundregions;itresultsinasignificantlevelofnoise,forexample,duetoilluminationchanges.W4usesregion-basednoisecleaningtoeliminatenoiseregions.Afterthresholding,oneiterationoferosionisappliedtoforegroundpixelstoeliminateone-pixelthicknoise.Then,afastbinaryconnected-componentoperatorisappliedtofindtheforegroundregions,andsmallregionsareeliminated.Sincetheremainingregionsaresmallerthantheoriginalones,theyshouldberestoredtotheiroriginalsizesbyprocessessuchaserosionanddilation.

Generally,findingasatisfactorycombinationoferosionanddilationstepsisquitedifficult,andnofixedcombinationworkswell,ingeneralonouroutdoorimages.Instead,W4appliesmorphologicaloperatorstoforegroundpixelsonlyafternoisepixelsareeliminated.So,W4reappliesbackgroundsubtraction,followedbyoneinterationeachofdilationanderosion,butonlytothosepixelsinsidetheboundingboxesoftheforegroundregionsthatsurvivedthesizethresholdingoperation.

Asthefinalstepofforegroundregiondetection,abinaryconnectedcomponentanalysisisappliedtotheforegroundpixelstoassignauniquelabeltoeachforegroundobject.W4generatesasetoffeaturesforeachdetectedforegroundobject,includingitslocallabel,centroid,median,andboundingbox.

4ObjectTracking

Thegoalsoftheobjecttrackingstageareto:

Todeterminewhenanewobjectentersthesystem'sfieldofview,andinitializemotionmodelsfortrackingthatobject.

TocomputethecorrespondencebetweentheforegroundregionsdetectedbythebackgroundsubtractionandtheobjectscurrentlybeingtrackedW4.

Toemploytrackingalgorithmstoestimatethepositionofthetorsoofeachobject,andupdatethemotionmodelusedfortracking.W4employssecondordermotionmodelsincludingavelocityand,possiblyzero,accelerationtermstomodelboththeoverallm