书签分享收藏举报版权申诉 / 19

立即下载加入VIP,免费下载

当前位置：首页 > 小学教育 > 数学 > 翻译模板1 用于城市交通灯控制的增强型多主体多目标增强学习系统朱文瑾博士翻译.docx

翻译模板1 用于城市交通灯控制的增强型多主体多目标增强学习系统朱文瑾博士翻译.docx

文档编号：4105725
上传时间：2022-11-27
格式：DOCX
页数：19
大小：213.32KB

《翻译模板1 用于城市交通灯控制的增强型多主体多目标增强学习系统朱文瑾博士翻译.docx》由会员分享，可在线阅读，更多相关《翻译模板1 用于城市交通灯控制的增强型多主体多目标增强学习系统朱文瑾博士翻译.docx（19页珍藏版）》请在冰豆网上搜索。

翻译模板1 用于城市交通灯控制的增强型多主体多目标增强学习系统朱文瑾博士翻译.docx

翻译模板1用于城市交通灯控制的增强型多主体多目标增强学习系统朱文瑾博士翻译

翻译：

朱文瑾学号：

12124010072013年7月

201211thInternationalConferenceonMachineLearningandApplications2012第11届国际大会机器学习和应用

EnhancedMultiagentMulti-Objective

ReinforcementLearningforUrbanTrafficLightControl

用于城市交通灯控制的增强型多主体多目标增强学习系统

MohamedA.Khamis∗,StudentMember,IEEE,andWalidGomaa∗,†

∗DepartmentofComputerScienceandEngineering

Egypt-JapanUniversityofScienceandTechnology（E-JUST）

Alexandria,Egypt

Email:

{mohamed.khamis,walid.gomaa}@ejust.edu.eg

Abstract—Trafficlightcontrolisoneofthemajorproblemsinurbanareas.Thisisduetotheincreasingnumberofvehiclesandthehighdynamicsofthetrafficnetwork.Ordinarymethodsfortrafficlightcontrolcausehighrateofaccidents,wasteintime,andaffecttheenvironmentnegativelyduetothehighratesoffuelconsumption.Inthispaper,wedevelopanenhancedversionofourmultiagentmulti-objectivetrafficlightcontrolsystemthatisbasedonaReinforcementLearning（RL）approach.Asatestbedframeworkforourtrafficlightcontroller,weusetheopensourceGreenLightDistrict（GLD）vehicletrafficsimulator.WeanalyzeandfixsomeimplementationproblemsinGLDthatemergedwhenapplyingamorerealisticcontinuoustimeaccelerationmodel.Weproposeanewcooperationmethodbetweentheneighboringtrafficlightagentcontrollersusingspecificlearningandexplorationrates.Ourenhancedtrafficlightcontrollerminimizesthetriptimeinmajorarteriesandincreasessafetyinresidentialareas.Inaddition,ourtrafficlightcontrollersatisfiesgreenwavesforplatoonstravelinginmajorarteriesandconsidersaswellthetrafficenvironmentalimpactbykeepingthevehiclesspeedswithinthedesirablethresholdsforlowestfuelconsumption.Inordertoevaluatetheenhancementsandnewmethodsproposedinthispaper,wehaveaddednewperformanceindicestoGLD.

摘要：

交通灯控制是城市的主要问题之一。

这都要归功于不断增长的车辆数量和交通网络的高动态性。

传统的交通灯控制模式在这样的情况下会导致高交通事故率、时间的浪费，并且由于能源的浪费对环境造成负面影响。

在本文中，我们开发了一种基于增强学习（ReinforcementLearning（RL））方法的多主体、多目标的增强版交通灯控制系统。

作为交通灯控制器的测试平台，我们使用开放源绿灯区（GreenLightDistrict（GLD））车辆交通模拟器。

我们分析并固定了一些在GLD中，当应用一个更加逼真的时间连续加速模型，暴露出来的执行问题。

我们提出一种新的在邻近的交通灯代理控制器间的合作模式，它使用特殊的学习和探索率。

我们提高交通灯控制器，在主要路段减少行程时间并增强居民区的安全性。

此外,我们的交通灯控制器满足在主干道上成排行驶的车辆绿波性，同时通过保持车速在理想的最低燃油消耗阈值内，兼顾了交通的环境影响。

为了评估本文提出的增强型新方法,我们对GLD增加了新的性能指标。

Keywords-multi-objectivetrafficlightcontroller;reinforcementlearning;multiagentcooperation;environmentalimpact;trafficgreenwaves;

关键词：

多目标交通灯控制、增强学习、多主体合作、环境影响、交通绿波

I.INTRODUCTION

前言

Urbantrafficlightcontrolisoneofthemajorproblemsincountries.Poortrafficlightcontrolcausesconsiderablewaitingtimeswithhighrateofaccidentsandhasanegativeimpactontheenvironmentduetothehugeamountoffuelconsumptionespeciallyinhighlycongestedtrafficareas.

城市交通信号灯控制是一个国家的重要问题。

糟糕的交通灯控制会导致相当大的等待时间并伴随着高交通事故率和对环境的负面影响——特别是由严重的交通赌赛造成巨大的耗油量。

Inthispaper,wedevelopanenhancedversionofourmulti-agentmulti-objectivetrafficlightcontrolsystempresentedin[1],[2]basedonaRLtrafficlightcontrolapproach[3].

在本文中，我们开发了一种基于RL交通灯控制方法（文献[3]）的增强型的多主体、多目标的交通灯控制系统（在参考文献[1][2]中出现）。

Inthiswork,weusetheGLDvehicletrafficsimulator[4]asatestbedframework.Thecontributionsofthispaperare:

（1）fixingsomeimplementationproblemsinGLDthatemergedwhenapplyingamorerealisticaccelerationmodelthatistime-continuous,

（2）usinganewcooperationmethodbetweenthetrafficlightagentcontrollersthatisbasedonpropagatingthelearntknowledgefromthehighlylearntagentstotheirneighboringagentsoflessknowledge,（3）minimizingtheAverageTripTime（ATT）inmajorarteries,（4）increasingsafetyinresidentialareas,（5）satisfyinggreenwavesforplatoonstravelinginmajorarteries,（6）consideringthetrafficenvironmentalimpactbykeepingthevehiclesspeedswithinthethresholdsoflowestfuelconsumption,and（7）addingnewperformanceindicestotheGLDtrafficsimulatortoevaluatethesystemperformance.

在本文的工作中，我们使用GLD车辆交通模拟器[4]作为测试结构。

本文的贡献在于：

（1）在GLD系统中固定了一些执行问题，这些问题在应用在一个更加逼真的时间连续的加速系统中时会暴露出来；

（2）在临近的交通灯代理控制器间使用了一种新的合作模式，这种模式是根据高级别信息量大的学习代理器自动传输到它们临近的信息量小的学习代理器；（3）在主干道上最小化平均单程时间（theAverageTripTime（ATT））；（4）增加了居民区的安全性；（5）满足在主干道上成排行驶的车辆绿波性；（6）通过保持车速在理想的最低燃油消耗阈值内，兼顾了交通的环境影响；（7）为GLD交通模拟器增加了新的性能指标，用于评估系统性能。

Theremainingofthispaperisorganizedasfollows;therelatedworkisdiscussedinsectionII.AbackgroundontheGLDtrafficsimulationandcontrolispresentedinsectionIII.Ourenhancedmulti-objectivetrafficlightcontrolsystemisdepictedinsectionIV.SystemperformanceevaluationispresentedinsectionV.Finally,sectionVIconcludesthepaperandgivesdirectionsforfuturework.

其余的文章是如下组织的：

相关工作

Thereexistdifferentmachinelearningapproachesthatarerecentlyusedforurbantrafficlightcontrolincludingreinforcementlearning,fuzzylogic,evolutionaryalgorithms,andartificialneuralnetworks.

最近,用于城市交通信号灯控制存在不同的机器学习方法,包括强化学习、模糊逻辑、遗传算法和人工神经网络。

Trafficlightcontrolmethodsbasedonfuzzylogic,e.g.,[5]aremoresuitabletocontroltrafficatanisolatedintersection.Evolutionaryalgorithmssuchasgeneticalgorithmsandantalgorithms,e.g.,[6],[7]cannotbeeasilyappliedforonlineoptimizationoflargescaletrafficcoordinatedcontrolduetotheircharacteristicsofrandomsearchandimplicitparallelcomputing.Asmentionedin[8],thesemethodswillspendhugetimetoconvergetotheoptimaltrafficlightdecisionforlargescaleproblems.

如文献[5]中所述，基于模糊逻辑的交通信号灯控制方法，更适合用在一个独立的十字路口交通控制。

如文献[6]、[7]中所述的进化算法的遗传算法和蚁群算法，由于其随机搜索和盲目的并行计算的特性，不能轻易的用于大规模交通联动控制联机最优化。

又如文献[8]中所述，在解决大规模问题中，这些方法将花费巨大的时间收敛到最优交通灯设计。

InRLmethodse.g.,[2],[3],[9]eachtrafficlightcontrolleragentlearnshowtocontrolthetrafficlightthroughitsinteractionwiththeenvironmentandgainsomefeedback（rewardsignal）.Throughatrial-and-errorprocess,theagentlearnsapolicythatoptimizesthecumulativerewarditgainsovertime.MostRLapproachesthathavetrafficlight-basedstate-spacee.g.,[10],[11],[12]sufferfromthegrowthinthenumberofstateswhenscalingtolargernetworks,thustheyareonlyappliedtorelativelysmallscaletrafficnetworks.

在RL模式中，如文献[2]、[3]、[9]中所述，每一个交通灯控制器代理，通过与环境和增加部分反馈（反馈信号）的相互作用，学习怎样控制交通灯。

通过试验-差错处理，代理器学习了如何优化随着时间的增加累积的反馈的政策。

大多数RL方法具备交通灯基础状态空间，如文献[10]、[11]、[12]所述，在御用大规模网络中受到状态数增长的影响，因此他们只用于规模较小的交通网络。

Weadoptavehicle-basedstate-spaceRLapproach[3]inwhichthecontrollerpredictsforeachvehicletheestimatedremainingwaitingtimeuntilitarrivestoitsdestinationincasethetrafficlightisredorgreen.Thosepredictionsarethencombinedforallvehiclesatthecontrolledtrafficjunctionandthetrafficlightdecisionistakenaccordingly.

我们采用一种车辆基础状态空间RL方法，文献[3]中所述，在交通灯是红色或绿色的情况下，控制器为每辆车预测估算的余下等待时间，直到车辆到达它的目的地。

这些预测将组合所有在控制交叉节点车辆和相应的控制交通灯状态。

Inthisrepresentation,thenumberofstateswillgrowlinearlyinthenumberoflanesandvehiclespositionsandthuswillscalewellforlargenetworks.

这就表示，状态数将随着车道和车辆位置线性增长，因此将适合大型网络应用。

III.BACKGROUND

背景介绍

A.TrafficSimulationModel

交通仿真模型

Thetrafficsimulationinfrastructureconsistsofroadsandnodes.Everyroadconnectstwonodesthatcanbeeitheranedgenode（startorendpointofthegeneratedvehicles）oratrafficlightjunction.Everyroadcanconsistofseverallanesineachdirection.Thereexisttwotypesofagents:

vehiclesandtrafficlightcontrollers.Trafficlightagentsareupdatedeverytimestepwiththenewpositionsofthevehiclesandtakeitstrafficlightdecisionautonomously.

交通仿真基础设施包括道路和节点。

每一条路连接两个节点可以是一个边缘节点（产生车辆的开始或结束点）或一个交通灯结点。

每一条路都可以由每个方向的几个车道构成。

这就存在两种类型的代理:

车辆和交通灯控制器。

交通灯代理每次更新的时间步长为车辆移动到了新的位置，并自动采集当时的交通灯判决信息。

Eachedgenodehasaprobabilitytogenerateanewvehicleateverytimestep∈[0,1]（i.e.,1meansavehicleisgeneratedeverytimestepfromtheedgenode,0meansnovehiclewillbegenerated）.Everytimestep,avehiclecaneitherstayatthesamepositionormoveaheadinthesamelaneorcrossthecurrentintersectionandjointhenextlanetowardsitsfinaldestination.Eachtrafficlightcontrollercantakesomedecisionsrepresentingtheconsistenttrafficlightconfigurationsthatdonotleadtoanaccidentbetweencrossingvehiclesatthecontrolledjunction（e.g.,settingthetrafficlightsattwolanesinoppositedirectionstogreen）.

每个边缘节点在每一个时间步长有一个概率∈[0,1]来生成一台新车辆（即1是指每一个时间步长在边缘节点有车辆生成,0表示没有车辆生成）。

每一个时间步长中,一台车辆可以保持在不变的位置或在同一车道向前移动，或者通过下一个十字路口并加入相邻的指向其最终的目的地的车道。

每个交通灯控制器可以做一些判决，呈现始终如一的红绿灯配置,这就不会引起受控节点中通过车辆间的事故（如，在两条反向线路设置绿灯交通信号）。

B.RLforUrbanTrafficLightControl

用于城市交通灯控制的RL系统

IntheadoptedRLmodel[3],thevehiclestatemeansthatthevehicleisatalanecontrolledbyaspecifictrafficlight,shortlydenotedbytl,thevehicleisataspecificpositioninthislane,denotedbypos,andhasaspecificdestinationedgenode,denotedbydes.

在文献[3]中采用的RL模型，车辆状态意味着在特定的交通灯控制下的车道中的车辆，用tl表示，车辆在车道中的特定位置上，用pos表示，特定的重点边缘节点，用des表示。

Thus,thevehiclecurrentandnextstatescanbedenotedbys=[tl,pos,des]ands’=[tl’,pos’],respectively,wherethevehiclefinaldestinationdoesnotchangebythestatetransition.ThestatetransitionprobabilityisgivenbyP（s,a,s’）,wherea（redorgreen）representstheactionofthetrafficlighttl.P（a|s）istheprobabilitythattheactionofthetrafficlighttlisagiventhatavehicleisatstates.

因此,车辆当前和下一个状态可以分别用s=[tl、pos、des）和s=[tl’,pos']表示,车辆最终目的地不因状态的转换而改变。

状态转换概率由P（s,a,s’）给出，其中a（红色或绿色）代表交通灯tl的动作。

P（a|s）是交通灯tl动作的概率，是a给定下的车辆在s状态的概率。

InordertocalculatethestateprobabilitiesP（s,a,s’）andP（a|s）,somecountersareupdatedeverytimestep.Theoriginalmodel[3]dependsonthefrequentistprobabilityinterpretation,whileweproved