Commentaires 0

Retranscription du document

1Dynamic Vehicle Routing for Robotic SystemsFrancesco Bullo Emilio Frazzoli Marco Pavone Ketan Savla Stephen L.SmithAbstract—Recent years have witnessed great advancements inthe science and technology of autonomy,robotics and networking.This paper surveys recent concepts and algorithms for dynamicvehicle routing (DVR),that is,for the automatic planning ofoptimal multi-vehicle routes to perform tasks that are generatedover time by an exogenous process.We consider a rich varietyof scenarios relevant for robotic applications.We begin byreviewing the basic DVR problem:demands for service arriveat random locations at random times and a vehicle travels toprovide on-site service while minimizing the expected wait timeof the demands.Next,we treat different multi-vehicle scenariosbased on different models for demands (e.g.,demands withdifferent priority levels and impatient demands),vehicles (e.g.,motion constraints,communication and sensing capabilities),andtasks.The performance criterion used in these scenarios iseither the expected wait time of the demands or the fractionof demands serviced successfully.In each speciﬁc DVR scenario,we adopt a rigorous technical approach that relies upon methodsfrom queueing theory,combinatorial optimization and stochasticgeometry.First,we establish fundamental limits on the achievableperformance,including limits on stability and quality of service.Second,we design algorithms,and provide provable guaranteeson their performance with respect to the fundamental limits.I.INTRODUCTIONThis survey presents a joint algorithmic and queueing ap-proach to the design of cooperative control and task allocationstrategies for networks of uninhabited vehicles and robots.The approach enables groups of robots to complete tasks inuncertain and dynamically changing environments,where newtask requests are generated in real-time.Applications includesurveillance and monitoring missions,as well as transportationnetworks and automated material handling.As a motivating example,consider the following scenario:a sensor network is deployed in order to detect suspiciousactivity in a region of interest.(Alternatively,the sensornetwork is replaced by a high-altitude sensory-rich aircraftloitering over the region.) In addition to the sensor network,a team of unmanned aerial vehicles (UAVs) is available andSubmitted to the Proceedings of the IEEE in May 2010,revised in February2011.This research was partially supported by AFOSR award FA 8650-07-2-3744,ARO MURI award W911NF-05-1-0219,NSF awards ECCS-0705451and CMMI-0705453,and ARO award W911NF-11-1-0092.F.Bullo is with the Center for Control,Dynamical Systems and Com-putation and with the Department of Mechanical Engineering,University ofCalifornia,Santa Barbara,CA 93106 (bullo@engineering.ucsb.edu).E.Frazzoli is with the Laboratory for Information and Decision Systems,Department of Aeronautics and Astronautics,Massachusetts Institute ofTechnology,Cambridge,MA 02139 (frazzoli@mit.edu).M.Pavone is with the Jet Propulsion Laboratory,California Institute ofTechnology,Pasadena,CA 91109 (marco.pavone@jpl.nasa.gov).K.Savla is with the Laboratory for Information and Decision Systems,Massachusetts Institute of Technology,Cambridge,MA 02139 (ksavla@mit.edu).S.L.Smith is with the Department of Electrical and Computer Engineering,University of Waterloo,Waterloo ON,N2L 3G1 Canada (stephen.smith@uwaterloo.ca).The authors are listed in alphabetical order.each UAV is equipped with close-range high-resolution on-board sensors.Whenever a sensor detects a potential event,a request for close-range observation by one of the UAVsis generated.In response to this request,a UAV visits thelocation to gather close-range information and investigates thecause of the alarm.Each request for close-range observationmight include priority levels or time windows during which theinspection must occur and it might require an on-site servicetime.In summary,from a control algorithmic viewpoint,eachtime a new request arises,the UAVs need to decide whichvehicle will inspect that location and along which route.Thus,the problem is to design algorithms that enable real-time taskallocation and vehicle routing.Accordingly,this paper surveys allocation and routing al-gorithms that typically blend ideas from receding-horizon re-source allocation,distributed optimization,combinatorics andcontrol.The key novelty in our approach is the simultaneousintroduction of stochastic,combinatorial and queueing aspectsin the distributed coordination of robotic networks.Static vehicle routing:In the recent past,considerable ef-forts has been devoted to the problem of how to cooperativelyassign and schedule demands for service that are deﬁned overan extended geographical area [1],[2],[3],[4],[5].In thesepapers,the main focus is in developing distributed algorithmsthat operate with knowledge about the demands locationsand with limited communication between robots.However,the underlying mathematical model is static,in that no newdemands arrive over time.Thus,the centralized version of theproblem ﬁts within the framework of the static vehicle routingproblem (see [6] for a thorough introduction to this problem,known in the operations research literature as the VehicleRouting Problem (VRP)),whereby:(i) a team of m vehiclesis required to service a set of n demands in a 2-dimensionalspace;(ii) each demand requires a certain amount of on-site service;(iii) the goal is to compute a set of routes thatoptimizes the cost of servicing (according to some quality ofservice metric) the demands.In general,most of the availableliterature on routing for robotic networks focuses on staticenvironments and does not properly account for scenarios inwhich dynamic,stochastic and adversarial events take place.Dynamic vehicle routing:The problemof planning routesthrough service demands that arrive during a mission exe-cution is known as the “dynamic vehicle routing problem”(abbreviated as the DVR problem).See Figure 1 for anillustration of DVR.There are two key differences betweenstatic and dynamic vehicle routing problems.First,planningalgorithms should actually provide policies (in contrast to pre-planned routes) that prescribe how the routes should evolve asa function of those inputs that evolve in real-time.Second,dynamic demands (i.e.,demands that vary over time) addqueueing phenomena to the combinatorial nature of vehicle2Fig.1.An illustration of dynamic vehicle routing for a robotic system.Frompanel#1 to#2:vehicles are assigned to customers and select routes.Panel#3:the DVR problem is how to re-allocate and re-plan routes when newcustomers appear.routing.In such a dynamic setting,it is natural to focuson steady-state performance instead of optimizing the perfor-mance for a single demand.Additionally,system stability interms of the number of waiting demands is an issue to beaddressed.Algorithmic queueing theory for DVR:The objective ofthis work is to present a joint algorithmic and queueing ap-proach to the design of cooperative control and task allocationstrategies for networks of uninhabited vehicles required tooperate in dynamic and uncertain environments.This approachis based upon the pioneering work of Bertsimas and VanRyzin [7],[8],[9],who introduced queueing methods to solvethe simplest DVR problem (a vehicle moves along straightlines and visits demands whose time of arrival,locationand on-site service are stochastic;information about demandlocation is communicated to the vehicle upon demand arrival);see also the earlier related work [10].Starting with these works [7],[8],[9] and integrating ideasfrom dynamics,combinatorial optimization,teaming,and dis-tributed algorithms,we have recently developed a systematicapproach to tackle complex dynamic routing problems forrobotic networks.We refer to this approach as “algorithmicqueueing theory” for dynamic vehicle routing.The power ofalgorithmic queueing theory stems from the wide spectrum ofaspects,critical to the routing of robotic networks,for whichit enables a rigorous study;speciﬁc examples taken from ourwork in the past few years include complex models for thedemands such as time constraints [11],[12],service priori-ties [13],and translating demands [14],problems concerningrobotic implementation such as adaptive and decentralizedalgorithms [15],[16],complex vehicle dynamics [17],[18],limited sensing range [19],and team forming [20],and evenintegration of humans in the design space [21].Survey content:In this work we provide a detailedaccount of algorithmic queueing theory for DVR,with anemphasis on robotic applications.We start in Section II byreviewing the possible approaches to dynamic vehicle routingproblems.Then,in Section III,we describe the foundations ofalgorithmic queueing theory,which lie on the aforementionedworks of Bertsimas and Van Ryzin.In the following foursections we discuss some of our recent efforts in applyingalgorithmic queueing theory to realistic dynamic routing prob-lems for robotic systems.Speciﬁcally,in Section IV we present routing policies forDVR problems that (i) are spatially distributed,scalable tolarge networks,and adaptive to network changes,(ii) haveremarkably good performance guarantees in both the light-load regime (i.e.,when the arrival rate for the demands issmall) and in the heavy-load regime (i.e.,when the arrival ratefor the demands is large).Here,by network changes we meanchanges in the number of vehicles,the arrival rate of demands,and the characterization of the on-site service requirement.In Section V we discuss time-constrained and prioritizedservice.For time-constrained DVR problems,we establishupper and lower bounds on the optimal number of vehicles fora given level of service quality (deﬁned as the desired fractionof demands that must receive service within the deadlines).Additionally,we rigorously characterize two service policies:in light load the DVR problem with time constraints isclosely related to a particular facility location problem,and inmoderate and heavy load,static vehicle routing methods,suchas solutions of traveling salesman problems,can provide goodperformance.We then study DVR problems in which demandshave an associated level of priority (or importance).Theproblem is characterized by the number of different priorityclasses n and their relative levels of importance.We providelower bounds on the optimal performance and a service policywhich is guaranteed to perform within a factor 2n2of theoptimal in the heavy-load.We then study the implications of vehicle motion constraintsin Section VI.We focus on the Dubins vehicle,namely,a nonholonomic vehicle that is constrained to move alongpaths of bounded curvature without reversing direction.Form Dubins vehicles,the DVR problem with arrival rate  andwith uniform spatial distribution has the following properties:the system time is (i) of the order 2=m3in heavy-load,(ii)of the order 1=pm in the light-load if the vehicle density issmall,and of the order 1=3pm in the light-load if the densityof the vehicles is high.In Section VII we discuss the case when vehicles areheterogeneous,each capable of providing a speciﬁc type ofservice.Each demand may require several different services,implying that collaborative teams of vehicles must be formedto service a demand.We present three simple policies for thisproblem.For each policy we show that there is a broad classof system parameters for which the policy’s performance iswithin a constant factor of the optimal.Finally,in Section VIII we summarize other recent resultsin DVR and draw our conclusions.II.ALGORITHMIC APPROACHES TO DVR PROBLEMSIn this section we review possible approaches to DVRproblems and motivate our proposed algorithmic queueingtheory approach.A.On the Adaptation of Queueing Policies and Static MethodsA naive,yet reasonable approach to DVR problems wouldbe to adapt classic queueing policies to spatial queueingsystems.However,perhaps surprisingly,this adaptation is notat all straightforward.For example,policies based on a First-Come First-Served discipline,whereby tasks are fulﬁlled in theorder in which they arrive,are unable to stabilize the systemfor all possible task arrival rates,in the sense that under such3policies the average number of tasks grows over time withoutbound [7,page 608].A second possibility is to combine static routing methods(e.g.,nearest neighbor or VRP-like methods) and sequential re-optimization algorithms.This approach,indeed,will be at thecore of most of the policies we present in this work.However,the joint selection of a static routing method and of the re-optimization horizon in presence of robot and task constraints(e.g.,differential motion constraints or task priorities) makesthe application of this approach far from trivial.For example,one can showthat an erroneous selection of the re-optimizationhorizon can lead to pathological scenarios where no taskreceives service [22].Likewise,direct application of VRP-like methods might lead to infeasible paths for vehicles withdifferential motion constraints.Finally,performance criteriain dynamic settings commonly differ from those of the corre-sponding static problems (e.g.,in a dynamic setting,the waitfor service delivery might be a more important factor than thetotal vehicle travel cost).The general conclusion is that DVR problems require adhoc routing algorithms together with tailored performanceanalyses.There are currently two main algorithmic approachesthat allowboth a rigorous synthesis and a performance analysisof routing algorithms for DVR problems;we review these twoapproaches next.B.Online AlgorithmsAn online algorithm is one that operates based on input in-formation given up to the current time.Thus,these algorithmsare designed to operate in scenarios where the entire input isnot known at the outset,and new pieces of the input shouldbe incorporated as they become available.The distinctivefeature of the online algorithm approach is the method that isused to evaluate the performance of online algorithms,whichis called competitive analysis [23].In competitive analysis,the performance of an online algorithm is compared to theperformance of a corresponding ofﬂine algorithm (i.e.,analgorithm that has a priori knowledge of the entire input) inthe worst case scenario.Speciﬁcally,an online algorithm isc-competitive if its cost on any problem instance is at most ctimes the cost of an optimal ofﬂine algorithm:Costonline(I)  c Costoptimal ofﬂine(I);8 problem instances I:In the recent past,dynamic vehicle routing problems havebeen studied in this framework,under the name of the onlinetraveling repairman problem [24],[25],[26].While the online algorithm approach applied to DVR hasled to numerous results and interesting insights,it leaves somequestions unanswered,especially in the context of roboticnetworks.First,competitive analysis is a worst-case analysis,hence,the results are often overly pessimistic for normal prob-lem instances.Moreover,in many applications there is someprobabilistic problem structure (e.g.,distribution of the inter-arrival times,spatial distribution of future demands,distribu-tion of on-site service times etc.),that can be advantageouslyexploited by the vehicles.In online algorithms,this additionalinformation is not taken into account.Second,competitiveanalysis is used to bound the performance relative to theoptimal ofﬂine algorithm,and thus it does not give an absolutemeasure of performance.In other words,an optimal onlinealgorithm is an algorithm with minimum “cost of causality” inthe worst-case scenario,but not necessarily with the minimumworst-case cost.Finally,many important real-world constraintsfor DVR,such as time windows,priorities,differential con-straints on vehicle’s motion and the requirement of teams tofulﬁll a demand “have so far proved to be too complex to beconsidered in the online framework” [27,page 206].Some ofthese drawbacks have been recently addressed by [28] wherea combined stochastic and online approach is proposed for ageneral class of combinatorial optimization problems and isanalyzed under some technical assumptions.This discussion motivates an alternative approach for DVRin the context of robotic networks,based on probabilisticmodeling,and average-case analysis.C.Algorithmic Queueing TheoryAlgorithmic queueing theory embeds the dynamic vehiclerouting problem within the framework of queueing theory andovercomes most of the limitations of the online algorithmapproach;in particular,it allows to take into account severalreal-world constraints,such as time constraints and differentialconstraints on vehicles’ dynamics.We call this approachalgorithmic queueing theory since its objective is to synthesizean efﬁcient control policy,whereas in traditional queueingtheory the objective is usually to analyze the performance of aspeciﬁc policy.Here,an efﬁcient policy is one whose expectedperformance is either optimal or optimal within a constantfactor.1Algorithmic queueing theory basically consists of thefollowing steps:(i) queueing model of the robotic system and analysis ofits structure;(ii) establishment of fundamental limitations on perfor-mance,independent of algorithms;and(iii) design of algorithms that are either optimal or constant-factor away from optimal,possibly in speciﬁc asymp-totic regimes.Finally,the proposed algorithms are evaluated via numerical,statistical and experimental studies,including Monte-Carlocomparisons with alternative approaches.In order to make the model tractable,customers are usuallyconsidered “statistically independent” and their arrival processis assumed stationary (with possibly unknown parameters).Because these assumptions can be unrealistic in some scenar-ios,this approach has its own limitations.The aim of thispaper is to show that algorithmic queueing theory,despitethese disadvantages,is a very useful framework for the designof routing algorithms for robotic networks and a valuablecomplement to the online algorithm approach.1The expected performance of a policy is the expected value of theperformance over all possible inputs (i.e.,demand arrival sequences).A policyperforms within a constant factor  of the optimal if the ratio between thepolicy’s expected performance and the optimal expected performance is upperbounded by .4III.ALGORITHMIC QUEUEING THEORY FOR DVRIn this section we describe algorithmic queueing theory.Westart with a short review of some fundamental concepts fromthe locational optimization literature,and then we introducethe general approach.A.Preliminary ToolsThe Euclidean Traveling Salesman Problem (in short,TSP)is formulated as follows:given a set D of n points in Rd,ﬁnd a minimum-length tour (i.e.,a closed path that visits allpoints exactly once) of D.More properties of the TSP tourcan be found in Section A of the Appendix.In this paper,wewill present policies that require real-time solutions of TSPsover possibly large point sets;this can indeed be achieved byusing efﬁcient approximation algorithms presented in SectionB of the Appendix.Let Q  R2be a bounded,convex set (the followingconcepts can be similarly deﬁned in higher dimensions).LetP = (p1;:::;pm) be an array of m distinct points in Q.TheVoronoi diagram of Q generated by P is an array of sets,denoted by V(P) = (V1(P);:::;Vm(P)),deﬁned byVi(P) = fx 2 Qj kx pik  kx pjk;8j 2 f1;:::;mgg;where k  k denotes the Euclidean norm in R2.We refer to Pas the set of generators of V(P),and to Vi(P) as the Voronoicell or the region of dominance of the ith generator.The expected distance between a random point q,generatedaccording to a probability density function':Q!R0,andthe closest point in P is given byHm(P;Q):= Emink2f1;:::;mgkpkqk:The function Hmis known in the locational optimizationliterature as the continuous Weber function or the continu-ous multi-median function;see [29],[30] and the referencestherein.The m-median of the set Q with density'is theglobal minimizerPm(Q) = arg minP2QmHm(P;Q):We let Hm(Q) = Hm(Pm(Q);Q) be the global minimumof Hm.The set of critical points of Hmcontains all arrays(p1;:::;pm) with distinct entries and with the property thateach point pkis simultaneously the generator of the Voronoicell Vk(P) and the median of Vk(P).We refer to such Voronoidiagrams as median Voronoi diagrams.It is possible to showthat a median Voronoi diagram always exists for any boundedconvex domain Qand density'.More properties of the multi-median function are discussed in Section C of the Appendix.B.Queueing Model for DVRHere we review the model known in the literature as them-vehicle Dynamic Traveling Repairman Problem (m-DTRP)and introduced in [7],[8].Consider m vehicles free to move,at a constant speedv,within R2.The extension to higher-dimensional setups isstraightforward unless otherwise noted.On the other hand,constraints on the motion of the vehicles (e.g.,obstacles)require in general non-trivial extensions of our approach,andour results do not necessarily hold.Demands are generated in a bounded and convex set Q,called the environment,according to a homogeneous (i.e.,time-invariant) spatio-temporal Poisson process,with timeintensity  2 R>0,and spatial density':Q!R>0.In otherwords,demands arrive to Q according to a Poisson processwith intensity ,and their locations fXj;j  1g are i.i.d.(i.e.,independent and identically distributed) and distributedaccording to a density'whose support is Q.Many results inthis paper extend to the case in which Qis not convex,and werefer the interested reader to our full-length papers for moredetails.A demand’s location becomes known (is realized) at itsarrival epoch;thus,at time t we know with certainty thelocations of demands that arrived prior to time t,but futuredemand locations form an i.i.d.sequence.The density'satisﬁes:P[Xj2 S] =ZS'(x) dx 8S  Q;andZQ'(x) dx = 1:At each demand location,vehicles spend some time s  0in on-site service that is i.i.d.and generally distributed withﬁnite ﬁrst and second moments denoted by s > 0 ands2.Arealized demand is removed from the system after one of thevehicles has completed its on-site service.We deﬁne the loadfactor %:= s=m.The system time of demand j,denoted by Tj,is deﬁned asthe elapsed time between the arrival of demand j and the timeone of the vehicles completes its service.The waiting time ofdemand j,Wj,is deﬁned by Wj= Tjsj.The steady-statesystem time is deﬁned byT:= limsupj!1E[Tj].A policyfor routing the vehicles is said to be stable if the expectednumber of demands in the system is uniformly bounded atall times.A necessary condition for the existence of a stablepolicy is that % < 1;we shall assume % < 1 throughout thepaper.When we refer to light-load conditions,we considerthe case %!0+,in the sense that !0+;when we refer toheavy-load conditions,we consider the case %!1,in thesense that !(m=s).Let P be the set of all causal,stable,and time-invariantrouting policies andTbe the system time of a particularpolicy  2 P.The m-DTRP is then deﬁned as the problemof ﬁnding a policy 2 P (if one exists) such thatT:=T= inf2PT:In general,it is difﬁcult to characterize the optimal achiev-able performanceTand to compute the optimal policy for arbitrary values of the problem parameters ,m,etc.Itis instead possible and useful to consider particular ranges ofparameter values and,speciﬁcally,asymptotic regimes suchas the light-load and the heavy-load regimes.For the purposeof characterizing asymptotic performance,we brieﬂy reviewsome useful notation.For f;g:N!R,f 2 O(g)(respectively,f 2(g)) if there exist N02 N and k 2 R>0such that jf(N)j  kjg(N)j for all N  N0(respectively,jf(N)j  kjg(N)j for all N  N0).If f 2 O(g) andf 2(g),then the notation f 2 (g) is used.5C.Lower Bounds on the System TimeAs in many queueing problems,the analysis of the DTRPproblem for all the values of the load factor % in (0;1) isdifﬁcult.In [7],[8],[9],[31],lower bounds for the optimalsteady-state system time are derived for the light-load case(i.e.,%!0+),and for the heavy-load case (i.e.,%!1).Subsequently,policies are designed for these two limitingregimes,and their performance is compared to the lowerbounds.For the light-load case,a tight lower bound on the systemtime is derived in [8].In the light-load case,the lower boundon the system time is strongly related to the solution of them-median problem:T1vHm(Q) + s;as %!0+:(1)The bound is tight:there exist policies whose system times,in the limit %!0+,attain this bound;we present suchasymptotically optimal policies for the light-load case below.Two lower bounds exist for the heavy-load case [9],[31]depending on whether one is interested in biased policies orunbiased policies.Deﬁnition III.1 (Spatially biased and unbiased policies).LetX be the location of a randomly chosen demand and W beits wait time.A policy  is said to be(i) spatially unbiased if for every pair of sets S1,S2 QE[WjX 2 S1] = E[WjX 2 S2];and(ii) spatially biased if there exist sets S1,S2 Q such thatE[WjX 2 S1] > E[WjX 2 S2]:Within the class of spatially unbiased policies in P,theoptimal system time is lower bounded byTU2TSP;22RQ'1=2(x)dx2m2v2(1 %)2as %!1;(2)where TSP;2'0:7120  0:0002 (for more detail on theconstant TSP;2,we refer the reader to Appendix A).Within the class of spatially biased policies in P,theoptimal system time is lower bounded byTB2TSP;22RQ'2=3(x)dx3m2v2(1 %)2as %!1:(3)Both bounds (2) and (3) are tight:there exist policies whosesystem times,in the limit %!1,attain these bounds;therefore the inequalities in (2) and (3) could indeed bereplaced by equalities.We present asymptotically optimalpolicies for the heavy-load case below.It is shown in [9] thatthe lower bound in equation (3) is always less than or equalto the lower bound in equation (2) for all densities'.We conclude with some remarks.First,it is possible to show(see [9],Proposition 1) that a uniform spatial density functionleads to the worst possible performance and that any deviationfrom uniformity in the demand distribution will strictly lowerthe optimal mean system time in both the unbiased and biasedcase.Additionally,allowing biased service results in a strictreduction of the optimal expected system time for any non-uniform density'.Finally,when the density is uniform thereis nothing to be gained by providing biased service.D.Centralized and Ad-Hoc PoliciesIn this section we present centralized,ad-hoc policies thatare either optimal in light-load or optimal in heavy-load.Here,we say that a policy is ad-hoc if it performs “well” only fora limited range of values of %.In light-load,the SQM policyprovides optimal performance (i.e.,lim%!0+TSQM=T= 1):The m Stochastic Queue Median (SQM) Pol-icy [8] — Locate one vehicle at each of the mmedian locations for the environment Q.Whendemands arrive,assign them to the vehicle corre-sponding to the nearest median location.Have eachvehicle service its respective demands in First-Come,First-Served (FCFS) order returning to its medianafter each service is completed.This policy,although optimal in light-load,has two charac-teristics that limit its application to robotic networks:First,it quickly becomes unstable as the load increases,i.e.,thereexists %c< 1 such that for all % > %cthe system timeTSQMis inﬁnite (hence,this policy is ad-hoc).Second,a centralentity needs to compute the m-median locations and assignthem to the vehicles (hence,from this viewpoint the policy iscentralized).In heavy-load,the UTSP policy provides optimal unbiasedperformance (i.e.,lim%!1TUTSP=TU= 1):The Unbiased TSP (UTSP) Policy [9] —Let r bea ﬁxed positive,large integer.From a central pointin the interior of Q,subdivide the service region intor wedges Q1;:::;Qrsuch thatRQk'(x)dx = 1=r,k 2 f1;:::;rg.Within each subregion,form setsof demands of size n=r (n is a design parameter).As sets are formed,deposit them in a queue andservice them FCFS with the ﬁrst available vehicleby forming a TSP on the set and following it inan arbitrary direction.Optimize over n (see [9] fordetails).It is possible to show that,as %!1,TUTSP1 +mr2TSP;22RQ'1=2(x)dx2m2v2(1 %)2;(4)thus,letting r!1,the lower bound in (2) is achieved.The same paper [9] presents an optimal biased policy.This policy,called Biased TSP (BTSP) Policy,relies on aneven ﬁner partition of the environment and requires'to bepiecewise constant.Although both the UTSP and the BTSP policies are optimalwithin their respective classes,they have two characteristicsthat limit their application to robotic networks:First,in theUTSP policy,to ensure stability,n should be chosen so that(see [9],page 961)n >22TSP;2RQ'1=2(x) dx2m2v2(1 %)2;6therefore,to ensure stability over a wide range of values of%,the system designer is forced to select a large value forn.However,if during the execution of the policy the loadfactor turns out to be only moderate,demands have to wait foran excessively large set to be formed,and the overall systemperformance deteriorates signiﬁcantly.Similar considerationshold for the BTSP policy.Hence,these two policies are ad-hoc.Second,both policies require a centralized data structure(the demands’ queue is shared by the vehicles);hence,bothpolicies are centralized.Remark III.2 (System time bounds in heavy-load with zeroservice time).If s = 0,then the heavy-load regime is deﬁnedas =m!+1,and all the performance bounds we providein this and in the next two sections hold by simply substituting% = 0.For example,equation (2) readsTU2TSP;22RQ'1=2(x)dx2m2v2as =m!+1:IV.ROUTING FOR ROBOTIC NETWORKS:DECENTRALIZED AND ADAPTIVE POLICIESIn this section we ﬁrst discuss routing algorithms thatare both adaptive and amenable to decentralized implemen-tation;then,we present a decentralized and adaptive routingalgorithm that does not require any explicit communicationbetween the vehicles while still being optimal in the light-load case.A.Decentralized and Adaptive PoliciesHere,we say that a policy is adaptive if it performs“well” for every value of % in the range [0;1).A candidatedecentralized and adaptive control policy is the simple NearestNeighbor (NN) policy:at each service completion epoch,eachvehicle chooses to visit next the closest unserviced demand,if any,otherwise it stops at the current position.Because ofthe dependencies among the inter-demand travel distances,the analysis of the NN policy is difﬁcult and no rigorousresults have been obtained so far [7];in particular,there areno rigorous results about its stability properties.Simulationexperiments show that the NN policy performs like a biasedpolicy and is not optimal in the light-load case orin the heavy-load case [7],[9].Therefore,the NN policy lacks provableperformance guarantees (in particular about stability),and doesnot seem to achieve optimal performance in light-load or inheavy-load.In [15],we study decentralized and adaptive routing policiesthat are optimal in light-load and that are optimal unbiasedalgorithms in heavy-load.The key idea we pursue is that ofpartitioning policies:Deﬁnition IV.1 (Partitioning policies).Given a policy  forthe 1-DTRP and mvehicles,a -partitioning policy is a familyof multi-vehicle policies such that(i) the environment Q is partitioned into m openly disjointsubregions Qk,k 2 f1;:::;mg,whose union is Q,(ii) one vehicle is assigned to each subregion (thus,thereis a one-to-one correspondence between vehicles andsubregions),(iii) each vehicle executes the single-vehicle policy  in orderto service demands that fall within its own subregion.Because Deﬁnition IV.1 does not specify how the environ-ment is actually partitioned,it describes a family of policies(one for each partitioning strategy) for the m-DTRP.The SQMpolicy,which is optimal in light-load,is indeed a partitioningpolicy whereby Qis partitioned according to a median Voronoidiagram and each vehicle executes inside its own Voronoiregion the policy “service FCFS and return to the medianafter each service completion.” Moreover,speciﬁc partitioningpolicies,which will be characterized in Theorem IV.2,areoptimal or within a constant factor of the optimal in heavy-load.In the following,given two functions'j:Q!R>0,j 2 f1;2g,withRQ'j(x) dx = cj,an m-partition (i.e.,a partition into m subregions) is simultaneously equitablewith respect to'1and'2ifRQi'j(x) dx = cj=m for alli 2 f1;:::;mg and j 2 f1;2g.Theorem12 in [32] shows that,given two such functions'j,j 2 f1;2g,there always existsan m-partition that is simultaneously equitable with respect to'1and'2,and whose subregions Qiare convex.Then,thefollowing results characterize the optimality of two classes ofpartitioning policies [15].Theorem IV.2 (Optimality of partitioning policies).Assumeis a single-vehicle,unbiased optimal policy in the heavy-load regime (i.e.,%!1).For m vehicles,(i) a -partitioning policy based on an m-partition whichis simultaneously equitable with respect to'and'1=2is an optimal unbiased policy in heavy-load.(ii) a -partitioning policy based on an m-partition whichis equitable with respect to'does not achieve,ingeneral,the optimal unbiased performance,however itis always within a factor m of it in heavy-load.The above results lead to the following strategy:First,forthe 1-DTRP,one designs an adaptive and unbiased (in heavy-load) control policy with provable performance guarantees.Then,by using decentralized algorithms for environment par-titioning,such as those recently developed in [33],one extendssuch single-vehicle policy to a decentralized and adaptivemulti-vehicle policy.Consider,ﬁrst,the single vehicle case.The single-vehicle Divide & Conquer (DC) Policy— Compute an r-partition fQkgrk=1of Q that issimultaneously equitable with respect to'and'1=2.Let~P1be the point minimizing the sum of distancesto demands serviced in the past (if no points havebeen visited in the past,~P1is set to be a randompoint in Q),and let D be the set of outstandingdemands waiting for service.If D =;,moveto~P1.If,instead,D 6=;,randomly choose ak 2 f1;:::;rg and move to subregion Qk;computethe TSP tour through all demands in subregion Qkand service all demands in Qkby following thisTSP tour.If D 6=;repeat the service process insubregion k +1 (modulo r).This policy is unbiased in heavy-load.In particular,if r!7+1,the policy (i) is optimal in light-load and achievesoptimal unbiased performance in heavy-load,and (ii) is stablein every load condition.It is possible to show that with r = 10the DC policy is already guaranteed to be within 10% of theoptimal (for unbiased policies) performance in heavy-load.If,instead,r = 1,the policy (i) is optimal in light-load andwithin a factor 2 of the optimal unbiased performance inheavy-load,(ii) is stable in every load condition,and (iii) itsimplementation does not require the knowledge of'.This lastproperty implies that,remarkably,when r = 1,the DC policyadapts to all problem data (both % and').It is worth notingthat when r = 1 and'is constant over Q the DC policy issimilar to the generation policy presented in [34].The optimality of the SQM policy and Theorem IV.2(i)suggest the following decentralized and adaptive multi-vehicleversion of the DC policy:(i) compute an m-median of Q that induces a Voronoipartition that is equitable with respect to'and'1=2,(ii) assign one vehicle to each Voronoi region,(iii) each vehicle executes the single-vehicle DC policy inorder to service demands that fall within its own subre-gion,by using the median of the subregion instead of~P1.For a given Q and',if there exists an m-median of Q thatinduces a Voronoi partition that is equitable with respect to'and'1=2,then the above policy is optimal both in light-loadand arbitrarily close to optimality in heavy-load,and stabilizesthe system in every load condition.There are two main issueswith the above policy,namely (i) existence of an m-median ofQthat induces a Voronoi partition that is equitable with respectto'and'1=2,and (ii) how to compute it.In [33],we showedthat for some choices of Q and'a median Voronoi diagramthat is equitable with respect to'and'1=2fails to exist.Additionally,in [33],we presented a decentralized partitioningalgorithm that,for any possible choice of Q and',providesa partition that is equitable with respect to'and represents a“good” approximation of a median Voronoi diagram (see [33]for details on the metrics that we use to judge “closeness”to median Voronoi diagrams).Moreover,if an m-median ofQ that induces a Voronoi partition that is equitable withrespect to'exists,the algorithm will locally converge toit.This partitioning algorithm is related to the classic Lloydalgorithm from vector quantization theory,and exploits theunique features of power diagrams,a generalization of Voronoidiagrams.Accordingly,we deﬁne the multi-vehicle Divide & Conquerpolicy as follows.The multi-vehicle Divide & Conquer (m-DC)Policy — The vehicles run the decentralized parti-tioning algorithmdiscussed above (see [33] for moredetails) and assign themselves to the subregions (thispart is indeed a by-product of the algorithm in [33]).Simultaneously,each vehicle executes the single-vehicle DC policy inside its own subregion.The m-DC policy is within a factor m of the optimalunbiased performance in heavy-load (since the algorithm in[33] always provides a partition that is equitable with respectto'),and stabilizes the system in every load condition.Ingeneral,the m-DC policy is only suboptimal in light-load;note,however,that the computation of the global minimum ofthe Weber function Hm(which is non-convex for m > 1) isdifﬁcult for m > 1 (it is NP-hard for the discrete version ofthe problem);therefore,for m> 1,suboptimality has also tobe expected from any practical implementation of the SQMpolicy.If an m-median of Q that induces a Voronoi partitionthat is equitable with respect to'exists,the m-DC will locallyconverge to it,thus we say that the m-DC policy is “locally”optimal in light-load.Note that,when the density is uniform,a partition that isequitable with respect to'is also equitable with respect to'1=2;therefore,when the density is uniform the m-DC policyis arbitrarily close to optimality in heavy-load (see TheoremIV.2(i)).The m-DC policy adapts to arrival rate ,expected on-siteservice s,and vehicle’s velocity v;however,it requires theknowledge of'.Tables I and II provide a synoptic view of the resultsavailable so far;in particular,our policies are compared withthe best unbiased policy available in the literature,i.e.,theUTSP policy with r!1.In Table I,an asterisk * signalsthat the result is heuristic.Note that there are currently noresults about decentralized and adaptive routing policies thatare optimal in light-load and that are optimal biased algorithmsin heavy-load.B.A Policy with No Explicit Inter-vehicle CommunicationA common theme in cooperative control is the investigationof the effects of different communication and information shar-ing protocols on the systemperformance.Clearly,the ability toaccess more information at each single vehicle cannot decreasethe performance level;hence,it is commonly believed thatproviding better communication among vehicles will improvethe system’s performance.In [16],we propose a policy forthe DVR that does not rely on dedicated communicationlinks between vehicles,but only on the vehicles’ knowledgeof outstanding demands.An example is when outstandingdemands broadcast their location,but vehicles are not awareof one another.We show that,under light load conditions,theinability of vehicles to communicate explicitly does not limitthe steady-state performance.In other words,the informationcontained in the outstanding demands (and hence the effectsof others on them) is sufﬁcient to provide,in light loadconditions,the same convergence properties attained whenvehicles are able to communicate explicitly.The No (Explicit) Communication (NC) Policy —Let D be the set of outstanding demands waiting forservice.If D =;,move to the point minimizing theaverage distance to demands serviced in the past byeach vehicle.If there is no unique minimizer,thenmove to the nearest one.If,instead,D 6=;,movetowards the nearest outstanding demand location.In the NC policy,whenever one or more service requestsare outstanding,all vehicles will be pursuing a demand;inparticular,when only one service request is outstanding,all8TABLE IPOLICIES FOR THE 1-DTRPPropertiesDC Policy,r!1DC Policy,r = 1RH Policy [15]UTSP Policy,r!1Light-load performanceoptimaloptimaloptimalnot optimalHeavy-load performanceoptimalwithin 100 % of the optimalwithin 100% of the optimal*optimalAdaptive to ,s,and vyesyesyesnoAdaptive to'noyesyesnoTABLE IIPOLICIES FOR THE m-DTRPPropertiesm-DC Policy,r!1UTSP Policy,r!1Light-load performance“locally” optimalnot optimalHeavy-load performanceoptimal for uniform',within m of optimal unbiased in generaloptimalAdaptive to ,s,and vyesnoAdaptive to'nonoDistributedyesnovehicles will move towards it.When the demand queue isempty,vehicles will either (i) stop at the current location,ifthey have visited no demands yet,or (ii) move to their ref-erence point,as determined by the set of demands previouslyvisited.In [16],we prove that the system time provided by the NCpolicy converges to a critical point (either a saddle point or alocal minimum) of Hm(Q) with high probability as !0+.Let us underline that,in general,the achieved critical pointstrictly depends on the initial positions of the vehicles.Wecannot exclude that the algorithm so designed will convergeindeed to a saddle point instead of a local minimum.Thisis due to the fact that the algorithm does not follow thesteepest direction of the gradient of the function Hm,butjust the gradient with respect to one of the variables.Onthe other hand,since the algorithm is based on a sequenceof demands and at each phase we are trying to minimizea different cost function,it can be proved that the criticalpoints reached by this algorithm are no worse than the criticalpoints reached knowing a priori the distribution'.In [16],wealso report results from illustrative numerical experiments thatcompare the performance of the NC policy with a sensor-based policy according to which a demand is consideredonly by the vehicle whose reference point is closest to thedemand location at the time of its generation.We observe that,as  increases,the performance of the NC policy degradessigniﬁcantly,almost approaching the performance of a single-vehicle system over an intermediate range of values of .Interestingly,this efﬁciency loss seems to decrease for large ,and the numerical results suggest that the NC policy recoversa similar performance as the sensor-based policy in the heavyload limit.Interestingly,the NC policy can be regarded as a learningalgorithm in the context of the following game [16].Theservice requests are considered as resources and the vehiclesas selﬁsh entities.The resources offer rewards in a continuousfashion and the vehicles can collect these rewards by travelingto the resource locations.Every resource offers reward at aunit rate when there is at most one vehicle present at itslocation and the life of the resource ends as soon as morethan one vehicle are present at its location.This setup can beunderstood to be an extreme form of congestion game,wherethe resource cannot be shared between vehicles and where theresource expires at the ﬁrst attempt to share it.The total rewardfor vehicle i from a particular resource is the time differencebetween its arrival and the arrival of the next vehicle,if iis the ﬁrst vehicle to reach the location of the resource,andzero otherwise.The utility function of vehicle i is then deﬁnedto be the expected value of reward,where the expectation istaken over the location of the next resource.Hence,the goal ofevery vehicle is to select their reference location to maximizethe expected value of the reward from the next resource.In[16],we prove that the median locations,as a choice forreference positions,are an efﬁcient pure Nash equilibrium forthis game.Moreover,we prove that by maximizing their ownutility function,the vehicles also maximize the common globalutility function,which is the negative of the average wait timefor service requests.V.ROUTING FOR ROBOTIC NETWORKS:TIMECONSTRAINTS AND PRIORITIESIn many vehicle routing applications,there are strict servicerequirements for demands.This can be modeled in two ways.In the ﬁrst case,demands have (possibly stochastic) deadlineson their waiting times.In the second case,demands havedifferent urgency or “threat” levels,which capture the relativeimportance of each demand.In this section we study thesetwo related problems and provide routing policies for bothscenarios.We discuss hard time constraints in Section V-Aand priorities in Section V-B.In this section we focus only on the case of a uniformspatialdensity'.However,the algorithms we present below extenddirectly to non-uniform density.One simply replaces the equalarea partitions with simultaneously equitable (with respect to'and'1=2) partitions,as described for the DC policy inSection IV.The presentation,on the other hand,would becomemore involved,and thus we restrict our attention to uniformdensities.9A.Time ConstraintsIn [11],[12] we introduced and analyzed DVR with timeconstraints.Speciﬁcally,the setup is the same as that of the m-DTRP,but now each demand j waits for the beginning of itsservice no longer than a stochastic patience time Gj,which isgenerally distributed according to a distribution function FG.A vehicle can start the on-site service for the jth demand onlywithin the stochastic time window [Aj;Aj+Gj),where Ajis the arrival time of the jth demand.If the on-site service forthe jth demand is not started before the time instant Aj+Gj,then the jth demand is considered lost;in other words,suchdemand leaves the system and never returns.If,instead,theon-site service for the jth demand is started before the timeinstant Aj+Gj,then the demand is considered successfullyserviced.The waiting time of demand j,denoted again byWj,is the elapsed time between the arrival of demand jand the time either one of the vehicles starts its service orsuch demand departs from the system due to impatience,whichever happens ﬁrst.Hence,the jth demand is consideredserviced if and only if Wj< Gj.Accordingly,we denote byP[Wj< Gj] the probability that the jth demand is servicedunder a routing policy .The aim is to ﬁnd the minimumnumber of vehicles needed to ensure that the steady-stateprobability that a demand is successfully serviced is largerthan a desired value d2 (0;1),and to determine the policythe vehicles should execute to ensure that such objective isattained.Formally,deﬁne the success factor of a policy  as:= limj!+1P[Wj< Gj].We identify four types ofinformation on which a control policy can rely:1) Arrivaltime and location:we assume that the information on arrivalsand locations of demands is immediately available to controlpolicies;2) On-site service:the on-site service requirement ofdemands may either (i) be available,or (ii) be available onlythrough prior statistics,or (iii) not be available to control poli-cies;3) Patience time:the patience time of demands may either(i) be available,or (ii) be available only through prior statistics;4) Departure notiﬁcation:the information that a demand leavesthe system due to impatience may or may not be availableto control policies (if the patience time is available,suchinformation is clearly available).Hence,several informationstructures are relevant.The least informative case is when on-site service requirements and departure notiﬁcations are notavailable,and patience times are available only through priorstatistics;the most informative case is when on-site servicerequirements and patience times are available.Given an information structure,we then study the followingoptimization problem OPT:OPT:minjj;subject to limj!1P[Wj< Gj]  d;where jj is the number of vehicles used by  (the existence ofthe limit limj!1P[Wj< Gj] and equivalent formulationsin terms of time averages are discussed in [12],[22]).Let mdenote the optimal cost for the problem OPT (for a giveninformation structure).In principle,one should study the problem OPT for eachof the possible information structures.In [12],instead,weconsidered the following strategy:ﬁrst,we derived a lowerbound that is valid under the most informative informationstructure (this implies validity under any information struc-ture),then we presented and analyzed two service policies thatare amenable to implementation under the least informativeinformation structure (this implies implementability under anyinformation structure).Such approach gives general insightsinto the problem OPT.1) Lower Bound:We next present a lower bound for theoptimization problem OPT that holds under any informationstructure.Let P = (p1;:::;pm) and deﬁneLm(P;Q):= 1 1jQjZQFGmink2f1;:::;mgkx xkkvdx:Theorem V.1 (Lower bound on OPT ).Under any informa-tion structure,the optimal cost for the minimization problemOPT is lower bounded by the optimal cost for the minimiza-tion problemminm2N>0msubject to supP2QmLm(P;Q)  d:(5)The proof of this lower bound relies on some nearest-neighbor arguments.Algorithms to ﬁnd the solution to theminimization problem in equation (5) have been presented in[12].2) The Nearest-Depot Assignment (NDA) Policy:We nextpresent the Nearest-Depot Assignment (NDA) policy,whichrequires the least amount of information and is optimal inlight-load.The Nearest-Depot Assignment (NDA) Policy —Let~Pm(Q):= arg maxP2Qm Lm(P;Q) (if thereare multiple maxima,pick one arbitrarily),and let~pkbe the location of the depot for the kth vehicle,k 2 f1;:::;mg.Assign a newly arrived demandto the vehicle whose depot is the nearest to thatdemand’s location,and let Dkbe the set of out-standing demands assigned to vehicle k.If the setDkis empty,move to ~pk;otherwise,visit demandsin Dkin ﬁrst-come,ﬁrst-served order,by taking theshortest path to each demand location.Repeat.In [12] we prove that the NDApolicy is optimal in light-loadunder any information structure.Note that the NDA policy isvery similar to the SQM policy described in section III-D;the only difference is that the depot locations are now themaximizers of Lm,instead of the minimizers of Hm.3) The Batch (B) Policy:Finally,we present the Batch (B)policy,which is well-deﬁned for any information structure,however it is particularly tailored for the least informative caseand is most effective in moderate and heavy-loads.The Batch (B) Policy — Partition Q into m equalarea regions Qk,k 2 f1;:::;mg,and assign one ve-hicle to each region.Assign a newly arrived demandthat falls in Qkto the vehicle responsible for regionk,and let Dkbe the set of locations of outstandingdemands assigned to vehicle k.For each vehicle-region pair k:if the set Dkis empty,move to the10median (the “depot”) of Qk;otherwise,compute aTSP tour through all demands in Dkand vehicle’scurrent position,and service demands by followingthe TSP tour,skipping demands that are no longeroutstanding.Repeat.Note that this policy is basically a simpliﬁed version of them-DC policy (with r = 1).The following theorem characterizes the batch policy,underthe assumption of zero on-site service,and assuming the leastinformative information structure.Theorem V.2 (Vehicles required by batch policy).Assumingzero on-site service,the batch policy guarantees a successfactor at least as large as dif the number of vehicles isequal to or larger than:minnm sup2R>0(1FG())(12g(m)=)do;where g(m):=122v2jQjm2+q4v4jQj22m4+82v2jQj1m,and where is a constant that depends on the shape of theservice regions.Furthermore,in [11] we show that when (i) the system isin heavy-load,(ii) dtends to one,and (iii) the deadlines aredeterministic,the batch policy requires a number of vehiclesthat is within a factor 3.78 of the optimal.B.PrioritiesIn this section we look at a DVR problemin which demandsfor service have different levels of importance.The service ve-hicles must then prioritize,providing a quality of service whichis proportional to each demand’s importance.We introducedthis problem in [13].Formally,we assume an environmentQ  R2,with area jQj,and m vehicles traveling in R2,eachwith maximum speed v.Demands of type  2 f1;:::;ng,called -demands,arrive in the environment according to aPoisson process with rate .Upon arrival,demands assumean independently and uniformly distributed location in Q.An-demand requires on-site service with ﬁnite mean s.For this problem the load factor can be written as%:=1mnX=1s:(6)The condition % < 1 is necessary for the existence of a stablepolicy.For a stable policy ,the average system time perdemand isT=1nX=1T;;where :=Pn=1,andT;is the expected system timeof -demands (under routing policy ).The average systemtime per demand is the standard cost functional for queueingsystems with multiple classes of demands.Notice that wecan writeT=Pn=1cT;with c= =.Thus,ifwe aim to assign distinct importance levels,we can modelpriority among classes by allowing any convex combinationofT;1;:::;T;n.If c> =,then the system time of -demands is being weighted more heavily than in the averagecase.In other words,the quantity c=gives the priority of-demands compared to that given in the average system timecase.Without loss of generality we can assume that priorityclasses are labeled so thatc11c22    cnn;(7)implying that if  <  for some ; 2 f1;:::;ng,then thepriority of -demands is at least as high as that of -demands.The problem is as follows.Consider a set of coefﬁcientsc> 0, 2 f1;:::;ng,withPn=1c= 1,and satisfyingexpression (7).Determine the policy  (if it exists) whichminimizes the costT;c:=nX=1cT;:In the light-load case where %!0+we can use existingpolicies to solve the problem.This is summarized in thefollowing remark.Remark V.3 (Light-load regime).In light-load,it can beveriﬁed that the Stochastic Queue Median policy (see Sec-tion III-D) provides optimal performance.That is,the vehiclescan simply ignore the priorities and service the demands inthe FCFS order,returning to their median locations betweeneach service.1) Lower Bound in Heavy-Load:In this section we presenta lower bound on the weighted system timeT;cfor everypolicy .Theorem V.4 (Heavy-load lower bound).The system time ofany policy  is lower bounded byT;c2TSP;2jQj2m2v2(1 %)2nX=1c+2nXj=+1cj;(8)as %!1,where c1;:::;cnsatisfy expression (7).Remark V.5 (Lower bound for all % 2 [0;1)).Lowerbound (8) holds only in heavy-load.We can also obtain alower bound that is valid for all values of %.However,in theheavy-load limit it is less tight than bound (8).Under thelabeling in expression (7),this general bound for any policy isT;c

2jQjm2v2(1 %)2nX=1c+2nXj=+1cjmc121+nX=1cs;(9)where % 2 [0;1) and = 2=(3p2)  0:266.2) The Separate Queues Policy:In this section we presentthe Separate Queues (SQ) policy.This policy utilizes a prob-ability distribution p = [p1;:::;pn],where p> 0 foreach  2 f1;:::;ng,deﬁned over the priority classes.Thedistribution p is a set of parameters to be used to optimizeperformance.Separate Queues (SQ) Policy — Partition Q intom equal area regions and assign one vehicle to11Fig.2.A representative simulation of the SQ policy for one vehicle and twopriority classes.Circle shaped demands are high priority,and diamond shapedare low priority.The vehicle is marked by a chevron shaped object and TSPtour is shown in a solid line.The left-ﬁgure shows the vehicle computing a tourthrough class 2 demands.The right-ﬁgure shows the vehicle after completingthe class 2 tour and computing a new tour through all outstanding class 1demands.each region.For each vehicle,if region contains nodemands,then move to median location of regionuntil a demand arrives.Otherwise,select a classaccording to the distribution p.Compute a TSPtour through all demands in region of the selectedclass and service all of these demands by followingthe TSP tour.When tour is completed,repeat byselecting a new class.Figure 2 shows an illustrative example of the SQ policy.Inthe ﬁrst frame the vehicle is servicing only class 2 (diamondshaped) demands,whereas in the second frame,the vehicle isservicing class 1 (circle shaped) demands.3) Performance of the SQ Policy:By upper bounding theexpected steady-state number of demands in each class,we areable to obtain the following expression for the system time ofthe SQ policy in heavy-load:TSQ;c2TSP;2jQjm2v2(1 %)2nX=1cp

nXi=1pipi!2:(10)Thus,we can minimize this upper bound by appropriatelyselecting the probability distribution p = [p1;:::;pn].Withthe selectionp:= cfor each  2 f1;:::;ng;we obtain the following result.Theorem V.6 (SQ policy performance).As %!1,thesystem time of the SQ policy is within a factor 2n2ofthe optimal system time.This factor is independent of thearrival rates 1;:::;n,coefﬁcients c1;:::;cn,service timess1;:::;sn,and the number of vehicles m.In [13],numerical experiments are used to verify the tight-ness of the upper bound (10),and to compare methods foroptimization of the distribution p.4) Heuristic Improvements:We now present two heuristicimprovements on the SQ policy.The ﬁrst improvement,calledthe queue merging heuristic,is guaranteed to never increasethe upper bound on the expected system time,and in certaininstances it signiﬁcantly decreases the upper bound.To moti-vate the modiﬁcation,consider the case when all classes haveequal priority (i.e.,c1=1=    = cn=n),and we use theprobability assignment p= cfor each class .Then,theupper bound for the Separate Queues policy is n times largerthan if we (i) ignore priorities,(ii) merge the n classes into asingle class,and (iii) run the SQ policy on the merged class(i.e.,at each iteration,service all outstanding demands in Qvia the TSP tour).Motivated by this discussion,we deﬁne a merge conﬁgu-ration to be a partition of n classes f1;:::;ng into`setsC1;:::;C`,where`2 f1;:::;ng.The idea is to run the Sepa-rate Queues policy on the`classes,where class i 2 f1;:::;`ghas arrival rateP2Ciand convex combination coefﬁcientP2Cic.Given a merge conﬁguration fC1;:::;C`g,andusing the probability assignment pi=P2Cicfor eachclass i 2 f1;:::;`g,the analysis leading to (10) can easily bemodiﬁed to yield an upper bound of2TSP;2jQj`m2v2(1 %)20@`Xi=1sX2CicX2Ci1A2:(11)The SQ-policy with merging can be summarized as follows:Separate Queues (SQ) with Merging Policy —Find the merge conﬁguration fC1;:::;C`g whichminimizes equation (11).Run the Separate Queuespolicy on`classes,where class i has arrivalrateP2Ciand convex combination coefﬁcientP2Cic.Now,to minimize equation (11) in the SQ with Mergingpolicy,one must search over all possible partitions of a setof n elements.The number of partitions is given by the BellNumber and thus search becomes infeasible for more thanapproximately 10 classes.However,one can also limit thesearch space in order to increase the number of classes thatcan be considered as in [13].The second heuristic improvement for the SQ policy whichcan be used in implementation is called the tube heuristic.Theheuristic improvement is as follows:The Tube Heuristic — When following a tour,service all newly arrived demands that lie withindistance  > 0 of the tour.The idea behind the heuristic is to utilize the fact that somenewly arrived demands will be “close” to the demands inthe current service batch,and thus can be serviced withminimal additional travel cost.Analysis of the tube heuristic iscomplicated by the fact that it introduces correlation betweendemand locations.The parameter  should be chosen such that the total tourlength is not increased by more than,say,10%.A roughcalculation shows that  should scale as sjQjtotal expected number of demands;where  is the fractional increase in tour length (e.g.,10%).Numerical simulations presented in [13] show that this heuris-tic,with an appropriately chosen value of ,improves theSQ performance by a factor of approximately 2.In a moresophisticated implementation we deﬁne an for each  212f1;:::;ng,where the magnitude of is proportional to theprobability p.VI.ROUTING FOR ROBOTIC NETWORKS:CONSTRAINTS ON VEHICLE MOTIONIn this section,we consider the m-DTRP described in theearlier sections with the addition of differential constraints onthe vehicle’s motion [18].In particular,we concentrate onvehicles that are constrained to move on the plane at constantspeed v > 0 along paths with a minimum radius of curvature > 0.Such vehicles,often referred to as Dubins vehicles,have been extensively studied in the robotics and controlliterature [35],[36],[37].Moreover,the Dubins vehicle modelis widely accepted as a reasonably accurate model to representaircraft kinematics,e.g.,for air trafﬁc control [38],[39],andUAV mission planning purposes [40],[4],[41].Accordingly,the DVR problem studied in this section will be referred toas the m-Dubins DTRP.In this section we focus only on thecase of a uniform spatial density'.A feasible path for the Dubins vehicle (called Dubins path)is deﬁned as a path that is twice differentiable almost every-where,and such that its radius of curvature is bounded belowby .Since a Dubins vehicle cannot stop,we only considerzero on-site service time.Hence,the generic load factor % =s=m,as deﬁned in subsection III-B,becomes inappropriatefor this setup and,in accordance with Remark III.2,the heavy-load regime is deﬁned as =m!+1.We correspondinglydeﬁne the light-load regime as =m!0+.A.Lower BoundsIn this section we provide lower bounds on the system timefor the m-Dubins DTRP.Theorem VI.1 (System time lower bounds).The optimalsystem time for the m-Dubins DTRP satisﬁes the followinglower bounds:THm(Q)v;(12)liminfd!+1TmjQj1=333p34v;(13)liminfm!+1T m328164jQjv3;(14)where jQj is the area of Qand d:=2mjQjis the nonholonomicvehicle density.The lower bound (12) follows from equation (1);howeverthis bound is obtained by approximating the Dubins distance(i.e.,the length of the shortest feasible path for a Dubinsvehicle) with the Euclidean distance.The lower bound (13) isobtained by explicitly taking into account the Dubins turningcost.Although the ﬁrst two lower bounds of Theorem VI.1are valid for any ,they are particularly useful in the light-load regime.The lower bound (14) is valid and useful in theheavy-load regime.QFig.3.Illustration of the Median Circling policy.The squares representPm(Q),the m-median of Q.Each vehicle loiters about its respectivegenerator at a radius .The regions of dominance are the Voronoi partitiongenerated by Pm(Q).In this ﬁgure,a demand has appeared in the subregionroughly in the upper-right quarter of the domain.The vehicle responsible forthis subregion has left its loitering orbit and is en route to service the demand.B.Routing Policies for the m-Dubins DTRPWe start by considering two policies that are particularlyefﬁcient in light load.The ﬁrst light-load policy,called the Me-dian Circling policy,imitates the optimal policy for Euclideanvehicles,assigning static regions of responsibility.As usual,let Pm(Q) be the m-median of Q.The policy is formallydescribed as follows.The Median Circling (MC) Policy — Let the loi-tering orbits for the vehicles be circular trajectoriesof radius  centered at entries of Pm(Q),with eachvehicle allotted one trajectory.Each vehicle visits thedemands in the Voronoi region Vi(Pm(Q)) in theorder in which they arrive.When no demands areavailable,the vehicle returns to its loitering orbit;the direction in which the orbit is followed is notimportant,and can be chosen in such a way that theorbit is reached in minimum time.An illustration of the MC policy is shown in Figure 3.We next introduce a second light-load policy,namely theStrip Loitering policy,which is more efﬁcient than the MCpolicy when the nonholonomic vehicle density is large andrelies on dynamic regions of responsibility for the vehicles.Anillustration of the Strip Loitering policy is shown in Figure 4.The Strip Loitering (SL) Policy — Bound theenvironment Q with a rectangle of minimum height,where height denotes the smaller of the two sidelengths of a rectangle.Let R and S be the widthand height of this bounding rectangle,respectively.Divide Q into strips of width r,wherer = min(43pRS +10:38Sm2=3;2):Orient the strips along the side of length R.Con-struct a closed Dubins path,henceforth referred to asthe loitering path,which runs along the longitudinalbisector of each strip,visiting all strips in top-to-bottom sequence,making U-turns between strips atthe edges of Q,and ﬁnally returning to the initialconﬁguration.The m vehicles are allotted loitering13QFig.4.Illustration of the Strip Loitering policy.The trajectory providingclosure of the loitering path (along which the vehicles travel from the end ofthe last strip to the beginning of the ﬁrst strip) is not shown here for clarityof the drawing.d2d1target!Fig.5.Close-up of the Strip Loitering policy with construction of the pointof departure and the distances d1,and d2for a given demand,at the instantof appearance.positions on this path,equally spaced,in terms ofpath length.When a demand arrives,it is allocated to the closestvehicle among those that lie within the same stripas the demand and that have the demand in front ofthem.When a vehicle has no outstanding demands,the vehicle returns to its loitering position as follows.(We restrict the exposition to the case when a vehiclehas only one outstanding demand when it leaves itsloitering position and no more demands are allottedto it before it returns to its loitering position;othercases can be handled similarly.) After making aleft turn of length d2(as shown in Figure 5) toservice the demand,the vehicle makes a right turnof length 2d2followed by another left turn of lengthd2,and then returns to the loitering path.However,the vehicle has fallen behind in the loitering pathby a distance 4(d2sind2).To rectify this,as itnears the end of the current strip,it takes its U-turna distance 2(d2sind2) early.Note that the loitering path must cover Q,but it need notcover the entire bounding box of Q.The bounding box ismerely a construction used to place an upper bound on thetotal path length.The MC and SL policies will be proven to be efﬁcient inlight-load.We now propose the Bead Tiling policy which willbe proven to be efﬁcient in heavy-load.A key component ofthe algorithm is the construction of a novel geometric set,tuned to the kinetic constraints of the Dubins vehicle,calledthe bead [17].The construction of a bead B(`) for a given and an additional parameter`> 0 is illustrated in Figure 6.10!

p!p+B!(!)!Fig.2.Constructionofthe“bead”B!(!).Theﬁgureshowshowtheupperhalfoftheboundaryisconstructed,thebottomhalfissymmetric.Next,westudytheprobabilityoftargetsbelongingtoagivenbead.ConsiderabeadBentirelycontainedinQandassumenpointsareuniformlyrandomlygeneratedinQ.TheprobabilitythattheithpointissampledinBisµ(!)=Area(B!(!))Area(Q).Furthermore,theprobabilitythatexactlykoutofthenpointsaresampledinBhasabinomialdistribution,i.e.,indicatingwithnBthetotalnumberofpointssampledinB,Pr[nB=k|nsamples]=!nk"µk(1!µ)n!k.Ifthebeadlength!ischosenasafunctionofninsuchawaythat"=n∙µ(!(n))isaconstant,thenthelimitforlargenofthebinomialdistributionis[31]thePoissondistributionofmean",thatis,limn"+#Pr[nB=k|nsamples]="kk!e!".C.TheRecursiveBead-TilingAlgorithmInthissection,wedesignanovelalgorithmthatcomputesaDubinspaththroughapointsetinQ.Theproposedalgorithmconsistsofasequenceofphases;duringeachofthesephases,aDubinstour(i.e.,aclosedpathwithboundedcurvature)willbeconstructedthat“sweeps”thesetQ.WebeginbyconsideringatilingoftheplanesuchJune30,2006DRAFTFig.6.An illustration for the construction of the bead for a given  and`.QFig.7.An illustration of the mBT policy.We start with the policy for a single vehicle.The Bead Tiling (BT) Policy — Bound the en-vironment Q with a rectangle of minimum height,where height denotes the smaller of the two sidelengths of a rectangle.Let R and S be the widthand height of this bounding rectangle,respectively.Tile the plane with identical beads B(`) with`=minfCBTAv=;4g,whereCBTA=7 p1741 +7S3jQj1:The beads are oriented to be along the width of thebounding rectangle.The Dubins vehicle visits allbeads intersecting Qin a row-by-row fashion in top-to-bottom sequence,servicing at least one demandin every nonempty bead.This process is repeatedindeﬁnitely.The BT policy is extended to the m-vehicle Bead Tiling(mBT) policy in the following way (see Figure 7).The m-vehicle Bead Tiling (mBT) Policy — Di-vide the environment into regions of dominance withlines parallel to the bead rows.Let the area andheight of the i-th vehicle’s region be denoted withjQjiand Si.Place the subregion dividers in such away thatjQji+73Si=1mjQj +73Sfor all i 2 f1;:::;mg.Allocate one subregion toevery vehicle and let each vehicle execute the BTpolicy in its own region.14C.Analysis of Routing PoliciesWe now present the performance analysis of the routingpolicies we introduced in the previous section.Theorem VI.2 (MC policy performance in light-load).TheMC policy is a stabilizing policy in light-load,i.e.,as =m!0+.The system time of the Median Circling policy in light-load satisﬁes,as =m!0+,limsupm!0+TMCT 1 +25pd;and,in particular,limd!0+limsupm!0+TMCT= 1:Theorem VI.2 implies that the MC policy is optimal inthe asymptotic regime where =m!0+and d!0+.Hence,the MC policy is particularly efﬁcient in light-load forlow values of the nonholonomic vehicle density.Moreover,Theorem VI.2 together with Theorem VI.1 and Equation (21)(provided in the Appendix) implies that the optimal systemtime in the aforementioned asymptotic regime belongs to(1=(vpm)).We now characterize the performance of the SL policy.Theorem VI.3 (SL policy performance in light-load).The SLpolicy is a stabilizing policy in light-load,i.e.,when =m!0+.Moreover,the system time of the SL policy satisﬁes,as=m!0+,TSL8>>><>>>:1:238vRS+10:382Sm1=3+R+S+6:19mvfor m 0:471RS2+10:38S;RS+10:38S4mv+R+S+6:19mv+1:06votherwise:Theorem VI.3 together with Theorem VI.1 implies thatthe SL policy is within a constant factor of the optimal inthe asymptotic regime where =m!0+and d!+1.Moreover,in such asymptotic regime the optimal system timebelongs to (1=(v3pm)).Finally,we characterize the performance of the mBT policy.Theorem VI.4 (mBT policy performance in heavy-load).ThemBT policy is a stabilizing policy.Moreover,the system timefor the mBT policy satisﬁes the followinglimm!+1TmBTm32 71jQjv31 +7S3jQj3:Theorem VI.3 together with Theorem VI.1 implies that themBT policy is within a constant factor of the optimal in heavy-load,and that the optimal system time in this case belongs to2=(mv)3.It is instructive to compare the scaling of the optimal systemtime with respect to ,m and v for the m-DTRP and for them-Dubins DTRP.Such comparison is shown in Table III.Onecan observe that in heavy-load the optimal system time for them-Dubins DTRP is of the order 2=(mv)3,whereas for them-DTRP it is of the order =(mv)2.Therefore,our analysisTABLE IIIA COMPARISON BETWEEN THE SCALING OF THE OPTIMAL SYSTEM TIMEFOR THE m-DTRP AND FOR THE m-DUBINS DTRP.m-DTRPm-Dubins DTRPT=(mv)22=(mv)3(=m!+1)[8][18]T1=(vpm)1=(vpm)(d!0+)(=m!0+)[42]1=(v3pm)(d!+1)[18]rigorously establishes the following intuitive fact:bounded-curvature constraints make the optimal system much moresensitive to increases in the demand generation rate.Perhapsless intuitive is the fact that the optimal system time is alsomore sensitive with respect to the number of vehicles and thevehicle speed in the m-Dubins DTRP as compared to the m-DTRP.Extension of the results for the Dubins DTRP in thelight-load case for dimensions higher than 2 is still an openproblem.We have extended the results for the Dubins DTRPin the heavy-load case to the three-dimensional case [43].However,the extension to dimensions higher than 3 is stillan open problem.In [18],we report results from illustrative numerical exper-iments on the performance of MC,SL and mBT policies.Aclose observation of the system time in the light-load caseshows that the territorial MC policy is optimal as d!0+and the gregarious SL policy is constant-factor optimal asd!+1.This suggests the existence of a phase transition inthe optimal policy for the light-load scenario as one increasesthe number of vehicles for a ﬁxed  and Q (recall thatd= 2m=jQj).It is desirable to study the fundamentalfactors driving this transition,ignoring its dependence on theshape of Q.Towards this end,envision an inﬁnite number ofvehicles operating on the unbounded plane.In this case,theconﬁguration Pm(Q) yielding the minimum of the functionHmis that in which the Voronoi partition induced by Pm(Q) isa network of regular hexagons [42].Moreover,in this scenario,the SL policy reduces to vehicles moving straight on inﬁnitestrips.In this setup,it is observed that the phase transitioncan be characterized by a critical value of the dimensionlessparameter of the nonholonomic density [18],estimated tobe dunbd 0:0587.An alternate interpretation is that thetransition occurs when each vehicle is responsible for a regionof area 5:42 times that of a minimum turning-radius disk.This critical value of dobtained for unbounded domainhas been found to be very close to the values obtained,vianumerical experiments,for bounded domains [18].This resultprovides a systemarchitect with valuable information to decideupon the optimal strategy in the light-load scenario for givenproblem parameters.Similar phase transition phenomena forother vehicles have been studied in [44].VII.ROUTING FOR ROBOTIC NETWORKS:TEAM FORMING FOR COOPERATIVE TASKSHere we study demands (or tasks) that require the si-multaneous services of several vehicles [20].In particular,15consider m vehicles,each capable of providing one of kservices.We assume that there are mj> 0 vehicles capableof providing service j (called vehicles of service-type j),foreach j 2 f1;:::;kg,and thus m:=Pkj=1mj.In addition,we assume there are K different types of tasks.Tasks of type  2 f1;:::;Kg arrive according to a Poissonprocess with rate ,and assume a location i.i.d.uniformlyin Q.2The total arrival rate is :=PK=1.Each task-type requires a subset of the k services.We record the requiredservices in a zero-one (column) vector R2 f0;1gk.The jthentry of Ris 1 if service j is required for task-type ,and0 otherwise.The on-site service time for each task-type  hasmean s.To complete a task of type ,a team of vehiclescapable of providing the required services must travel to thetask location and remain there simultaneously for the on-siteservice time.We refer to this problem as the dynamic teamforming problem [20].As a motivating example,consider the scenario given inSection I where each demand (or task) corresponds to an eventthat requires close-range observation.The sensors requiredto properly assess each event will depend on that event’sproperties.In particular,a event may require several differentsensing modalities,such as electro-optical,infra-red,syntheticaperture radar,foliage penetrating radar,and moving targetindication radar [45].One solution would be to equip eachUAV with all sensing modalities (or services).However,inmany cases,most events will require only a few sensingmodalities.Thus,we might increase our efﬁciency by having alarger number of UAVs,each equipped with a single modality,and then forming the appropriate sensing team to observe eachevent.We restrict our attention to task-type unbiased policies;policies  for which the system time of each task (denoted byT;) is equal,and thusT;1=T;2=    =T;K=:T.We seek policies  which minimize the expected system timeof tasksT.Policies of this type are amenable to analysisbecause the task-type unbiased constraint collapses the feasibleset of system times from a subset of RKto a subset of R.Deﬁning the matrixR:= [R1   RK] 2 f0;1gkK;(15)a necessary condition for stability isR[1s1   KsK]T< [m1   mk]T(16)component-wise.Note that this condition is akin to the “loadfactor” in Subsection III-B.However,the space of load factorsis much richer,and thus light and heavy-load are no longersimply deﬁned.To simplify the problem we take an alterna-tive approach.We study the performance as the number ofvehicles becomes very large,i.e.,m!+1.In addition,we simply look at the order of the expected delay,withoutconcern for the constant factors.It turns out that this analysisprovides substantial insight into the performance of differentteam forming policies.This type of asymptotic analysis isfrequently performed in computational complexity [46] andad-hoc networking [47].2As in Section V,the algorithms in this section extend directly to a non-uniform spatial density by utilizing simultaneously equitably partitions.A.Three Team Forming PoliciesWe now present three team forming policies.The Complete Team (CT) Policy — Form m=kteams of k vehicles,where each team contains onevehicle of each service-type.Each team meets andmoves as a single entity.As tasks arrive,servicethem by one of the m=k teams according to theUTSP policy.For the second policy,recall that the vector R1Krecords inits jth entry the number of task-types that require service j,where 1Kis a K1 vector of ones.Thus,ifR1K [m1;:::;mk]T(17)component-wise,then there are enough vehicles of eachservice-type to create bmTSTc teams,wheremTST:= min(mjeTjR1Kj 2 f1;:::;kg)dedicated teams for each task-type,where ejis the jth vectorof the standard basis of Rk.Thus,when equation (17) issatisﬁed,we have the following policy.The Task-Speciﬁc Team (TT) Policy — For eachof the K task-types,create bmTSTc teams of vehicles,where there is one vehicle in the team for eachservice required by the task-type.Service each taskby one of its bmTSTc corresponding teams,accordingto the UTSP policy.The task-speciﬁc team policy can be applied only whenthere is a sufﬁcient number of vehicles of each service-type.The following policy requires only a single vehicle of eachservice type.The policy partitions the task-types into groups,where each group is chosen such that there is a sufﬁcientnumber of vehicles to create a dedicated team for each task-type in the group.The task-speciﬁc team policy is then run oneach group sequentially.The groups are deﬁned via a serviceschedule which is a partition of the K task-types into L  Ktime slots,such that each task-type appears in precisely onetime slot,and the task-types in each time slot are pairwisedisjoint (i.e.,in a given time slot,each service appears inat most one task-type).3We now formally present the thirdpolicy.The Scheduled Task-Speciﬁc Team (STT) Policy—Partition Qinto mCT:= minfm1;:::;mkg equalarea regions and assign one vehicle of each service-type to each region.In each region form a queue foreach of the K task-types.For each time slot in theschedule and each task-type in the time slot,createa team containing one vehicle for each requiredservice.For each team,service the ﬁrst n tasks (n isa design parameter) in the corresponding queue byfollowing an optimal TSP tour.When the end of theservice schedule is reached,repeat.Optimize over n(see [20] for details).3Computing an optimal schedule is equivalent to solving a vertex coloringproblem,which is NP-hard.However,an approximate schedule can becomputed via known vertex coloring heuristics;see [20] for more details.16B.Performance of PoliciesTo analyze the performance of the policies we make the fol-lowing simplifying assumptions:(A1) There are n=k vehiclesof each service-type.(A2) The arrival rate is =K for eachtask-type.(A3) The on-site service time has mean s and isupper bounded by smaxfor all task-types.(A4) There existsp 2 [1=k;1] such that for each j 2 f1;:::;kg the servicej appears in pK of the K task-types.Thus,each task willrequire service j with probability p.With these assumptions,the stability condition in equa-tion (16) simpliﬁes tom<1pks:(18)We say that  is the total throughput of the system (i.e.,thetotal number of tasks served per unit time),and Bm:= =mis the per-vehicle throughput.Finally,we study the system time as the number of vehiclesmbecomes large.As mincreases,if the density of vehicles isto remain constant,then the environment must grow.In fact,the ratiopjQj=v must scale aspm,[48].In [2] this scalingis referred to as a critical environment.Thus we will study theperformance in the asymptotic regime where (i) the number ofvehicles m!+1;(ii) on-site service times are independentof m;(iii) jQ(m)j=(mv2(m))!constant > 0.To characterize the system time as a function of the per-vehicle throughput Bmwe introduce the canonical throughputvs.system time proﬁle fTmin;Tord;Bcrit:R>0!R>0[ f+1gwhich has the formBm7!8><>:maxTmin;Tord(Bm=Bcrit)(1 Bm=Bcrit)2;if Bm< Bcrit;+1;if Bm Bcrit:(19)This proﬁle (see Figure 8) is described by the three positiveparametersTmin,Tordand Bcrit,whereTordTmin.Theseparameters have the following interpretation:Tminis the minimum achievable system time for anypositive throughput. Bcritis the maximum achievable throughput (or capacity).Tordis the system time when operating at (3 p5)=2 38% of capacity Bcrit.Additionally,Tordcaptures theorder of the system time when operating at a constantfraction of capacity.For each of the three policies ,we can write the systemtime asT2 OfTmin;Tord;Bcrit(Bm)for some values ofTmin,Tord,and Bcrit.In addition,we can write the lower boundin the formT2

fTmin;Tord;Bcrit(Bm).We summarize thecorresponding parameter values for the policies and the lowerbound in Table IV.We refer the reader to [20] for the proofof each upper and lower bound.In this table,k is the numberof services,K is number of task-types,L is the length of theservice schedule,C:= mTST=bmTSTc,and p is the probabilitythat a task-type requires service j for each j 2 f1;:::;kg.From these results we draw several conclusions.First,if thethroughput is very low,then the CT Policy has an expectedsystem time of (pk),which is within a constant factor ofthe optimal.In addition,if p is close to one and each task!!"#!"$!"%!"&''!!''!!'!''!#'!('!$)*+,-.*/-012345BcritSystemTimeTTminThroughputBmTordFig.8.The canonical throughput vs.system time proﬁle for the dynamicteamforming problem.The semi-log plot is for parameter values ofTmin= 1,Tord= 10,and Bcrit= 1.If Bm Bcrit,then the system time is +1.TABLE IVA COMPARISON OF THE CANONICAL THROUGHPUT VS.SYSTEM TIMEPARAMETERS FOR THE THREE POLICIES.HERE pK  L  K IS THESCHEDULE LENGTH.TminTordBcritLower boundTpkk1pksCT Policypkk1ksTT PolicyppkKpkK1CspkSTT PolicyLpkLkKsmaxLkrequires nearly every service,then CT is within a constantfactor of the optimal in terms of capacity and system time.Second,if p  1=k and each task requires few services,thenthe capacity of CT is sub-optimal,and the capacity of bothTT and STT are within a constant factor of optimal.However,the system time of the TT and STT policies may be muchhigher than the lower bound when the number of task-typesK is very large.Third,the TT policy performs at least as wellas the STT policy,both in terms of capacity and system time.Thus,one should use the TT policy if there is a sufﬁcientnumber of vehicles of each service-type.However,if p  1=kand if resources are limited such that the TT policy cannotbe used,then the STT Policy should be used to maximizecapacity.VIII.SUMMARY AND FUTURE DIRECTIONSIn this paper we presented a joint algorithmic and queueingapproach to the design of cooperative control,task allocationand dynamic routing strategies for networks of uninhabitedvehicles required to operate in dynamic and uncertain environ-ments.The approach integrates ideas from dynamics,combi-natorial optimization,teaming,and distributed algorithms.Wehave presented dynamic vehicle routing algorithms with prov-able performance guarantees for several important problems.These include adaptive and decentralized implementations,demands with time constraints and priority levels,vehicleswith motion constraints,and team forming.These resultscomplement those from the online algorithms literature,inthat they characterize average case performance (rather thanworst-case),and exploit probabilistic knowledge about futuredemands.17Dynamic vehicle routing is an active area of research and,in recent years,several directions have been pursued whichwere not covered in this paper.In [14],[49],we considermoving demands.The work focuses on demands that arriveon a line segment,and move in a perpendicular directionat ﬁxed speed.The problem has applications in perimeterdefense as well as robotic pick-and-place operations.In [19],a setup is considered where the information on outstandingdemands is provided to the vehicles through limited-range on-board sensors,thus adding a search component to the DVRproblem with full information.The work in [50] and [51]considers the dynamic pickup and delivery problem,whereeach demand consists of a source-destination pair,and thevehicles are responsible for picking up a message at the source,and delivering it to the destination.In [8],the authors considerthe case in which each vehicle can serve at most a ﬁnitenumber of demands before returning to a depot for reﬁlling.In[52],a DVR problem is considered involving vehicles whosedynamics can be modeled by state space models that areafﬁne in control and have an output in R2.Finally,in [21]we consider a setup where the servicing of a demand needsto be done by a vehicle under the supervision of a remotelylocated human operator.The dynamic vehicle routing approach presented in thispaper provides a new way of studying robotic systems in dy-namically changing environments.We have presented resultsfor a wide variety of problems.However,this is by no meansa closed book.There is great potential for obtaining moregeneral performance guarantees by developing methods to dealwith correlation between demand positions.In addition,thereare many other key problems in robotic systems that couldbeneﬁt from being studied from the perspective presented inthis paper.Some examples include search and rescue missions,force protection,map maintenance,and pursuit-evasion.ACKNOWLEDGMENTSThe authors wish to thank Alessandro Arsie,Shaunak D.Bopardikar,and John J.Enright for numerous helpful discus-sions about topics related to this paper.REFERENCES[1] B.J.Moore and K.M.Passino,“Distributed task assignment formobile agents,” IEEE Transactions on Automatic Control,vol.52,no.4,pp.749–753,2007.[2] S.L.Smith and F.Bullo,“Monotonic target assignment for roboticnetworks,” IEEE Transactions on Automatic Control,vol.54,no.9,pp.2042–2057,2009.[3] M.Alighanbari and J.P.How,“A robust approach to the UAV taskassignment problem,” International Journal on Robust and NonlinearControl,vol.18,no.2,pp.118–134,2008.[4] R.W.Beard,T.W.McLain,M.A.Goodrich,and E.P.Anderson,“Coordinated target assignment and intercept for unmanned air vehicles,”IEEE Transactions on Robotics and Automation,vol.18,no.6,pp.911–922,2002.[5] G.Arslan,J.R.Marden,and J.S.Shamma,“Autonomous vehicle-targetassignment:A game theoretic formulation,” ASME Journal on DynamicSystems,Measurement,and Control,vol.129,no.5,pp.584–596,2007.[6] P.Toth and D.Vigo,eds.,The Vehicle Routing Problem.Monographson Discrete Mathematics and Applications,SIAM,2001.[7] D.J.Bertsimas and G.J.van Ryzin,“A stochastic and dynamic vehiclerouting problem in the Euclidean plane,” Operations Research,vol.39,pp.601–615,1991.[8] D.J.Bertsimas and G.J.van Ryzin,“Stochastic and dynamic vehiclerouting in the Euclidean plane with multiple capacitated vehicles,”Operations Research,vol.41,no.1,pp.60–76,1993.[9] D.J.Bertsimas and G.J.van Ryzin,“Stochastic and dynamic ve-hicle routing with general interarrival and service time distributions,”Advances in Applied Probability,vol.25,pp.947–978,1993.[10] H.N.Psaraftis,“Dynamic vehicle routing problems,” in Vehicle Routing:Methods and Studies (B.Golden and A.Assad,eds.),pp.223–248,Elsevier (North-Holland),1988.[11] M.Pavone,N.Bisnik,E.Frazzoli,and V.Isler,“A stochastic anddynamic vehicle routing problem with time windows and customer im-patience,” ACM/Springer Journal of Mobile Networks and Applications,vol.14,no.3,pp.350–364,2009.[12] M.Pavone and E.Frazzoli,“Dynamic vehicle routing with stochastictime constraints,” in IEEE Int.Conf.on Robotics and Automation,(Anchorage,AK),pp.1460–1467,May 2010.[13] S.L.Smith,M.Pavone,F.Bullo,and E.Frazzoli,“Dynamic vehiclerouting with priority classes of stochastic demands,” SIAM Journal onControl and Optimization,vol.48,no.5,pp.3224–3245,2010.[14] S.D.Bopardikar,S.L.Smith,F.Bullo,and J.P.Hespanha,“Dynamicvehicle routing for translating demands:Stability analysis and receding-horizon policies,” IEEE Transactions on Automatic Control,vol.55,no.11,pp.2554–2569,2010.[15] M.Pavone,E.Frazzoli,and F.Bullo,“Adaptive and distributed algo-rithms for vehicle routing in a stochastic and dynamic environment,”IEEE Transactions on Automatic Control,vol.56,no.6,2011.To appear.[16] A.Arsie,K.Savla,and E.Frazzoli,“Efﬁcient routing algorithms formultiple vehicles with no explicit communications,” IEEE Transactionson Automatic Control,vol.54,no.10,pp.2302–2317,2009.[17] K.Savla,E.Frazzoli,and F.Bullo,“Traveling Salesperson Problems forthe Dubins vehicle,” IEEE Transactions on Automatic Control,vol.53,no.6,pp.1378–1391,2008.[18] J.J.Enright,K.Savla,E.Frazzoli,and F.Bullo,“Stochastic and dynamicrouting problems for multiple UAVs,” AIAA Journal of Guidance,Control,and Dynamics,vol.34,no.4,pp.1152–1166,2009.[19] J.J.Enright and E.Frazzoli,“Cooperative UAV routing with limitedsensor range,” in AIAA Conf.on Guidance,Navigation and Control,(Keystone,CO),Aug.2006.[20] S.L.Smith and F.Bullo,“The dynamic team forming problem:Throughput and delay for unbiased policies,” Systems & Control Letters,vol.58,no.10-11,pp.709–715,2009.[21] K.Savla,T.Temple,and E.Frazzoli,“Human-in-the-loop vehiclerouting policies for dynamic environments,” in IEEE Conf.on Decisionand Control,(Canc´un,M´exico),pp.1145–1150,Dec.2008.[22] M.Pavone,Dynamic Vehicle Routing for Robotic Networks.PhD thesis,Department of Aeronautics and Astronautics,Massachusetts Institute ofTechnology,June 2010.[23] D.D.Sleator and R.E.Tarjan,“Amortized efﬁciency of list update andpaging rules,” Communications of the ACM,vol.28,no.2,pp.202–208,1985.[24] S.O.Krumke,W.E.de Paepe,D.Poensgen,and L.Stougie,“Newsfrom the online traveling repairman,” Theoretical Computer Science,vol.295,no.1-3,pp.279–294,2003.[25] S.Irani,X.Lu,and A.Regan,“On-line algorithms for the dynamictraveling repair problem,” Journal of Scheduling,vol.7,no.3,pp.243–258,2004.[26] P.Jaillet and M.R.Wagner,“Online routing problems:Value ofadvanced information and improved competitive ratios,” TransportationScience,vol.40,no.2,pp.200–210,2006.[27] B.Golden,S.Raghavan,and E.Wasil,The Vehicle Routing Prob-lem:Latest Advances and New Challenges,vol.43 of OperationsResearch/Computer Science Interfaces.Springer,2008.[28] P.Van Hentenryck,R.Bent,and E.Upfal,“Online stochastic optimiza-tion under time constraints,” Annals of Operations Research,vol.177,no.1,pp.151–183,2009.[29] P.K.Agarwal and M.Sharir,“Efﬁcient algorithms for geometricoptimization,” ACM Computing Surveys,vol.30,no.4,pp.412–458,1998.[30] Z.Drezner,ed.,Facility Location:A Survey of Applications and Meth-ods.Series in Operations Research,Springer,1995.[31] H.Xu,Optimal Policies for Stochastic and Dynamic Vehicle RoutingProblems.PhD thesis,Massachusetts Institute of Technology,Cam-bridge,MA,1995.[32] S.Bespamyatnikh,D.Kirkpatrick,and J.Snoeyink,“Generalizing hamsandwich cuts to equitable subdivisions,” Discrete & ComputationalGeometry,vol.24,pp.605–622,2000.18[33] M.Pavone,A.Arsie,E.Frazzoli,and F.Bullo,“Distributed algorithmsfor environment partitioning in mobile robotic networks,” IEEE Trans-actions on Automatic Control,vol.56,no.9,2011.To appear.[34] J.D.Papastavrou,“A stochastic and dynamic routing policy usingbranching processes with state dependent immigration,” European Jour-nal of Operational Research,vol.95,pp.167–177,1996.[35] L.E.Dubins,“On curves of minimal length with a constraint onaverage curvature and with prescribed initial and terminal positions andtangents,” American Journal of Mathematics,vol.79,pp.497–516,1957.[36] S.M.LaValle,Planning Algorithms.Cambridge University Press,2006.Available at http://planning.cs.uiuc.edu.[37] U.Boscain and B.Piccoli,Optimal Syntheses for Control Systems on2-D Manifolds.Math´ematiques et Applications,Springer,2004.[38] L.Pallottino and A.Bicchi,“On optimal cooperative conﬂict resolutionfor air trafﬁc management systems,” IEEE Transactions on IntelligentTransportation Systems,vol.1,no.4,pp.221–231,2000.[39] C.Tomlin,I.Mitchell,and R.Ghosh,“Safety veriﬁcation of conﬂict res-olution manoeuvres,” IEEE Transactions on Intelligent TransportationSystems,vol.2,no.2,pp.110–120,2001.[40] P.Chandler,S.Rasmussen,and M.Pachter,“UAV cooperative pathplanning,” in AIAA Conf.on Guidance,Navigation and Control,(Denver,CO),Aug.2000.[41] C.Schumacher,P.R.Chandler,S.J.Rasmussen,and D.Walker,“Taskallocation for wide area search munitions with variable path length,” inAmerican Control Conference,(Denver,CO),pp.3472–3477,2003.[42] E.Zemel,“Probabilistic analysis of geometric location problems,”Annals of Operations Research,vol.1,no.3,pp.215–238,1984.[43] K.Savla,F.Bullo,and E.Frazzoli,“Traveling Salesperson Problems fora double integrator,” IEEE Transactions on Automatic Control,vol.54,no.4,pp.788–793,2009.[44] K.Savla and E.Frazzoli,“On endogenous reconﬁguration for mobilerobotic networks,” in Workshop on Algorithmic Foundations of Robotics,(Guanajuato,Mexico),Dec.2008.[45] E.K.P.Chong,C.M.Kreucher,and A.O.Hero III,“Monte-Carlo-based partially observable Markov decision process approximationsfor adaptive sensing,” in Int.Workshop on Discrete Event Systems,(G¨oteborg,Sweden),pp.173–180,May 2008.[46] B.Korte and J.Vygen,Combinatorial Optimization:Theory and Al-gorithms,vol.21 of Algorithmics and Combinatorics.Springer,4 ed.,2007.[47] P.Gupta and P.R.Kumar,“The capacity of wireless networks,” IEEETransactions on Information Theory,vol.46,no.2,pp.388–404,2000.[48] V.Sharma,M.Savchenko,E.Frazzoli,and P.Voulgaris,“Transfer timecomplexity of conﬂict-free vehicle routing with no communications,”International Journal of Robotics Research,vol.26,no.3,pp.255–272,2007.[49] S.L.Smith,S.D.Bopardikar,and F.Bullo,“A dynamic boundaryguarding problem with translating demands,” in IEEE Conf.on Deci-sion and Control and Chinese Control Conference,(Shanghai,China),pp.8543–8548,Dec.2009.[50] H.A.Waisanen,D.Shah,and M.A.Dahleh,“A dynamic pickup anddelivery problem in mobile networks under information constraints,”IEEE Transactions on Automatic Control,vol.53,no.6,pp.1419–1433,2008.[51] M.R.Swihart and J.D.Papastavrou,“A stochastic and dynamic modelfor the single-vehicle pick-up delivery problem,” European Journal ofOperational Research,vol.114,pp.447–464,1999.[52] S.Itani,E.Frazzoli,and M.A.Dahleh,“Dynamic travelling repairpersonproblem for dynamic systems,” in IEEE Conf.on Decision and Control,(Canc´un,M´exico),pp.465–470,Dec.2008.[53] J.Beardwood,J.Halton,and J.Hammersly,“The shortest path throughmany points,” in Proceedings of the Cambridge Philosophy Society,vol.55,pp.299–327,1959.[54] J.M.Steele,“Probabilistic and worst case analyses of classical problemsof combinatorial optimization in Euclidean space,” Mathematics ofOperations Research,vol.15,no.4,p.749,1990.[55] A.G.Percus and O.C.Martin,“Finite size and dimensional dependenceof the Euclidean traveling salesman problem,” Physical Review Letters,vol.76,no.8,pp.1188–1191,1996.[56] R.C.Larson and A.R.Odoni,Urban Operations Research.PrenticeHall,1981.[57] D.Applegate,R.Bixby,V.Chv´atal,and W.Cook,“On the solution oftraveling salesman problems,” in Documenta Mathematica,Journal derDeutschen Mathematiker-Vereinigung,(Berlin,Germany),pp.645–656,Aug.1998.Proceedings of the International Congress of Mathemati-cians,Extra Volume ICM III.[58] N.Christoﬁdes,“Worst-case analysis of a new heuristic for the trav-eling salesman problem,” Tech.Rep.388,Carnegie Mellon University,Pittsburgh,PA,Apr.1976.[59] S.Arora,“Nearly linear time approximation scheme for Euclidean TSPand other geometric problems,” in IEEE Symposium on Foundations ofComputer Science,(Miami Beach,FL),pp.554–563,Oct.1997.[60] S.Lin and B.W.Kernighan,“An effective heuristic algorithm for thetraveling-salesman problem,” Operations Research,vol.21,pp.498–516,1973.[61] N.Megiddo and K.J.Supowit,“On the complexity of some commongeometric location problems,” SIAM Journal on Computing,vol.13,no.1,pp.182–196,1984.[62] C.H.Papadimitriou,“Worst-case and probabilistic analysis of a geo-metric location problem,” SIAM Journal on Computing,vol.10,no.3,1981.APPENDIXA.Asymptotic Properties of the Traveling Salesman Problemin the Euclidean PlaneLet D be a set of n points in Rdand let TSP(D) denotethe minimum length of a tour through all the points in D;by convention,TSP(;) = 0.Assume that the locations of then points are random variables independently and identicallydistributed in a compact set Q;in [53],[54] it is shown thatthere exists a constant TSP;dsuch that,almost surely,limn!+1TSP(D)n11=d= TSP;dZQ'(q)11=ddq;(20)where 'is the density of the absolutely continuous part of thepoint distribution.For the case d = 2,the constant TSP;2hasbeen estimated numerically as TSP;2'0:71200:0002 [55].Notice that the bound (20) holds for all compact sets:theshape of the set only affects the convergence rate to the limit.According to [56],if Qis a “fairly compact and fairly convex”set in the plane,then equation (20) provides a “good” estimateof the optimal TSP tour length for values of n as low as 15.B.Tools for Solving TSPsThe TSP is known to be NP-complete,which suggests thatthere is no general algorithm capable of ﬁnding the optimaltour in an amount of time polynomial in the size of the input.Even though the exact optimal solution of a large TSP can bevery hard to compute,several exact and heuristic algorithmsand software tools are available for the numerical solution ofTSPs.The most advanced TSP solver to date is arguablyconcorde [57].Polynomial-time algorithms are availablefor constant-factor approximations of TSP solutions,amongwhich we mention Christoﬁdes’ algorithm [58].On a moretheoretical side,Arora proved the existence of polynomial-time approximation schemes for the TSP,providing a (1 +")constant-factor approximation for any"> 0 [59].A modiﬁed version of the Lin-Kernighan heuristic [60]is implemented in linkern;this powerful solver yieldsapproximations in the order of 5%of the optimal tour cost veryquickly for many instances.For example,in our numericalexperiments on a machine with a 2GHz Intel Core Duoprocessor,approximations of random TSPs with 10,000 pointstypically required about twenty seconds of CPU time.44The concorde and linkern solvers are freely available for academicresearch use at www.tsp.gatech.edu/concorde/index.html.19We presented algorithms that require online solutions ofpossibly large TSPs.Practical implementations of the algo-rithms would rely on heuristics or on polynomial-time approx-imation schemes,such as Lin-Kernighan’s or Christoﬁdes’.Ifa constant-factor approximation algorithm is used,the effecton the asymptotic performance guarantees of our algorithmscan be simply modeled as a scaling of the constant TSP;d.C.Properties of the Multi-Median FunctionIn this subsection,we state certain useful properties of themulti-median function,introduced in Section III-A.The multi-median function can be written asHm(P;Q):= Emink2f1;:::;mgkpkqk=mXk=1ZVk(P)kpkqk'(q) dq;where V(P) = (V1(P);:::;Vm(P)) is the Voronoi partitionof the set Q generated by the points P.It is straightforwardto show that the map P 7!H1(P;Q) is differentiable andstrictly convex on Q.Therefore,it is a simple computationaltask to compute P1(Q).On the other hand,when m> 1,themap P 7!Hm(P;Q) is differentiable (whenever (p1;:::;pm)are distinct) but not convex,thus making the solution of thecontinuous m-median problem hard in the general case.It isknown [29],[61] that the discrete version of the m-medianproblem is NP-hard for d  2.However,one can providetight bounds on the map m 7!Hm(Q).This problem isstudied thoroughly in [62] for square regions and in [42] formore general compact regions.It is shown that,in the limitm!+1,Hm(Q)pm!chexpjQj almost surely [42],where chex 0:377 is the ﬁrst moment of a hexagon of unitarea about its center.This optimal asymptotic value is achievedby placing the m points at the centers of the hexagons in aregular hexagonal lattice within Q (the honeycomb heuristic).Working towards the above result,it is also shown [42],[18]that for any m2 N:0:3761rjQjm Hm(Q)  c(Q)rjQjm;(21)where c(Q) is a constant depending on the shape of Q.