ebook img

Travel Time Estimation for Ambulances using Bayesian Data Augmentation PDF

25 Pages·2013·1.85 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Travel Time Estimation for Ambulances using Bayesian Data Augmentation

TheAnnalsofAppliedStatistics TRAVEL TIME ESTIMATION FOR AMBULANCES USING BAYESIAN DATA AUGMENTATION∗ By Bradford S. Westgate, Dawn B. Woodard, David S. Matteson and Shane G. Henderson Cornell University WeintroduceaBayesianmodelforestimatingthedistributionof ambulance travel times on each road segment in a city, using Global Positioning System (GPS) data. Due to sparseness and error in the GPSdata,theexactambulancepathsandtraveltimesoneachroad segment are unknown. We simultaneously estimate the paths, travel times, and parameters of each road segment travel time distribution usingBayesiandataaugmentation.Todrawambulancepathsamples, we use a novel reversible jump Metropolis-Hastings step. We also introducetwosimplerestimationmethodsbasedonGPSspeeddata. Wecomparethesemethodstoarecently-publishedtraveltimees- timationmethod,usingsimulateddataanddatafromTorontoEMS. In both cases, out-of-sample point and interval estimates of ambu- lancetriptimesfromtheBayesianmethodoutperformestimatesfrom the alternative methods. We also construct probability-of-coverage mapsforambulances.TheBayesianmethodgivesmorerealisticmaps thantherecently-publishedmethod.Finally,pathestimatesfromthe Bayesian method interpolate well between sparsely recorded GPS readings and are robust to GPS location errors. 1. Introduction. Emergency medical service (EMS) providers prefer to assign the closest available ambulance to respond to a new emergency [Dean (2008)]. Thus, it is vital to have accurate estimates of the travel time of each ambulance to the emergency location. An ambulance is often as- signed to a new emergency while away from its base [Dean (2008)], so the problem is more difficult than estimating response times from several fixed bases. Travel times also play a central role in positioning bases and parking locations [Brotcorne, Laporte and Semet (2003); Goldberg (2004); Hender- son (2010)]. Accounting for variability in travel times can lead to consid- erable improvements in EMS management [Erkut, Ingolfsson and Erdo˘gan (2008); Ingolfsson, Budge and Erkut (2008)]. We introduce methods for es- timating the distribution of travel times for arbitrary routes on a municipal road network, using historical trip durations and vehicle Global Positioning ∗Supported in part by NSF Grant CMMI-0926814 and NSF Grant DMS-1209103. Keywords and phrases: Reversible jump, Markov chain Monte Carlo, map-matching, Global Positioning System, emergency medical services. 1 2 WESTGATE, WOODARD, MATTESON AND HENDERSON System (GPS) readings. This enables estimation of fastest paths in expec- tation between any two locations, as well as estimation of the probability an ambulance will reach its destination within a given time threshold. Most EMS providers record ambulance GPS information; we use data from Toronto EMS from 2007-2008. The GPS data include locations, times- tamps, speeds, and vehicle and emergency incident identifiers. Readings are stored every 200 meters (m) or 240 seconds (s), whichever comes first. The true sampling rate is higher, but this scheme minimizes data transmission and storage. This is standard practice across EMS providers, though the storage rates vary [Mason (2005)]. In related applications the GPS readings can be even sparser; Lou et al. (2009) analyzed data from taxis in Tokyo in which GPS readings are separated by 1-2 km or more. The GPS location and speed data are also subject to error. Location accuracy degrades in urban canyons, where GPS satellites may be obscured and signals reflected [Chen et al. (2005); Mason (2005); Syed (2005)]. Chen et al. (2005) observed average location errors of 27 m in parts of Hong Kong withnarrowstreetsandtallbuildings,withsomeerrorsover100m.Location error is also present in the Toronto data; see Figure 1. Witte and Wilson (2004) found GPS speed errors of roughly 5% on average, with largest error at high speeds and when few GPS satellites were visible. Recent work on estimating ambulance travel time distributions has been done by Budge, Ingolfsson and Zerom (2010) and Aladdini (2010), using estimates based on total trip distance and time, not GPS data. Budge et al. proposed modeling the log travel times using a t-distribution, where the median and coefficient of variation are functions of the trip distance (see Section 4.2). Aladdini found that the lognormal distribution provided a good fit for ambulance travel times between specific start and end locations. Budge et al. found heavier tails than Aladdini, in part because they did not condition on the trip location. Neither of these papers considered travel timesonindividualroadsegments.Forthisreasontheycannotcapturesome desiredfeatures,suchasfasterresponsetimestolocationsnearmajorroads. We first introduce two local methods using only the GPS locations and speeds (Section 4.1). Each GPS reading is mapped to the nearest road segment (the section of road between neighboring intersections), and the mapped speeds are used to estimate the travel time for each segment. In the firstmethod,weusetheharmonicmeanofthemappedGPSspeedstocreate a point estimator of the travel time. We are the first to propose this estima- tor for mapped GPS data, though it is commonly used for estimating travel times via speed data recorded by loop detectors [Rakha and Zhang (2005); Soriguera and Robuste (2011); Wardrop (1952)]. We give theoretical results TRAVEL TIME ESTIMATION USING BAYESIAN DATA AUGMENTATION 3 0 0 0 0 y coordinate (m)05001000150020 y coordinate (m)05001000150020 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 0 500 1000 1500 0 500 1000 1500 x coordinate (m) x coordinate (m) Fig 1. Left: A subregion of Toronto, with primary roads (black), secondary roads (gray) and tertiary roads (light gray). Right: GPS data on this region from the Toronto EMS lights-and-sirens dataset. supporting this approach in the Supplement [Westgate et al. (2012)]. This method also yields interval and distribution estimates of the travel time. In our second local method, we assume a parametric distribution for the GPS speeds on each segment, and calculate maximum likelihood estimates of the parameters of this distribution. These can be used to obtain point, interval, or distribution estimates of the travel time. In Sections 2 and 3, we propose a more sophisticated method, model- ing the data at the trip level. Whereas the local methods use only GPS data and the method of Budge et al. uses only the trip start and end lo- cations and times, this method combines the two sources of information. We simultaneously estimate the path driven for each ambulance trip and the distribution of travel times on each road segment, using Bayesian data augmentation [Tanner and Wong (1987)]. For computation, we introduce a reversiblejumpMarkovchainMonteCarlomethod[Green(1995)].Although parameter estimation is more computationally intensive than for the other methods, prediction is very fast. Also, the parameter estimates are updated offline, so the increased computation time is not an operational handicap. Wecomparethepredictiveaccuracyonout-of-sampletripsfortheBayesian method, the local methods, and the method of Budge et al. on a subregion 4 WESTGATE, WOODARD, MATTESON AND HENDERSON of Toronto, using simulated data and real data (Sections 6 and 7). Since the methods have some bias due in part to the GPS sampling scheme, we first use a correction factor to make each method approximately unbiased (Section 5). On simulated data, point estimates from the Bayesian method outperformthealternativemethodsbyover50%inrootmeansquarederror, relative to an Oracle method with the lowest possible error. On real data, point estimates from the Bayesian method again outperform the alternative methods. Interval estimates from the Bayesian method have dramatically better coverage than intervals from the local methods. We also produce probability-of-coverage maps [Budge, Ingolfsson and Ze- rom(2010)],showingtheprobabilityoftravelingfromagivenintersectionto any other intersection within a time threshold (Section 7.4). This is the per- formance standard in many EMS contracts; an EMS organization attempts to respond to, e.g., 90% of all emergencies within 9 minutes [Fitch (1995)]. The estimates from the Bayesian method are more realistic than those of Budge et al., because they differentiate between equidistant locations based on whether or not they can be reached by fast roads. Finally,weassesstheambulancepathestimatesfromtheBayesianmethod (Sections6.3and7.5).EstimatingthepathdrivenfromadiscretesetofGPS readings is called the map-matching problem [Mason (2005)]. Most map- matching algorithms return a single path estimate [Krumm, Letchner and Horvitz (2007); Lou et al. (2009); Marchal, Hackney and Axhausen (2005); Mason (2005)]. However, our posterior distribution can capture multiple high-probability paths when the true path is unclear from the GPS data. Our path estimates interpolate accurately between widely-separated GPS locations and are robust to GPS error. 2. Bayesian Formulation. 2.1. Model. Consider a network of J directed road segments, called arcs, and a set of I ambulance trips on this network. Assume that each trip starts and finishes on known nodes (intersections) ds and df in the network, at i i known times ts and tf. Therefore the total travel time tf − ts is known. i i i i In practice, trips sometimes begin or end in the interior of a road segment; however, road segments are short enough that this is a minor issue. The medianroadsegmentlengthinthefullTorontonetworkis111m,themeanis 162 m, and the maximum is 4613 m. Each trip i has observed GPS readings, indexed by (cid:96) ∈ {1,...,r }, and gathered at known times t(cid:96). GPS reading (cid:96) i i (cid:16) (cid:17) is the triplet X(cid:96),Y(cid:96),V(cid:96) , where X(cid:96) and Y(cid:96) are the measured geographic i i i i i coordinates and V(cid:96) is the measured speed. Denote G = (cid:110)(cid:16)X(cid:96),Y(cid:96),V(cid:96)(cid:17)(cid:111)ri . i i i i i (cid:96)=1 TRAVEL TIME ESTIMATION USING BAYESIAN DATA AUGMENTATION 5 The relevant unobserved variables for each trip i are the following: 1. The unknown path (sequence of arcs) A = {A ,...,A } traveled i i,1 i,Ni by the ambulance from ds to df. The path length N is also unknown. i i i 2. TheunknowntraveltimesT = (T ,...,T )onthearcsinthepath. i i,1 i,Ni We use the notation T (j) to refer to the travel time in trip i on arc j. i We model the observed and unobserved variables {A ,T ,G }I as fol- i i i i=1 lows. Conditional on A , each element T of the vector T follows a log- i i,k i normal distribution with parameters µ ,σ2 , independently across i and Ai,k Ai,k (cid:16) (cid:17) k. We use the notation T |A ∼ LN µ ,σ2 . In the literature, am- i,k i Ai,k Ai,k bulance travel times between specific locations have been observed and modeled to be lognormal [Aladdini (2010); Alanis, Ingolfsson and Kolfal (2012)]. Denote the expected travel time on each arc j ∈ {1,...,J} by (cid:16) (cid:17) θ(j) = exp µ +σ2/2 . We use a multinomial logit choice model [McFad- j j den (1973)] for the path A , with likelihood i (cid:16) (cid:17) exp −C(cid:80)Ni θ(A ) k=1 i,k (2.1) f(A ) = , i (cid:80) exp(cid:0)−C(cid:80)ni θ(a )(cid:1) ai∈Pi k=1 i,k where C > 0 is a fixed constant, P is the set of possible paths with no i repeated nodes from ds to df in the network, and a = {a ,...,a } i i i i,1 i,ni indexes the paths in P . In this model, the fastest routes in expectation i have the highest probability. We assume that ambulances travel at constant speed on a single arc in a given trip. This approximation is necessary since there is typically at most one GPS reading on any arc in a given trip, and thus little information in the data regarding changes in speed on individual arcs. Therefore, the true location and speed of the ambulance at time t(cid:96) are deterministic functions i (cid:16) (cid:17) (cid:16) (cid:17) loc A ,T ,t(cid:96) and sp A ,T ,t(cid:96) of A and T . Conditional on A ,T , the i i i i i i i i i i (cid:16) (cid:17) measured location X(cid:96),Y(cid:96) is assumed to have a bivariate normal distri- i i bution (a standard assumption; see Krumm, Letchner and Horvitz (2007); (cid:16) (cid:17) Mason (2005)) centered at loc A ,T ,t(cid:96) , with known covariance matrix Σ. i i i Similarly, the measured speed V(cid:96) is assumed to have a lognormal distribu- i (cid:16) (cid:17) tion with expectation equal to sp A ,T ,t(cid:96) and variance parameter ζ2: i i i (cid:16) (cid:17)(cid:12) (cid:16) (cid:16) (cid:17) (cid:17) (2.2) X(cid:96),Y(cid:96) (cid:12)A ,T ∼ N loc A ,T ,t(cid:96) ,Σ , i i (cid:12) i i 2 i i i (cid:32) (cid:33) (cid:12) (cid:16) (cid:17) ζ2 (2.3) logV(cid:96)(cid:12)A ,T ∼ N logsp A ,T ,t(cid:96) − ,ζ2 . i (cid:12) i i i i i 2 6 WESTGATE, WOODARD, MATTESON AND HENDERSON We assume independence between all the GPS speed and location errors. Combining Equations 2.1-2.3, we obtain the likelihood  f(cid:18){Ai,Ti,Gi}Ii=1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:110)µj,σj2(cid:111)Jj=1,ζ2(cid:19) = (cid:89)I f(Ai) (cid:89)Ni LN (cid:16)Ti,k;µAi,k,σA2i,k(cid:17) i=1 k=1 (2.4) (cid:89)ri (cid:34)N (cid:16)(cid:16)X(cid:96),Y(cid:96)(cid:17);loc(cid:16)A ,T ,t(cid:96)(cid:17),Σ(cid:17)×LN (cid:32)V(cid:96);logsp(cid:16)A ,T ,t(cid:96)(cid:17)− ζ2,ζ2(cid:33)(cid:35)(cid:35). 2 i i i i i i i i i 2 (cid:96)=1 In practice we use data-based choices for the constants Σ and C (see the Supplement). The unknown parameters in the model are the arc travel time (cid:110) (cid:111)J parameters µ ,σ2 and the GPS speed error parameter ζ2. j j j=1 2.2. Prior Distributions. Tocompletethemodel,wespecifyindependent prior distributions for the unknown parameters. We use µ ∼ N (cid:0)m ,s2(cid:1), j j σ ∼ Unif(b ,b ), and ζ ∼ Unif(b ,b ), where m , s2, b , b , b , b are fixed j 1 2 3 4 j 1 2 3 4 hyperparameters. A normal prior is a standard choice for the location pa- rameter of a lognormal distribution. We use uniform priors on the standard deviations σ and ζ [Gelman (2006)]. The prior ranges [b ,b ] and [b ,b ] are j 1 2 3 4 madewideenoughtocaptureallplausibleparametervalues.Thepriormean for µ depends on j, because there are often existing road speed estimates j that can be used to specify m . Prior information regarding the values s2, j b , b , b , b is more limited. We use a combination of prior information and 1 2 3 4 the data to specify all hyperparameters, as described in the Supplement. 3. BayesianComputationalMethod. WeuseaMarkovchainMonte Carlomethodtoobtainsamples(cid:18)ζ2((cid:96)),(cid:110)µ((cid:96)),σ((cid:96))(cid:111)J ,(cid:110)A((cid:96)),T((cid:96))(cid:111)I (cid:19)from j j i i j=1 i=1 the joint posterior distribution of all unknowns [Robert and Casella (2004); Tierney (1994)]. Each unknown quantity is updated in turn, conditional on the other unknowns, via either a draw from the closed-form conditional pos- terior distribution or a Metropolis-Hastings (M-H) move. Estimation of any (cid:18) (cid:110) (cid:111)J (cid:19) desired function g ζ2, µ ,σ2 of the unknown parameters is done via j j j=1 Monte Carlo, taking gˆ= 1 (cid:80)M g(cid:18)ζ2((cid:96)),(cid:110)µ((cid:96)),σ2((cid:96))(cid:111)J (cid:19). M (cid:96)=1 j j j=1 3.1. Markov Chain Initial Conditions. To initialize each path A , select i the middle GPS reading, reading number (cid:98)r /2(cid:99)+1. Find the nearest node i in the road network to this GPS location, and route the initial path A i TRAVEL TIME ESTIMATION USING BAYESIAN DATA AUGMENTATION 7 through this node, taking the shortest-distance path to and from the middle node. To initialize the travel time vector T , distribute the known trip time i across the arcs in the path A , weighted by arc length. Finally, to initialize i ζ2 and each µ and σ2, draw from their priors. j j 3.2. Updating the Paths. Updating the path A may also require updat- i ing the travel times T , since the number of arcs in the path may change. i Since this changes the dimension of the vector T , we update (A ,T ) us- i i i ing a reversible jump M-H move [Green (1995)]. Calling the current values (cid:16) (cid:17) (cid:16) (cid:17) (1) (1) (2) (2) A ,T , we propose new values A ,T and accept them with the i i i i appropriate probability, detailed below. Theproposalchangesacontiguoussubsetofthepath.Thelength(number of arcs) of this subpath is limited to some maximum value K; we specify K in Section 3.5. Precisely: 1. With equal probability, choose a noded(cid:48) from the pathA(1), excluding i the final node. 2. Let a(1) be the number of nodes that follow d(cid:48) in the path. With equal (cid:16) (cid:17) probability, choose an integer w ∈ {1,...,min a(1),K }. Denote the wth node following d(cid:48) as d(cid:48)(cid:48). The subpath from d(cid:48) to d(cid:48)(cid:48) is the section to be updated (the “current update section”). 3. Consider all possible routes of length up to K from d(cid:48) to d(cid:48)(cid:48). With equal probability, propose one of these routes as a change to the path (2) (the “proposed update section”), giving the proposed path A . i (2) (2) Next we propose travel times T that are compatible with A . Let i i (1) (2) {c ,...,c } ⊂ A and {p ,...,p } ⊂ A denote the arcs in the current 1 m i 1 n i and proposed update sections, noting that m and n may be different. Re- call that T (j) denotes the travel time of trip i on arc j. For each arc j ∈ i A(2)\{p ,...,p }, set T(2)(j) = T(1)(j). Let S = (cid:80)m T(1)(c ) be the total i 1 n i i i (cid:96)=1 i (cid:96) traveltimeofthecurrentupdatesection.Sincethetotaltraveltimeoftheen- tiretripisknown(seeSection2.1),S isfixedandknownaswell,conditional i onthetraveltimesforthearcsthatareunchangedbythisupdate.Therefore we must have (cid:80)n T(2)(p ) = S . The travel times T(2)(p ),...,T(2)(p ) (cid:96)=1 i (cid:96) i i 1 i n are proposed by drawing (r ,...,r ) ∼ Dirichlet(αθ(p ),...,αθ(p )) for 1 n 1 n (2) a constant α > 0 (specified below), and setting T (p ) = r S for (cid:96) ∈ i (cid:96) (cid:96) i {1,...,n}. The expected value of the proposed travel time on arc p is (cid:96) (cid:16) (cid:17) E Ti(2)(p(cid:96)) = Si(cid:80)nθ(p(cid:96)θ)(p ). Therefore, the expected values of the proposed k=1 k times are weighted by the arc travel time expected values [Gelman et al. (2004)]. The constant α controls the variances and covariances of the com- 8 WESTGATE, WOODARD, MATTESON AND HENDERSON (2) ponents T (p ). In our experience α = 1 works well; one can also tune α to i (cid:96) obtainadesiredacceptancerateforaparticulardataset[RobertandCasella (2004); Roberts and Rosenthal (2001)]. (j) (j) Let N be the number of edges in the path A for j ∈ {1,2}, and let i i a(2) be the number of nodes that follow d(cid:48) in the path A(2). We accept the i (2) (2) proposal (A ,T ) with probability equal to the minimum of one and i i fi(cid:18)A(i2),Ti(2),Gi(cid:12)(cid:12)(cid:12)(cid:12)(cid:110)µj,σj2(cid:111)Jj=1,ζ2(cid:19) × Ni(1)min(a(1),K) × fi(cid:18)A(i1),Ti(1),Gi(cid:12)(cid:12)(cid:12)(cid:12)(cid:110)µj,σj2(cid:111)Jj=1,ζ2(cid:19) Ni(2)min(a(2),K) (cid:18) (cid:19) Dir Ti(1)(c1),..., Ti(1)(cm);αθ(c ),...,αθ(c ) (3.1) Si Si 1 m Sn−m, Dir(cid:18)Ti(2)(p1),..., Ti(2)(pn);αθ(p ),...,αθ(p )(cid:19) i Si Si 1 n where f is the contribution of trip i to Equation 2.4 and Dir(x;y) denotes i the Dirichlet density with parameter vector y, evaluated at x. The proposal (2) (2) density for the travel times T (p ),...,T (p ) requires a change of vari- i 1 i n ables from the Dirichlet density. This leads to the factor Sn−m in the ratio i of proposal densities. In the Supplement, we show that this move is valid since it is reversible with respect to the conditional posterior distribution of (A ,T ). i i 3.3. Updating the Trip Travel Times. To update the realized travel time vector T (j), we use the following M-H move. Given current travel times i (1) (2) T , we propose travel times T . i i 1. With equal probability, choose a pair of distinct arcs j and j in the 1 2 (1) (1) path A . Let S = T (j )+T (j ). i i i 1 i 2 2. Draw (r ,r ) ∼ Dirichlet(α(cid:48)θ(j ),α(cid:48)θ(j )). Set T(2)(j ) = r S and 1 2 1 2 i 1 1 i (2) T (j ) = r S . i 2 2 i Similarlytothepathproposalabove,thisproposalrandomlydistributesthe traveltimeoverthetwoarcs,weightedbytheexpectedtraveltimesθ(j )and 1 θ(j ), with variances controlled by the constant α(cid:48) [Gelman et al. (2004)]. In 2 our experience α(cid:48) = 0.5 is effective for our application. It is straightforward to calculate the M-H acceptance probability. 3.4. Updating the Parameters µ , σ2, and ζ2. To update each µ , we j j j sample from the full conditional posterior distribution, which is available in TRAVEL TIME ESTIMATION USING BAYESIAN DATA AUGMENTATION 9 (cid:12) (cid:16) (cid:17) closed form. We have µ (cid:12)σ2,{A ,T }I ∼ N µˆ ,sˆ2 , where j(cid:12) j i i i=1 j j (cid:34) (cid:35)−1   sˆ2j = s12 + σn2j , µˆj = sˆ2j ms2j + σ12 (cid:88)logTi(j), j j i∈Ij thesetI ⊂ {1,...,I}indicatesthesubsetoftripsusingarcj,andn = |I |. j j j To update each σ2, we use a local M-H step [Tierney (1994)]. We pro- j pose σ2∗ ∼ LN(logσ2,η2), having fixed variance η2. The M-H acceptance j j probability p is the minimum of 1 and σ σσjj∗1{σj∗∈[b1,b2]}(cid:81)(cid:81)ii∈∈IIjjLLNN(cid:16)(cid:16)TTii((jj));;µµjj,,σσj2j2∗(cid:17)(cid:17) LLNN (cid:16)(cid:16)σσjj22∗;;lologg(cid:16)(cid:16)σσj2j∗2(cid:17)(cid:17),,ηη22(cid:17)(cid:17). To update ζ2, we use another M-H step with a lognormal proposal, with varianceν2.Theproposalvariancesη2,ν2 aretunedtoachieveanacceptance rate of approximately 23% [Roberts and Rosenthal (2001)]. 3.5. Markov Chain Convergence. The transition kernel for updating the path A is irreducible, and hence valid [Tierney (1994)], if it is possible to i move between any two paths in P in a finite number of iterations, for all i i. For a given road network, the maximum update section length K can be set high enough to meet this criterion. However, the value of K should be set as low as possible, because increasing K tends to lower the acceptance rate. If there is a region of the city with sparse connectivity, the required value of K may be impractically large. For example, there could be a single arc of a highway alongside many arcs of a parallel minor road. Then, a large K would be needed to allow transitions between the highway and the minor road. If K is kept smaller, the Markov chain is reducible. In this case, the chain converges to the posterior distribution restricted to the closed communicating class in which the chain is absorbed. If this class contains muchoftheposteriormass,asmightariseiftheinitialpathfollowstheGPS data reasonably closely, then this should be a good approximation. In Sections 6 and 7, we apply the Bayesian method to simulated data and data from Toronto EMS, on a subregion of Toronto with 623 arcs. Each Markov chain was run for 50,000 iterations (where each iteration updates all parameters), after a burn-in period of 25,000 iterations. We calculated Gelman-Rubindiagnostics[GelmanandRubin(1992)],usingtwochains,for the parameters ζ2, µ , and σ2. Results from a typical simulation study were: j j potential scale reduction factor of 1.06 for ζ2, of less than 1.1 for µ for 549 j arcs (88.1%), between 1.1-1.2 for 43 arcs (6.9%), between 1.2-1.5 for 30 arcs 10 WESTGATE, WOODARD, MATTESON AND HENDERSON (4.8%), and less than 2 for the remaining 1 arc, with similar results for the parameters σ2. These results indicate no lack of convergence. j Each Markov chain run for these experiments takes roughly 2 hours on a 3.2 GHz workstation. Each iteration of the Markov chain scales linearly in time with the number of arcs and the number of ambulance trips: O(J+I), assuming the lengths of the ambulance paths do not grow as well. This assumption is reasonable, since long ambulance paths are undesirable for an EMS provider. It is much more difficult to assess how the number of iterations required for convergence changes with J and I, since this would require bounding the spectral gap of the Markov chain. The full Toronto road network has roughly 110 times as many arcs as the test region, and the full Toronto EMS dataset has roughly 80 times as many ambulance trips. In practice, parameter estimates are updated infrequently and offline. Onceparameterestimationisdone,predictionfornewroutesandgeneration of our figures is very fast. If parameter estimation for the Bayesian method is computationally impractical for the entire city, it can be divided into multiple regions and estimated in parallel. We envision creating overlapping regions and discarding estimates on the boundary, to eliminate edge effects (see Section 7.1). During parameter estimation, trips traveling through mul- tiple regions would be divided into portions for each region, as we have done in our Toronto EMS experiments. However, prediction for such a trip can be handled directly, given the parameter estimates for all arcs in the city. The fastest path in expectation may be calculated using a shortest path algorithm over the entire road network, which gives a point estimate of the trip travel time. A distribution estimate of the travel time can be obtained by sampling travel times on the arcs in this fastest path (see Section 7.3). 4. Comparison Methods. 4.1. Local Methods. Here we detail the two local methods outlined in Section 1. Each GPS reading is mapped to the nearest arc (both directions of travel are treated together). Let n be the number of GPS points mapped j toarcj,L thelengthofarcj,and(cid:110)Vk(cid:111)nj themappedspeedobservations. j j k=1 We assume constant speed on each arc, as in the Bayesian method. Thus, let Tk = L /Vk be the travel time associated with observed speed Vk. j j j j In the first local method, we calculate the harmonic mean of the speeds (cid:110)Vk(cid:111)nj , and convert to a travel time point estimate j k=1 TˆH = Lj (cid:88)nj 1 . j n Vk j k=1 j

Description:
The Annals of Applied Statistics TRAVEL TIME ESTIMATION FOR AMBULANCES USING BAYESIAN DATA AUGMENTATION By Bradford S. Westgate, Dawn B. Woodard, David S.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.