Our Data

Where does our traffic data come from?

Measuring traffic is no mean feat. Our data comes primarily from two sources. The first is through something called the Archived Data Management System, or ADMS. That system stores millions of signals every hour that record the flow of traffic across the county. If you’ve driven in Los Angeles, you’ve likely contributed to this database. Most of the inputs come from Inductive-Loop Traffic Detectors, those strange circles you see embedded on freeways and streets.

Every time a vehicle passes over one, two pieces of data are recorded: the time and place, and the amount of time the sensor was depressed. The first tells us when and how many vehicles are travelling. The second tells us the speed at which a vehicle is moving (the longer the sensor is depressed, the slower the traffic). Because these sensors are arrayed along freeways, we can use them to calculate how quickly traffic is moving at different times and places. There are 14,000 sensors deployed across freeways and main streets in Los Angeles County covering 5,400 miles of roadway cumulatively.

In addition, we can chart the progress of public transportation. Every bus and train is equipped with a sensor which reports its location every 30 seconds. We can use that to see if public transport is arriving on schedule, if buses are getting bunched up, and how many are operating at any given time.

But with so much data coming in, we need a way to process it all. For this, we used a data-management system called TransDec. ADMS is housed within TransDec. The TransDec system allows us to query the ADMS data and understand it in terms of space and time. TransDec is essentially the engine and the interface we use to make sense of all the data. Both of these platforms were built by our partners at USC Viterbi’s Integrated Media Systems Center.

We also receive data collected by the California Highway Patrol, Los Angeles Sheriff’s Department and others about road accidents.

Taken together, all of this information can paint a nuanced picture of what’s happening on roadways all across Los Angeles County. We can look at trends that happen over years, or even in the course of an hour.

Our research is supported in part by the METRANS grant from LA Metro to USC’s Integrated Media Systems Center. The principal investigator for the grant is Prof. Cyrus Shahabi.

How do we get our crime data?

Our crime data is official. But that doesn’t mean it’s complete or even consistent. We collect the crime data directly from the the law enforcement agencies, or through open data portals, such as Socrata. There are 46 of them, however, and many don’t make their data easily accessible to the public. Right now, we have something of a patchwork. We have data from the two largest agencies, the Los Angeles Police Department and the Los Angeles County Sheriff’s Department, as well as smaller agencies like the Pasadena and Santa Monica police departments. But we’re missing data from others, such as Inglewood and Glendale. In time, we hope to fill in those gaps. If you would like to look at the raw data from the LAPD, the largest agency in the county, you can find much of it here.

Though there are some national standards for how agencies report their data, called Uniform Crime Report, there are often inconsistencies from agency to agency. Some departments use a different nomenclature for the same crime (larceny instead of petty theft, for example). Others include location specific data for each report, while others make that information difficult to come by. In some jurisdictions, we have crime data that goes back to 2005; in other places it’s just a few years. Our team has cleaned and standardized the data wherever possible. This is an ongoing process. We regularly find inconsistencies or outliers in the data. Sometimes there are actual mistakes (listing a year as 0017 instead of 2017). When we come across these issues we make efforts to correct them.

Another issue to be aware of when looking at crime data is that it’s only as good as what’s being reported. Some communities work well with law enforcement; others are more reticent. That can impact the number of crimes that are reported. In areas that have a history of tensions with the police, some crimes may never be called in. That’s why we have paid particular attention to the baseline of reported crimes. When we notice a sharp increase or decrease, it usually tells us something important.

If you have questions about the data, our process or just want to learn more, contact us at askus@xtown.la

What are we measuring for air quality?

Our air quality data is derived from the 12 monitoring stations of the Environmental Protection Agency and the South Coast Air Quality Management District. We collect hourly PM 2.5 AQI (Air Quality Index) data through the Airnow API. PM 2.5 refers to the mixture of solid particles and liquid droplets found in the air. PM 2.5 consists of particles smaller than 2.5 µm in aerodynamic diameter. How small is that? The EPA compares it to a single hair from your head. The average human hair is about 70 micrometers in diameter, which is as much as 30 times larger than the largest fine particle.

What scale do we use to measure air quality?

The air quality data is updated every hour. EPA receives the raw PM 2.5 measurements and converts that to the Air Quality Index scale. PM 2.5 levels are generally proportional to AQI. (However, the actual AQI number is calculated using several different variables and is a 24-hour average.) So, for example, if the level of PM 2.5 is 31 µg/m3 at the East San Fernando Valley station, it is converted into an AQI scale of 82, which is considered to be a moderate level.

How do we get air quality data at the neighborhood level?

This is a unique process that our partners at the USC Spatial Sciences Institute has developed. Their research indicates that PM 2.5 concentrations are influenced by the surrounding geographic context. We use geographic data from OSM (OpenStreetMap) to generate a “geographic abstraction” for each monitoring station automatically. The geographic abstraction is used to describe the neighborhood environment for a given location. For example, we compute the built environment, including the length of various road types, the point number of various locations types, area of open spaces, and so on. Then we build geo-context by selecting important features based on the air quality data.

We create a fishnet of grid points across Los Angeles County. With the geo-context, we are able to compute PM2.5 values for the locations in the fishnet that do not have monitoring stations. We have around 3,000 points in total, four points within one square mile. For each point, we predict an hourly PM 2.5 AQI.

Our research is supported in part by the NIH PRISMS grant to the USC’s Information Sciecnes Institute Keck School of Medicine. The principal investigators on the grant are Profs. Jose Luis Ambite and Frank D. Gilliland. The grant is supported in part by Microsoft Corp., and in part by the NVIDIA Corp.