Given a graph which is the product of an equation, we can meaningfully calculate any points that solve the equation, and consequently, also a line that goes through the points. The line, at any of its points, is the answer.

But what about measures like "number of cars per hour / parking lot." Does it make sense to connect the dots of each hourly measure? Wouldn't this lead us to false results, or at least misleading results, since we do not know how many cars were there in the meantime?

4 Answers
4

There are a variety of reasons to connect the points in a graph. If you're only showing one category of values (i.e. if there is a line there would only be one) then the rule of continuous versus discrete is generally good to abide by. However, even discrete or categorical values can be connected when multiple lines might be required in order to make it easy to follow pattern variation across the x-axis. The point is to make a coherent story, and if a line makes the story more sensible or easier to follow then add it. If it detracts then remove it.

In your case a graph with a point for each lot and hours on the x-axis I would very much be inclined to plot lines connecting the hours for each lot. And, while you have means at hours, the x-axis values are interval measured and theoretically continuous (all continuous might be argued to be interval measured), so there is further justification there.

As for bars, as other posters mentioned. I almost always avoid them. A point is usually better even for kinds of data typically thought to be filled by bars.

Also consider that, unless the parking lots are the same size the number of cars is misleading. A graph with fixed area and typical bars implies that each bar is representing the same filling of items in the space equally. You only partially solve that problem with proportions of cars in the lots. An alternative when there is only one time period would be to have empty bars indicating the sizes of the lots and then fill them up with the number of cars. But this would be overly complex when demonstrating multiple lots. Line graphs of proportion of fill connected over hours with a line for each lot is the best way to go here.

Agreed, particularly if you are trying to compare two or more sets of data (say three parking lots) over time. Trying to keep the different color dots and swaps straight without lines is hard. It's usually fairly clear that if a line graph is a line connecting points that the points are discrete, and that if its only a line it's reflecting continuous data.
–
WayneFeb 28 '14 at 21:05

A continuous line indicates a continuum. If averages should be plotted, I would consider either using a bar diagram or a stair-step diagram. Plotting individual points is also possible, and when averages are concerned, you can probably add standard deviation information as necessary.

IMHO, whoever first omitted the precise timing of changes in number of cars is the first one responsible for any misleading results. If you had this information (even if measured with error), time would be a proper continuous variable, not a grouped continuous variable (see Anderson, 1984) necessarily. You'd be free to group observations into hour-based bins if you really wanted to, at which point you'd assume responsibility for deriving any misleading results. Otherwise, by preserving precise times of arrival, you could graph your number of carstime-series over continuous time accurately.

Anyway, assuming you're stuck with number of cars per hour, I agree with @John, you should draw a line connecting your hourly observations. If you lack information about when each incremental change occurred, it's rather hard to say you're misleading anyone unless you fail to describe the limits of the information graphed. Similarly, if you graph your hourly data with a simple bar chart without a line connecting the bins, you're not really guilty of misleading anyone if you don't claim that the changes between hourly observations occur precisely as depicted, on the hour, all at once. If someone misunderstands (as will probably occur with any sufficiently publicized statistic or data), it won't be the case that you misled them, especially if you describe your data and collection procedure in sufficient detail. This much should not be hard to do.

Given basic clarity and thoroughness of data and graph descriptions, there should be no disadvantage of drawing a line to connect your bins. The advantage of connecting your bins is in fact what you seem to think is the disadvantage: drawing those lines mimics a halfway decent equation for the number of cars as a function of continuous time, even though it's based on discrete, hourly observations. You can use a straight line between observations to represent a fairly reasonable assumption that change occurs linearly over each hour, not all at once. Based on such an assumption, any reader can make a decent guess of which minute after a given hour's measurement will see the next car arrive or leave by this fairly common-sense four-step procedure:

Find the point on the line where number of cars $=1+$ the previous hour's observation

Draw a line straight down from this point to find where it intersects with the hour axis

Measure the distance of this point on the hour axis from the point of the previous observation

distance $\div$ distance between observations $\times60=$ minute after the hour of the next car's arrival.

Of course, one can estimate the next car's arrival down to the precise second too, and you can't stop readers from doing this by not providing the line – drawing the line just becomes the first of five steps. Thus if someone actually wants to know how many cars were there in the meantime...well, they can't, because the info isn't available, but they can estimate. If you knock a step off the process for them, I imagine they'll be grateful.

Doing this for your readers with simple, straight lines only implies your comfort with the assumption that change occurs linearly between hourly observations, or more pejoratively stated, your disinterest in any inaccuracies in this assumption. Inaccuracies aren't hard to imagine. First, change necessarily occurs as a nonlinear, zero-inflated function of time. It's nonlinear because the change event is ternary: either a car arrives, leaves, or neither – cars don't arrive or leave in fractional increments. It's zero-inflated because most moments in time won't see a car arrive or leave. You can get around this by treating the line as describing the probability that cars will arrive or leave in any given moment to reach the nearest whole number.

Yet another inaccuracy of the assumption behind straight lines between hourly observations remains. You might expect the rate of change (in terms of probability as above) to change more smoothly over time than your straight lines drawn separately between points imply. In more mathematical terms, you might want the derivative of your number of cars(hour) function to be continuous across hours. You might be able to do this by fitting a polynomial function to your data, but if your purpose is predictive, beware of overfitting.

Another advantage of lines over histogram-style bars (i.e., with no intermediate spacing for adjacent values of hour...let alone charts with bars that don't "touch" each other) arises from your polytomous lot variable. You can superimpose your separate time series for each lot on the same graph to facilitate comparisons, which will help you see whether your lot variable is interesting. Here's a demonstration with some made-up data:

I'm not even going to try to figure out how to do that coherently with bars; I'll leave that to @ChristianStade-Schuldt ;) To be fair, it's even easier to not connect these points as he suggested, but adding the lines helps disambiguate the points corresponding to separate time series from one another. In the end, it's still going to be a little subjective, so judge for yourself:

I for one find myself drawing the lines in my mind anyway. BTW, if you feel the lines in the first figure detract anything from the visual impact of the exact points, don't forget that you can always increase the size of the points, change their shape, or present their values numerically in a separate table.