Does Your Download Progress Bar Lie to You?

Share

Does Your Download Progress Bar Lie to You?

Different browsers do this differently. Some show a little bar to indicate how much of the file you have downloaded as well as an estimate of how much longer you can expect to wait. Well, now the time has come. I am going to check these download progress bars. Why? I have no idea.

Is the Progress Bar Accurate
—————————-

Let me start with the download progression bar from the Safari browser. Why? Well, I usually use the Google Chrome browser, but it doesn't show a nice visual bar like this.

Perhaps you will notice that I picked a nice large file to download. Next step was to load a video of this download progress into Tracker for video analysis. I set the maximum length of the download bar to be 1.0 such that the length at any given time will give the percent downloaded. There was other important data besides bar length and time. I also needed the actual size of the file that was downloaded as well as download rate and the projected time remaining.

Here is a plot of the size of the download bar and the reported size of the downloaded file (as a fraction of the total download size) vs. time.

The two lines are right on top of each other. This means that the browser gives an accurate representation of file size with the progress bar.

Estimated Time Remaining
————————

I understand the browser doesn't know the future. It can only estimate how long the download will last. The browser gives a value for the estimated time. Since I already downloaded the file, I know the actual time left. Here is a plot of estimated time left and actual time left (as a function of time).

The blue line represents the actual time remaining. Of course, this is a straight line since I am recording values at regular time intervals. The green line is jagged-looking because Safari reports the time remaining in minutes (unless there is less than a minute left).

It doesn't seem fair to look at how much the Safari browser's estimate was off for this minute data. Let me just look at the points where the estimated download time changed. So, if the download bar went from 5 minutes to 4 minutes, right at that moment I suspect there is actually 4 minutes left.

Now let me plot the estimate error (how much the time remaining estimate is off) as a function of downloaded data.

The first thing I noticed was that the Safari estimate was always too high. Perhaps Safari adopts the philosophy "estimate high and then give low – that way everyone will be surprised." Just imagine what would happen if they said "12 seconds left in the download" but really it was a minute. The other thing to notice is that the error gets smaller with time. Why? Well, if there is only 4MB of data left to download, it will be easier to predict how long this will take instead of 1GB of data left.

In this plot, I weighted the estimate error based on how much data was left to download. So a 1-minute error at the beginning of the download isn't as bad as a 1-minute error at the end.

It seems the big spike is due to this constant over estimate of about 2 minutes.

Checking the Download Rate
————————–

Although the browser gives the download rate (I will use units of MB/sec), there is also a way I can check this value. Let me just show a few of these download vs. time data point. Here are the first four.

This diagram also shows the simplest way of finding the download rate (which I call r). I could say that for the fourth data point, the download rate would be the change in the file size (from the previous data point) over the time interval. There are other methods that might give a smoother plot for the download rate – but this should work fairly well since the download rate is close to linear. Using this method, I can plot the reported download rate along with this calculated rate.

The green line is the reported download rate – it is much smoother than the calculated rate. Why? Two reasons. First, this rate calculation method isn't the best. (Technically, it might be the worst way to calculate the rate.) Second, the reported download rate could depend on several things. If it is using the file size to calculate the download rate, it will have many more data points to work with. For my data, I recorded the screen capture at 15 frames per second but only looked at one frame out of 100. (I had a video analysis step size of 100.) You really didn't think I would look at 20 minutes' worth of video data without skipping, did you?

Even if I look at the two previous data points to calculate the download rate, it still looks pretty jumpy. Really, there is another problem. Let me zoom in on the end of this data rate plot.

Smoothing out the data rate that I calculated will still give a higher value than the reported rate. Is it possible that Safari is reporting the total (average) rate to that point instead of the instantaneous rate? Just to be clear, here is the calculation for the average rate and the instantaneous rate:

There is one small problem. My data has a non-zero data file size at time t = 0 seconds. This means that if I just calculate data size divided by time, it will give me something crazy. Since the data seems to be increasing at a fairly linear rate at this point, I can just find the time that the data would be at 0 MB – this happens to be at -11.64 seconds. Adjusting for this time, I get the following plot for the overall average data rate.

The blue line is the download rate as reported by Safari. It seems that Safari is reporting the overall download rate and not instantaneous rate. Oh, they aren't the same? I suspect this is because Safari is also rounding to the nearest 0.1 MB/s.

How Do You Estimate the Time Remaining?
—————————————

If it were up to me, I would use the instantaneous download rate to estimate the time left. I suspect that Safari uses the overall average data rate to get this estimate. Let's find out. With either rate, I think you would use the following formula to find the time left.

Here I am representing the file size with d and di is the current file size. The download rate is r – and this can be either the instantaneous or average. This first plot shows the time left calculation using the instantaneous rate along with the prediction from Safari.

And here is the plot using the overall average download rate to calculate the time:

It seems clear that the Safari browser uses the average download rate to estimate the time remaining. Really, the only difference between the blue line (Safari) and the green (my calculation) is that Safari rounds the time to the highest minute.

I guess this decision is the most appropriate. If you used the instantaneous download rate, the time remaining would jump all over the place. This would make some people quite unhappy.

Conclusion
———-

Back to the question: Did the browser lie? I guess this depends on your definition of "lie." The time remaining was clearly wrong – but you can't blame the browser for not being about to see into the future. (That will be included in a future software update, though.) The other issue is the "download rate." I would expect this to be the instantaneous rate (for no particular reason) but it was in fact reporting the average download rate.

What about other browsers? I have some data from the Chrome download progress (but it doesn't show a bar) – I guess I can look at that.

Actually, this is a nice example of a problem that students have with introductory physics. In lab, students will often collect position and time data. The goal will be to use this data to find the velocity of an object. There are two common ways students do this:

The first one is surprisingly common for students to use. Sometimes, it will work – but many times it won't. For some reason, students are strangely attracted to the idea that velocity is just distance over time. (I blame middle school math textbooks.) Of course, in the case of downloads, data divided by time does have a real meaning – provided that there is zero MB downloaded at time zero seconds.

Let me just give one preemptive comment (since I can see the future and know that someone will say it):

"Don't you know that Safari is based on WebKit? You can just look at the source code and see how it calculates time remaining. Do they actually pay you to write this stuff?"

My response is as usual. What if I gave you jigsaw puzzle? That would be nice, right? Who doesn't love a nice puzzle. Well, for this puzzle, you wouldn't even have to put it together. Why? Well, the picture of the final result is right there on the front of the jigsaw puzzle box.