“In 1845, the United States was largely an agrarian society. Farmers often needed a full day to travel by horse-drawn vehicles to the county seat to vote. Tuesday was established as election day because it did not interfere with the Biblical Sabbath or with market day, which was on Wednesday in many towns.”

When you evaluate projects to work on, particularly if it is not a personal project, it helps to consider giving a lot of importance to how ‘Mission critical’ the project is. You might be developing in the same technology stack, but there is usually a magnitude of difference between what you learn while working on a Mission critical project versus what you learn in a normal project. All the required stuff that goes into developing a system that just cannot go down will be very crucial for you in the long run.

File this in the same folder as A New Metric For Startup Job Postings. The parameters that developers look for when choosing an assignment is fast changing. Number of users is just one – Mission criticality is another – we will see more!

For Interesting Statistics Everyday, Find Statspotting on Facebook and Follow Statspotting on Twitter

“Now let me answer your question about why songs cost 99 cents (or 88 cents or 79 cents, but not usually 49 cents). Selling songs legitimately consists of 3 components: the cost of the recording, which we usually pay to the record company (who then pays the artist); “publishing” cost which goes to the company that owns the rights to the musical composition (who pays the song writer); and other costs such as credit card fees, bandwidth, and technical support.

While wholesale prices vary depending on the label, today most labels charge approximately 65-70 cents per song. Publishing costs a fixed rate of about 9 cents per song. And the other costs average a few pennies per song. Thus, as we have made clear, selling every song in our store for 49 cents a song is not sustainable unless/until the labels change their pricing philosophy.

Based on the data we’ve seen, we think, long-term, the pricing that will result in the biggest overall market for music will involve some kind of tiered pricing new mainstream songs for 99 cents retail, and up-and-coming artists and back catalog artists at a lower price.

We are working with the labels to prove this to them. We think over time we will succeed, but it will take time. The more that customers support our efforts both directly (by voting with your wallets) and by communicating directly to the music industry, the better.”

Most of our decisions these days are data-driven. Or, we think it is that way. The real issue is with data-capture. Let me give an example.

When I drive back from work, I have a choice between taking the freeway and avoiding it altogether. Sometimes I reach faster when I take the freeway, but not always. Every Day, I would like to take this decision based on probabilities. But I have no recorded data and hence no probabilities.

Now you can extend this pain point – As we move towards a data-driven future, hundreds of decisions we take in our everday lives would be driven by data such as this. The data-capture is a pain.

More examples:
Should You Take This Rest Area or the Next? What is the probability of the flight getting delayed?
What is the probability of that Craigslist item getting sold, in what timeframe?
By past history, what would be the Hotel Room availability – better still, what is the probabiltiy that this person would check out by 10 AM, etc?

(You: “I need to check-in early”. Hotel Receptionist: “Well, we can try. Based on past data, there is an 80 percent chance that the person there would be out by 10 AM”)

Summary is this: why do we never capture data that can be captured, and use it to make better decisions?
(On Air ticket fares, this is being used.)

Finally Google Glasses can tell us, like they show in Poker tournaments on TV, the appropriate percentages for different actions everywhere.

But for now – How do you solve this data-capture pain point?

For Interesting Statistics Everyday, Find Statspotting on Facebook and Follow Statspotting on Twitter

We have definitely been hearing the term ‘Big Data’ quite frequently these days – but what scale are we really talking about? We had written some posts on how big cloud computing really is – we spotted some interesting stats on how big big data really is, not from a specifics standpoint, but from a scale perspective.

“Metric prefixes rule the day when it comes to defining Big Data volume. In order of ascending magnitude: kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, and yottabyte. A yottabyte is 1,000,000,000,000,000,000,000,000 bytes = 10 to the 24th power bytes.

Big data can come fast. Imagine dealing with 5TB per second as Akamai does on its content delivery and acceleration network. Or, algorithmic trading engines that must detect trade buy/sell patterns in which complex event processing platforms such as Progress Apama have 100 microseconds to detect trades coming in at 5,000 orders per second.

Flavors of data can be just as shocking because combinations of relational data, unstructured data such as text, images, video, and every other variations can cause complexity in storing, processing, and querying that data. ”

If Standard & Poor’s downgrades America’s debt, the other two big credit-raters are likely to follow. The result: You’ll be paying higher interest on your variable-rate mortgage, your auto loan, your credit card loans, and every other penny you borrow. And many of the securities you own that you consider especially safe – Treasury bills and other highly-rated bonds – will be worth less.

In other words, Standard & Poor’s is threatening that if the ten-year budget deficit isn’t cut by $4 trillion in a credible and bipartisan way, you’ll pay more – even if the debt ceiling is lifted next week.”

For Interesting Statistics Everyday, Find Statspotting on Facebook and Follow Statspotting on Twitter

We have written about how the government measures unemployment before – but the jobs report, the one that quotes the number of jobs gained or lost in a particular month, gets revised so many times later, that a lot of people are questioning if the report can ever be close to accurate. A Time article on this more or less summarizes the issue:

“Revisions are of course the norm with government reports, but most of those reports don’t get the attention that the jobs report gets. Worse, the size of the revision seem to matter more at times when the economy is weak. A difference of 50,000 jobs, which is well within what the government deems as an acceptable error, can really change the employment picture at a time when the economy is producing less than 100,000 jobs a month.”

“Making matters worse, the monthly numbers we get on the jobs market come from not one, but two surveys. The number of jobs comes from something called the establishment survey, which is done by asking 140,000 businesses and 440,000 workplaces to fill out a form in the middle of the month detailing how their payrolls have changed. The government takes that data and makes an estimate of how many people are employed in the entire country. The unemployment number, which is also released on the first Friday of the month, comes from a survey of 60,000 households. Despite the smaller sample, the unemployment rate often gets more attention than the jobs tally. Worse, at times the two surveys can say opposite things, which is exactly what happened last month when the unemployment rate rose, but so did the number of people with a job. The differing data can leave economists scratching their heads.”

“Despite the warm and fuzzy reaction to the surprisingly good jobs number, there was another very important number that seemed to suggest the economy is far from healed: The unemployment rate. The jobless figure rose to 9.0% in April, from 8.8% the month before. And while some economists were expecting the number to jump even while the economic picked up, April’s jump was worse than expected. Here’s why:

Month after month economists have been expecting the unemployment rate to rise, even at the same time predicting the economy would improve. How is that? Well, the unemployment rate tracks the number of people who are looking for work, not the number of people who are out of work. If you are out of work and not looking, well then you are basically invisible to the government (that’s certainly a problem, but another story). So as the economy improves, more people get encouraged they will find a job and more people look for work. Presto: The unemployment rate jumps. Good sign, right?

It would be if that was what happened in April, which it was not. Actually, the number of people in the labor force not looking for a job actually rose. At the same time, the number of people who want a job and can’t get one rose by more than 200,000. So there is no easy way to explain away the unemployment rate, other than to say, at least by this measure, the jobs market was worse in April.”

For many new services on the web, audience measurement is getting tougher since most of the usage is on mobile devices like smartphones and tablets. comScore seems to have solved this problem with its latest solution.

[From comScore Press Release]

” comScore, Inc. (NASDAQ: SCOR), a leader in measuring the digital world, today announced the beta release of the comScore Media Metrix Total Universe report, which provides audience measurement for 100 percent of a site’s traffic, including usage via mobile phones, apps, tablets and shared computers such as Internet cafes. This never-before-available report, which will be available to comScore Media Metrix subscribers, will be released with April data in the U.S. and U.K. (with other global markets being released in subsequent months) for all publishers currently leveraging the comScore Unified Digital Measurement™ (UDM) tag. The initial report features standard comScore Media Metrix key measures, such as unique visitors, reach, and page views, providing an unduplicated view of site audiences across multiple media platforms.”

Did you know that the US Government measures unemployment using a survey, and not on number of persons claiming unemployment benefits?

“Because unemployment insurance records relate only to persons who have applied for such benefits, and since it is impractical to actually count every unemployed person each month, the Government conducts a monthly sample survey called the Current Population Survey (CPS) to measure the extent of unemployment in the country. The CPS has been conducted in the United States every month since 1940, when it began as a Work Projects Administration project. It has been expanded and modified several times since then. For instance, beginning in 1994, the CPS estimates reflect the results of a major redesign of the survey. (For more information on the CPS redesign, see Chapter 1, “Labor Force Data Derived from the Current Population Survey,” in the BLS Handbook of Methods.)

There are about 60,000 households in the sample for this survey. This translates into approximately 110,000 individuals, a large sample compared to public opinion surveys which usually cover fewer than 2,000 people. The CPS sample is selected so as to be representative of the entire population of the United States. In order to select the sample, all of the counties and county-equivalent cities in the country first are grouped into 2,025 geographic areas (sampling units). The Census Bureau then designs and selects a sample consisting of 824 of these geographic areas to represent each State and the District of Columbia. The sample is a State-based design and reflects urban and rural areas, different types of industrial and farming areas, and the major geographic divisions of each State. (For a detailed explanation of CPS sampling methodology, see Chapter 1, of the BLS Handbook of Methods.)”