Sports stars, musicians, actors—their salaries are often discussed as a matter of course. This is less true for authors, and it creates unrealistic expectations for those who pursue writing as a career. Now with every writer needing to choose between self-publishing and submitting to traditional publishers, the decision gets even more difficult. We don’t want to screw up before we even get started.
When I faced these decisions, I had to rely on my own sales data and nothing more. Luckily, I had charted my daily sales reports as my works marched from outside the top one million right up to #1 on Amazon. Using these snapshots, I could plot the correlation between rankings and sales. It wasn’t long before dozens of self-published authors were sharing their sales rates at various positions along the lists in order to make author earnings more transparent to others [link] [link]. Gradually, it became possible to closely estimate how much an author was earning simply by looking at where their works ranked on public lists [link].
This data provided one piece of a complex puzzle. The rest of the puzzle hit my inbox with a mighty thud last week. I received an email from an author with advanced coding skills who had created a software program that can crawl online bestseller lists and grab mountains of data. All of this data is public—it’s online for anyone to see—but until now it’s been extremely difficult to gather, aggregate, and organize. This program, however, is able to do in a day what would take hundreds of volunteers with web browsers and pencils a week to accomplish. The first run grabbed data on nearly 7,000 e-books from several bestselling genre categories on Amazon. Subsequent runs have looked at data for 50,000 titles across all genres. You can ask this data some pretty amazing questions, questions I’ve been asking for well over a year [link]. And now we finally have some answers.
When Amazon reports that self-published books make up 25% of the top 100 list, the reaction from many is that these are merely the outliers. We hear that authors stand no chance if they self-publish and that most won’t sell more than a dozen copies in their lifetime if they do. (The same people rarely point out that all bestsellers are outliers and that the vast majority of those who go the traditional route are never published at all.) Well, now we have a large enough sample of data to help glimpse the truth. What emerges is, to my knowledge, the clearest public picture to date of what’s happening in this publishing revolution. It’s a lot to absorb, but I believe there’s much here to learn.

I wonder why the $500.000 bracket is more prevalent among BPHs than indie authors? Seems an odd anomaly.

Seems odd, no?

My take is that either it is a "small sample" artifact due to the fact that so few titles generate that much author revenue that trends can't assert themselves (in which case future reports with more data will clarify outlier trends) or, possibly, the effect is real and due to the BPH bestseller-driven model and low royalties.

As in: BPHs only do marketing and promotion for select titles they give high advances to acquire so those titles actually get a unit sales boost from the trad publisher promotions. Some of the very highest selling titles make it to the million dollar mark but many get "stalled" by the low royalties. (As Konrath snarkily points out, the promo funding effectively comes out of the author's share rather than the publisher's.) In other words, the revenue generation distribution is bi-modal: it sees a step function increase in promotion efforts that boosts unit sales significantly (somewhere near the $500k levels) but once sales reach a given level any added promotion has diminishing returns and the lower royalty rate prevails.

The effect may be an artifact but I'm inclined to believe it is real because trad publishers actively seek and promote likely outlier titles from name brand authors; they want to control high revenue books. So it makes sense to find a bigger fraction of high income titles in their offerings. (They really don't do squat for midlisters.) But the books that can generate sales high enough to hit a million in income are few and far between and fewer still at BPH royalties levels whereas an outlier indie title captures more of its revenue as author income. In effect, many tradpub $500k titles would be likely $1m titles as Indies, with likely lower unit sales but higher author net.

We'll have to see how things breakdown with more data to see if the effect repeats on the next (quarterly?) report.

The one caveat to bear in mind is that this data is Amazon specific and the amazon customer base sees more indie titles than other customer bases. The splits are going to be different elsewhere, especially when it comes to audiobooks outselling hardcovers.

As a reader, my main takes are that bestselling indie titles review better than equivalent-selling (or equally-priced) trad-pub titles so there really is no significant quality difference there, and that paying more for a tradpub title doesn't buy me anything.

The gatekeepers are bringing nothing significant to the table; a good story is a good story regardless of how the author chooses to bring it to market.

As a reader, my main takes are that bestselling indie titles review better than equivalent-selling (or equally-priced) trad-pub titles so there really is no significant quality difference there, and that paying more for a tradpub title doesn't buy me anything.

You can't have it both ways, either the books are "equally-priced" or you're "paying more". And if the price is different the reviews lose a lot of their validity -> take it away from books to consumer goods: cheap goods get almost always the same or better reviews even if the higher price is more than justified (and also if the cheap ones are overpriced for what they are offering).

So BPH are aiming for that million dolar pie in the sky and promote the hell out of the book when it is released just to find out a month or two later that people are less enamored than publishers and editors. Pull the promotion to save some bucks and the book lingers on the top tier without rally making it sky high? Is that your take on it? While the same phenomena never occurs among indie publishers because they don't have that kind of marketing budget available?

You can't have it both ways, either the books are "equally-priced" or you're "paying more". And if the price is different the reviews lose a lot of their validity -> take it away from books to customer goods: cheap goods get almost always the same or better reviews even if the higher price is more than justified (and also if the cheap ones are overpriced for what they are offering).

But in general this is very interesting, thanks for posting!

Its not the same books in the comparisons.
Rather it is that it works out the same either way: if you look at equally priced books, the indies have better reviews; if you look at equal sales volumes, independent of cost, you get the same, confirming result, which takes price (which tends to be lower for indies) out of the discussion of ebook quality.

This report is popping up all over the author websites because it sheds some light at Amazon's closemouthed ways. And explains why Amazon doesn't brag. They're fast approaching danger territory with ebook sales.

Perhaps because some independent authors are notorious for getting all their friends and family to post fake 5-star reviews?

No, I dont buy that. First all authors are known to do that, not just self pub. Second, if you run the numbers and only include books with greater than 100 reviews so a few friends are lost in the average the relationship still holds.

So BPH are aiming for that million dolar pie in the sky and promote the hell out of the book when it is released just to find out a month or two later that people are less enamored than publishers and editors. Pull the promotion to save some bucks and the book lingers on the top tier without rally making it sky high? Is that your take on it? While the same phenomena never occurs among indie publishers because they don't have that kind of marketing budget available?

Yes.
But also that there is a limit to how much you achieve even with active promotion.
Diminishing returns and all that. So they tend to cluster at the great but not exceptional level.
The top revenue generators--the jackpot! books-- seem to get there independent of path or promotion. Once you pass a certain threshold people buy the book simply because people are buying the book. (Kinda like 50 Shades or Jonathan Livingstone Seagull, back in the day.)

Perhaps because some independent authors are notorious for getting all their friends and family to post fake 5-star reviews?

Shrug.
Nobody forces you to believe anything.
But the study includes the raw data and it will be analyzed to death in the days to come.
Those open to numerical analysis will consider the data; those that aren't will keep on denying. Doesn't matter.
In the end, we as readers aren't the target audience.
It is midlist authors and newcomers.
It will only affect us over time as authors start going indie more often or use the data to get the publishers to actually negotiate living wage royalties so they can keep on writing.

My remaining question for the article is that most of the data comes from the top 7,000 best sellers. How hard is it for an author to get up in that bracket (for each category)? How much noise gets introduced into the data if you start increasing that number? Is it even possible?

My remaining question for the article is that most of the data comes from the top 7,000 best sellers. How hard is it for an author to get up in that bracket (for each category)? How much noise gets introduced into the data if you start increasing that number? Is it even possible?

Actually, on any given day some subcategories can be headed with single digit sales. Others might need thousands. Amazon slices and dices their categories to help buyers find books so there are lots of categories to slot books in.

This data set is a time slice snapshot.
Further slices will provide further snapshots to enable time-based analysis.