The Business Rusch: Not A Real Survey

In the traditional news cycle, politicians, major organizations, and anyone with a PR brain dump stories into the week before Memorial Day (and sometimes the week after) in the hopes that no one will notice. Generally speaking, no one does notice because most of America is focused on the three-day weekend and the unofficial start of summer. Since this is a US-only holiday, the rest of the world goes on and wonders why the US drops out for a few weeks. Europe, for example, seems to take its summer vacation in August (all of August, in some countries), rather than piecemeal across a three-month period.

Which is why most of the interesting discussions in the past week have taken place in Europe or on the blogosphere. (Although, even on the blogosphere, it seems, Americans vanish. My readership is always down around Memorial Day, and it creeps back up slowly as the summer months progress. People look at more pages during that time too, apparently catching up on all that they missed.)

I’ll get to the European discussion in a moment. First, though, a story that got dumped into the silence that is the week before Memorial Day.

Remember last February everyone involved in traditional publishing pointed to the big flap between the distributor Independent Publishers Group (IPG) and Amazon? IPG pulled all of its titles from Amazon because of a contract negotiation in which Amazon demanded something IPG considered unreasonable.

When IPG pulled the plug on Amazon, it left countless authors who were published through small and regional presses without any ability to sell on Amazon. The blogosphere went nuts, partly through traditional publishing and partly with these writers stuck in a situation out of their control.

Through it all, Amazon was portrayed as the Big Bad, asking for something terribly unreasonable, and the only choice IPG had was to pull books. I blogged then about IPG doing what any good business does in a tough negotiation. (If you follow the link, note how dated that post is: all of the flaps have ended and have mostly been forgotten.)

In that negotiation, IPG upped the ante. IPG said that if Amazon wanted to play hardball, IPG would play too. IPG didn’t cave; it continued to negotiate.

Three months passed. Occasionally, IPG’s CEO would blog about how difficult the negotiations were, but those negotiations never ended. And, magically, last week, the two companies came together, signed a new contract, and voila!IPG’s books are for sale on Amazon again. Just like I told you they would be. Everybody screamed about Amazon, when Amazon was doing what businesses do—negotiating terms. IPG never quit negotiating either.

IPG’s president refused to discuss the terms with the media, and told his clients (whom I assume are the small publishers, not the writers), “”I feel that the experience has clarified some things for us and our clients, and that now we are all even better equipped to navigate through this rapidly changing industry.”

The story was dumped onto Friday afternoon so that no one in the industry would notice. Writers will discover their titles are back up during the next few weeks, and everyone will forget that IPG was part of the big Amazon-Is-Evil flap.

The point of this week’s blog isn’t the IPG flap, but the way that the media can be manipulated. It happened last week with The Taleist’s survey of indie-published writers.

On Thursday, The Guardian in the UK published an article with this headline: Stop the press: half of self-published authors earn less than $500. Initially, I blamed The Guardian for that headline, not because the information was incorrect or even because it’s inflammatory. (Which it isn’t, given what the Taleist wrote—but I’ll get to that.)

It was because I had taken that survey. And while I knew that the questions were incomplete (and somewhat biased), I also knew that the survey writers were trying to be comprehensive. I think it took me 15 minutes to fill out the damn thing.

I figured the Guardian’s reporter, Alison Flood, didn’t drill deeply into the numbers. Maybe she did, maybe she didn’t. But I must say that the bias in the article isn’t hers. If anything, she toned down the results of the survey.

Let me explain something, ancient recovering reporter that I am. When Reporter A (who used to be me) is working under deadline, she doesn’t have time to read 60+ pages of a newly published survey. She must rely on the conclusions published with the survey. She will dig into the numbers that fascinate her, or might seem newsworthy, but she doesn’t read the entire survey.

For the past two days, I’ve been trying to read the entire survey. I bought the e-book (yes, they released it as an e-book) so you don’t have to. I’m not going to link to this because of the commentary and the bias. If you want to spend your own money reading the numbers, go ahead. But I’m not going to encourage it.

Not because I’m mad at the survey. The guys at the Taleist did something that needs to be done. Someone needs to survey writers who are publishing outside of traditional publishing, and get some hard facts and figures.

Unfortunately, the Taleist’s survey is not that survey.

First, problems with methodology, which the Taleist guys freely admit.

1. The survey is self-selecting.Any time you get a self-selecting survey you immediately run into the problem of bias. Bias can cut two ways. You’ll get the people who love, love, love whatever it is, and you’ll get the people who tried it, hated it, and want to tell everyone about whatever it is. The Taleist guys know that’s a problem, and they tried to compensate for it. Unfortunately, they compensated through their bias, and that caused additional problems, which I will get to.

2. The survey is anonymous. Well, hell, I could have participated five times from five different computers if I wanted to. You know the old Chicago joke: Vote Early and Vote Often. There’s no accountability in an anonymous survey. The survey writers don’t even know who participated. And they didn’t try to verify any information they got.

This is not unusual in such surveys, by the way. And yet those of us who love numbers will often take this stuff as gospel. It’s not. Because I could, on my five different posts, give five different numbers, none of them accurate, and no one would know.

3. The sample size is too small. The sample size is 1,007 people. The Taleist guys tried. They really did. I saw notifications about this all over the web, and I came over to take the survey. I did not, however, tell a listserve I’m on with dozens of indie writers (all of whom are making money), nor did I tell Dean or anyone else about it. In other words, by myself with some prodding, I could have added 10% more respondents to the list without even trying.

1,007 is a ridiculously small percentage of indie writers. Let me show you why. Mark Coker, on his year-end blog on Smashwords, a distribution service that many indie writers do not use (preferring to publish only on Kindle) says that 38,000 writers and small publishers around the globe used his service in 2011. That was up from 12,000 in 2010. I’m sure more use it now.

Look at those numbers: 38,000 writers and small publishers who actually know about Smashwords. Most indie writers use Kindle Select or other exclusive services, which means that they do not use Smashwords because of the exclusivity. I’m not even going to hazard a guess as to how many writers and small publishers there are publishing outside of traditional publishing right now because I will be wrong. But 1,007 is a terrible sample size. [Later: see the comments for notes from statistics folks on sample size.]

4. The questions were bad. Many, many times as I took this survey, I had to chose between set answers that did not apply to me at all. I had to pick the least offensive one.

Questions always show a survey writer’s bias. And ironically, the Taleist guys had some serious pro-traditional publishing biases from the start. I say ironically because they indie-published the results, and if they sell about 150 copies of their survey (which I’m not helping with), they will easily make that $500 that they’re talking about as the mean income for writers.

(And, speaking of bias, these guys titled the survey results Not a Gold Rush. Apparently, they thought indie-publishing was a gold rush. Why else would they have chosen that title considering all the other things this survey asked about?)

The Taleist guys have no idea what real writers are like. How much they write, how much they publish (traditionally), and how long their careers last. These guys are shocked that most of the respondents were women of a certain age, who had been in the business a long time. (Maybe longer than the survey writers had been alive.) The survey writers postulated it was because a lot of romance writers responded.

Nope, guys, it’s because your survey attracted real writers. The folks who make a living at it.

Here’s a sign of bias: When the survey writers asked what people did for a living, they did not allow the respondent to answer that they were self-employed. I remember that.

They write, “We focused on hours of employment and did not offer an option for self-employment. We both know people for whom self-employment might mean 100 or more hours of work per week and for others it might mean writing website copy.”

Okay, fine. But taking self-employed off the list of choices was just silly. Because they still could have looked at hours worked and measured it, then pointed it out in the responses. (Half of the self-employed people only spent 100 hours at their jobs—and oh, look, they were the ones who earned less than $500 that year. D’oh.)

I remember staring at that for a while, wondering what the hell these guys meant? I couldn’t answer the employment question honestly, and then the survey asked me how much I worked at whatever it was I did? Confusing. It pissed me off. I’ve been self-employed my entire life. I would have quit responding right there, but I had already invested too much time into it, so I finished.

I had these kinds of problems over and over and over again with this survey. The questions were awful.

5. The survey writers used their bias to tally the results. From the overview, “We believer there are also mistakes in the answers. Question 17 asked, ‘How many books have you self-published by year?’ One respondent wrote ‘16,000’ which was enough to set the average number of books self-published by author in 2011 as 112 with the help of a few similar responses. While we speculate later in this report that authors have responded to the new self-publishing channels by pulling manuscripts from the bottom drawer to publish, it would be a hell of a career (or bottom drawer) that held 16,000 manuscripts…Even publishing 112 manuscripts would require publishing more than two books a week.”

Clearly that 16,000 was a mistake. But they’re doubting 112. Dean and I published more than 270 e-books since 2010, and we still have a lot of backlist to go. For those of you who are math-challenged, that 135 books per year. (Although it didn’t work out that way. It was closer to 100 in year one and 170 in year two.)

Here’s the kicker:

“We assume that the respondents misunderstood the question as ‘How many books have you sold by year?’ Where we found individual responses that significantly affected the results and that we did not believe to be accurate, we have corrected the data by filtering out these responses.” [Emphasis mine]

In other words, they probably believed that our 270 manuscripts were impossible to do in a year, so they filtered it out. And God knows what else they filtered out.

So not only is the data gathered suspect, but the report is as well. They tampered with the evidence they gathered.

Bad questions, bad methodology, bad analysis.

If The Guardian reporter and all of the others who used this survey as in the words of The Guardian, “one of the most comprehensive insights into the growing market to date,” had actually read the report, they would have realized this survey is invalid.

Which is a shame. Because initially, when I planned to write this blog, I was going to discuss the hopeful stuff in the survey that you can find without the bias. I mean, if the information were accurate—and it’s clearly not—the fact that the average earnings for a self-published author is $10,000, and the median is $500. That’s incredibly good.

Because if you had done the same kind of survey of writers who have published at least one thing over three years ten years ago, the average earnings might be higher (depending on who responded) but the mean would be lower. Most writers who published one or more things in three years earned nothing in two of those years. Nothing.

And as to some of the other conclusions, that clearly the traditional publishing gatekeepers work, because the writers who are doing the best were traditionally published—hogwash.

What that means is that we traditionally published writers have a built-in backlist and at least 1,000 true fans—the folks who buy our work repeatedly. It takes more than three years to build up that kind of audience, and it takes publishing more than one thing.

It has nothing to do with gatekeeping. It has to do with readership. Ask the same question of indie writers five years from now (in a legitimate survey) and watch that number go up, as indie writers build an audience by publishing a lot.

We need a good comprehensive survey of writers who are stepping outside of traditional publishing. Mark Coker can’t crunch his numbers and reveal this information because Smashwords doesn’t have access to the numbers for writers who go direct to Amazon or Barnes & Noble or the iBookstore.

But his survey would be a hell of a lot more accurate than this Taleist one is.

I would love to know how well indie writers and small publishers are doing. Unfortunately, it’ll take years to find out. The normal ways of learning this information are no longer available. In the past, writers organizations used to survey their members, but most writers organizations do not accept indie published writers even if their work sells tens of thousands of copies per month. Surveying writers with careers longer than five years doesn’t work because the indie publishing revolution isn’t that old yet.

So we’re going to have to wait. But when someone quotes this survey and makes it sound like indie writers aren’t doing well, tell that someone about the methodology. This survey isn’t valid, no matter what the press says.

This blog wouldn’t exist without new ways to crowdsource a project. I write every week because you guys return and because you donate to keep me writing nonfiction. I make my living off my fiction writing—and have for longer decades—so writing nonfiction is always a risk.

You guys have kept the blog alive for more than three years with your comments, e-mails, links, and donations. So if you found something useful in this post, please leave a tip on the way out. That’ll guarantee there will be more posts in the future.

83 Comments

The biggest problem I have with this survey is it doesn’t compare / contrast with traditionally published authors. To that end…I’ve created my own survey…designed for ALL Authors: Self, traditional big-press, traditional small-press, hybrid. Even for those who have not publihsed yet – as I want to know what “route” they are anticipating going.

If you are an author – please help contribute to dataset by taking the survey at:

I hate seeing all the blog posts quoting information from this extremely flawed survey. Does anyone know if the “raw data” for it is published anywhere – It would be interesting to look at that and try to do some other analysis of the data – like filtering out people who are “professionally” self-publishing verses
hobbyists.”

Kris,
Excellent post as usual. It reminded me of the stats class I took some time ago. On the first day the instructor said “Statistics are what you make of them.” He said it with a smile.

The last two days of the class consisted of lessons on how to count cards and how to skew survey numbers to your advantage. It was at that point I understood the smile. Those last two days were worth the whole semester.

By the way, your post (to me) is black font on a light grey background. The comments are black on a darker grey background and use a smaller font. A little hard on the eyes. I’m currently rebuilding my wordpress site also so I feel your pain. I would suggest that you make your comment box taller too. But I like what I see so far.

Looks like I’ll have to get a new theme, because this one just doesn’t work. Thanks, Randall. Apparently some browsers get it completely differently. {sigh} And I love your recounting of the stats class.

Fascinating. And thank you for taking the time to to write this up and for folks to comment who are far more knowledgeable on statistics than I am. I would love to see a survey that has these issues fixed and has information that is, you know worth my time.

Sadly about the only thing I know how to do is set up a livejournal poll but I don’t think that quite counts. 🙂 (I could try though if you want me too :P)

@Dave, there’s a lot of folks here with good suggestions and information backed by years of experience, I would highly recommend you pay attention.

As always, Kris, thank you for taking the time to put your blog together, and thank you fellow commentors for your input. I learn from you guys/gals as well as Kris.

Okay, I haven’t read all the comments, but I’ve been reading about this all over the Internet lately. My opinion of this survey can be summed up this way: Of the 1,007 people who participated, these are the results. Maybe.

Using statistics to stretch this to include all people who self publish is ludicrous, especially as you point out, since it’s a bad survey.

Are some people making money self-publishing? Sure. Are some people not making much, if any, money? Sure. But there are a lot of factors involved that apparently did not make it into this survey, so the results are pretty much meaningless, and certainly not worth $4.99.

The issue of not accepting ‘self employed’ as an anwer will have made many professional writers give up the survey. It’s therefore not surprising that few of the respondents make their living writing – those who do were effectively excluded.

You’ll be able to compare the results of your survey with the Taleist results, from Chapter 3. As at January 2012, when asked when they first self-published, the respondents were split a follows:
2011 53%
2010 20%
2009 7%
2008 or earlier 20%

Well, not usefully. “What year did you begin self-publishing?” is a different question to “How long have you been self-publishing?”. What your breakdown there tells me is that 50% of your respondents had been self-published anything from one month to one year. Not being able to separate out a self-publisher who has been published one month when calculating the average earnings of self-publishers does make the data far less useful.

I earned around $20 my first month of self-publishing. I earned about $6,000 in my first year – most of that toward the end of that year.

As one of the authors of the survey report, I thought it would be worthwhile to make a few comments.

Firstly, there are two fundamental errors in your analysis of the survey that I’d like to correct.

1. Bias, tampering, excluding data “willy-nilly”, or standard data checking techniques?
We saw inconsistencies in responses – for example, someone who “published” 16,000 books in 2011 had a very small number of “total books self published in career” at another question – so an adjustment was pretty easy to make for a small number of authors who misread that question. Of course, we could have just used the raw data and said that the average respondent published 112 new titles in 2011, even though the average total number published in their whole career was single digit. Now that would have been bad analysis.

Well done you to have that much in your backlist, but assuming that is what everyone has is exposing your own bias, isn’t it?

It’s also worth noting that the person who answered “16,000” has identified himself publicly — in Twitter and on his blog — to confirm his error. It certainly seemed unlikely, as one commentator suggested, that someone who scrapes Wikipedia articles for publication would bother to fill in our survey.

2. Taking the survey many times
The survey tool, which is widely used for all sorts of research, doesn’t allow people to respond twice from the same computer (ie IP address). So, I guess if you took the survey from your phone, iPad and PC, you might be able to do it three times – but why would you want to? So, it’s not completely watertight, but the survey tool is widely used and I can tell you that there were no duplicate IP addresses in the responses.

“Tampering with data” and “multiple responses” seem to be prominent in your critique, and the following comments, which is unfortunate. I’ll pick up on some other things from the discussion as follows:

Income
We asked authors about their total income in calendar 2011. Given that over half of them self-published for the first time in 2011, of course the figures are “early”, as discussed in the report – it’s new, but some of those who just started are still doing really well.

We also asked them about the costs, date of publication (inc in last month if that was how long since it was published) and royalties of their most recent books, and did some projections on those to estimate break even.

A future survey based on 2012 income and comparing to 2011 income will be interesting, although I suspect that there will be a large number of 2012 first timers.

Statistics
Lies, damn lies, and statistics? The survey is what it is – a summary of the responses of self-selected self-publishers from around the world. Best we’ve got at the moment. The cost of randomly selecting and recruiting 1,000 self-publishers for a survey, given how small the population of that group is out of the total population, would be prohibitive without access to corporate information or corporate resources – like from amazon – but you’d probably say they were biased?
The survey doesn’t pretend to present a statically representative sample of the self-publishing population, and quoting a “level of confidence” would have created the impression that it did. I guess if you don’t find a window into the lives of 1,000 self-publishers interesting, that’s fine.

Meanwhile, everyone wants more data, but some respondents complained the survey was too long or didn’t finish it – it’s a balance. It’s sure not “Well, I’ve done this, and my two friends have done that . . .”

As for publishing “raw data”, the responses and percentages to each question only tell you so much – it’s the combinations of responses that are interesting. For many things that we analysed, the data didn’t show connections we thought might be there. So we didn’t make them up or fabricate them. Or change the data to show things that weren’t there. Or have any bias against Romance writers – where did that come from?

As to the questions, we concede in the report that a few of the 61 questions didn’t work. Others didn’t give any differentiating responses so we wouldn’t use them again. The survey was beta tested with self-publishing authors before it was released, and given how many questions there were, the percentage that made it all the way to the end was higher than I’ve seen in other studies.

The millionaire next door?
The reason we were somewhat circumspect about the portrait of Top Earners was both because of how many (ie few) there were, but also as differences between them and “the rest” weren’t as marked as we’d expected in the data. How many of them are there to randomly sample and give statistically significant results? Not many!

“Not a Gold Rush”
The title was as much about the fact that the responding self publishers aren’t newbies – they’re experienced writers in the main – as it was about the reported sales figures. We didn’t think self-publishing was a gold rush, but it is possible to get the idea that it is from all the headlines about indie success stories. We chose the title only near the end of the process – in response to what we found in the data, and publishing what we hoped was an accessible piece of work for authors – not a statistical dissertation.

I wondered if one of you would show up, Dave. I do hope that you do a better survey in the future, with accepted analysis techniques, including publishing the raw data.

There’s a lot of bias in your survey. It’s strongly biased against those of us who write for a living. You have no idea what kind of inventory we have or how long we’ve been doing it. I have more than 500 short pieces (most of which aren’t up) in fiction alone, not counting all the nonfiction I’ve been writing throughout my career. I have more than 30 backlist novels which I have the rights to, not all of which are up yet. None of this counts the nonfiction I’m writing now, like this blog, nor does it count the novels I’ve written that I did not sell to traditional publishers or the fiction I’m currently writing.

If you look at the people who’ve commented here, from Rebecca Shelley to Steve Mohan, you’ll see people with more than 100 items they can post as books. Then add Bob Vardeman, who has been publishing longer than I have, and Gerald Weinberg who has been a New York Times bestseller longer than I’ve been alive, and you’ll see that there are serious, serious problems with that one bias alone. By the way, you should pay attention to what Gerald says, since statistics, surveys, and data analysis are his area of expertise.

As for tampering with the survey, sorry, but it’s really easy. In my home alone, I have two laptops, two iPads, two iPhones, and four up-to-date Macs on which I could have taken this survey. And that doesn’t count the three computers in the office that I have access to. If I felt so inclined, I could have given you 13 different responses, and your system wouldn’t have caught it. Me, alone. That’s why I mention tampering. There are other methods of avoiding this, but you can ask folks like Gerald who responded here, not me. And people do such things. That’s why surveys and statistical models have ways of preventing that which a Luddite like me can’t figure out.

Also just because you were right about that 16,000 mistake doesn’t mean you should have left it out of the raw data. And include the raw data. Some of us only read the raw data. We’re smart enough to figure out what belongs and what doesn’t.

Finally, if you counted responses from people who didn’t finish, well, that invalidates the survey even more.

There are too many errors here for me and most folks to take the survey seriously. Use this as someone else suggested, as a beta test, and then do it down the road using proper methodology. This is a good idea: the execution just didn’t work.

Dave, a few suggestions on future surveys:
I’m pretty sure I took the survey, but it was awhile ago so I don’t really remember. I do remember being annoyed at a number of the questions on the survey I did take because I had a hard time finding an appropriate answer. (I haven’t received a copy of the results, or I’d give an example.)
I don’t recall if there was a comments section at the end of the survey. If not, I highly recommend including one in future surveys so you can capture the feedback on the questions while it is fresh in the respondant’s mind. This helps a lot in creating future surveys.
Also, please include the raw data–all of the raw data. Put it in an appendix at the end so those of us who are interested and like to make our own conclusions based on the data can do so. I wouldn’t call myself a numbers geek but I am statistics-informed, and I often find bias even in planned experiments. Some sorts of bias can be really hard to catch yourself, because you don’t always realize what base assumptions you’re starting with that are not valid assumptions. I remember an experiment we critiqued for one of my classes. The experiment was a survey on racial bias, and one section had a list of racial slurs which was supposed to be multi-ethinc, but there wasn’t a single one for white people. Boy did that throw a wrench in her results…

Okaaaay. I should probably apologize to my technical writing instructor now for rolling my eyes all through the segment on writing proper survey questions. Having come from a science background I thought all the points on having beta-testers and eliminating bias fell under common sense … but I guess not. Le sigh.

Kris, I took the survey and got the book, which I didn’t read because, as you so rightly point out, it has so many serious flaws.

Perhaps the most serious flaw was not mentioned in your post. This is supposed to be a survey of “indie writers,” but there is no such thing. In Vonnegut’s words, it’s a granfaloon, a meaningless grouping with a name. When I attend meetings of Southwest Writers, I see perhaps 200 people, all of whom claim to be writers, and most, “indie writers.” Here are some of the descriptions of the people there:

– a woman who writes letters to the editor (only)
– a man who has published a travel book–a journal of his RV roaming
– a woman who has published at least 40 romance novels
– a woman who is “working on my novel” (actually, at least 30 of these)
– a man who has published a scholarly book
– a woman who writes recipes for the newspaper (when she feels like it)
– a couple of poets, one of whom announces that his great breakthrough was selling a four-line poem to a kids’ magazine, for $10 and 3 copies

– And on and on. In the entire group, I doubt there are 10 people who are even pretending to make a living by their writing. But they are, in fact, all justified in calling themselves “writers.” I have no problem with that. But any sort of average taken over this group has to be meaningless.

Thanks, Jerry. That’s why surveys of “writers” doesn’t work either, unless the writers are defined in some way. I always find those surveys encouraging, because they do include your Great-Aunt Martha who wrote a poem in 1942 and my tenth cousin Felix the cab driver who is thinking about a screenplay. And even then, the surveys always come out with a real number, usually in 5-figures, as to how much “writers” earn. That means there’s some serious, serious money on the top end to cover all of those wannabes on the bottom who never earned a dime and never will.

LOL. I’m right in the middle of a statistics class right now, and this survey seems to be the best example of every possible thing that could make it meaningless. Sad. If they were going to take the time to do a survey, you’d think they’d take the time to do it right.

I haven’t had an instant to go over this in detail, but skimming through seemed to show one conclusion: you and Dean are right when it comes to marketing. Less is better if you have a high output of new titles.

I checked out the survey authors’ ebook on what to include in a Kindle ebook and found it pretty much worthless. Maybe for the experience challenged, but buy one successful ebook and copy its intro. These are nonfiction writers (travelogue writers, it would seem) so it is possible their biases (esp against romance writers) enters into their basic worldview.

Another brilliant post, Kris! I originally started this post by talking about the study’s sample size (which is just fine, btw), but I see many others have covered this point so I won’t belabor the point with more stats talk.

All your other critiques of the study are spot on, IMHO. I’m particularly concerned with the non-random nature of the study and the obvious bias. What =I= would really like to see is a survey of “serious” writers, by which I mean people who’ve been writing 50-100k words/year for 5-10 years in an effort to hone their craft and for whom trad publishing would be a viable alternative. I bet if you looked at THAT population of indie writers, you’d find a huge majority making much, MUCH more than $500/year.

If the survey is meant to compare the methods of publishing to see how well they serve writers, than somehow you have to control for the skill level that the writer enters with. I don’t think this survey does this at all. My purely anecdotal observation from knowing many fabulous writers, is that indie publishing is a very good tool, indeed!

Thanks for the vote of confidence, Kris. : ) It occurred to me that there is another subtle problem with this survey. Many statistical analyses require that a process be stable to analyze–but epub is so new and is growing so fast that most of the data are NOT stable, not in a statistical sense. When sales levels increase by hundreds of a percent in a MONTH, you can’t accurately assess yearly income by looking at the past twelve months . . .

Honestly this is sad, because it’s something I know everyone would like a realistic answer to. That said, I’m not sure the answer wouldn’t be “it depends” whether you’re publishing electronic or in print. There are just so many variables, including dumb luck, that I’m not sure it can be quantified that accurately to begin with. What’s the average and mean incomes for writers publishing traditionally? IS there a stat for that?

Personally, I’d like to see solid info on both. I’ve always been upfront with anyone who asks as to how much money I made publishing in print, but none of my indie friends are as forthcoming. They all tell me how great indie publishing experience is, wave of the future–yadi, yadi–but nobody wants to back that claim with numbers. That does not diminish the value of indie publishing for me, not anymore than it diminishes the value of print publishing. They are different games, with different pros and cons, but they are also the same, with many of the same principles. I suspect that success at any level in both indie and print publishing depends on a kaleidoscope of constantly changing variables, but that the concept of “produce, produce, produce” is the only constant that will always apply. The more you do, the more you can make. How much you make with each effort, though, in either arena, is never going to be guaranteed.

As for the press–well, I haven’t trusted a thing they’ve reported for decades. This is just another (sad) example of why.

Ginny, exactly on the survey and the way folks aren’t always up-front.

As for the press, they’re not experts. They’re generalists. They have to be–writing about different things every day. They do their best. I’m a Supreme Court geek, so I love the days when a major ruling comes out. Some reporter stands on the steps of the Supreme Court and tries to talk about this 100-page legalese tome, while reporterly minions read fast and try to understand it. They have to do an adequate analysis, knowing that the next day, everyone will have forgotten this story.

I was as guilty as any other reporter. We tried our best, but we only had so much time.

So I don’t begrudge the lack of ability to go deep. That’s why I was complaining about the conclusions. If you want to see what a study/survey/investigation really says, you have to read it yourself. And reading this thing was a revelation–in a way I didn’t want. 🙁

A sample size of 1000 is certainly statistically relevant. That would give you a margin of error of approximately 3% with a 95% confidence level on a total population well into the millions. (Or with a 4% margin of error and a 99% confidence level.) I don’t know how they presented that in their results (I have no intention of supporting them by purchasing the report, it sounded bogus the first time I read the article about it), but it’s not an unreasonable sample size provided you frame it with the margin of error and confidence level. Now, if they are trying to pass it off as a margin of error of 1% and 99% confidence level….yeah, you’d need a substantially larger sample size.

Of course, it’s hard to prove their sample size since it sounds like they didn’t do a good job of preventing people from taking the survey as often as they liked. And of course, it sounds like they readily ignored data when it suited them. There are already ways to handle statistical anomalies with proper analysis, so I’m not sure why they felt the need to monkey with the data.

Oh, Ed. It would have been nice if they listed a margin of error and a statistical level of confidence. Alas, they did not. They didn’t use proper techniques at all. I wouldn’t have complained otherwise. Thanks for the comment, though. As with B. further in the stream, I pointed out in a added mention in the post that you guys’ comments exist for the statistics-minded, so they can understand what’s truly wrong.

Statistics nerd here. There are many many problems with the Taleist survey, but its sample size is not one of them. Many of the landmark studies in medicine have had far fewer than 1,000 participants, yet we still make life and death decisions based on them every single day. Even the famed Framingham study’s original cohort only had around 5,000 enrolled. To gain insight into what sample size is appropriate, a priori calculations called power analyses can be conducted.

There are a number of factors to consider, including what types of statistical tests you wish to conduct, what sensitivity you desire, and how many different groups you wish to compare at once, but 1,000 folks was sufficient for the portions of the Taleist study that I have (so far) reviewed.

The true (and fatal) issue with the survey is that when you make the leap from census to sample, you must ensure that you are obtaining a representative sample of the larger study population. You can do that in one of two general ways. Make the selection process totally random or somehow control for any known biases in the sampling process by using post hoc statistical correction techniques. The former is the gold standard when very little is known about a population or the stakes are high. The latter is ‘acceptable’ when precise results don’t matter or when baseline knowledge about a population is quite great. (One such correction is done to those weekly political spot polls ever featured on cable news. Spot polls rely on landline phone calls, and it is known that many folks under 30 do not have a landline telephones. Because this error is well known and has been quantified, statistical corrections have been developed to adjust for this discrepancy, and the pollsters claim that this known sampling bias has been corrected for.)

But in regards to this survey, we have no way of knowing whether this sample was representative, because the selection process was certainly not random, and we have no way of correcting for any errors, because our knowledge of this population is quite poor.

Thanks, B. I pointed out your comment in an addition to the post. That way folks can understand how this didn’t work. I love your conclusion: “The study is the statistical equivalent of a Ouija board.” Niiiice.

I borrowed it via Prime, and glanced over it, and my main objection was that they did not include the raw survey data — the actual questions and answers and percentages, and maybe some of the correlations. They only provided the major ones they based their conclusions on.

It seemed to me that not including the raw data makes their survey utterly useless, in and of itself.

But they pulled data they didn’t think fit? Wow. I mean, that’s one of the points of a baseline study, to test your expectations so you can ask better questions next time.

The 16,000 books published is actually not an unbelievable number. There have been people producing junk books with automated systems since before Kindle, and they are quite proud of their output.

On a more positive note about the survey….

One thing that struck me (though it also frustrated me because the survey was cloudy in this) was the picture they almost painted of the most successful writers. Underneath their broad brush strokes and conclusions I saw hints of a pattern; The most successful writers seemed to be those who were most serious about writing.

What I’d really like to see come next (perhaps from someone else, though) is a “Millionaire Next Door” type demographic study of different groups of self-publishers — including the ones who do the 16,000 junk books, and others outside the mainstream.

Camille, yeah, the raw data thing. [sigh] But I hadn’t even thought of the 16,000 being one of those thieves. You’re probably right.

The picture they almost painted, as you said, of successful writers was what attracted me to the survey. I expected a lot of good material in the raw data. The numbers–$500 and $10,000–so early are great and I wanted to show why. But I couldn’t, given the way the information was gathered. I love your idea for a survey. Someone…?

I noticed that article on the Guardian (as well as the survey itself) and, as someone about to take the self-publishing leap, I can’t say whether or not the majority of self-pubbers go through Kindle Select only. All I know is that my two friends who self-publish did that, but then put their stuff back on sale elsewhere after the allotted time was up and they withdrew their work from KDP Select. Neither of them sell frequently or well, and they’re more like side self-pubbers who want a majorly traditional publishing career — which is fine! It works for them, but I can’t help but wonder how many other of these $500 earners are the same way.

Elisa, sometimes I’m not as clear as I want to be. I meant that many indie writers are only familiar with Amazon’s program. Writers aren’t as familiar with Smashwords. I personally know many writers (particularly romance) who are using a different distribution service than Smashwords, and not using Smashwords at all. And then there’s the exclusivity of Select to add into the problem of making Smashwords’ numbers the authority. I wish they were, since they’re gathering a lot of good data right now. But I’ll use that data any day over the stuff in the survey. 🙂

Interesting that participants in the survey were supposed to receive a free copy. I did . . . and didn’t. However, now that I’ve read about the flawed methodology (thank you, Kris!), I won’t be following up!

I had my doubts, also, when I was answering all those questions. One of my concerns: the survey takers urged everyone to participate, no matter how new. But I indie released my first short story on Dec 12, 2011. And my first novel on Dec 28, 2011. And I didn’t see enough questions that would clarify that my responses were based upon 19 days of sales for the one story, and 4 days for the other. With data like mine, how could they draw any valid conclusions at all?

Exactly, J.M. After Michael’s post last night, I checked my spam filter to see if I got the free copy. I signed up for it, but never did get it. I did get some follow-ups, however. So my rather dicey e-mail/spam program probably ate it.

You’re right, though. The data wasn’t controlled enough to make the survey work–either.

Good analysis. I also participated, and after reading my free copy I also concluded that the report is far too biased to be useful. The kicker for me was in the chapter “reviews matter,” where they pointed to the correlation between reviews and income as evidence that reviews drive sales. They completely missed the more obvious explanation–that the best way to get more reviews is to sell more books!

Joe, exactly. Even if the data had been good, the analysis wasn’t. That’s pretty typical of surveys, which is why I look at the raw numbers. Which, as someone pointed out upstream, aren’t there. I was quite disappointed in that.

Say it with me: correlation does not equal causation!
You can’t get out of a basic statistics class without learning that. I had it drilled into my head during the course of my psych degree. And yet I constantly see people ignoring that very basic rule of research. It’s very frustrating.

I took the survey but I’m sure they threw out my answers as well, because at the time I’d only been self-published for 4 months. And it wasn’t going well, but if we look at months 4-7, we find I’m definitely above the $500- median, but still below the $10k mark.

But those numbers are growing steadily with 3 novels (all rejected by agents — I call it my “backlist”)and 4 short stories out, I don’t think one can get a true grasp of the results of a survey like this for the reasons you point out. Some people can pound out several hundred books a year (as you and Dean ) while others (like me) may only get one novel and one short story. So the “average” doesn’t work.

But still, in all, I commend the Taleist for sinking his teeth into something like this. I think people needed to see it, for the wanna-be’s to understand self-publishing is a real and true “occupation” for those who can’t find an inroad into New York. There is money to be made.

Good point, Anne, about showing some of the wannabe’s that this is a profession. But y’know, those folks aren’t going to look at it. Although the headlines might scare some get-rich-quick people away. 🙂 Congrats on your numbers btw. Extrapolate over a few years and more books, and you’ll be surprised at how good the numbers will be.

Well said, and I agree. I also love the irony that they self-published the results.

I would honestly rather see a survey that limited respondents to those who had had books out for a minimum amount of time, say 6 months or a year. It would also be more telling to only survey those who had been out for a length of time and had made a minimum of $500.

Why $500? Because it weeds out those souls, who bless their hearts, cannot spell or use sentences or be bothered with something resembling a cover. Let’s face it. While there are many wonderful and decent self-published authors, there are delusional people lacking skill. People who will never sale, no matter what, but who could easily skew numbers in such a survey.

David, I think there are a lot of ways to do this survey. I like your idea. Books out a minimum of a year. What is the author’s career plan? How many books? and on and on. I think there could be a lot of very valuable information in that. But this one didn’t do it. I would really love to see one that correlates the number of books published, years of writing practice (“how long have you been writing seriously?”), time spent writing, and earnings/sales/etc. There’s a lot of information now, a lot we could learn, and no one has done a good survey yet. Someone…?

“Why $500? Because it weeds out those souls, who bless their hearts, cannot spell or use sentences or be bothered with something resembling a cover.”

It would also weed out those who simply don’t sell well, despite a good plot, good writing, competent spelling and a decent cover. Who are those folks? Lots of us, who for one reason or another aren’t marketing whizzes and don’t do much more in the way of advertising than just putting the books out there. Please don’t assume that just because a book is unknown, obscure, or does not sell well that it is a bad book. Going by your criterion — all good books sell, all bad books do not — the bestseller lists would look a lot different today.

Sarah, I think it’s also about time. These numbers will look very different a year or so from now, as books that aren’t marketed get discovered, and then their writers sell more and more. There are a lot of reasons books aren’t selling. Mostly, I believe, not enough time has passed in the marketplace.

Is it $500 overall self-pubbing, or per title? Because I’ve been self-pubbing for seven and a half months, and I’ve made that much overall. Not from any one title though. But I started out with a dozen titles and have added another six since then. (Well, the sixth goes up when I get home, because I was brainless this morning and forgot to hit the buttons on my June release. D’oh!)

I believe it was overall, Mercy. I am not checking my copy of the survey morning to look. That makes a difference too. Although honestly, if they wanted what I earned by title, I have so many, I wouldn’t have taken the survey.

Less than $500 over what period of time? A year?
I find that hard to believe.
I’ve already done much better than that (most months) after just a year of self-publishing. With another three books coming out over the next three months, I look forward to what the future brings.

Lisa, since they started publishing, I think. I would have to refer to the survey again, and I’m not doing that. Here’s what I wanted: I wanted that survey to have great methodology because the numbers are wonderful, particularly extrapolated over years that a book will be in print. Even the $500 number. But I just couldn’t rely on it, not even to attempt a point. [sigh]

I like the new blog look, though I had to do a double-take when I first called up your site. I thought my bookmark had been corrupted or something, it’s so different. 🙂

Good analysis, as usual, but if I may be so bold, I have a few nits to pick.

1) You say the survey is too small at only 1,007, that it can’t give a true accounting of the indie community at that level. You then go on to say that most indies go with KDP-S or other exclusives. Really? Most do? Is there data on that? I know a bunch of people pulled down titles from Smashwords et al when KDP-S came around (silly, that), but most? I have no idea, really, but that seems a big assumption. But let’s say you’re right, and far more indies are out there than Mark Coker can account for, since you bring him up. Say it’s 100,000, or 150,000 writers total. 1,007 is actually not a bad sample size for a statistical analysis of that large a population. If you look at political polling, as an example, they typically only take 1,000 or 1,500ish opinions for populations of several to hundreds of millions, depending on what exactly they’re polling. Using proper statistical techniques, you really don’t need that large a sample size to get a high probability of accuracy. Of course, in a self-selected survey, proper techniques become moot, because the sample is skewed from the beginning. So in that sense, I suppose we are in agreement.

2) You mentioned the average is 10,000 and the mean is 500. Pretty sure you meant median there, not mean. Just a little backup. 🙂

3) If you participated in the survey, you were supposed to get a copy for free. I know I did. Might want to talk to the authors about that.

Nitpicks aside, I concur with your general assessment. While the survey is interesting, it is far from a rigorous statistical analysis of the indie phenomenon and should not be taken as the end-all, be-all of indie-dom, or whatever. That said, I thought there were a few good nuggets to be had there. For myself, I found that $500 median comforting, actually. It means I’m not doing all that badly after all. 🙂

Right. I think I’m actually going to go write something. It’s been several weeks since I did that, but we’re pretty much all settled in our new abode in San Diego, so I guess it’s time to get back on the horse.

It’s always a pleasure to read you, Kris. Keep up the great work, and I hope to see you and Dean at another workshop again soon. 🙂

But Michael, there’s no way to know if that 1007 number includes people who participated once or twenty times. It’s anonymous. And since they threw out results willy-nilly, it’s not at all a good survey. I had hoped for good news from it as well, but there really wasn’t any because you can’t trust the data.

The key to that statement about the sample size the it “properly selected” portion. As Kris noted, the sample showed a serious problem with self-selection bias. While a good sample selection among self-publishers might be as small as 1007, it couldn’t be so for a self selected poll as was done.

Get Monthly Updates on New Releases & Special Offers

Kristine Kathryn Rusch Newsletter

A New Uncollected Story

I love taking part in the Uncollected Anthology of Urban Fantasy. Every quarter, we put out individual stories on the same theme (hence "anthology" and "uncollected"). This time, Dayle A. Dermatis, Michele Lang, Leah Cutter, Rebecca Senese, and Valerie Brook have joined me in the Fabulous Familiars "anthology." And Winston & Ruby fans, this one's for you!

Best Mystery Stories of The Year!

"The heroine of Kristine Kathryn Rusch’s 'Christmas Eve at the Exit' struggles to make the holiday meaningful for her 10-year-old daughter while the pair are on the run....There isn’t enough Xanax in anyone’s medicine cabinet to calm the jitters these 20 skillful stories will unleash on a worried world." —Kirkus

New Urban Fantasy Story

"Sales. Force." was an absolute blast to write. This anthology, Shadowed Souls, also has some amazing writers, including Jim Butcher (who also edited), Kevin J. Anderson, and Tanya Huff. Lots of good reading here as well.Click on this sentence to order.