Archive for January, 2016

Here’s the real tl;dr: I could only find any discussion at all in Beall’s blog for 230 of the 1,834 journals and publishers in his 2016 lists—and those cases don’t include even 2% of the journals in DOAJ.

Now for the shorter version…

As long-time readers will know, I don’t much like blacklists. I admit to that prejudice belief: I don’t think blacklists are good ways to solve problems.

And yet, when I first took a hard look at Jeffrey Beall’s lists in 2014, I was mostly assessing whether the lists represented as massive a problem as Beall seemed to assert. As you may know, I concluded that they did not.

But there’s a deeper problem—one that I believe applies whether you dislike blacklists or mourn the passing of the Index Librorum Prohibitorum. To wit, Beall’s lists don’t meet what I would regard as minimal standards for a blacklist even if you agree with all of his judgments.

Why not? Because, in seven cases out of eight (on the 2016 lists), Beall provides no case whatsoever in his blog: the journal or publisher is in the lists Just Because. (Or, in some but not most cases, Beall provided a case on his earlier blog but failed to copy those posts.)

Seven cases out of eight: 87.5%. 1,604 journals and publishers of the 1,834 (excluding duplicates) on the 2016 versions have no more than an unstated “Trust me” as the reason for avoiding them.

I believe that’s inexcusable, and makes the strongest possible case that nobody should treat Beall’s lists as being significant. (It also, of course, means that research based on the assumption that the lists are meaningful is fatally flawed.)

The Short Version

Since key numbers will appear first as a blog post on Walt at Random and much later in Cites & Insights, I’ll lead with the short version.

I converted the two lists into an Excel spreadsheet (trivially easy to do), adding columns for “Type” (Pub or Jrn), Case (no, weak, maybe or strong), Beall (URL for Beall’s commentary on this journal or publisher—the most recent or strongest when there’s more than one), and—after completing the hard work—six additional columns. We’ll get to those.

Then I went through Beall’s blog, month by month, post by post. Whenever a post mentioned one or more publishers or independent journals, I pasted the post’s URL into the “Beall” column for the appropriate row, read the post carefully, and filled in the “Case” column based on the most generous reading I could make of Beall’s discussion. (More on this later in the full article, maybe.)

I did that for all four years, 2012 through 2015, and even January 2016.

The results? In 1,604 cases, I was unable to find any discussion whatsoever. (No, I didn’t read all of the comments on the posts. Surely if you’re going to condemn a publisher or journal, you would at least mention your reasons in the body of a post, right?)

If you discard those on the basis that it’s grotesquely unfair to blacklist a journal or publisher without giving any reason why, you’re left with a list of 53 journals and 177 publishers. Giving Beall the benefit of the doubt, I judged that he made no case at all in five cases (the fact that you think a publisher has a “funny name” is no case at all, for example). I think he made a very weak case (e.g., one questionable article in one journal from a multijournal publisher) in 69 cases. I came down on the side of “maybe” 43 times and “strong” 113 times, although it’s important to note that “strong” means that at some point for some journal there were significant issues raised, not that a publisher is forever doomed to be garbage.

Call it 156 reasonable cases—now we’re down to less than 10% of the lists.

Then I looked at the spreadsheets I’m working on for the 2015 project (note here that SPARC has nothing at all to do with this little essay!)—”spreadsheets” because I did this when I was about 35% of the way through the first-pass data gathering. I could certainly identify which publishers had journals in DOAJ, but could only provide article counts for those in the first 35% or so. (In the end, I just looked up the 53 journals directly in DOAJ.)

Here’s what I found.

Ignoring the strength of case, Beall’s lists include 209 DOAJ journals—or 1.9% of the total. But of those 209, 85 are from Bentham Open (which, in my opinion, has cleaned up its act considerably) and 49 are from Frontiers Media (which Beall never actually made a case to include in his list, but somehow it’s there). If you eliminate those, you’re down to 75 journals, or 0.7%: Less than one out of every hundred DOAJ journals.

For that matter, if you limit the results to strong and maybe cases, the number drops to 37 journals: 0.33%, roughly one in every three hundred DOAJ journals.

For journals I’ve already analyzed (and since I’m working by publisher name, that includes most of these—at this writing, January 29, I just finished Hindawi), total articles were just over 16,000 (with more to come on a second pass) in 2015, just under 14,000 in 2014, just over 10,000 in 2013, around 8,500 in 2012, and around 4,500 in 2011.

But most of those articles are from Frontiers Media. Eliminating them and Bentham brings article counts down to the 1,700-2,500 range. That’s considerably less than one half of one percent of total serious OA articles.

The most realistic counts—those where Beall’s made more than a weak case—show around 150 articles for 2015, around 200-250 for 2013 and 2014, around 1,000 for 2012 and around 780 for 2011 (Those numbers will go up, but probably not by much. There was one active journal that’s mostly fallen by the wayside since 2012.)

The conclusion to this too-long short version: Beall’s lists are mostly the worst possible kind of blacklist: one where there’s no stated reason for things to be included. If you’re comfortable using “trust me” as the basis for a tool, that’s your business. My comment might echo those of Joseph Welch, but that would be mean.

Oh, by the way: you can download the trimmed version of Beall’s lists (with partial article counts for journals in DOAJ, admittedly lacking some of them). It’s available in .csv form for minimum size and maximum flexibility. Don’t use it as a blacklist, though: it’s still far too inclusive, as far as I’m considered.

Modified 1/30: Apparently the original filename yields a 404 error; I’ve renamed the file, and it should now be available. (Thanks, Marika!)

I’m delighted to announce that SPARC (the Scholarly Publishing and Academic Resources Coalition) is supporting the update of Gold Open Access Journals 2011-2015 to provide an empirical basis for evaluating Open Access sustainability models. I am carrying out this project with SPARC’s sponsorship, building from and expanding on The Gold OA Landscape 2011-2014.

The immediate effect of this project is that the dataset for the earlier project is publicly available for use on zenodo.org and on my personal website. The data is public domain, but attribution and feedback are both appreciated.

Here’s what the rest of the project means:

I am basing the study on the Directory of Open Access Journals as of December 31, 2015. With eleven duplicates (same URL, different journal names, typically in two languages) removed and reported back to DOAJ, that means a starting point of 10,948 journals. All journals will be accounted for, and as many as feasible will be fully analyzed.

The grades and subgrades have been simplified and clarified, and two categories of journal excluded from the 2014 study will now be included (but tagged so that they can be counted separately if desired): journals consisting primarily of conference reports peer-reviewed at the conference level, and journals that require free registration to read articles.

I’m visiting all journal sites (and using DOAJ as an additional source) to determine current article processing charges (if any), add 2015 article counts to data carried over from the 2014 project, clean up article counts as feasible, and add 2011-2014 article counts for journals not in the earlier report.

Since some journals (typically smaller ones) take some time to post articles, and since some journals will not be analyzed for various reasons (malware, inability to access, difficulty in translating site or counting articles), I’ll be doing a second pass for all those requiring such a pass, starting in April 2016 or after the first pass is complete. My intent is to include as many journals as possible (although existence of malware is an automatic stopping point), although that doesn’t extend to (for example) going through each issue of a weekly journal only available in PDF form.

The results will be written up in a form somewhat similar to The Gold OA Landscape 2011-2014, refined based on feedback and discussion.

Once the analysis and preparation are complete, the dataset (in anonymized form) will be made freely available at appropriate sites and publicized as available.

The PDF version of the final report will be freely available and carry an appropriate Creative Commons license.

A paperback version of the final report will be available; details will be announced closer to publication.

A shorter version of the final report will appear in Cites & Insights, and it’s likely that notes along the way will also appear there.

The dataset–an Excel .xlsx spreadsheet with two workbooks–includes 9,824 rows of data, one for each journal graded A through C (and, thus, fully analyzed) in the project. Each row has a dozen columns. The columns are described on the second “data_key” workbook.

I would love to be able to say that this dataset was now on figshare–but after wasting spending far too much time attempting to complete the required fields and publish the dataset, it appears that the figshare mechanisms are at least partly broken. When (if) I receive assurances that the scripts (which fail in current versions of Chrome, Firefox and Internet Explorer) have been fixed, I’ll add the dataset there–although I’d be happy to hear about other no-fee dataset sharing sites that actually work. (It’s possible that figshare just doesn’t much care for free personal accounts any more: I also note that the counts of dataset usage that were previously available have disappeared.)

Update January 22, 2016: This dataset is now available on zenodo.org. (Hat-tip to Thomas Munro.)

Note: This isn’t quite the “Watch This Space” announcement foreshadowed in Cites & Insights 16:2, and it doesn’t mean that sales of the book have suddenly mushroomed. That announcement–which is related to this one–should come in a few days.

By the way, while the dataset consists of facts and is therefore in the public domain, I’d appreciate being told about uses of the spreadsheet and certainly appreciate proper attribution. Send me a note at waltcrawford@gmail.com

I’d also love your suggestions as to ways the presentation in the book could be improved if or when there’s a newer version…leave a comment or, again, send email to waltcrawford@gmail.com

I’ve tried to stay away from Beall and his Lists, but sometimes it’s not easy.

The final section of the Intersections essay in the January 2016 Cites & Insightsrecounts a quick “investigation” into the rationales Beall provided for placing 223 publishers on his 2014 list. Go to page 8: it’s the section titled “Lagniappe: The Rationales, Once Over Easy.” I found that I could find any rationale for condemning the publishers in only 35% of cases.

Perhaps too charitably, I assumed that it was because Beall’s blog changed platforms and he didn’t take the time to restore older posts to the new blog.

Then I noted his 2016 lists–which add 230 (or more) publishers and 375 (or more) independent journals to the 2015 lists. I say “or more” because at least one major publisher has been removed via the Star Chamber Appeal Process, even though Beall continues to attack the publisher as unworthy.

In any case: 605 new listings. My recollection is that there haven’t even been close to 605 posts on Beall’s blog in the past year… but I thought I’d check it out.

The results: As far as I can tell, posts during 2015 include around 60 new publishers and journals. (I may have missed a couple of “copycat” journals, so let’s call it 65).

Sixty or 65. Out of 605.

In other words: for roughly 90% of publishers (most of them really “publishers,” I suspect) and journals added to the list, there is no published rationale whatsoever for Beall’s condemnation.

None.

So if you’re wondering why I regard Beall as irrelevant to the reality of open access publishing (which isn’t all sweetness & light, any more than the reality of subscription publishing), there’s one answer.

*All Cites & Insights PDF ebooks are explicitly site-licensed for
mounting on a library's server and providing to authenticated users. That
includes The Gold OA Landscape 2011-2014, A Library Is..., Beyond the
Damage and any others.