Earlier this week, I released version 1.0 of my Oenology Theme. As I tend to do occasionally, after the release I decided to browse the Google search results for "Oenology WordPress", just to keep track of any mentions of the Theme.

I was somewhat surprised to discover, among the first-page search results, a site that was re-distributing a modified version of the Theme. Now, there is nothing inherently wrong with re-distribution of the Theme in either its original form or modified; I released it under the GPL, which explicitly permits modification and redistribution. However, these Theme modifications turned out to be insidiously malicious. As Otto explains, Themes distributed by the site in question had been hacked to (among other things) include a well-hidden PHP shell that would allow the hacker a backdoor to access the site on which the Theme is installed.

It is generally-accepted knowledge within the WordPress developer community that such malicious sites have over-run the search engines, and dominate the search results for WordPress Theme-related search queries. Thus, following the above discovery, I decided to audit those search results, to get an idea of just how bad the current situation is. I performed a Google search for "Free WordPress Themes". The results are sobering.

For each of the first 30 results (first three pages), I evaluated the domain name against the WordPress Domain Name Trademark Policy, determined whether the Themes were original, or taken from elsewhere, and whether the Themes had SEO/Spam links, encoding, and/or other forms of malware.

A note on the original source of the Themes: while some sites simply modified existing Themes, and some developed "original" Themes (derivatives of Kubrick, or Artiseer-generated, etc.), a great many of the audited sites distributed many of the same Themes. There apparently exists a rather incestuous relationship among these Theme-distribution sites - and especially among the sites distributing the worst-offending Themes.

76% (19/25) had Themes with base64 or other encoding (note: 1/26 not verified)

12% (3/26) had Themes with other malware (note: 17/26 sites not verified)

Of the sites that added base64 or other encoding, many of them (at least 5/19) included encoded "kill codes" that prevented the Theme from loading if the footer code was modified. All of them included SEO/Spam footer links.

Of the sites that added other malware, one added the well-known "Verify Widgets" worm to the functions.php file, and another added a sort of backdoor that allows for remote management of spam links, and phones home data back to the hacker.

WordPress Trademark-Violating Sites

I noticed what appeared to be a correlation between WordPress trademark-violating domains and the presence of SEO/Spam links, encoding, and malware. That appearance seems to be valid, though I make no claim on the statistical significance of the correlation. One of the 9 WordPress trademark-violating sites was the aforementioned affiliate-link aggregator; of the remaining 8 WordPress trademark-violating sites:

100% (8/8)had Themes with Spam/SEO links

100% (8/8) had Themes with base64 or other encoding

13% (1/8) had Themes with other malware (note: 4/8 sites not verified)

Just to determine if my sample size was too small, I did a quick audit of the next two pages' worth of results (thus including the top 50 results). I found an additional 4 sites (for a total of 13 out of 48 sites (27%) that violated the WordPress trademark).

Including those 4 sites with the previous results:

100% (12/12) had Themes with Spam/SEO links

100% (12/12) had Themes with base64 or other encoding

17% (2/12) had Themes with other malware (note: 4/9 sites not verified)

Are you beginning to notice a trend?

Restrictive Licensing

But mere inclusion of malware isn't the extent of the problem. Themes are also encumbered either with restrictive licensing, or else mal-applied licensing, in order to prevent the user from removing the crap that the sites added to the Themes.

Of the 27 above-referenced, all 27 incorporated some form of license encumbering. Most sites simply distributed the Themes with some sort of "Terms of Use" statement that (among other things) prohibited modifying the footer. The next most-popular licensing strategy was to distribute the Themes under some variation of a Creative Commons Attribution license, with the claim that the "footer links" were the license-conforming attribution. Quite a few Themes even attempted to claim that they were licensed under GPL, and that the GPL allowed the restriction against removal of footer credit links.

Of the 2 legitimate Theme sites, only one - ThemeLab, run by well-known WordPress community member Leland Fiegel - distributed unencumbered, GPL-licensed Themes. The other distributed Themes encumbered with a mal-applied CC-Attribution license.

Mal-Applied Creative Commons Attribution Clause

Of course, the requirement to keep a public-facing link to an SEO/Spam link has nothing to do with the Attribution clause of the CC-Attribution license [emphasis added]:

If You Distribute, or Publicly Perform the Work or any Adaptations or Collections, You must, unless a request has been made pursuant to Section 4(a), keep intact all copyright notices for the Work and provide, reasonable to the medium or means You are utilizing: (i) the name of the Original Author (or pseudonym, if applicable) if supplied, and/or if the Original Author and/or Licensor designate another party or parties (e.g., a sponsor institute, publishing entity, journal) for attribution ("Attribution Parties") in Licensor's copyright notice, terms of service or by other reasonable means, the name of such party or parties; (ii) the title of the Work if supplied; (iii) to the extent reasonably practicable, the URI, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work; and (iv) , consistent with Section 3(b), in the case of an Adaptation, a credit identifying the use of the Work in the Adaptation (e.g., "French translation of the Work by Original Author," or "Screenplay based on original Work by Original Author"). The credit required by this Section 4 (b) may be implemented in any reasonable manner; provided, however, that in the case of a Adaptation or Collection, at a minimum such credit will appear, if a credit for all contributing authors of the Adaptation or Collection appears, then as part of these credits and in a manner at least as prominent as the credits for the other contributing authors. For the avoidance of doubt, You may only use the credit required by this Section for the purpose of attribution in the manner set out above and, by exercising Your rights under this License, You may not implicitly or explicitly assert or imply any connection with, sponsorship or endorsement by the Original Author, Licensor and/or Attribution Parties, as appropriate, of You or Your use of the Work, without the separate, express prior written permission of the Original Author, Licensor and/or Attribution Parties.

The license requires that attribution of the original author, the original author's copyright, and a link to the same, must be retained within the work. It does not require that such attribution be public-facing, nor does it require that the author may specify any link other than one directly related to attribution to the author and his copyright of the original work. In fact (as noted above), the attribution clause specifically excludes URIs unrelated to the copyright notice or licensing information of the original work.

Mal-Applied GPL Attribution

Similarly, the requirement to keep public-facing attribution links is not only part of the GPL (in any version), but such a use restriction directly contradicts the wording of the license itself. Some confusion may arise regarding the requirement to retain appropriate copyright and licensing information in conveyed works (whether original or derivative), but under no circumstance does the GPL require that such information be portrayed in a public-facing manner. (The accepted standard for PHP code is to add this copyright and licensing information as PHP comments in the header of one or more PHP files.) And in fact, the GPL even states that if further restrictions are imposed upon a GPL-licensed work, that such restrictions may be ignored.

But Wait, There's More!

Theme Quality

As if SEO/Spam links, encoding, malware, and encumbered licensing weren't enough, I cannot fail to discuss the quality of the Themes distributed by the audited sites.

Again, with the lone exception being the one legitimate Theme site that distributes malware-free, GPL-licensed Themes, the remainder of the sites distributed Themes of - there's no better way to say it - utterly horrible quality. The Themes are, generally speaking, woefully obsolete. Trudging through this morass renewed my appreciation for the current effort to improve the quality standard of Themes submitted to the official WordPress Theme Repository.

Conclusion

If you are looking for free WordPress Themes, avoid the search engines; stick to known, trusted sources. Otherwise, you will end up with a poor-quality, license-encumbered Theme full of SEO/spam links, encoded content, and other malicious malware. If your primary objective is to find a free Theme, I strongly suggest that your search start - if not end - with the Official WordPress Theme Repository. (Although, I fully endorse ThemeLab as well.)

As a corollary: sites whose domain names violate the WordPress trademark policy cannot be trusted. It is safe to assume that a site that violates the trademark policy is not directly involved with the WordPress community. Such a site is either intentionally flaunting the trademark policy - likely in order to gain SEO from the domain name, and likely to be doing so for nefarious purposes - or else is not closely enough involved with the WordPress project or community to have the necessary experience and expertise.

We already know to be careful, so I really don’t see the point of this article if you’re not going to tell us who these culprits are.

As far as the official WordPress theme repository goes, I usually try to avoid it. It’s a mess! Outdated and incompatible themes abound, and the themes aren’t organized by version compatibility. That place is an exercise in frustration that I’d rather just avoid.

Given the percentages, would it really be helpful for me to publish the exact list of sites audited? Unless and until Google actually decides to de-list some of these sites, the first four pages of search results (I posted the exact search query) won’t have changed much.

As for the WordPress Theme Repository: the issue of incompatible Themes is becoming a thing of the past – and if all goes well, so will the issue of outdated Themes. The idea to sort by version compatibility is a good one; I’ll see if there is something that can be done to that end.

Thanks for this post. While one of the above posters already knows to be safe, and I do as well, it can be overwhelming for a newbie to try and find some sample themes to play with. This confirms my suspicion that there are plenty of bad actors, even in the theme business. Amazing. Thanks again.