Wednesday, November 25, 2009

I've been intending to write this post for months, but various things got in the way. Well, it's finally ready! Some of my ideas for Greasemonkey 1.0 would involve major changes to the way that Greasemonkey runs user scripts. The goals would be to make user script authoring easier, by removing some of the quirks, limitations, and problems that Greasemonkey's current security architecture imposes.

To begin, an aside: why does Greasemonkey have a security architecture that imposes limitations and problems on script authors? It's basically history now, but in short: Greasemonkey provides the powerful-but-dangerous capability for user scripts to break the same-origin policy for AJAX requests. Lots of useful scripts have been created that hinge on this capability. Unfortunately, it is indeed powerful, and Greasemonkey by nature mashes itself and the user scripts up with any old web page that you might visit. If Greasemonkey and/or a script it is running presents a vulnerability that the content page can leverage, all sorts of nasty things could result, from stealing your bank account, creating false ecommerce purchases, stealing the content of your private files or site data, and so on.

The point of this post, then, is to examine the landscape for user scripts today. Discover what scripts are out there, what they are like, and how they operate. What kind of changes to Greasemonkey would make these scripts stop working? What kinds of changes could we make with minimal impact? Toward that end, I've got three graphs to show you (with the raw data below).

To perform this analysis, I downloaded over thirty six thousand scripts from userscripts.org. This is by no means the entire population of user scripts out there, but I believe it is a good representative sample. I wrote a python script to read their source and (a bit crudely, but well enough) parse their contents and metadata. The first thing I was interested in seeing is how common the usage of the various GM_ apis are.

The first thing that we can quickly see is that well over half the scripts, 58.87%, use no special API calls at all. No matter what happens to the GM_ APIs, they'll keep working just fine. The most common API call is the get/set value call, at 16.50%. The cross-domain AJAX call is a close second at 15.51%, with GM_addStyle next at 12.95%. From here things trail off rapidly, but we see how common unsafeWindow and eval are, both with the potential to be very dangerous.

Browsers are progressing rapidly, however. Instead of get/set value, one could use DOM Storage, and HTTP Access Control standards, for making cross-domain requests, are being standardized. What's important to know is if the extra power provided by these APIs is actually being used, or whether these sorts of stand-ins could be a viable replacement. To investigate that, I examined how many different domains scripts are @included into when making these calls, and which URLs the AJAX calls are being made to.

The vast majority of get/set value calls (76.33%) are made by scripts that are only ever @include'd into a single domain. For these scripts, DOM Storage would work perfectly. Some execute on two, and almost none on more than two. Some also execute on every page, and this starts to be a problem. The AJAX patterns are very different.

Note importantly that my script was a bit naive with AJAX domain gathering. It used simple string manipulation to find URLs inside GM_xmlhttpRequest calls. If the URL was set in a variable, elsewhere, then the script did not find it. So of 5600 scripts that call GM_xmlhttpRequest, only 2693 were "understood" by my script -- and this may be a bad sample. Scripts that exclusively set their URLs in variables/constants may be more likely to make cross-domain requests, or even perhaps less likely.

That said: an obvious pattern emerges: plenty of scripts do "@include *" then AJAX off, likely to some other, fixed, site (20.16%). (Note: lots of these appear to be update checkers, which should hopefully be unnecessary before 1.0.) Plenty also seem to operate fully within one site (20.87%). By far the most, however, operate on one site and call another (46.79% or 1260 distinct scripts). Larger combinations of sites are minimal. Part of this group is oversimplification in my script, an @include of "*flickr.com" and an AJAX call to "flickr.com" are counted separately. Most though are the especially useful scripts that, for example, include IMDB data on Netflix, or vice versa. So, this is far too large a use case to break. Whatever we do, it seems cross-domain AJAX is going to have to remain.

Finally, I also took a look at the usage of metadata imperatives: both the "official" ones that actually affect how Greasemonkey works, and the others that are used in other tools, or added for the author's own purposes. That looks like:

Generally what I expected. Most everyone has an @name and an @include, nearly as many include an @description and @namespace. Things fall off rapidly from there, but the unofficial @version is next, and an unusual (to me) @author. From there we fall twoard the single-digit range, finding that @require and @resource are still very rarely used.

Conclusions: Over half of user scripts use no privileged APIs. All of Greasemonkey's security mechanisms are a pure hindrance to all these scripts. If it went away, they would benefit greatly. It may be possible to remove get/set value in favor of DOM Storage, but the potential damage of these APIs is so small that the cost likely outweighs the benefit. Although a minority (15.51%) of scripts use GM_xhr, it's still too many to consider removal.

Edit: Fixed GM_getResourceURL count, I first searched for "Url" and not "URL", explaining the zero found, before.

To those that are interested: the script that I used to generate these numbers is available for inspection, in case it perhaps contains a serious bug. The data that I generated with it, and the charts above, are also available to check.

This is the same file posted as RC2 about a week ago to the -users mailing list. As the version number (and the release notes) indicate, this is a maintenance release, fixing bugs and adding minor features to the previous release.

Friday, September 18, 2009

I'd like to follow up my earlier post, analyzing which browsers Greasemonkey is used in. This time, a view on the operating systems where Greasemonkey is used. This is generally less interesting information -- it closely mirrors the market share of the OSes. But it's one more bit of detail we can derive from the AMO stats.

This graph probably isn't very surprising. It shows that Mac has become more popular in the last year. Some detail on the last four weeks:

Another view on the same detail. The exact underlying numbers involved:

OS

Users

Percent

Windows

2531829

91.50%

Mac

164282

5.94%

Linux

69186

2.50%

Other

1690

0.06%

Like I said at the beginning, generally the breakdown of the operating systems in general. Even so, the eight or nine percent of users on Mac or Linux make up nearly a quarter of a million users.

Like before, the numbers and charts are visible on Google Docs. No script this time, the work was easy enough to do by hand.

Saturday, September 05, 2009

As Johan and I begin to take over development of Greasemonkey, one of the important questions we need to answer is: which platforms should we support? We can inform this decision with some of the usage statistics that Mozilla Add-Ons gathers.

The statistics page for Greasemonkey is visible to everyone. The raw data is even available for download. But it can be hard to read, due to the level of detail and formatting that is applied. So, I've taken the time to analyze it carefully. The first interesting thing that we can see is the usage trends over time:

(Looks like Mozilla had a reporting issue around May of 2009.)

I've also made a pie-graph of app usage, for the average values of the past 4 weeks:

That pie chart represents these numbers:

App

Users

Percent

Firefox/<=1.0

598

0.02%

Firefox/1.5

5502

0.21%

Firefox/2.0

113921

4.31%

Firefox/3.0

1470584

55.70%

Firefox/>=3.5

1049092

39.74%

Other

446

0.02%

So, let's say first off: we know this is a bad measurement. There's (almost) no "other" because there's no official support for other platforms, so only third party alterations make this usage possible. Thus, this data doesn't help us answer (i.e.) "Should we support Flock?" or "Should we support SongBird?".

It does let us know a little bit about what versions of Firefox we should support. All of 1.0 and 1.5 make up only 0.23% of the user base. Firefox 3.0 and 3.5 make up 95.44% of the user base. Firefox 2, however, makes up 4.31% of the user base. That's a much harder call.

Wednesday, August 26, 2009

Months ago, the people running DevjaVu let us know that they were shutting the service down. It's unfortunate, but they cannot be faulted. They're still running now, but there's no saying how long that will remain for. As of today, the ability to change tickets (both creating and commenting, for all but pre-existing project members) has been revoked.

Greasemonkey now lives at GitHub, both for source code hosting, and issue tracking. We expect that the distributed nature of git should allow freer forking and branching, and easier collaboration with anyone; rather than the limited set of people who were granted SVN commit access, in earlier days.

If you notice any existing links outside of DevjaVu pointing into it, please let us know at the greasemonkey-dev mailing list. If you see someone suggesting the DevjaVu site, please correct them and let everyone involved know that GitHub is now the official code and issue host for Greasemonkey.

Sunday, August 23, 2009

It's been a long time since I've been working on Greasemonkey actively. During the time I was away Johan Sundström and Anthony Lieuallen picked up the slack and did the last few releases without my help.

So I've decided to officially hand over the reins to them. What this means practically is that they will be the ones accepting patches, doing releases, and tending the bugs. I also hope that this change will reinvigorate the project, as it has been moving slowly for some time.

I'll still be lurking of course, but Johan and Anthony will be responsible for day-to-day administration now. I know they'll do a great job.