Building a botnet on PyPi

An update — September 2017

A week or so ago, some students applied this concept to the idea of typosqatting (registering malicious packages with names similar to popular libraries). By getting a university to issue a security notice, they generated some interest, and finally resulted in some changes to pypi/warehouse to address these issues.

I decided to take another look at the download figures for my packages, and see what damage my malicious alter-ego could have wreaked.

Across the 12 system module packages I’m hosting, I’m getting on average 1.5 thousand downloads per day, via pip. This adds up to 491,292 downloads so far this year. I’m hoping to hit 500k downloads before my packages are deleted!

By package, the download ratios pretty much match the numbers from May:

There’s a plan to delete my fake packages now that restrictions have been added to prevent this sort of attack, but it was fun while it lasted!

Intro

At a London python dojo in October last year, we discovered that PyPi allows packages to be registered with builtin module names.

So what? you might ask. Who would pip install a system package? Well the story goes something like this:

How effective is this attack vector?

Some of the downloads will be people using custom scrapers, others may be automated build jobs, running over and over, but I used some tactics to gauge the quality of this data:

pypi download logs include a column installer.name this seems equivalent to an HTTP user agent string, by only selecting rows where the installer.name is pip, we’re more likely to be counting actual installs, rather than scrapers, or other bots

Another column: system.release tracks very high-level system version information (for example 4.1.13–18.26.amzn1.x86_64) By including this in the counts, we can see that lots of different types of setups are downloading these packages, suggesting it’s not just a few bots scraping the site. 3.1k different system versions have downloaded my packages this year, compared with 33k total unique versions across the whole of pypi

The query I used is here:

What now?

I never actually received a reply to my email, so a while later, I raised an issue on the official pypi github issue tracker in January. This also got no reply.

I’m currently squatting all the system package names that seem most at risk, and doing so with benign packages, so I don’t see much of a risk of disclosing this now.