I just probably spewed out a bunch of garbage you don’t understand, so let me fill in the gray areas.

PyPI – This is the package repository for the Python programming language. For example, suppose I came up with a useful library for listing the contents of web page and called it webDir. Suppose I wanted to share it. PyPI is where I would upload it, and other developers that had a reason to want to view the contents of a web page would be able to download it and use it as part of their own programs.

Typosquatting – This is a popular way to get someone to trust something untrustworthy. It relies on the fact that people are imperfect and sometimes type in words incorrectly. Predict how people might do so, and you can create a web domain to intercept those typos and do … things. It’s been used for web pages quite frequently, but in the case of PyPI it can also be used for software libraries. Suppose you created a library called webdir (lower case D) that did the same thing but ALSO installed a virus. All you would need would be for a few unsuspecting developers to request the wrong package the right way a few times to get entrée to some interesting stuff.

So what happened is there were over a half a dozen instances of packages similar to webdir that got uploaded to PyPI and downloaded a few thousand times. They did exactly what the right packages do, but also added some code that – fortunately, this time – didn’t do anything malevolent, but could have without anyone’s knowledge.

Here’s the interesting parts.

I read so many Python oriented blogs. And yet only one has mentioned this.

At least five non-programmer blogs have mentioned this.

What the actual hell?

The Python “foundation” that maintains this repository say that they can’t help it – they have only two volunteers that support this repository, and they aren’t gatekeepers, they just take the crap that gets uploaded at face value and moves it on, provided it has all the right tic marks filled out. I could upload Professor Zola to PyPI and it would be okay as long as I filled out all the right forms.

They had no recommendations, no apologies, no plans. All they had were excuses.

I want to mention once again that a software language that forms a major part of the backbone of the greater internet has been infiltrated with trojans, twice, and that the people that maintain it have no plans for preventing it again, nor do the people that use it feel the least bit concerned.

One group of people, by the way, suggested that maybe keys would be good. You know, checksums by another name. The problems with this are:

All packages are already uploaded with properly formed MD5 checksums.

The checksum verifies only that the file has not been modified after it was uploaded by the creator.

So, basically, “This virus is 100% authentic”.

PyPI can only be trusted if:

All uploads are scanned and validated by people with domain knowledge.

A separate authenticity repository (or something like it) is maintained to track the fingerprints of the legitimate, vetted packages on PyPI.

Somebody works there that knows enough about the ecosystem to get alarmed when seeing ‘lmxl’ uploaded and claiming to be ‘lxml’.

The good news for now is that this time around, the compromise was limited to Python 2 users. The next one will probably not be so limited.

By the way, if you are a sysadmin of a system that uses Python (for example, a RHEL system that uses Yum to manage its own packages), here’s a program that will let you know if you have been compromised.