Reddit programmer charged with massive data theft

A former employee of Reddit has been accused of hacking into the computer systems of the Massachusetts Institute of Technology and downloading almost 5 million scholarly documents from a nonprofit archive service.

Aaron Swartz, a 24-year-old researcher in Harvard University's Center for Ethics, broke into a locked computer-wiring closet in an MIT basement and used a switch there to gain unauthorized access the college's network, federal prosecutors alleged Tuesday. He then downloaded 4.8 million articles from JSTOR, an online archive of more than 1,000 academic journals, according to an indictment filed in US District Court in Boston.

“As JSTOR, and then MIT, became aware of these efforts to steal a vast proportion of JSTOR's archive, each took steps to block the flow of articles to Swartz's computer and thus to prevent him from redistributing them,” the court document stated. “Swartz, in turn, repeatedly altered the appearance of his Acer laptop and the apparent source of his automated demands to get around JSTOR's and MIT's blocks against his computer.”

Attempts to reach Swartz for comment weren't immediately successful. According to his personal website, he is a cofounder of the social news website Reddit, although many people dispute this characterization. He is also the author of numerous articles on a variety of topics including “the corrupting influence of big money on institutions.”

Members of Demand Progress, a nonprofit political action group Swartz founded, criticized the indictment.

“This makes no sense,” the group's executive director, David Segal, said in a statement. “It’s like trying to put someone in jail for allegedly checking too many books out of the library.”

When JSTOR blocked the MIT IP address Swartz used in September, for example, the Harvard fellow allegedly incremented a single digit and resumed his wholesale downloading binge, which was streamlined with a custom Python script. JSTOR at times responded by blocking huge ranges of IP addresses, causing legitimate JSTOR users at MIT to be denied access.

Eventually, MIT responded by blocking the MAC address of his Acer laptop, so Swartz allegedly spoofed the digital serial number, again by incrementing a single character of the address.

According to authorities, Swartz hid the laptop and a battery of external hard drives under a box in the wiring closet so they wouldn't be noticed by anyone who entered the enclosure. He then periodically swapped out the drives, taking pains to mask his face with a bicycle helmet to evade identification.

Of the 4.8 million documents allegedly downloaded, about 1.7 million of them were made available for purchase by independent publishers. Prosecutors said Swartz planned to dump the huge stash on one or more file-sharing sites.

Swartz was charged with computer intrusion, fraud, and data theft. If convicted, he faces a maximum of 35 years in prison, restitution and forfeiture, and a fine of $1 million. A PDF of the indictment is here. ®

This post was updated to clarify Swartz's position at Reddit and to add comments from Demand Progress.