2016년 10월 29일 토요일

Slow transfer times when creating a local npm package mirror

In the current project I'm working on, the client wants developers to install packages from local mirrors of Ubuntu, PyPI, and npm. Creating local mirrors of Ubuntu default repositories is easy with apt-mirror, and PyPI can also be mirrored easily with Bandersnatch. Mirroring the npm repo, however, is not so easy. I followed guides which recommend creating a mirror with CouchDB, but I couldn't get this method to work. Instead, I set up Sinopia as an npm cache server so that every time I install an npm package locally it is also saved in Sinopia. The next time I try to install the same package, it will be installed from the Sinopia cache instead of from the npm site.

The problem with using Sinopia as a full npm mirror is that it will not automatically download packages. You must manually install npm packages with npm install pkgname. I got a list of all the packages in npm from http://skimdb.npmjs.com/registry/_all_docs and then parsed the file to get only package names. I then wrote the following script to download packages from npm (which will then be stored in Sinopia):

As of October 2016 when I mirrored PyPI using Bandersnatch, downloading 380GB took about 1.5 days on a 500 Mbit/s connection. Mirroring Ubuntu 14.04 and 16.04 repos requires 300GB each and takes almost one day for each. But using my bash script, I am only getting speeds of about 100MB per hour. Considering that npm is currently 1.2 TB in size, it would take me over 1200 days to create a complete npm mirror at the current download speed. Why is npm so slow compared to Ubuntu repos and PyPI?