How to use the Virtual Machine for Researchers

Some researchers that are working with the Internet Archive, such as those at University of Massachusetts, have wanted closer access to some of our collections. We are learning how to support this type of “on-campus” use of the collections. This post is to document how to use these machines.

Who can have access?

This is for joint projects with the archive, usually some academic program often funded by NSF. So this is not a general offering, but more of a special case thing. Most use the collections by downloading materials to their home machines. We have tools to help with this, and use “GNU Parallel” to make it go fast.

How to get an account?

Is there an agreement? Yes, there usually is. This is usually administered by Alexis Rossi. All in all, these are shared machines, so please be respectful of others data and use of the machines.

How do I get access to the VM? To get an account you will need to forward a public SSH key to Jake Johnson. Please follow the steps below for more details.

You will be prompted to enter a filename which your private SSH key will be saved to. Use something like id_rsa.{username}@researcher0.fnf.archive.org, again replacing {username} with your username that you will be using to login to the VM):

Enter file in which to save the key (~/.ssh/id_rsa): id_rsa.{username}@researcher0.fnf.archive.org

You will be prompted again to enter a passphrase. Enter a passphrase, and continue.

Adding your public key to the VM

Forward your public key to Jake Johnson. He will create a user for you, and add your public key to the VM. Once you receive notification that your user has been created and your key successfully added to the VM, proceed to the next step.

Logging into the VM via SSH

You can now use your private key to login into the VM with the following command:

(Note: replace {email%40example.com} with the email address associated with your archive.org account (encoding @ as %40), and {private} with the value of your logged-in-sig cookie.)

You can retrieve your logged-in-sig cookie using the following steps:

In Firefox , go to archive.org and log in with your account

Go to Firefox > Preferences

Click on the Privacy tab

Select “Use custom settings for History” in drop down menu in the history section

Click the “Show cookies” button

Find archive.org in the list of cookies and expand to show options

Select the logged-in-sig cookie. The long string in the “Content:” field is the value of your logged-in-sig cookie. This is the value that you will need for your wget command (specifically, replacing {private} in the --header flag mentioned above).

2 Responses to How to use the Virtual Machine for Researchers

It is not quite clear if the researchers will be using GNU Parallel for their research, but if they are could you help them learn ‘parallel –bibtex’? Unfortunately it seems researchers often forget the citation of GNU Parallel.