Project PRJEB99111 has 147 samples. I want to download the metadata (age, sex, disease status, etc) of each sample, not fastq. The only way I can download the metadata is by downloading the xml file of each sample accession one by one - is there a way to bulk download all 147 metadata files? I can work with xml files if I have to.

I had the same question and came up with a solution using datamash (note that you may have to put it on your machine using something like homebrew if you are on a Mac). Try out this code, which builds on the original solution above.

I wanted to ask about the last part of the script you have written "xsltproc transform.xsl" .

If I ran the your whole script, I get an error that transform.xsl is not found " warning: failed to load external entity "transform.xsl"
cannot parse transform.xsl". If I ran it step by step results are produced and I seem to get the correct xml but the transfomer is not working.

I am new to Unix but I understand the script up to the xslproc part. When I run " xslproc -h" there is no option transform.xsl. How does the module works does it need to be separately installed?

This is not true. You can easily download a XML file containing all of the attributes of all the biosamples from NCBI. Since the procedure may also be useful in other contexts, I will describe it step by step.

First go to the page of the project (the bioproject database in NCBI speach):

Next, get a list of all biosamples which are linked to this project. There is a section entitled "Related information" on the right site of the page. To get the list of biosamples, click on the hyperlink "Biosample".

This will open an new page which list the first 20 biosamples in the project. The URL of that page is:

On the top of this page (on the right site) is a pull-down menu entitled "Send to:". Click on this menu, then select "File", then select format "Full XML (text)", and finally click on the buttom "Create File". Store the XML file on your local disk and parse it with your favorite XML tool.

That is what I was looking for. Usually bioprojects in NCBI contain a file with all metadata. This file is available in other bioprojects but I couldn't find it in this project. I didn't know about the option you described. Very simple yet useful. Many thanks.

I my opinion, NCBI Entrez/Eutils is more versatile than EBI for downloads like this. If you want to stick with EBI, you can run the loop over all entries of the project on your local computer. There are only 147 samples. Since tasks like this are usually run only once, do not worry to much about computational efficiency.

Unfortunately NCBI does not contain metadata for this project. I get the error "Unable to establish SSL connection" using your codes. I have tried pythons request function but after one successful xml reading the connection fails when I try to read again. You can see my sample codes here: python stopped opening xml url, connection closed.