Oncotator Download

Default Datasource Corpus Download (April 5, 2016)

Please note that this corpus should be used with Oncotator 1.4.x.x and above. Uniprot AA Pos annotations will not function properly with Oncotator 1.3.x.x and below.

Transcript override lists

We highly recommend that you download and use one of the below transcript override lists, especially if clinical applications of Oncotator. When running Oncotator, provide one of the below files with the -c parameter.

Download UniProt Exact Match For GENCODE v19, will give selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly. This file can also be found in the Oncotator download at test/testdata/tx_exact_uniprot_matches.txt.

Download UniProt Exact Match + Clinical For GENCODE v19, this will give priority to known clinical protein changes. This file is a modification of the UniProt Exact Match (above). For more information about how this list was generated, please see the powerpoint presentation here

The Oncotator and default datasource corpus packages are simple tar files that can be expanded using the following commands:

This will produce two directories called oncotator-1.5.1.0 and oncotator_v1_ds_Dec112014, respectively. Move to the oncotator-1.5.1.0 directory by doing:

$ cd oncotator-1.5.1.0

2. Set up your Python environment and install dependencies

See the article on platform requirements for a full list of dependencies. This tutorial will show you how to use the virtual environment script we provide to set everything up automagically, and this tutorial will show you how to install dependencies manually if needed (or preferred).

3. Install Oncotator

Once you have installed all the necessary dependencies listed above, simply run the standard Python install script which is included with the Oncotator distribution.

$ python setup.py install

Two binaries (executable program files) named oncotator and initializeDatasource respectively will be installed into your Python's bin/ directory. You can test that they were installed by running e.g.:

$ oncotator -h

to invoke the help / usage instructions. You can also do a test run of Oncotator on the Patient0.snp.maf.txt file provided with the Oncotator distribution (in the test/testdata/maflite/ directory) with the following command:

I was install oncotator and got a error: ERROR: Could not load pysam. Some features will be disabled (e.g. COSMIC annotations) and may cause Oncotator to fail. No handlers could be found for logger "root". I used the virtual environment script and the pysam-0.7.5 package have been successfully installed. I was confused why got this error when I test the oncotator. Can you give me some suggestion?

We love having the local installation of Oncotator because of the improved ease of use and convenience. We do trip up ourselves from time to time when we forgot whether the dozens of output files contain canonical or best effect annotations or that the --tx-mode flag was set to "EFFECT" or not. Can you add as a future enhancement a text string in the output file to label the output as either Canonical or Best Effect? The text string does not have to be in the column headers but wherever it is sensible for you to put it.

Thank you @Geraldine_VdAuwera. While the run logs are excellent records for when, where, and how the --tx-mode flag was set, all is lost when the Oncotator output files gets passed around to other people in our group or to other organizations because we rather not send the run logs to those people.

Hi, when I run oncotator, I keep getting the error "pkg_resources.DistributionNotFound: natsort"
I am new to command-line, but I have checked that the natsort package is present. Any help would be greaty appreciated. Thanks

@np3 Though I have newer packages that is very similar to my configuration. Usually the issue here is an incompatible version of distribute or setuptools with the version of python. Are you using a virtual environment? In other words, did you run scripts/create_oncotator_venv.sh? Apologies for basic questions. Also, use @LeeTL1220 to get a faster response.

@LeeTL1220‌ Thanks for the info. I previously used "pip install virtualenv". I now tried your suggestion for installing the VE. I also uninstalled distribute before and after doing so, and also reinstalled it at the end. I could not get oncotator to work at any of these steps. I should say I am very new to Linux, so any basic advice is appreciated.

Btw, I used the command "./oncotator -h" from the "bin" directory. Is this correct?

The next step would be to try to use the new Corpus. To do so, do I need to uninstall/remove the old one?

Having spent a few days making Oncotator to work, I want to share my solution. I think the main problem stems from the fact that Oncotator uses distribute-0.6.15 which does not understand wheel format (new Python packaging). Therefore if you happen to have some of the required packages installed as dist-info rather than egg-info you are likely to run into trouble (check your lib/python2.7/site-packages/ )

We are running SL 6.4 on our cluster so no Python 2.7 in the distro and we are using modules.
1/ installed python/2.7.9; created a module for python/2.7.9
2/ installed numpy and scipy with Intel MKL support; after that

I unzipped oncotator_v1_ds_Jan262015.tar.gz and found that the folder name was oncotator_v1_ds_Jan262014 instead of oncotator_v1_ds_Jan262015. Was it simply a typo?

I found the same thing but was able to run Oncotator with the new corpus to get the upgraded 1000 genome and dbDNP data. Most likely the folder name had a typo, as is usually the case at the beginning of any year.

I recently downloaded Oncotator in order to use it on open-access TCGA MAF files. I kept only the first 34 columns of the MAF file, in order to remove any additional columns that may have been added by one of the TCGA data centers, and tried to input it to Oncotator using the --input_format=MAFLITE option. It seems like the two "dbSNP" columns in the TCGA MAF format caused a problem, and I got this error:

Removing the dbSNP_RS and dbSNP_Val_Status columns eliminated this problem.

Another issue I noticed is that column 12 in the annotated output ("Tumor_Seq_Allele1") is actually a copy of the preceding column ("Reference_Allele"), while column 13 ("Tumor_Seq_Allele2") is almost always the same as the input column 13, but sometimes different. Also the NCBI_Build in column #4 gets reset to "UNKNOWN" (although column #196 has the correct build information).

I was expecting that columns 1-34 in the original MAF would be the same in the output annotated MAF -- is there a reason why this is not the case?

Hello!
I recently downloaded oncotator v1.5.3.0 and the newest database bundle (Jan262014) and I cannot annotate my samples. Oncotator works well and parses through my input file, but there is no annotations at the end. Apparently the datasources are not recognized:

I was wondering if I have to initialize one by one the databases from the bundle, or if there is a way of getting oncotator to recognize them since they are already in the right format? I am not sure what I missed in the installation process.
Thank you in advance and I'm sorry for the newbie question!

Hi guys, I downloaded the virtual environment successfully and was able to run the create_oncotator_venv.sh script but I am not able to run - python setup.py install. I tried this inside the virtual environment as well but it gave me the error -

** File "numpy/core/setup.py", line 42, in check_types
7.2 Termination. LICENSEE shall have the right to terminate this Agreement for any reason upon prior written notice to BROAD. If LICENSEE breaches any provision hereunder, and fails to cure such breach within thirty (30) days, BROAD may terminate this Agreement immediately. Upon termination, LICENSEE shall provide BROAD with written assurance that the original and all copies of the PROGRAM have been destroyed, except that, upon prior written authorization from BROAD, LICENSEE may retain a copy for archive purposes.
File "numpy/core/setup.py", line 293, in check_types
SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel.**

I also tried - apt-get install python-dev. I verified if I have Numpy installed in Python and I do have it. Please let me know how to fix this and run oncotator. I am not sure what I am missing here. Thank you!

@daih You should see the datasource directories in $HOME/GATKOncotatorDatasource/oncotator_v1_ds_Jan262014. For example, the directory $HOME/GATKOncotatorDatasource/oncotator_v1_ds_Jan262014/hgnc/hg19 should exist. Perhaps, you have an extra subdirectory?

@LeeTL1220 said:
daih You should see the datasource directories in $HOME/GATKOncotatorDatasource/oncotator_v1_ds_Jan262014. For example, the directory $HOME/GATKOncotatorDatasource/oncotator_v1_ds_Jan262014/hgnc/hg19 should exist. Perhaps, you have an extra subdirectory?

Unfortunately there are still issues installing Oncotator. The principal issue related to v 1.8.0.0 is that it requires pysam 0.7.5 which is not available from pypi. As a result oncotator setup gives an error about missing pysam. The dependencies script installs everything but pysam in fact needlessly upgrading to the latest pysam version even if you happened to have the right one in the first place.
Here is my recipe:
1/ download and install pysam 0.7.5https://pysam.googlecode.com/files/pysam-0.7.5.tar.gz
2/ install dependencies as listed in scripts/create_oncotator_venv.sh BUT change ngslib to "ngslib==1.1.9" where necessary because installing newer version will upgrade your pysam which you want to avoid
3/ install oncotator itself (python setup.py install)
Hope this helps

@LeeTL1220 I am not using any BigWig datasources like you mentioned so I will move forward and ignore the message. I did a little more testing today and ngslib 1.1.18 and 1.1.19 both give the same error. ngslib 1.1.20 imports with no error. Thanks for the help!

I noticed the output from oncotator removes the chr prefix from the chromosome location when outputting to VCF. I understand that the default output is a MAF format that wouldn't have the chr prefix. Would it be possible to have an option in oncotator that would keep that CHROM field intact?
Thanks!

Hi, I have followed your instructions to successfully create the virtualenv for oncotator-1.9.8.0 on MacOSx. But I am having troubles on $ python setup.py install step. It doesn't look like a dependency error but I am not sure, I searched it a lot but still without solution. Error line is;
ld: library not found for -lcrypto
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: Setup script exited with error: command 'cc' failed with exit status 1