To cite a GenePattern analysis or visualization module, cite the GenePattern software and the original paper or other source for the module as specified in the module documentation. Documentation for each module is available on the Modules page and in GenePattern (click Help when prompted to enter the module's parameters).

If I am a member of the press, how can I get more information?

What can I do if I my job keeps running out of memory or fails with no error on the public server?

Sometimes jobs with large datasets or parameter settings which cause greater computational load on the system can fail because they ran out of memory. Usually you will get an error message stating that the module ran out of memory, though sometimes this can cause a "silent" failure; meaning that you see that your job failed, but there are no error files. In either case, GenePattern administrators can track down and resolve the problem. GenePattern administrators can assign memory settings on a per user basis, allowing your jobs to run with the required amount of memory for your analysis.

To have an administrator look into your errors and adjust the memory settings for your jobs please contact us at gp-help(at)broadinstitute.org, making sure to provide your username and job id.

Are there other public GenePattern servers?

The public server maintained by the GenePattern development team can be found at http://genepattern.broadinstitute.org. There are also other GenePattern servers maintained by other organizations that have been made publicly available. These organizations include:

SMD:Stanford Microarray Database
The SMD provides an extensive microarray database and has integrated with GenePattern to provided tools for the analysis of the data. More information about SMD can be found at http://smd.stanford.edu and the paper that describes the integration of GenePattern in their environment can be found here. Please send questions and comments regarding SMD to array(at)genome.stanford.edu.

NuGO: NBX
NuGO has developed a Black Box environment that utilizes GenePattern as its preferred analysis tool and has deployed NuGO-modified versions of some GenePattern modules on the GenePattern servers installed on the Black Boxes. For more information about the NuGO NBX please visit http://www.nugo.org/NBX/. The paper that describes the modules NuGO has contributed to GenePattern can be found here. Any comments or questions regarding the NuGO NBX and the NuGO GenePattern modules should be directed to sian.astley(at)bbsrc.ac.uk.

Garvan Institute
The Peter Wills Bioinformatics Centre at the Garvan Institute of Medical Research in Sydney, Australia, has set up a public GenePattern server here.

Is there a version of GenePattern that can run in the cloud?

How do I install (uninstall) GenePattern?

To install GenePattern, go to GenePattern Download and follow the instructions for your operating system.

To uninstall GenePattern, use the utility provided as part of the GenePattern installation. If the GenePattern uninstall utility is unavailable, deleting the GenePattern installation folder removes all GenePattern files other than the desktop icons.

Mac users: If R2.5 is not already installed, GenePattern installs it in the /Library/Frameworks/R.framework/Versions/2.5 folder. Uninstalling GenePattern does not uninstall R. To uninstall R, use the utility provided by R.

How do I upgrade to the latest version of GenePattern without losing my data (such as jobs, uploaded files and modules)?

Simply install the new version of GenePattern into the same directory as your previous version. Do not uninstall first. It is unnecessary and will delete your existing modules, pipelines and suites. When you overwrite the previous version:

Existing modules, pipelines and suites are preserved.

The following settings are read from your existing genepattern.properties file and displayed as default values for your new installation: settings for R, Java, Perl, LSID Authority, proxy settings, HSQL database URL and port, and file purge frequency and time.

The default value for the webserver (Tomcat) port used by the GenePattern server is always 8080. If your existing installation uses a different port number, you can specify that port number during the installation.

The require.password setting from your existing genepattern.properties file is preserved in your new installation.

Backup copies of the following configuration files are created: genepattern.properties[.backup (before GenePattern 3.4) or .save (3.4 and up)], permissionMap.xml[.backup], and userGroups.xml[.backup]. To recreate your previous settings after installing GenePattern, compare the saved files with the newly installed files and modify the new files as necessary. Do not replace the newly installed configuration files with the saved copies.

User groups: The userGroups.xml file for GenePattern 3.2 omits the group named Public. In GenePattern 3.2, all users are now in a predefined group named Public. To avoid confusion, do not recreate the group named Public.

R versions: Installing GenePattern 3.1 (or later) installs R2.5 and sets the full path to R2.5. See Using Different Versions of R for information on how to create and/or use GenePattern modules written for other versions of R.

What is the recommended method of upgrading my GenePattern server to 3.3.3 or higher?

Large uploaded data files or output files will significantly slow down your GenePattern upgrade installation. If you have less than approximately 10 GB of data files either uploaded (via the Upload tab) or output by GenePattern jobs, you can just follow the GenePattern server installation instructions. However, if you have more than 10 GB of uploaded data files or output files, we suggest that you:

Move your uploaded files to a non-default location, for instance:

mv <GenePatternServer>/Tomcat/temp <GenePatternServer>_data/temp

Edit the following property in both StartGenePatternServer.lax and genepattern.properties:

I already have R/Perl/Java on my machine. Will the versions of R/Perl/Java that GenePattern installs interfere with these?

Can I configure GenePattern to work with versions of R/Perl/Java other than those installed by GenePattern?

You can configure GenePattern to work with other versions of R/Perl/Java; however, the versions of R, Perl, and Java bundled with GenePattern are the ones that have been fully tested. We cannot guarantee that other versions will work.

Java VM: If you install a GenePattern server without the Java VM, choosing instead to use a Java VM that you have already installed, ensure that the file tools.jar (provided by SUN seperately from the JRE and JDK) is on your classpath. When you install a GenePattern server with an included VM, the GenePattern installation does this for you. If this file is not on your classpath, when you attempt to install a module that requires the MatlabComponentRuntime (MCR) Installer, the MCR Installer fails.

R versions: GenePattern modules can be written for any version of R. For details on how to specify which version to use, see Using Different Versions of R.

Does GenePattern support the international settings on my computer?

GenePattern supports the Basic Latin character set. Characters other than those in the Basic Latin character set may not be displayed correctly. Asian character sets are not currently supported.

All analysis and visualization modules support the decimal point (.) as the separator between the integral and fractional parts of a decimal number. Using a decimal comma (,) may cause unexpected behavior in some modules.

On the Proxy Settings page, enter the hostname and port of your web proxy server. If you do not know them, contact your IT help desk to get the values. If you need to log into the proxy server, also enter your username and password (these will NOT be saved to a file and will need to be reentered following a server restart next time you want to connect).

Click Save to update the proxy settings.

Click Modules & Pipelines>Install from Repository to install the modules.

When should I choose to install the GenePattern server on a different port than the default 8080?

How do I install GenePattern on a 64-bit Windows machine if I want to use a version before 3.2.2?

When GenePattern is installed on Windows 64-bit systems in the default C:Program Files (x86) directory, modules fail because of some code that is expecting only "C:Program Files" and then truncates that location to "C:Progr ~1". There is similar bug in ComparativeMarkerSelection. These errors are corrected in the 3.2.2 release of GenePattern. However, if you do not upgrade to release 3.2.2 or after, the work around is to re-install GenePattern in a directory that has no spaces in the name.

Installing MATLAB to "Program Files" or "Program Files (x86)" may cause the MATLAB installation to be incomplete. The installation will not fail, but modules will throw an exception, reporting a missing file and then fail. To fix this manually, re-run the MATLAB installer and choose a new installation directory that contains no spaces.

To do this you'll need to first uninstall MATLAB, else it will only let you "repair" the installation in the current directory. Steps to uninstall MATLAB are as follows:

1. Uninstall MATLAB (via Add/Remove Programs)
2. Rerun the .msi, found in GP/patches/...../*.msi, and choose a new install directory with no spaces - like the default GenePattern install directory at Windows: C:GenePatternServer.

If you've uninstalled GP and now need to uninstall the MATLAB that was installed via the now deleted GenePattern:

1. Reinstall GenePattern to a directory with no spaces
2. Install MATLAB module
3. Repair installation
4. Run the module to be sure MATLAB installed correctly
5. Use Add/Remove programs to uninstall MATLAB

You might need to reboot to get the system to let you uninstall the MCR files.

Or you can email us at gp-help(at)broadinstitute.org and we can send you the .msi which you can then point the uninstaller to, through Add/Remove Programs.

Why can't I connect to my GenePattern server on Windows 7 or Vista?

On Windows 7 and Vista, the StartGenePatternServer and StopGenePatternServer applications must be run as an administrator. To start or stop the GenePattern server, right-click on StartGenePatternServer.exe or StopGenePatternServer.exe and select Run as administrator.

To launch GenePattern in your browser, you can double-click the GenePatternHome.html icon located with the StartGenePatternServer and StopGenePatternServer icons.

The downloadable installer makes the assumption that GP will be run on an individual's PC or laptop and is configured for that. The remote client computer can't reach the necessary files because they are being served on URLs using the server's loopback network interface (127.0.0.1). For shared server use, there's one setting that needs to be changed. Go to the location where GenePattern has been installed on your server and look for the genepattern.properties file in the "resources" directory. Open it with a text editor like emacs or vi. Look for a property named GenePatternURL and replace the 127.0.0.1 part with the network name of that server. You should leave the ':8080/gp' part in place unless you have changed the Tomcat port configuration. Stop and restart the GenePattern server and the URLs should now point to the right location.

Why doesn't clicking StartGenePatternServer launch GenePattern?

The StartGenePatternServer application only starts the server. To access the web client interface for your GenePattern server, click the GenePatternHome.html shortcut icon, or, if you did not install icons in your task bar or on your desktop, GenePatternHome.html can be found at the top level of your GenePattern install directory.

How can I get R to install correctly on my Mac?

Some Mac users have found that the R library is not installing correctly when they try to install GenePattern. Even after making sure that the folder into which GenePattern is installing R has write permissions, upon running a module, they receive the following error message:

java.io.IOException: Cannot run program "/Library/Frameworks/R.framework/Versions/2.5/Resources/bin/R": error=2,
No such file or directory while running R command [/Library/Frameworks/R.framework/Versions/2.5/Resources/bin/R,
--no--save, --quiet, --slave, --no-restore]

This may be a simple GenePattern server configuration problem. First, check that something is installed at that path. Open the Terminal.app and run the following commands:

If there is something installed at this path, then check that the path to R is correctly configured in your GenePattern server. Go to the Administration>Server Settings>Programming Languages GenePattern page and verify that:

R 2.5 Home: /Library/Frameworks/R.framework/Versions/2.5/Resources

If this is configured correctly, you may be able to correct the problem by manually downloading and installing R 2.5.

Yes. If you are running more than one installation of GenePattern on the same machine, you must make sure that the port numbers for the GenePattern server and the HSQL server are unique to each installation. The Tomcat server listens on two ports, 8080 (requests) and 8005 (shutdown) by default, and the HSQL server listens on port 9001. All 3 ports need to be modified on the second copy of Tomcat. For example, you can set the GenePattern server port to 8080 and 8005 on one install and 8081 and 8086 on the other, and set the HSQL port to 9001 on one and 9002 on the other. You can configure these port numbers when you are installing the server.

How do I configure GenePattern to work with a queuing system (or grid engine)?

Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have such a queuing system installed at your site and you have installed a local GenePattern server, you can configure the GenePattern server to work with the queuing system. For instructions on how to do so, see Using a Queuing System.

I'm getting 'error "connection refused"': what is the problem?

A refused connection is most likely due to a proxy issue. If you are behind a proxy or firewall, verify that you have correctly configured GenePattern and/or talked with your local SysAdmin allow GenePattern access to your machine.

To configure a proxy connection in GenePattern please do the following:

In the GenePattern Web Client, click Administration>Server Settings to display the server settings.

Click Proxy to display the proxy settings.

On the Proxy Settings page, enter the hostname and port of your web proxy server. If you do not know them, contact your IT help desk to get the values. If you need to log into the proxy server, also enter your username and password (these will NOT be saved to a file and will need to be reentered following a server restart next time you want to connect).

Can I use a file path as input for a GenePattern module?

If you install your own GenePattern server, the default setting is not to allow input file paths. To change this, if you have administrator privileges on the server, add or edit the following in your genepattern.properties file:

allow.input.file.paths=true

Then restart your server. This will allow users to input an arbitrary network file path (such as file:///server/directory/file.gct) as the value for an input file parameter. When input file paths are allowed, you can use the server.browse.file.system.root property to set a root directory where the GenePattern server begins browsing for the specified network file path.

Note: On the Broad public server, we prevent users from entering an input file path (file://urls) as an input file for a module in order to better secure the machine running the public server.

How can I work around a LaunchAnywhere error?

If you tried to install GenePattern on Ubuntu, you may have received an installation error: "An internal LaunchAnywhere application error has occurred and this application cannot proceed. (LAX)" with "java.lang.IllegalArgumentException: Malformed \uxxxx encoding." in the stack trace.

LaunchAnywhere can interfere with the prompt string formatter PS1. In order to work around this problem, you need to use the following command:

$ export PS1=">"
>sudo sh./GPServer.bin

This is not only important for installing GenePattern on Ubuntu, but also launching GenePatternServer. Use the command before the GenePatternServer startup command, like so:

32-bit Windows machines only allow you to allocate up to 1.2GB of RAM to processes. 64-bit Windows will allow for more (depending on how much you have installed), but to run memory-intensive Java modules, you must install 64-bit Java and update your GenePattern server's maximum memory allocation (Refer to this information on increasing memory allocation).

Edit genepattern.properties (located in the resources subdirectory of the GenePattern server directory) so that the 64-bit Java installation location is now the Java parameter value: java=C\:/Program Files/Java/jre6/bin/java

Look for the entries noted below in this file and increase these values (for example, double the value) up to the maximum memory size of the machine you are using. (Note: Windows limits the total space available to a process to 2 GB. Some of that is used for overhead, so slightly less is really available to the JRE.)

Why does the Specify File Path or URL option not work in Internet Explorer?

This is a known issue: when users click the Browse Server File System button, the Internet Explorer web browser window (instead of a pop-up window) becomes the file system browser.

If you want to continue using Internet Explorer, you can copy and paste or manually enter the server file path rather than clicking the Browse Server File System button. We recommend using another browser for full functionality.

Yes. Most GenePattern analyses can run on 2-channel or ratio-based data as easily as on single channel or absolute value data. To run 2-channel data in GenePattern, do the following:

Convert your ratio-based data to a GenePattern GCT file. This tab-delimited text file format contains features (genes or probes), samples, and a computed ratio value for each feature in each sample.

GenePattern modules cannot analyze files with missing values. If your data has missing values, one way to address the issue is to use the ImputeMissingValues.KNN module to impute the missing values.

Your data is now in a GCT file that can be analyzed by most GenePattern modules. (If you want to use non-negative matrix factorization (NMF) and your data contains negative values, see the NMF note in the Modules & Pipelines section below.)

Ratio values for cDNA data can be computed using a variety of methods. How the ratios are computed determines whether it is possible to create a class (CLS) file for the cDNA ratio data. For example:

If ratios for all samples are computed against a common reference, as shown below, each sample can be assigned a distinct class and it is possible to create a class (CLS) file.

If ratios are computed by comparing conditions, as shown below, it may not be possible to create a CLS file.

normal sample (Cy3) / treated sample (Cy5) = phenotype

If you cannot create a CLS file, you can analyze your data using modules that do not require class files (such as ConsensusClustering), but will not be able to use modules that require the CLS file (such as ComparativeMarkerSelection).

Where can I find information about file formats used by GenePattern?

How can I convert between RES, GCT, and ODF formats?

Run your file through PreprocessDataset. Select the desired output format for your file. If you only want to convert the file type without filtering, select "no filter" as the choice for the "filter flag" parameter.

How do I convert a file to GenePattern format?

File Formats describes the file formats used in GenePattern and, where applicable, suggests methods for converting files to these formats.

How can I use CEL, MAGE-ML, and MAGE-TAB files in GenePattern?

The ExpressionFileCreator module converts a set of individual CEL files into an expression data set that is usable by GenePattern modules. The MAGEMLImportViewer module imports data in MAGE-ML format into GenePattern, and similarly, the MAGETABImportViewer module imports data in MAGE-TAB format into GenePattern.

I have installed a module/pipeline/suite, but I do not see it. What's wrong?

This generally occurs for one of two reasons:

If the same zip file is installed twice, by two users, the second one overwrites the first one. While the bits are the same (including LSID), the ownership and privacy are subject to change and may end up hiding it from the module's original installer if the second installer installs it as private.

The same suite cannot be installed as a "private" suite for more than one user. If you install a private suite and do not see it, it may already be installed as a private suite by another user.

My pipeline requires an input file, but displays a file-not-found error when I enter a file name. What's wrong?

Pipeline input files with spaces in their names may give file-not-found errors. If this happens, use DOS' "dir /x" command to get the 8.3 version of the directory and filename and use that instead of the long filename. If you are using a Unix-based platform, you may need to quote the filename parameters on the command line definition.

When I do a Hierarchical Clustering analysis, two files are produced, but the Hierarchical Cluster Viewer (JavaTreeView) looks like it needs three files. Do I need another one?

No, you can use the two files that are created and leave the remaining input box blank. HierarchicalClustering creates a cdt file and one or two additional files: an atr file if you clustered by samples (columns), a gtr file if you clustered by genes (rows), or both atr and gtr files if you clustered by both samples and genes (columns and rows). The JavaTreeView module accepts the two or three files created by HierarchicalClustering.

How can I export a Heat Map image with gene annotations?

Why do the scores from ComparativeMarkerSelection and ClassNeighbors differ?

When computing the t-test or signal to noise ratio, ClassNeighbors thresholds the standard deviation to ensure that it is at least twenty percent of the mean. Additionally, if the standard deviation is zero, ClassNeighbors sets it to 0.1.

I have used ComparativeMarkerSelection to construct gene lists representing different experimental conditions. Is there a GenePattern module that can determine if there are upstream non-coding motifs over represented in those gene lists?

Yes. You can use the GSEA module with the c3 (motif) gene sets. The GSEA module is documented on the Modules page.

How do I view the 3D visualization in the PCAViewer or FLAMEViewer?

How do I resolve GISTIC errors?

Most errors reported by users running the GISTIC module are caused by a mismatch between the segmentation and markers files. If an error occurs, verify that all markers indicated in the segmentation file appear in the markers file and only those markers indicated by the segmentation file appear in the markers file.

The CBS and GLAD segmentation methods produce GISTIC-friendly marker positions. Partek's latest beta version also produces GISTIC-friendly marker positions. However, if you used an earlier version of the Partek algorithm to create the segmentation file, the algorithm did not report the exact physical position of the first and last markers of the segments. If you run GISTIC on a segmentation file generated using the earlier version of the algorithm, the physical positions of the marker file will not agree with the start or stop positions of the segmentation file. Note that Partek also uses the control probes in the generation of the CN/segmentation.

What does the GISTIC MATLAB error "Matrix dimensions must agree." mean?

If you are running GISTIC and get the error above in your stderr.txt file, you should verify that your segmentation file and markers file are exactly matched. Only the markers from the markers file should be indicated in the segmentation file and only those markers indicated by the segments should be in the markers file.

How can I see the Color Scheme Legend in HeatMapViewer or HiearchicalClusteringViewer?

To see the Color Scheme Legend in either HeatMapViewer or HierarchicalClusteringViewer, select View>Color Scheme Legend. This legend also coordinates with HeatMapImage if you use the same parameters for HeatMapViewer and HeatMapImage.

Note that the issue of legends for row-normalized HeatMaps will be addressed in upcoming releases of HeatMapViewer, and HierarchicalClusteringViewer.

ComparativeMarkerSelectionViewer is not launching, but HeatMapViewer is. What is wrong?

You may have a corrupted copy of the ComparativeMarkerSelectionViewer directory. To fix this, do the following:

Make sure you have your Java Preferences set to display console. To do so open your Java Preferences (Mac) or Java Control Panel (Windows), go to the Advanced Tab and expand the Java console. Select Show Console, if it is not already selected. You will need to restart your browser for this setting to take effect.

Attempt to launch ComparativeMarkerSelectionViewer again.

Look in the console for the directory where your viewer is being downloaded and executed. Look for a line like:

I get "??? Attempted to access rl(:,2); index out of bounds because size(rl)=[0,1]" in my stderr file when running GISTIC, what does this mean?

If your run of GISTIC fails with the error below in the stderr.txt file, check your segmentation file format. Please see the sections on the segmentation file format in the GISTIC documentation for more details and examples.

??? Attempted to access rl(:,2); index out of bounds because size(rl)=[0,1].
Error in ==> derunlength at 25
Error in ==> smooth_cbs at 148
Error in ==> run_gistic_from_seg at 125
Error in ==> gp_gistic_from_seg at 177
MATLAB:badsubscript

I get "??? Index exceeds matrix dimensions." in my stderr file when running GISTIC, what does this mean?

If your run of GISTIC fails with the error below in the stderr.txt file, check your markers file format. Please see the sections on the markers file format in the GISTIC documentation for more details and examples.

Given these issues with Java, we are investigating new methods of providing interactive visualizers in the browser. Follow us on Twitter, Facebook or Google+ for updates, as they are available.

What version of Java do you have installed? If you're not sure how to find this on your computer, try Java Tester to find out.
The visualizers require Java 1.5 or later. We have tested them with Java 1.5 and Java 1.6. If you have an earlier version, you will need to update it. (Note that there is a bug in Java 1.6.9_16 build on Macintosh, so we recommend updating to _17 or later.)

Was there a pop-up window as you started the visualizer? The pop-up would have asked if you wanted to allow the applet to access your computer (on Macintosh) or if you wanted to run the application (Windows). Did you click Deny or Cancel?
If so, restart your browser, relaunch the viewer, and when the pop-up appears, click Allow (Macintosh) or Run (Windows).

Delete the "to" directory. In this example we would delete /var/folders/+8/+8VwuO5ZH1S3hKW-uC7XEk0cTCY/-Tmp-ComparativeMarkerSelectionViewe r2436495742394955925.tmp

Relaunch the visualizer; this should download a new copy of the executable and the viewer should display correctly. (Note that bash shells do not display "-" files and directories (like "-Tmp-"). So you may need to use a different shell to find the file path.)

Delete the folder from that location (in this example, we would delete the HierarchicalClusteringViewer folder).

Why is my module taking so long to run?

Some computationally-intense modules can take a day or more to run. Some examples are FLAMEMetacluster, NMFConsensusClustering, GISTIC, and GLAD. In addition, server load can affect queuing times on the Broad public server, and this can affect the length of time a module can take to complete.

If your job does not use a computationally-intense module or a large data set, and it takes longer than about 4 hours to complete, please contact us at gp-help(at)broadinstitute.org.

Why does Safari crash whenever I run a Java applet?

A recent update in Safari 5 (we observed the problem with 5.0.1 [5533.17.8]) has caused this problem for some users. The Java applets interfered with include the GenePattern visualizers and the GenePattern installer.

What does a missing value error for ComparativeMarkerSelection mean?

ERROR: The estimated pi0 <=0. Check that you have valid p-values or use another lambda method.

then a gene in your data has insufficient variation in its expression values. Use the PreprocessDataset module with a filter that is more stringent than you have previously used on your data set before running ComparativeMarkerSelection.

Why does ExpressionFileCreator fail?

If ExpressionFileCreator fails on your local server, but works on the public server, you need a more recent version (version 8 or 9) of ExpressionFileCreator. Only version 7 (and earlier versions) is available in the public repository (via Pipelines & Modules>Install from repository) because versions 8 and 9, which support the updated CEL file formats, require R 2.8, and GenePattern installs with R 2.5.

You can either use ExpressionFileCreator on the GenePattern public server or install a more recent version of .

Then the CDF for your array was not found in the Broad-hosted CDF library. You need to use a custom CDF to support the conversion. CDF files are available here. For instance, if you were analyzing the Mouse Gene 1.0 ST Array, you could type in that search term on the Affymetrix page. The result page opens, where you could find your CDF file under the Library Files section.

Provide this CDF file as the input for the cdf file parameter in ExpressionFileCreator.

Please note that a number of newer Affymetrix array types are not current supported by ExpressionFileCreator, including the 1.1, 2.0, 2.1 ST arrays, Exon arrays, and HTA 2.0 arrays. This is the case even if a CDF file is provided. Please see the ExpressionFileCreator documentation for details and future plans.

What does the GISTIC error, "Invalid file identifier" mean?

Why does the "no such module" error occur for a module on the server?

If you run an imported pipeline on your own GenePattern server, and you get the error, "No such module [module name]", when you know you have that module on your server, then the pipeline requires a version of the module that is not on your server. If you return to the pipeline page and click Properties, you can view the modules that are required but not installed. If you install these module versions from the repository, the pipeline will run.

Why is nothing happening when I try to upload my large file?

There are limitations on file upload size. Files uploaded via the Browse button on the module input page must be under 1.2 GB. To use larger files, there are a few options:

Select Upload files from the blue arrow in the Uploads tab to upload your file. This invokes the large file upload system. Add the files you want to upload to the Java applet and click the upload arrow. This makes your files available in the Uploads tab, and you can set them as input for appropriate GenePattern modules. More information about the Uploads tab can be found in our User Guide

Download and install GenePattern on a local machine. Put your files on a server that is accessible to your GenePattern server – that is, on the same file system or via a network share – and use the file path as input for the GenePattern modules. (Note: you will have to enable file paths on your server.)

Put your files on a web-accessible machine or FTP site and specify a URL or FTP address for the input file. Make sure that the machine you use is accessible to the GenePattern server.

What does "Error in subfiles: subscript out of bounds" mean?

This error can be produced if there are hidden files or directories in the ZIP archive. This usually occurs on a Mac when using the "Compress" option from the right-click pop-up menu. If this is the case, you may want to use the zip command from the terminal window to zip files instead. If you didn't Compress on a Mac, then you should check that there are no hidden files in the ZIP archive.

How can I pre-process my RNA-seq data for IGV?

The recommended format for RNA-seq data in IGV is the BAM file. If you run your SAM or BAM file as the input file for the SortSam module, you can sort and index it, and can convert a SAM file to BAM.

In addition, the IGVTools.sort and IGVTools.index modules can sort and index a SAM file. These modules are currently in beta. If you would like to use them, please contact us at gp-help(at)broadinstitute.org.

Does GenePattern support SNP 6.0?

No. The GenePattern team is presently working to support SNP 6.0.

Why did the module I tried fail to run with my ZIP file as input?

If your ZIP file has a directory in it, GenePattern cannot resolve it. Unfortunately, if you generated your ZIP archive using the Finder on the Macintosh OS, the Mac builds a directory structure into your ZIP archive and GenePattern cannot resolve it. To zip on a Mac, use the zip command from a terminal window; for example, if you wanted to create a ZIP archive called "all_foo" that contains the files all_foo.cls and all_foo.gct, you could use the following command:

zip all_foo all_foo.cls all_foo.gct

Some other reasons that your ZIP file may fail include spaces in the names of the files or hidden files. If you cannot locate the issue with your ZIP file, please contact us at gp-help(at)broadinstitute.org.

Why did my GenePattern job fail?

The first place to look for the reason is the stderr.txt file, which should be available in the job summary or job status page. This file often contains plain text indicating what went wrong with a job, such as formatting or filtering errors. If you find that this file does not help you resolve the error, please contact us at gp-help(at)broadinstitute.org.

How can I use the RNA-seq modules available in GenePattern?

If You Choose to Run RNA-seq Modules on Your Own GenePattern Server

If you have not installed GenePattern on your local machine, instructions for installing a local GenePattern server are provided on the Download GenePattern page.

If you have already installed a GenePattern server, select Modules & Pipelines>Install from repository. The page will present all available modules. You only need to select the checkboxes for the modules you want and click Install Checked.

Note: The main analysis RNA-seq modules (Bowtie, BWA, Cufflinks, TopHat, and Scripture) currently only run on Macintosh and Linux. If you do not have access to machines with these operating systems, you can use the modules on the Broad public server. The conversion/utility modules that are related to the RNA-seq modules are available for Macintosh, Linux, and Windows.

You may find it helpful to enable your GenePattern server to accept file paths in order to handle large input files that are already present on the system where your local server is installed. To do this, edit genepattern.properties (located in the resources directory under your GenePattern server directory) and make allow.input.file.paths=true. This allows users to input a network file path (such as file:///server/directory/file.gct) as the value for an input file parameter. When this value is set to true, you can define a root directory where the GenePattern server begins browsing for network files by setting server.browse.file.system.root to the root directory you want to specify.

Example: In genepattern.properties, setting server.browse.system.root=/Users/mydata/ngs will cause the browser window to open to /Users/mydata/ngs when a user chooses Specify File Path or URL.

Example: In the config_default.yaml file, setting server.browse.system.root: [ "/Users/mydata/ngs", "Users/shared"] will add two folders to the browser window.

Why is my GenePattern job stuck in the PENDING state?

There are a few reasons why this might occur. Jobs are often PENDING because GenePattern is a shared resource. When your job is in the PENDING state, it means that it is waiting in the queue behind other jobs for the GenePattern server to submit the job to the server farm. Jobs that use large files and access them via an external URL may hold up the line while those files are transferred to the GenePattern server, even keeping jobs that normally take a few seconds in PENDING.

The job will run when the queue clears up.

If this is a common issue on your GenePattern server, it is possible to configure it to help reduce the wait. If you want to reconfigure your GenePattern server in this way, please contact us at gp-help(at)broadinstitute.org.

Why did I get a warning stating that my index is older than my BAM file?

If you try to run an indexed BAM file through a module and receive a warning that your index file (BAI) is older than your BAM file, it means that the timestamps for these files are out of sync. If you receive this warning, you should index your BAM file by using the SortSam module.

GenePattern also has a feature (disabled by default) that allows you to access input files on the server's file system. With this feature turned on, you don't need to directly upload your input files via the job input form. See Using File Paths for details.

GenePattern has a programming interface (with versions for Java, R, and MATLAB) that allows you to submit your jobs in parallel. See the Programmers Guide for more details.

How do I zip my files for use in GenePattern?

On Windows, you need to select the files to be added to the ZIP archive (hold down the Control or Shift key while selecting to select a group). Then right-click on the group and select WinZip (or whichever zip application you have on your machine). Do not select a folder and zip it – that will create a directory inside the ZIP archive; if your ZIP archive has a directory in it, GenePattern cannot resolve it.

On Macintosh, if you generate your ZIP archive using the Finder, Mac builds a directory structure into your ZIP archive and GenePattern cannot resolve it. To zip files on a Mac, use the zip command from a terminal window (launched from Applications/Utilities); for example, if you wanted to create a ZIP archive called "all_foo" that contains the files all_foo.cls and all_foo.gct, you could use the following command:

zip all_foo all_foo.cls all_foo.gct

If you follow these instructions and find that GenePattern does not accept your ZIP file, check for spaces in the names of the files or hidden files in the ZIP archive. If you cannot locate the issue with your ZIP file, please contact us. The GenePattern team plans to develop a ZIP module to help users with creating ZIP archives.

How can I easily run the same analysis on many different data files?

As of GenePattern 3.3.3, GenePattern supports batch jobs. To use this feature:

Select the Uploads tab.

Click the arrow next to the uploads directory, name a subdirectory, and click Create.

Click the arrow next to your subdirectory and select Upload. This launches the GenePattern file uploader.

Click Add in the top of the uploader window and select all the files you want to run as a batch.

Click the upload arrow. This will upload all your files into the subdirectory you just made on the GenePattern server. Do not close the uploader window while the file upload is in progress.

Once the files are uploaded, click the blue arrow next to the directory containing the files, and select the module or pipeline you want to use for your analysis.

If there is more than one input file field you need to populate, you can select "send to as batch" for those parameters that accept batch inputs. Make sure that all the files for a given analysis, which need to be paired, have the same name; for instance, "file1.gct" would be processed with "file1.cls".

Run the module.

The module will be run once for each file selected. All the job results for the batch will be listed under a single batch ID.

Why can't I use a directory as input for all modules?

While as of GenePattern 3.3.3, GenePattern supports the use of directories as input for modules, not all modules support this function.

A few quick ways to tell if a module does accept directories are:

Click on the arrow next to a directory in the Uploads tab; the modules listed in that drop-down will accept directories as input.

Check the caption under the input parameter; if the module accepts directories as input, it will indicate that here.

Check the module documentation (available from the help link in the upper righthand corner of the module's page); the input parameters section will make it clear if a directory is accepted as input for the module.

How do I format my GenePattern output for submission to GEO?

How do I get a heat map with a high enough resolution for publication?

To generate a new heat map image at a resolution near 300 dpi, you can:

Select the HeatMapImage module in GenePattern.

Change both column size and row size to 33 pixels.

For best results, change show grid to "no". (The grid does not scale as much as the column/row size does, and so may look suboptimal for print publication.)

Generate your heat map image. Open your heat map image file in an image manipulation application that can scale images (like Adobe Photoshop or GIMP) and increase the image resolution to 300 dpi. This will reduce the size of the image by about 4 times (thus why you enlarged the image above) and leave it at a resolution of 300 dpi, which is optimal for print publication.

If you already have a heat map image that you cannot for some reason recreate that is at 72 dpi, you can use an image manipulation application that can scale images (like Adobe Photoshop or GIMP) to increase the resolution to 180 dpi. This will shrink the image by half, but 180 dpi is usually the minimum resolution necessary for print publication.

I am running a large number of RNA sequencing jobs, and I'd like to be able to look at the quality of the data. Is there a tool I could use for this?

Yes: the RNAseQC module in GenePattern calculates standard RNA-seq related metrics, including depth of coverage, ribosomal RNA contamination, continuity of coverage and GC bias. See the module documentation for the recommended data processing workflow for optimal use of this QC analysis.

How can I retrieve external database information from GenePattern?

The GenePattern server itself does not connect to any database, but modules can and have been written to connect to databases and retrieve data from them including caArray (caArrayImportViewer) and Gene Expression Omnibus (GEOImporter). To connect to any database of your choice, write a simple command-line program to connect to the database and retrieve data into a file format and install this program as a module into GenePattern (see Creating Modules).

My MATLAB figures are not appearing in the MATLAB visualizer I created. Why?

When creating a matlab visualizer using matlab 7.0 compiled m-code (any release before 7.4), any figures that you create in MATLAB must have the value visible set to on or they will not be drawn to the screen.

Why can't I specify a 64-bit platform for a module?

GenePattern does not have a valid CPU Type for 64-bit platforms. So if you try to specify a 64-bit CPU Type, the module will fail on 64-bit platforms, whether or not they are running compatibility mode. You will have to set the CPU Type to 'any' and add more information on the appropriate platforms in your documentation. If this does not stop the module from failing on appropriate platforms, contact us at gp-help(at)broadinstitute.org.

Why can't I call my pipeline/module from MATLAB?

What is a CSV file?

CSV stands for "comma-separated values". While CSV files will open in Excel or similar spreadsheet applications, it is important to remember that the values in these files are comma-delimited, not space- or tab-delimited.

Can I process raw Illumina BeadChip data in GenePattern?

There are several modules available in updated form on the Broad GenePattern server for the processing of raw Illumina scan data into GCT files that are usable by GenePattern: IlluminaScanExtractor, IlluminaNormalizer, and IlluminaConcatenator, only support the 6k Transcriptionally Informative Gene (TIG) panel (GEO accession: GPL5474), but not other DASL gene panels at this time. The IlluminaDASLPipeline is a workflow that chains together these 3 modules so that it is easy to process zipped Illumina scan data files produced by a DNA-mediated Annealing, Selection, extension and Ligation (DASL) assay.

IlluminaExpressionFileCreator extracts the mean value for each probe from a set of Illumina expression IDAT files and put them into GCT format.

What versions of genomic databases is GeneCruiser currently using?

UniGene and SwissProt are at the current versions listed on their websites and are updating regularly. We are working on restoring regular updates for Entrez Gene. If you are interested in knowing the version of another of the databases accessed by GeneCruiser, please contact us.

Can you send me the source code for GISTIC?

We do not currently distribute the source code for GISTIC. The executable is available and can be found on the GISTIC page. You can also export the GISTIC module from the Broad's public GenePattern server. Note that the GISTIC module and executable are currently compiled only for 64-bit Linux.

The GISTIC developers are working on a version that will allow us to distribute the source code, but it is still currently in development.

How can I make my GenePattern module available on the GenePattern public server?

First, please look at GenePattern Archive (GParc) to see if this will satisfy your requirements. If it seems that GParc is not the right answer for you, please contact the GenePattern team at gp-help(at)broadinstitute.org to begin discussing the possibility of releasing your module on the GenePattern public server. When you contact us, please provide your code and any documentation you currently have.

Why am I getting the "none of the gene sets passed the size thresholds" error in GSEA?

There are several points you need to check in your gene sets. Check that your gene identifiers are all uppercase if you are not using the collapse to gene symbols option. For other information, please see the error 1001 FAQ for GSEA for the list.

I keep getting file errors when I run a module. What are common reasons for file errors that I can check?

There are several things you can check in your files that commonly cause file errors:

Do your files/directories have spaces in their names?

Remove them or replace them with _ (underscore) or periods.

Are there characters such as parentheses or pound signs (#) in your file names?

Remove them or replace them with _ (underscore) or periods.

What type of file is the module expecting? Is your file the correct type?

Check the module documentation for more information.

Does the file have the correct extension for the file type the module is expecting?

Sometimes Excel or similar programs can add a ".txt" or other extension to the file name. Remove it (rename the file on your desktop and delete the .txt extension) and make sure the file name ends with the correct extension.

Is your data delimited in the way the module expects it to be?

Check the module documentation to see if it expects tab-, comma-, or space-delimited data (or something else), then make sure your file is formatted appropriately.

Did you edit your file in Excel or similar program?

Such applications can sometimes add extra spaces or tabs. Open your file in a text-editing application and look for these extra invisible characters that can cause errors.

Do the contents of your file match the expected file format?

Check that your file contains all the expected columns and header information in the expected order for the given format. See File Formats.

When you install GenePattern 3.4.0 or earlier and select the option to install icons in the dock during install, the icons will not appear in the dock. They only appear there when the server is running. You can, however, manually place them there.

Why am I seeing a "Process existed with status code: 138" error when I try to run ConsensusClustering?

The ConsensusClustering module does not work with Java 1.6.0_33 on Macintosh. As a workaround, you can run ConsensusClustering on the GenePattern public server, or on a server that is on a Windows machine or a Macintosh with a Java version other than 1.6.0_33.

I got this error when running GISTIC_2.0 "All input data were removed after NaN processing", what does it mean?

GISTIC expects that the segments for a sample should cover almost all of its genome, even the regions where the copy number is normal. Any gaps in coverage for any sample are removed from the GISTIC analysis.

How do I find reference genomes to use in TopHat, Bowtie, or BWA?

The TopHat, Bowtie, and BWA GenePattern modules provide easy access to the reference genome index bundles for a number of species. If we aren't yet hosting the index for the species you need, you can email us at gp-help@broadinstitute.org and we will add your species to the available indexes, or you can find additional reference genome bundles for other species are available from the Illumina iGenomes website. Note that the GenePattern modules cannot use the iGenomes bundles directly as packaged there. It will be necessary for you to unpack the bundle and repackage the pertinent files (for example, the Bowtie2 Index files) as a ZIP archive. Remember that there are some special considerations for creating ZIP archives for use in GenePattern.

The modules can usually accept an FTP URL directly wherever a file input is allowed, so there is no need for you to download the reference file; instead, just copy and paste the file's FTP URL into the file input parameter.

How can I create a GISTIC markers file for my segmented data file?

The best way to create a markers file for your data ( so that it matches correctly) is to take the first 3 columns from the copy number file you used as input to the segmentation method used to create the seg file for GISTIC.

It is likely that you installed the Java 8 JRE via your browser, which allows you to run Java apps, but is not sufficient for running GenePattern. You need the Java JDK.https://java.com/en/download/manual.jsp An easy way to see what jdk you have installed is to bring up a terminal window and type "java -version" (without the quotes).