Tip:
Highlight text to annotate it
X
GenePattern 3.3.3 introduces batch processing, providing an automated method for running
a module or pipeline over a set of input files. Any number of input file parameters can be
specified as batch inputs. In this tutorial we will walk through the steps to submit a
batch job in GenePattern. There are two ways to specify a batch input
for a module or pipeline. One way is to specify a server file path to a directory in the input
file field. To use this method, server file paths must be enabled on your server, and
the files you want to use much be directly accessible by the server. More information
about server file paths can be found at GenePattern.org, in the GenePattern Integration Guide. Note
that server file paths are not enabled on the public server for security reasons. We
will discuss batch processing in that context shortly. For now, however, let’s get back
to server file paths. To specify a path to a directory, you must
first click the "Specify URL" radio button. You can then specify the path to the directory
which contains the files you wish to use for the batch process. If you are doing batch
execution over multiple parameters, specify a directory for each parameter, as appropriate.
To indicate that a job should be run for each file in the input directory make sure to check
“Batch Parameter”, next to the input field. For each file in the input folder which matches
the input type for the module, a new GenePattern job will be submitted. If you have submitted
multiple batch parameters which require pairing, jobs will only be submitted for valid pairs
of files. This is determined by naming convention, by this we mean that files that are to be
paired must have the same name, excluding file extension. For this tutorial we will
submit one directory of files. After we click Run, our batch job results
in one analysis job for each valid input file type in the specified directory. All other
parameter values will remain the same for each job in the batch.
Note that this job results view is a filtered version of the Job Status page and therefore
does not automatically refresh. To see the current status of your jobs, refresh your
browser window. The second method for submitting batch jobs
is to use a directory in the Uploads Tab. Note that this is the method to use when submitting
batch jobs on the GenePattern public server. To do so, we must first upload a set of files
to a subdirectory in the tab. Click on the blue arrow next to the folder
in which to create a subdirectory. For this tutorial, we’ll create a subdirectory in
the top uploads directory. Next we’ll upload files to our directory
from our local machine or a networked drive. Once the files have completed uploading we
can select a module or pipeline to run in batch processing.
When the module run page is displayed, we’ll use our directory in place of a file for the
module's input file parameter. In the Uploads tab on the right side of the
page, locate the directory containing the files to use as batch input for that module.
Then, click the blue arrow button to the right of the directory name, and select Send Batch
to… for the appropriate parameter. At this point the process is the same as it
was with server file paths; after clicking run, jobs are launched for each of the valid
input file types and the Batch Job Status page is displayed.
Output from a batch job is the same as output from any other GenePattern job and can be
used as any other output in GenePattern. More information about Batch Parameters can
be found in the indepth article “Batch Processing in GenePattern 3.3.3” on the National Cancer
Institute’s Wiki : wiki.nci.nih.gov. More information about GenePattern can be
found at genepattern.org.