Batch Execution in Genepattern 3.3.3

GenePattern 3.3.3 introduces batch processing, providing an automated method for running a module or pipeline over a set of input files. Any number of input file parameters can be specified as batch inputs. In this tutorial we will walk through the steps to submit a batch job in GenePattern. There are two ways to specify a batch input for a module or pipeline. One way is to specify a server file path to a directory in the input file field. To use this method, server file paths must be enabled on your server, and the files you want to use much be directly accessible by the server. More information about server file paths can be found at GenePattern.org, in the GenePattern Integration Guide. Note that server file paths are not enabled on the public server for security reasons. We will discuss batch processing in that context shortly. For now, however, let’s get back to server file paths. To specify a path to a directory, you must first click the "Specify URL" radio button. You can then specify the path to the directory which contains the files you wish to use for the batch process. If you are doing batch execution over multiple parameters, specify a directory for each parameter, as appropriate. To indicate that a job should be run for each file in the input directory make sure to check “Batch Parameter”, next to the input field. For each file in the input folder which matches the input type for the module, a new GenePattern job will be submitted. If you have submitted multiple batch parameters which require pairing, jobs will only be submitted for valid pairs of files. This is determined by naming convention, by this we mean that files that are to be paired must have the same name, excluding file extension. For this tutorial we will submit one directory of files. After we click Run, our batch job results in one analysis job for each valid input file type in the specified directory. All other parameter values will remain the same for each job in the batch. Note that this job results view is a filtered version of the Job Status page and therefore does not automatically refresh. To see the current status of your jobs, refresh your browser window. The second method for submitting batch jobs is to use a directory in the Uploads Tab. Note that this is the method to use when submitting batch jobs on the GenePattern public server. To do so, we must first upload a set of files to a subdirectory in the tab. Click on the blue arrow next to the folder in which to create a subdirectory. For this tutorial, we’ll create a subdirectory in the top uploads directory. Next we’ll upload files to our directory from our local machine or a networked drive. Once the files have completed uploading we can select a module or pipeline to run in batch processing. When the module run page is displayed, we’ll use our directory in place of a file for the module's input file parameter. In the Uploads tab on the right side of the page, locate the directory containing the files to use as batch input for that module. Then, click the blue arrow button to the right of the directory name, and select Send Batch to… for the appropriate parameter. At this point the process is the same as it was with server file paths; after clicking run, jobs are launched for each of the valid input file types and the Batch Job Status page is displayed. Output from a batch job is the same as output from any other GenePattern job and can be used as any other output in GenePattern. More information about Batch Parameters can be found in the indepth article “Batch Processing in GenePattern 3.3.3” on the National Cancer Institute’s Wiki : wiki.nci.nih.gov. More information about GenePattern can be found at genepattern.org.