Analyse Your Sequence Variants with The Vep - Web interface

Hi I'm Denise from the Ensembl team and I will show you how to analyse sequence variants using VEP. I've got a list of six SNPs that vary between cases and controls from my sequencing study. Note that although my example contains six variants only I can use VEP for thousands of variants. I will use VEP, the Variant Effect Predictor, to find out the functional consequences of my list of six variants on genes and transcripts annotated in Ensembl. Go to the main page of Ensembl. www.ensembl.org To get to VEP I can either click on tools or on the VEP icon over here. This page describes VEP and gives a few examples of what you can use VEP for On the right hand side we list the two different ways to access the Variant Effect Predictor: through our web interface and through our stand-alone Perl script. Click on the documentation links for more information. From this page you can also launch the web interface and download the latest version of our VEP Perl script. Let's now have a closer look at the left hand side menu of this page. We've got four links there the web interface, VEP script data formats and FAQ. Let's click on the data formats to explore the different input formats that VEP accepts such as: a list of variant identifiers, VCF files and HGVs coordinates. You can also view some examples of these different formats for both SNPs and CNVs. Let's now launch the web interface. You can run VEP for any species in Ensembl. My variants were described in humans so I will choose human as my species. But you can use VEP for any other species in Ensembl. There different formats accepted by VEP. When I choose any of these formats a different example will appear in my VEP form. Let's choose the Ensembl default format for VEP. This format is simply a list of genomic coordinates, alleles of variants and strand information. In this default example we have one SNP, one indel and a larger structural variant. Let's paste in the data from my sequencing project which contains six variants, all of them SNPs. Note that I can upload my file too. Hover over the question mark to find out this size limit for file upload which is 50 MB. I can also provide a URL containing my variants. Let's now have a look at the output options for VEP before running the job. There are three options with a brief description next to each of them: identifiers and frequency data, extra options and filtering options. Let's expand to the first one. Hover over the question mark for brief definition. The VEP can also tell me whether or not my variants are known and provide a dbSNP ID. This option is on by default. If I click on the drop down menu I can choose to view them and compare the alleles. If you don't want this information simply select No. Let's now move on to the extra options. Expand the field to see the different data available. Some of them are already selected by default such as: transcript biotype. SIFT and PolyPhen predictions and information on regulatory regions. Click on the drop down menus if you want to change the options for SIFT, PolyPhen and Get regulatory region consequences. Finally there are also different options for filtering. Let's expand this field. If my variants are co-located with known variants from dbSNP, for example, I can choose to see the frequency obtained from the 1000 Genomes Project. Let's do that. Click on advanced filtering and from the drop down menu select Include only to filter for variants with minor allele frequency greater than, for example, 1%. You can change this frequency if you want to. I will change it to 0.1`% The last thing to do now is to choose which population from the 1000 Genomes Project I want to get data for. This can be done via the drop down menu. I will leave it as the 1000 combined population. For more details on the samples from the 1000 Genomes Project have a look at their website. I'm now ready to run the job. Let's click on View results and look at the summary statistics first. These show, among other things, the number of overlapped genes and transcripts, how many of my variants are novel and known and whether or not my variants overlap regulatory features annotated by Ensembl. Let's have a look at the pie charts now. The first one shows all possible consequence terms from the sequence ontology project. Have a look at our documentation page for a list of all consequence terms. The consequence terms are colour-coded and this is consistent across all variation pages in the Ensembl browser website. The second pie chart shows a breakdown of all different consequence terms under the coding category. For my variants the only coding consequence type is missense_variant. Please refer to the Sequence Ontology website for a complete list of other types of coding consequence terms. Let's now have a look at the results table. You can collapse the summary statistics to view this table, or you can simply scroll down in the page. The results are returned on a table format which can be downloaded as VCF, VEP or TXT format. Hover over the (i) for more information how to navigate the results table. I can choose to view all variants at once, by clicking on all. If I click, for example, on the number 1, I will view only one variant position at a time. Then navigate to the next position by clicking on the arrow. Let's go back to all. For each position VEP reports the genes and features that my variants overlap with. Click on the Ensembl gene ID and Ensembl transcript ID for a pop-up window with extra information. Please note that for single variant many different consequence terms can be assigned. This is due to the different alternatively spliced transcripts that have been annotated. Now scroll along with the page to see additional columns in these results table. You can see, for example, SIFT and PolyPhen algorithims that predict how deleterious a missense variant might be. Click on the location link to jump to the Region in detail view of the Ensembl browser. You can also show and hide columns in the table if you want to add and remove them. The existing variation column will show rs IDs already available in dbSNP. For my six variants only two have them are known. That information is also available from the summary statistics. I can also filter my results from this table just hover over the (i) for more information. From the drop-down menu I can choose any filters such as consequence and, for example, 3' UTR variant. Click on Add. The filtered data can be downloaded as VCF VEP or TXT formats. I can edit the filters or remove them. Click on the spreadsheet icon to download the entire table or only what I see. I can share these results by clicking here. And I can also download this view as CSV. If I want to run another job just click on Variant Effect Predictor and then click on the New VEP job. Previous jobs will be maintained. You can save, edit previous jobs or delete them from this table. That's all folks! Have a look at our VEP documentation and if you have any questions just drop us a line. Thank you very much for watching and happy VEPing!