Using Igv Browser for Variants And next Generation Sequencing - Part 1

Hello. My name is Mariam. And today, I will be showing you how to add a custom genome to the IGV Browser. So IGV is Integrative Genomic Viewer, and it is a high performance visualization tool for interactive exploration of large data sets. It supports a wide variety of data types, including array-based or next-generation sequence data, as well as genomic annotations. You can download the IGV Browser to your Mac or Windows computer from the website broadinstitute.org. Because it is a Java tool, it installs very easily in your computer. And it creates an icon in your desktop that you can later use to launch the application. I'm using IGV Version 2.2 for this video. Let's go to the application. So IGV allows you to display data from various file types, including mapping files in SAM and BAM format, bigWig, annotation files, [? BED, ?] GFF, [INAUDIBLE] data from GWAS studies after processing with PLINK, variant data in [? DCF ?] format, and more. Now. the developers of IGV have done a great job of adding a good number of genomes to their server for you to use. For those who work on human, they are in luck, because as you download the IGV, by default, you will find the human genome in the menu. Now, if you would like to bring in a different genome from there server, you can go to the Genome menu and select Load Genome from Server. There will be a list of genomes, those that are perhaps used the most, such as the various human and mouse and dog and cow, et cetera. But you also notice that if you're working with a bacterial genome, a viral genome, may not find it in their server. This is why instead of picking the genome from here, I will show you how to bring in your own custom genome sequence and annotation file to create your own genome to be using IGV with your own sequence data. So let's start by showing you the genome that I will bring in to IGV. I decided to show you a genome I was working with, which is the HCV hepatitis C genome. The sequence is [INAUDIBLE] under the accession number NC004102. If you go to the NCBI and select to download a [? FASTA ?] file, you end up with this file type that has one line at the top with the genome name, the accession number in this case-- which you can rename if you want-- and then the lines below are all the nucleotide sequence available for that genome. It could be a [? multi-FASTA ?]. There's multiple chromosomes, for example, and that will be taken well by the program and interpreted as separate chromosome, as if it were the human, in this case, that has different chromosomes. But in the case of a genome like HCV, there's only one sequence. OK. So here, we have the chromosome, or the HCV, in this [? FASTA ?] file. Now, I also would like to show you the annotations for this genome. From the [? GenBank ?] file, I'm able to see that there are several features [? plotting ?] [? products ?] and the coordinates for each of these. For example, there's a core protein, E1 protein, E2 protein. The NS5B, the last one annotated here, starts after base pair 7,601 and ends at base pair 9,274. Notice that there are various columns. The first column is the genome name, the second column is the start coordinate for the feature, the third column, the ending coordinate, the name, a score, which, in this case, I labeled S, 0, but if you had any other values, you could use them here. And the last column, it's a plus [? four ?] a feature in the plus strand or minus if you were in the minus strand. This is called a BED format file, and again, it is [INAUDIBLE]-- pretty simple. And the minimum information you need to have is the chromosome name-- the same chromosome name that you have in the [? FASTA ?] file, the starting and the ending coordinates. So now that I have these two files, I can go to IGV and I can import the [? FASTA ?] as well as the annotation file and create my new genome. Let's do that.