Tip:
Highlight text to annotate it
X
Hello.
My name is Mariam.
And today, I will be showing you how to add a custom genome
to the IGV Browser.
So IGV is Integrative Genomic Viewer, and it is a high
performance visualization tool for interactive exploration of
large data sets.
It supports a wide variety of data types, including
array-based or next-generation sequence data, as well as
genomic annotations.
You can download the IGV Browser to your Mac or Windows
computer from the website broadinstitute.org.
Because it is a Java tool, it installs very
easily in your computer.
And it creates an icon in your desktop that you can later use
to launch the application.
I'm using IGV Version 2.2 for this video.
Let's go to the application.
So IGV allows you to display data from various file types,
including mapping files in SAM and BAM format, bigWig,
annotation files, [? BED, ?]
GFF, [INAUDIBLE] data from GWAS studies after processing
with PLINK, variant data in [? DCF ?] format, and more.
Now. the developers of IGV have done a great job of
adding a good number of genomes to their server for
you to use.
For those who work on human, they are in luck, because as
you download the IGV, by default, you will find the
human genome in the menu.
Now, if you would like to bring in a different genome
from there server, you can go to the Genome menu and select
Load Genome from Server.
There will be a list of genomes, those that are
perhaps used the most, such as the various human and mouse
and dog and cow, et cetera.
But you also notice that if you're working with a
bacterial genome, a viral genome, may not find it in
their server.
This is why instead of picking the genome from here, I will
show you how to bring in your own custom genome sequence and
annotation file to create your own genome to be using IGV
with your own sequence data.
So let's start by showing you the genome that I
will bring in to IGV.
I decided to show you a genome I was working with, which is
the HCV hepatitis C genome.
The sequence is [INAUDIBLE]
under the accession number NC004102.
If you go to the NCBI and select to download a
[? FASTA ?] file, you end up with this file type that has
one line at the top with the genome name, the accession
number in this case--
which you can rename if you want-- and then the lines
below are all the nucleotide sequence
available for that genome.
It could be a [? multi-FASTA ?].
There's multiple chromosomes, for example, and that will be
taken well by the program and interpreted as separate
chromosome, as if it were the human, in this case, that has
different chromosomes.
But in the case of a genome like HCV,
there's only one sequence.
OK.
So here, we have the chromosome, or the HCV, in
this [? FASTA ?] file.
Now, I also would like to show you the
annotations for this genome.
From the [? GenBank ?]
file, I'm able to see that there are several features
[? plotting ?]
[? products ?]
and the coordinates for each of these.
For example, there's a core protein,
E1 protein, E2 protein.
The NS5B, the last one annotated here, starts after
base pair 7,601 and ends at base pair 9,274.
Notice that there are various columns.
The first column is the genome name, the second column is the
start coordinate for the feature, the third column, the
ending coordinate, the name, a score, which, in this case, I
labeled S, 0, but if you had any other values, you could
use them here.
And the last column, it's a plus [? four ?] a feature in
the plus strand or minus if you were in the minus strand.
This is called a BED format file, and again, it is
[INAUDIBLE]--
pretty simple.
And the minimum information you need to have is the
chromosome name--
the same chromosome name that you have in the [? FASTA ?]
file, the starting and the ending coordinates.
So now that I have these two files, I can go to IGV and I
can import the [? FASTA ?] as well as the annotation file
and create my new genome.
Let's do that.