Tip:
Highlight text to annotate it
X
I'M ALSO GOING TO BE PRESENTING THE WORK OF MY COLLABORATORS AND
TALKING ABOUT PLATFORMS AND THE USUAL DISCLAIMERS.
WHEN I SAY VARIANT, THAT MEANS ANY DIFFERENCE --
--
MIGHT END UP BEING A PATHOGENIC VARIANT BUT YOU HAVEN'T
IDENTIFIED THAT YET. I'M GOING TO START MENTIONING A
FEW THINGS ABOUT PROJECT DESIGN, HOPEFULLY NOT OVERLAPPING TOO
MUCH WITH THE GREAT TALKS THIS MORNING.
THE CORE OF THE TALK WILL BE DISCUSSIONS ON HOW TO INTEGRATE
OTHER TECHNOLOGIES WITH EXOME ANALYSIS TO IMPROVE CHANCES OF
SUCCESS, IN PARTICULAR SNP ARRAYS AND ALSO PHENOTYPE AND
FAMILY HISTORY DATA AND AT THE END I'LL TALK ABOUT VALIDATION
AND ADD COMMENTS TO WHAT HAS BEEN TALKED ABOUT BEFORE AND THE
SAME FOR REANALYSIS FOR UNINFORMATIVE DATASETS.
BECAUSE OF MY BACKGROUND, THE EXAMPLES I'M GOING USE ARE GOING
TO BE FOR MANDELIAN INHERITED DISEASES AND HIGH PENETRANTS,
SMALL NUMBER OF GENES IN HUMANS, HOPEFULLY THERE WILL BE SOME
OVERLAP IF YOU WORK WITH DIFFERENT TYPES OF PROJECTS AND
SOMEBODY'S PHONE IS UP HERE -- THAT'S OKAY.
I DON'T THINK IT'S MY FINANCIAL ADVISOR.
AND THE BASIC PROBLEM YOU USUALLY WHEN YOU GO TO LOOK FOR
THE NEEDLE IN THE HAY STACK, END UP WEAPON A BIG PILE OF NEEDLES
AND YOU NEED TO PICK WHICH ONE IS YOUR DISEASE-CAUSING VARIANT.
SO HOW CAN YOU IMPROVE YOUR CHANCES OF SUCCESS?
THERE HAVE BEEN A LOT OF EXAMPLES THIS MORNING OF HOW
CAREFUL PROJECT SELECTION CAN IMPROVE THE OUTCOME OF AN EXOME
PROJECT AND I'LL GIVE AN EXAMPLE.
YOU CAN USE PARALLEL ANALYSIS. I'M GOING TO USE THE SNP
CHAMPIONSHIP ARRAY AS AN EXAMPLE.
BUT THERE ARE LOTS OF THINGS YOU CAN CONSIDER INCLUDING
PHENOTYPING AND EXPRESSION ANALYSIS.
SO LOTS OF VARIABLES TO CONSIDER IN EXPERIMENTAL DESIGN IN LIFT
EPING TO THE QUESTIONS BEFORE THE BREAK, IT WAS CLEAR THAT
PEOPLE ARE, SOME PEOPLE ARE STARTING EARLY IN THE PROCESS OF
SEQUENCING DATA COLLECTION AND OTHER PEOPLE ARE ANALYZING DATA
THAT HAS BEEN OBTAINED THROUGH A COLLABORATOR AND EVEN ANALYSIS
CAN BE BROKEN INTO A BUNCH OF DIFFERENT PIECES.
BEFORE YOU START THE PROJECT IT'S WORTH THINKING ABOUT HOW TO
CARRY OUT ALL THESE DISPENSE WHICH THINGS YOU'RE GOING TO DO
YOURSELF AND THEN COLLABORATIVELY.
WE SEE MANY FAMILIES AND ONLY HAVE THE RESOURCES TO APPLY
EXOME SEQUENCING TO A FEW OF THEM.
WE HAVE A TOOL THAT WE USE TROUGH PRIORITIZE THE FAMILIES.
THE PHENOTYPE MORE MULTIFACTORIAL OR IS IT GENETIC,
ONSET, SEVERE? IS THE MATERIAL AVAILABLE SINGLE
INDIVIDUAL, OR FAMILY. IS IT A PHENOTYPE THAT OVERLAPS
WITH COMMON CONDITIONS OR SEVERE AND COMPELLING.
DOES THE FAMILY SHOW MULTIAPPROXIMATELY INDIVIDUALS
OR ONLY ONE? WE NEED TO USE THIS KIND OF
RUBRIC BECAUSE OUR FAMILIES, WE DON'T HAVE A LARGE PEDIGREE WITH
LOTS OF PEOPLE ON A CLEAR PATTERN OF INHERITANCE.
SO FOR ALL THE TYPE OF ANALYSIS THAT YOU MAY ADD TO AN EXOME
PROJECT, THAT INCLUDES THE FILTERING THINGS WE HEARD ABOUT
EARLIER, AND ALSO THE THINGS I WILL TALK ABOUT TODAY IS WORTH
ASKING A FEW QUESTIONS. I'M GOING TO USE THE SNP ARRAY
DATA TO EXPLORE SOME OF THOSE AND AS I MENTIONED, WE'LL
INCLUDE SOME OTHER EXAMPLES AND I WOULD LIKE TO TOUCH ON AT THE
END USING ACCUMULATED DATE FOR MULTIPLE EXOME PROJECTS
SOMETHING THAT CAME UP DURING THE QUESTION SESSION.
SO IF YOU THINK ABOUT THE CRITERIA YOU MIGHT USE TO DECIDE
WHETHER A FILTER IS WORTH DOING, OR AN ADDITIONAL ANALYSIS OF THE
DATA OR DNA AND WORTH DOING, YOU FIRST WANT TO ASK HOW MUCH OF
THE VARIANT LIST WILL BE REDUCED.
IT'S WORTH THE TROUBLE OF DOING THAT STEP.
AND THE SECOND THING YOU WANT TO ASK IS HOW ERROR PRONE IS IT?
THERE IS A NICE PRESENTATION ABOUT TRYING TO GET A SENSE FOR
WHETHER TRUE VARIANTS WERE THROWN OUT OR FALSE VARIANTS
THAT YOUR ANALYSIS IS DESIGNED TO EXCLUDE OR IN FACT INCLUDED.
AND WE HEARD ABOUT EXAMPLES OF THAT, FOR INSTANCE, DB SNP CAN
BE A POWERFUL TOOL USED IN A LOT OF STUDIES BUT IT CAN FALL INTO
BOTH OF THOSE ERROR CATEGORIES PRETTY EASILY WHEREAS
SEGREGATION FILTERING, IF YOU HAVE HIGH QUALITY DATA AND
CORRECT GENETIC MODEL HAS VERY FAVORABLE CHARACTERISTICS.
SO LET'S DIVE INTO TALKING ABOUT SNPS SINCE WE'LL TALK ABOUT SNP
ARRAY. A #IS A SINGLE BASE IN A DEFINED
GENOMIC POSITION. THE EXACT NUCLEOTIDE VARIES IN
THE POPULATION AND THE LOCATION ITSELF IS DEFINED BY CONSERVED
SEQUENCES NEARBY. AND YOU MAY BE FAMILIAR WITH
THIS LOGOTYPE OF DISPLAY WHERE THE HEIGHT OF EACH ONE OF THESE
POSITIONS IS PROPORTIONAL TO THE CONSERVATION.
THIS IS THE ONE END OF THE MAMMALIAN SPLICE SITE AND THE G
AND T ARE HIGHLY CONSERVED WHEREAS YOU CAN SEE THERE ARE
TWO NUCLEOTIDES THAT ARE COMMONLY FOUND AT THIS SITE.
AND THE MOST COMMONLY ALLELE IS CALLED THE A ALLELE OR MAJOR
ALLELE AND THE LESS COMMON IS THE MINOR OR B ALLELE AND A
PERFECT SITE WOULD HAVE LOTS OF HIGHLY CONSERVED SEQUENCE TO
DEFINE THE POSITION OF THE SNP AND A SNP WITH TWO NUCLEOTIDES.
SO THE WAY THAT ASSAYS FOR SNPS ARE CARRIED OUT IS THAT THE
TECHNIQUE WHICH I'M NOT GOING GOING TO EXPLORE IN DETAIL USES
A DIFFERENT FLOR FORFOR THE TWO NUCLEOTIDES.
THIS IS A GRAPH FROM THE STUDIO WHICH IS A TOOL WE USE TO
ANALYZE THIS TYPE OF DATA. ALONG THE LEFT IS THE INTENSITY
FOR THE B ALLELE AND ALONG THE BOTTOM IS THE A ALLELE.
EACH ONE OF THESE SPOTS REPRESENTS A INDIVIDUAL PERSON.
SO THIS GRAVIS FOR ONE SNP WITH MULTIPLE INDIVIDUALS.
AND BY QUANTITATING THE FLUORESCENCE, YOU CAN CREATE A
GENOTYPE. SO THIS IS ALL B, THIS IS ALL A
AND THIS IS ABOUT HALF INTENSITY OF EACH OF THOSE.
SO THOSE ARE CALLED HETEROZYGOTES.
AND THOSE SAME PRINCIPLES CAN BE USED TO DEFINE HEMIZYGOTE SPOTS,
DUPLICATIONS OF VARIOUS COMBINATIONS AND PLACE WHERE IS
THERE IS TOTAL DELETION. SO THIS IS PROBABLY THE MOST
COMPLICATED SLIDE SO BEAR WITH ME.
THIS IS A DISPLAY FROM GENOME STUDIO AND TWO THINGS TO POINT
OUT BEFORE EACH DOT WAS FROM SNP IN THIS CASE, THE DOTS ARE A
SERIES OF SNPS FROM LEFT TO RIGHT.
THERE IS ONLY TWO PATIENTS HERE. THE ONE WITH THE BLUE AND THE
ONE WITH THE YELLOW. THERE ARE TWO IMPORTANT PLOTS TO
KNOW ABOUT. THE TOP
IS THE -- IF THE REFERENCE INTENSITY AND
THE TOTAL INTENSITY ARE THE SAME IT'S A LOG OF ONE OVER ONE OR
ZERO AND THEY CLUSTER ALONG THE ZERO LINE HERE AND IF YOU HAVE
OR IF THE LOG IS .5 OVER 1, THEN YOU GET A DECREASE AND YOU CAN
SEE ALONG THE BOTTOM HERE THAT THESE ARE SLIGHTLY SHIFTED DOWN
AND IF WE GO TO THE NEXT SLIDE, THAT IS BLOWN UP A LITTLE BIT SO
THE B ALLELE PLOT SHOWS NO HETEROZYGOTES AND THERE IS A
SHIFT HERE FOR SINGLE COPY DELETIONS AND THIS SAME
TECHNIQUE CAN BE USED FOR DOUBLE COUPE DELETIONS AND FOR
DUPLICATIONS. SO, WHY SHOULD YOU INCLUDE SNP
CHIP ANALYSIS ALONG WITH YOUR EXOME ANALYSIS?
AS YOU SAW THIS MORNING, THE SHORT READS FROM SNP OR FROM
EXOME ANALYSIS TEND TO PILE UP OVER REGIONS OF INTEREST OVER
THE EXON HERE AND FOR THE SNP, THESE ARE MUCH LESS DENSE, THEY
ARE SPREAD OUT ACROSS THE GENOME.
THIS IS EASIER TO SEE IF YOU LOOK AT CHROMOSOMES NOW AND AT
EVERY PLACE EXCEPT FOR SOME SENT MERIC REGIONS, THERE IS A DENSE
COVERAGE OF SNPS THAT COVER MOST ALL OF THE GENOME.
AND SO THIS REALLY PROVIDES A SURVEY OF THE STRUCTURE.
THIS IS ONE WAY TO GET DOSAGE ABNORMALITIES.
IT'S CHEAPER THAN DOING GENOMES. SO FOR 200-300 DOLLARS A POP,
YOU CAN DOE SNP ANALYSIS ON FAMILIES WHERE COSTS MAYBE 5000
DOLLARS TO DO A GENOME. SO YOU CAN DO THIS ANALYSIS AND
COMBINE IT WITH YOUR EXOME ANALYSIS FOR LESS MONEY THAN IT
WOULD COST TO DO EVERYTHING WITH GENOME.
YOU CAN DETECT THINGS LIKE DOSAGE CHANGES.
YOU CAN DETECT CHROME SOMAL MOSAICISM AND --
[READING] THERE IS A QUESTION BEFORE THE
BREAK ABOUT A CASE OF DOING AN ANALYSIS IN SUCH A ANOMALOUS
CONTINUOUS MEMO ZYGOSITY. AND WE'LL TALK ABOUT EACH ONE OF
THESE. SO AS FAR AS DETECTING DOSAGE
CHANGES, YOU HAVE A COUPLE OF OPTIONS.
YOU CAN USE THE MANUFACTURER PROVIDED SOFTWARE AND LOOK AT
THE B ALLELE FREQUENCY. ANOTHER THING IS THERE IS A
PIECE OF SOFTWARE CALLED PEN ZNZ WHICH WILL AUTOMATICALLY DETECT
REGIONS OF DOSAGE CHANGES AND WILL PRINT OUT A LIST AND THAT
CAN BE INTEGRATED USING JAMIE'S BAR SISTER TOOL INTO A SET OF
INCLUDE AND EXCLUDE REGIONS TO COP BINE WITH YOUR EXOME DATA.
SO HERE IS AN EXAMPLE OF HOW THIS WAS USEFUL FOR ONE OF OUR
UNIQUE CASES. THIS IS A 10-YEAR-OLD MALE.
COMPLEX NEUROLOGIC PHENOTYPE AND WE GUESSED THAT THIS WAS GOING
TO BE AUTOSOMAL RECESSIVE. IT COULD HAVE BEEN DOMINANT AS
WELL AND WE APPLIED MULTIPLE FILTERS AS WE DISCUSSED THIS
MORNING AND DIDN'T FIND ANYTHING.
SO WE REANALYZED THE DATA WITH NEW FILTERING TOOLS THAT WERE
USING SOMETHING DEVELOPED BY SOMEONE IN OUR GROUP, VARMD.
WHICH AUTO MATES THE FILTERING STEPS AND ALLOWS YOU TO RELAX
SOME OF THE FILTERING CONSTRAINTS WE TALKED ABOUT THE
ROLE OF DOING THAT IN ITERATIVE ANALYSIS PROCESS.
AND WE FOUND A CANDIDATE. NOW THIS CANDIDATE HAD
ORIGINALLY BEEN THROWN OUT BECAUSE IT DIDN'T FOLLOW THE
RULES OF MENDELIAN SEGREGATION SO THE CHILD THAT HAS AA, LITTLE
A LITTLE A, COULD NOT HAVE GOTTEN THAT LITTLE A ALLELE FROM
THE MOTHER BECAUSE SHE DIDN'T CARRY IT.
AND THE GENETICIST IN THE CROWD WILL GUESS THAT IN FACT THE
MOTHER WAS NOT HOMOZYGOUS BUT HEMIWE HAVE GUS AND THE SNP
CHAMPIONSHIP DATA CONFIRMED THERE WAS A SMALL DELETION THE
MOTHER HAD SHE PASSED ON TO THE CHILD AND THIS WAS THE CAUSE OF
HIS DISEASE. SO DUPLICATIONS CAN RESULT IN
SUBTLE BUT IMPORTANT CHANGES IN GENE DOSAGE AND YOU CAN CRET A
BED FILE. SO, THIS TYPE OF --
YOU CAN SEE THERE IS MORE THAN THE NORMAL 3 POPULATIONS WE SEE
FOR THIS B ALLELE PLOT AND THIS SEPARATION OF THESE TWO
POPULATIONS OF SNPS IS BECAUSE IN FACT YOU HAVE ONE THAT IS
TRIPLE B, ONE IS AAB, ONE IS -- I'M SORRY, AAB, AAB AND AAA AND
THIS TURNS OUT TO BE A CASE OF MOW SAY SIMPLE AND THIS TYPE OF
MOW SAY SIMPLE CAN BE QUANTITATED FAIRLY ACCURATELY
EVEN MORE SO THAN YOU CAN DO WITH KARYOTYPES.
SO CONSIDER THE AFFECT OF MOW SAY SIMPLE ON SEQUENCING
QUALITY. HOMOZYGOUS AND HETEROZYGOUS BASE
CALLING USE THE RELATIVE PROPORTIONS OF SHORT SEQUENCE
READS WITH DIFFERENT GENOTYPES. SO IF YOU HAVE MONTH SAY SIMPLE,
IT WILL CHANGE PROPORTION AND WILL AFFECT THE QUALITY OF BASE
CALLING. THEY MAY INDICATE REGIONS OF
INTEREST IN THE GENOME AND MAY BE IMPORTANT SOMATICALLY
EVOLVING CELLS LIKE CANCER EXAMPLES WE SAW EARLIER.
I THIRD THING IS HOMOZYGOSITY MAPPING.
SO HERE IS A NORMAL B ALLELE PLOT.
YOU CAN SEE THAT THERE ARE A FEW GAPS WHERE THEY ARE AT THE
CENTROMERES BUT SHEAR A FAMILY THAT HAS MANY LARGE GAPS AND
THESE ARE REGIONS OF HOMOZYGOSITY AND I CAN HIGHLIGHT
THOSE BY PUTTING ARROWS IN THEM. SO THERE ARE MANY MORE THAN YOU
WOULD EXPECT TO SEE IN A FAMILY THAT WAS COMPLETELY OUTBRED AND
TOM WORK BY TOM IN OUR GROUP, IN HIS WORK HE TOOK A GROUP OF
FAMILIES WITH A KNOWN AMOUNT OF CONSEN 89ITY AND COMPARED THEM
TO THE TOTAL SUM OF LINEAR ELEVENTH OF HOMOZYGOTES IN THE
REGION. WITH THIS TYPE OF CALIBRATION
YOU CAN DETERMINE THE AMOUNT OF CONSEN BEGINNITY USING THE CHIP.
IF YOU WANT TO INCORPORATE THIS INTO YOUR DATA, YOU CAN USE THE
ILUMINA TO LOOK AT PLOTS AND THERE IS ALSO A TOOL CALLED,
LINK. AND MANY THE CAPABILITIES IS THE
ABILITY TO AUTO DETECT REGIONS OF HOMOZYGOSITY WHICH YOU CAN
INCORPORATE. HERE IS AN EXAMPLE OF A CASE,
EARLY UDP SUCCESSES WHERE WE FOUND A NEW DISEASE AND IN THIS
CASE, IT WAS THE PRESENCE OF A REGION OF HOMOZYGOSITY THAT
ALLOWED US TO FIND THE GENE. SO YOU CAN IDENTIFY THESE USING
B ALLELE PLOTS. YOU CAN LOOK AT JUST THE
VARIANTS THAT FALL WITHIN HOMOZYGOUS REGION AND IN FACT,
THAT IS PROBABLY MAKES UP THE BULK OF THE SUCCESSFUL EXOME
PAPERS THAT ARE AVAILABLE, IS EXOME PROJECTS DONE WITH
HOMOZYGOSITY MAPPING AND IT MAY ALTER THE PLANNING OF NEXT GEN
EXPERIMENTS AND TO BE FULLY FORTHCOMING, WE NEVER DID EXOME
ON THE PREVIOUS SLIDE. THAT STUDY BECAUSE WE FOUND A
REGION THAT WE COULD JUST GO AND LOOK AT THE GENE WITHOUT DOING
EXOME SEQUENCING. BUT IT CAN BE INCORPORATED INTO
WHICH VARIANTS YOU'RE LOOKING AT.
ALSO IN THINKING ABOUT THE CONSEN BEGINNITY, OPTIMUM IS
PROBABLY SECOND OR THIRD COUSINS OR FURTHER OUT.
IF THE FAMILIES ARE TOO RELATED, YOU END UP WITH TOO MANY REGIONS
AND THAT'S NOT A VERY EFFECTIVE FILTER FOR YOUR VARIANTS.
SO FAR TALKING ABOUT ANALYSIS WITH SNP HELPS YOU DOCK USING
INTENSITY MEASUREMENTS BUT YOU CAN ALSO USE BOOLEAN TOOLS, SUCH
AS THE ONE THAT JAMIE SHOWED YOU AND GENOME STUDIO HAS A SET OF
TOOLS FOR DOING THESE INQUIRIES WHERE YOU ASK IF A CERTAIN SET
OF SNPS FOLLOWS RULES YOU IMPOSE BASED ON CRITERIA.
AND THIS IS USUALLY BASED ON FAIRLY STRAIGHTFORWARD GENETICS.
SO IF THE MOTHER IS AB AND THE FATHER IS AA, THEN A CHILD WHO
IS AB HAD TO GET THE B ALLELE FROM THE MOTHER BECAUSE SHE WAS
THE ONLY ONE THAT HAD A B ALLELE TO GIVE.
AND IF AN ADJACENT LOCUS, THE SAME IS TRUE, THEN IF SOME
CHILDREN ARE AA AND SOME ARE AB, OR SOME ARE AAAB AND SOME ARE AB
AND AA AT THE SECOND, THEN A RECOMBINATION IS SUGGESTED AND
YOU CAN SEE THAT HERE. SO THE PARENTAL GENOTYPE IS ABAB
AND THE RECOMBINATION GENOTYPE IS AA/AB.
SO TO SET UP A FACILITY TORE CHECK FOR THESE IS FAIRLY
STRAIGHTFORWARD WITH A SMALL FAMILY.
AS YOU GET TO LARGER FAMILIES, YOU NEED MORE RULES TO
INCORPORATE ALL THE POSSIBILITIES.
SO I'D LIKE TO CONTRAST THIS A LITTLE BIT WITH FORMAL LINKAGE
ANALYSIS AND POSITIONAL MAPPING SUCH AS THAT, THAT WAS TALKED
ABOUT EARLIER. CLASSIC LINKAGE ANALYSIS USUALLY
USES FAIRLY ROBUST MARKERS, 10 REPEATS AND THOSE SORTS OF
THINGS. THEY TEND TO BE FEWER AND MORE
WIDELY SPADES. THERE ARE ARE 440 IN THE MOST
COMMON ABI SETS FOR INSTANCE. AND THE ANALYSIS MUST TAKE INTO
ACCOUNT THE CHANCE OF ADAM RECOMBINATION BETWEEN THE
MARKERS OR OTHER RECOMBINATION EVENTS.
WITH SNP-BASED LINKAGE MAPPING, THE MARKERS ARE LESS ROBUST.
THEY CAN BE UNINFORMATIVE AND THE SNP GENOTYPE CAN BE WRONG,
HOWEVER, YOU HAVE A MUCH HIGHER DENSITY OF MARKERS AND SO YOU
HAVE MANY ASSAYS TO TEST FOR RECOMBINATIONS AND AT THE END OF
THE DAY, THEY ARE DENSE ENOUGH IT MAKES THE CHANCE OF ADAM
RECOMBINATION BETWEEN THE INFORMATIVE MARKERS UNLIKELY.
SO YOU GET OUT A SLIGHTLY DIFFERENT KIND OF DATA.
THIS IS A GRAPH THAT SHOWS THE LOG OF THE ODDS RATIO SHOWN ON
THE LEFT-HAND SIDE BECAUSE THIS WAS DONE IN A SMALL FAMILY.
THE BLACK PLOT HERE, WHICH IS THE LOG OF THE ODDS RATIO, IT
DOESN'T GO UP TO THE HIGH SIGNIFICANTS LEVEL OF 3 YOU
WOULD USE TO CONFIRM A PLACE IN THE WHOLE GENOME BUT REALLY WHAT
I WANT TO DO IS CONTRAST THE FACT OF THIS CONTINUOUS PLOT
WITH THESE DISCRETE INTERVALS THAT YOU GET FROM RECOMBINATION
MAPPING USING A CHIP. AND WHEN YOU LOOK AT THESE CLOSE
UP, ONCE AGAIN, YOU CAN SEE THERE ARE THESE DISCRETE
INTERVAL THAT IS ARE DEFINED BY THE SNP-BASED RECOMBINATION MAP.
SO HERE IS AN EXAMPLE TO SHOW HOW WE USED.
... IF YOU DO YOUR PHENOTYPING, YOU
CAN MAKE A LIST OF DEVELOPMENTAL GENES AND LOOK AT THOSE.
SO HERE IS AN EXAMPLE OF WHERE PHENOTYPING HELPED US TO MAKE A
DIAGNOSIS. THIS IS A 19-YEAR-OLD FEMALE
WITH THE BRAIN OF A 19-YEAR-OLD FEMALE SLOWLY PROGRESSIVE
NEUROLOGIC DISEASE AND HER COURSE OF SUGGEST IS LYSOSOMAL
STORAGE DISEASE. HOWEVER, THAT HAD BEEN EXCLUDED
BY THE GOLD STANDARD OF ENZYMATIC TESTING.
EXOME SEQUENCING DETECTED CANDIDATE VARIANTS IN THAT GENE
SO THE COMBINATION OF THOSE MOLECULAR RESULTS PLUS THE FACT
THAT WE HAD A CLINICAL SUSPICION FOR LYSOSOMAL STORAGE DISEASE
PROMPTED US TO GO AND REDO THE GOLD STANDARD TESTING.
IN FACT, THAT HAD BEEN A INCORRECT CLINICAL RESULT AND
THIS PATIENT HAD ENZYMATIC ACTIVITY CONSISTENT WITH GENOME
1 GANG LOW SI DOSEIS. IN THIS CASE, IT WAS THE
COMBINATION OF THE PHENOTYPE AND CAREFUL PHENOTYPING AND
SUSPICIONS THAT GENERATED PLUS THE MOLECULAR DATA THAT HELPED
US. ANOTHER VERY HEALTHY AND ONGOING
DEBATE WE HAVE IN OUR LABORATORY IS TO WHETHER SINGLE EXOMES ARE
SMALL PED GREASE FOR OUR FAMILY. CLEARLY THEY ARE LESS EXPENSIVE,
FEWER TOOLS REQUIRED BUT YOU WILL GENERATE MORE VARIANTS AND
I HOPE TO SHE YOU THAT. A SMALL PEDIGREE IS MORE
EXPENSIVE AND ADDITIONAL TOOLS AND EXPERTISE ARE REQUIRED BUT
YOU GET FEWER CANDIDATE VARIANTS.
AND THE FILTRATION THAT YOU CAN DO USING THAT PEDIGREE
INFORMATION HAS A LOW ERROR RATE IF YOU HAVE A CORRECT GENETIC
MODEL AND HIGH QUALITY DATA. YOU CAN THINK OF TWO WAYS OF
USING THAT FAMILY DATA. YOU COULD ASK ME, WHY NOT JUST
DO A SNP CHAMPIONSHIP AND DO THE RECOMBINATION DATA?
IT'S NOT THE SAME SET OF THE VARIANTS YOU EXCLUDE BY FORCING
INDIVIDUAL VARIANTS TO FOLLOW SEGREGATION THAT HAS MANDELIAN
CONSISTENCY. YOU GET A LARGER SET EXCLUDED BY
DOING BOTH OF THESE THINGS THAN BY JUST DOING ONE OF THEM.
THIS GRAPH HAS PAIRED RESULTS. SO FOR EVERY LINE UP IN THIS RED
SECTION, THERE IS ANOTHER LINE DOWN IN THE BLACK SECTION.
ALL OF THE RED TRACES ARE THE PATH FILTERING FROM THE
BEGINNING AND ENDING OF THE PROJECT USING ONLY THE PROBE BAN
WHEREAS THE BLACK TRACES ARE USING A PRO BAN PLUS THE FAMILY
MEMBERS OF ON THE LEFT-HAND SIDE IS THE LOG ACCUMULATIVE NUMBER
OF POST FILTRATION VARIANTS AND ALONG THE BOTTOM OF ARE THE
FILTERING STEPS. IMPORTANTLY, THESE LAST TWO ARE
CONTINUOUS COLUMNS. THESE ARE THE HETEROZYGOTES AND
THESE ARE THE HOMOVIE GOATS. WHEN YOU ONLY USE THE PRO BAN
FOR THESE FAMILIES, WE HAVE BETWEEN 100 AND 1000 CANDIDATE
VARIANTS WHEN WE INCLUDED THE FAMILY INFORMATION, WE GOT
AROUND 10, SOMETIMES A LITTLE BIT MORE AND SOMETIMES LESS.
THIS IS GOING TO VARY WITH THE PARTICULAR PROJECT AND CHEMISTRY
YOU'RE USING. I WANTED TO GIVE YOU EVIDENCE
THAT YOU GET IMPROVED FILTRATION BY INCLUDING EXTRA FAMILY
MEMBERS. AND HERE IS ANOTHER WAY TO LOOK
AT THE SAME THING. SO HERE IS LOG OF NUMBER OF
VARIANTS. EACH ONE OF THESE IS A DIFFERENT
EXOME PROJECT. AND I'D LIKE YOU TO JUST FOCUS
ON THE RED COLUMNS. AND THE GENERAL PATTERN I'D LIKE
TO POINT OUT IS THE SMALLER SIMPLER FAMILIES WITH TRIOS
ENDED UP GENERATING BETWEEN 100 AND 1000 VARIANTS ON AVERAGE AND
THE ONES THAT HAD MORE FAMILY MEMBERS INCLUDED GENERATED FEWER
VARIANTS. SO, OVER ALL, THIS TECHNIQUE CAN
BE A VERY POWERFUL WAY OF FILTERING YOUR DATA.
SO AT THE END, WE WOULD SAY, USE EXOME DATA WHEN YOU HAVE ANY
OTHER CLUES AVAILABLE OR SINGLE EXOME, SO IF YOU HAVE A PATHWAY,
IF HAVE YOU GOT A HOMOZYGOUS REGION YOU MAPPED, IF YOU HAVE A
GENE LIST, BY ALL MEANS, CONSIDER DOING A SINGLE EXOME
BECAUSE YOU HAVE GOT OTHER MEANS OF DOING FILTERING.
IF YOU HAVE NO CLUE GOING INTO THE FAMILY AND IF IT'S A SINGLE
FAMILY, YOU MAY CONSIDER USING ADDITIONAL FAMILY MEMBERS
ASSUMING GOOD PHENOTYPING IS AVAILABLE.
AND FOR THIS TYPE OF MAPPING, ESPECIALLY FOR RECESSIVE
CONDITIONS, IT HELPS TO HAVE BOTH PARENTS AND ONE SIBLING IN
ADDITION TO THE PRO BAN. TRIOS ARE LESS USEFUL FOR
EXCESSIVE MODELS IN PARTICULAR. SO WE TALKED ABOUT A COUPLE OF
DIFFERENT TYPES OF DATA INTEGRATION.
USE ALL AVAILABLE RESOURCE THAT IS YOU HAVE TO HELP FILTER YOUR
LIST. FOR EXOME SEQUENCING, CONSIDER
USING SNP ARRAYS FOR ALL THE REASONS WE DISCUSSED.
THE STUDY DESIGN SHOULD INCLUDE AS MUCH INFORMATION AS YOU CAN
PUT IN FROM CAREFUL PHENOTYPING AND FAMILY HISTORY AND NEW
APPROACHES ARE COMING OUT ONA MONTHLY BASIS.
SO YOU SHOULD DO A LITERATURE REVIEW IF YOU'RE STARTING A BIG
NEW PROJECT. SO, JUST A FEW WORDS ON SEQUENCE
VALIDATION AND REANALYZING PROJECTS.
THERE IS A COUPLE OF WAYS TO THINKS ABOUT SEQUENCE VALIDATION
AND I WAS VERY INTERESTED TO HEAR THAT SOME OF THIS WAS
EXPLORED IN THE CANCER TALK. SO, YOU CAN THINK ABOUT ONE TYPE
OF VALIDATION IN THE SEQUENCING TO MAKE SURE WHATEVER YOU FOUND
IN THE EXOME IS REAL. PICK A SUBSET OF THINGS YOU'RE
INTERESTED IN. YOU MAY HAVE TO DO SEQUENCING IF
YOU HAVE RETURNING RESULTS TO FAMILIES.
THE POINT I'D LIKE TO MAKE IS THE LIKELIHOOD OF VERIFICATION
IS BASED ON YOUR FULFILLERRING TECHNIQUES THAT MAY HAVE COME
OUT IN EARLIER TALKS AS WELL. FOR AUTO SOMAL RECESSIVE MODEL,
WE CAN HAVE 90% OR MORE OF THE VARIANTS DETECTED BY EXOME
ANALYSIS VERIFIED WHEREAS FOR AN AUTO SOMAL DOMINANT MODEL WITH
NEW DOMINANTS AND LESS ABILITY TO FILTER THE VARIANTS N-SOME
CASES WE VARY FIDS 30% OR LESS. SO YOU MAY SEE VARIATION IN HOW
ACCURATE THE GENOTYPING IS BASED UPON WHAT TYPE OF FILTERING YOU
USED. AND THE POINT FOR FUNCTIONAL
VALIDATION THAT I WOULD LIKE TO MAKE IS THAT FUNCTIONAL
VALIDATION IS DETERMINING THE BIOLOGICAL AFFECT OF A VARIANT
AND THERE ARE NO METHOD THAT IS CAN REPLACE FUNCTIONAL ANALYSIS
FROM THE LABORATORY FOR PREVIOUSLY UNCHARACTERIZED
VARIANTS. DURING THE BREAK, I TALKED WITH
A COUPLE OF PEOPLE ABOUT PATHOGENESIS PREDICTION
SOFTWARE. AND THIS IS A STUD FRE2012.
YOU CAN -- STUDY FROM 2012.
THE POINT IS THIS CONFIRMED DATA FROM EARLIER STUDIES THAT ALL OF
THESE METHODS BASICALLY HAVE 10-20% FALSE NEGATIVE AND 20-20%
FALSE POSITIVE RATE. THEY ARE NOT SO GOOD IN
DETERMINING WHETHER AN INDIVIDUAL VARIANT IS PATHOGENIC
OR DISEASE-CAUSING AND SHOULDN'T BE RELIED ON FOR SUCH.
EDITORS WILL ASK FOR EVIDENCE TO FUNCTIONAL CONSEQUENCES.
THERE ARE PAPERS OUT THERE WHERE THEY DON'T BUT EVERY PAPER JUST
ABOUT I THINK EVERY PAPER WE PUT OUT WITH EXOME DATA, THEY WANTED
TO HAVE PROTEIN RNA MEASUREMENTS OR ENZYME ACTIVITY OR RESCUE
EXPERIMENTS OR MODEL ORGANISMS, SOMETHING TO SHOW YOU FOUND THE
RIGHT THING. THE EXCEPTIONS ARE PROBABLY
PREVIOUSLY WELL CHARACTERIZED VARIANTS AND MAYBE SEVERE
VARIANTS IN WELL CHARACTERIZED GENES BUT EVEN FOR THE LATTER,
YOU MAY HAVE TO HAVE SOME EXPERIMENTAL EVIDENCE.
SO WHAT HAPPENS WHEN YOU COME UP EMPTY-HANDED?
WE ALREADY HEARD ABOUT THE ITERATIVE APPROACH WHICH I WOULD
WHOLE HEARTEDLY ENDORSE. YOU REALLY NEED TO REVISIT ALL
OF THE ASSUMPTIONS YOU MADE ALONG THE LINE ABOUT WHO HAS THE
DISEASE AND WHO DIDN'T HAVE THE DISEASE, WHAT THE GENETIC MODEL
WAS, WHAT THE FREQUENCY WAS FOR SOME OF YOUR FILTERS AND YOU
NEED TO KNOW WHAT YOUR TECHNIQUE MEASURES AND DOESN'T.
WE HEARD ALL THROUGH THE MORNING ABOUT THE FACT THAT TARGETING,
CAPTURING, BASE CALLING, ALL OF THESE THINGS COLLECT A PORTION
OF THE TOTAL POSSIBLE WORLD OF TRUE GENOTYPES AND THE FAMILY
YOU'RE STUDYING. EXPLORE THE SOURCES OF FALSE
NEGATIVE RESULTS AND STUDY DATA QUALITY AND COVERAGE.
SO HERE IS ONE EXON AND PLOTTED ABOVE IT IS THE COVERAGE.
THIS IS USING A FAIRLY OLD KIT. THIS IS GETTING THIS COVERAGE
CONSISTENCY GETTING BETTER. BUT YOU CAN SEE THAT THE
COVERAGE VARIES A LOT EVEN ACROSS THIS EXON.
AND THERE IS SOME KNOWN DETERMINANTS OF THIS PHENOMENON.
GC CONTENT, SEQUENCE COMPLEXITY AND NEAR IDENTICAL REPEATS AND
CHANGES IN REPRESENTATION DUE TO UNEQUAL AMPLIFICATION, BUT I ASK
YOU TO CONSIDER WHAT AN AVERAGE COVERAGE MEANS IN TERMS OF THIS
TYPE OF GRANULARITY. THAT MAY NOT PROVIDE YOU THE
TOTAL QUALITY STORY YOU NEED TO GO FORWARD.
GENOTYPING QUALITY AND COMPLETENESS IN EXOME SEQUENCING
IS COMPLEX AND IT CAN FAIL DIFFERENTLY THAN SANGER
SEQUENCING. SO, YOU NEED TO THINK ABOUT A
NUMBER OF DIFFERENT ASPECTS OF WHAT WENT INTO YOUR EXPERIMENT
AND FOR MANY OF THEM THERE ARE SPECIFIC THINGS YOU CAN LOOK AT,
FOR INSTANCE FOR TARGETING OR WHAT WAS THE STUDY DESIGNED TO
CAPTURE, YOU CAN CREATE A BED FILE TO SHOW WHERE ALL THE BAITS
WERE. SO YOU CAN SEE WHAT WAS SUPPOSED
TO BE TARGETED AND WHAT WAS NOT TARGETED.
FOR CAPTURE AND COMPLEXITY, THIS IS SOMEWHAT INVOLVED TOPIC BUT
HISTORICAL DATA CAN BE USED. FOR SEQUENCING AND ALIGNMENT,
YOU CAN USE COVERAGE AND OTHER METRICS AND ALSO HISTORICAL DATA
AND FOR BASE CALLING, YOU CAN USE THE MPG SCORES WE TALKED
ABOUT AND HISTORICAL DATA AS WELL.
SO CLEARLY I MENTIONED HISTORICAL DATA THREE TIMES AND
THE REASON IS BECAUSE AN ACCUMULATED SET OF DATA USING
THE SAME TECHNIQUES IS AN INVALUABLE RESOURCE.
THEY DO EXPIRE AS CAME UP IN THE TALK BEFORE THE BREAK.
SO, IF YOU HAVE GOT SOMETHING WHICH WAS CAPTURED USING MUCH,
MUCH DIFFERENT CHEMISTRY, THAN WHAT YOU'RE CURRENTLY USING, IT
MAY BE LESS USEFUL BUT YOU SHOULD ACCUMULATE AND STUDY
DATASETS UNTIL THEY DIVERGE AS FAR AZTECNIQUE GOES BECAUSE THEY
CAN BE VERY INSTRUCTIVE. AN EXAMPLE OF THAT IS WE USE
SOME OF OUR PREVIOUSLY COLLECTED DATA FROM THE UDP PLUS DATA
SHARED WITH US FROM CLIN SEEK AND WE LOOKED AT SEVERAL HUNDRED
AXIOMS AND LOOKED. WE USE THE FISHER'S EXACT TEST
AND THE BON FERONY CORRECTION DOING MULTIPLE TESTING
HYPOTHESES LOOKING AT THESE SITES.
AND THERE WERE TWO ERROR TYPES THAT JUMPED OUT AT US.
THERE IS MORE COMPLEX INTERESTING THINGS TOO.
ONE OF THESE WAS WHEN ALL OF THE GENOTYPE CALLS WERE HOMOZYGOUS
NONREFERENCE. THAT WOULD SUGGEST THE REFERENCE
SEQUENCE IS EITHER WRONG OR HAS A MINOR ALLELE IN IT.
A SECOND TYPE OF ERROR THAT WE SAW IS WHEN ALL OF THE GENOTYPES
WERE HETEROZYGOUS AND THAT SUGGESTS THAT TWO SIMILAR
REGIONS WERE ALIGNED TO FORM A COMPRESSION AND THE FEW SPOTS
THEY DIFFER SHOW UP AS HETEROZYGOTES AND WE COULD USE
THIS DATA FROM OUR HISTORICAL DATASET TO MAKE EXCLUSION LIST
FOR FURTHER FILTERING. WE DID ANOTHER EXPERIMENT WHERE
WE ASKED THE QUESTION, GIVEN A SET OF GENES ASSOCIATED WITH THE
KNOWN DISORDER SO THIS IS THE GENETIC HIT ROWGENEITY QUESTION,
HOW WELL ARE THEY COVERED? WE TOOK 114 EXOMES FROM 27
FAMILIES AND USED GENE LIST. ONE WAS A VARIETY OF MUSCLE
DISORDERS AND THE SECOND WAS FOR HEREDITARY PASTIC PARAPAIR SIS.
YOU SHOULD KNOW THAT EVEN THOUGH THIS ISN'T ALWAYS TRUE,
SOMETIMES CLINICIANS WILL ASSUME IF A CLINICAL SEQUENCING TEST
COMES BACK NEGATIVE, THEN ALL OF THE SEQUENCED GENE REGIONS WERE
SEQUENCED WITH SUFFICIENT QUALITY TOW DETECT ALL VARIANTS
IN THOSE REGIONS. OR PUT IN SIMPLER TERMS, I CAN
TAKE THAT DIAGNOSIS AND SET IT ASIDE AND MOVE ON TO ANOTHER
DIAGNOSTIC CONSIDERATION. FOR THESE TWO GENE LIST, MIND
YOU THIS IS USING SOMEWHAT OLDER TECHNOLOGY, THE TARGETED CAPTURE
KITS INCLUDED 47-73% OF NUCLEOTIDES WITHIN THE GENE LIST
AND THIS IS PROBABLY LOWER THAN AVERAGE.
IT IS TRUE FOR THESE LIST. AND WHILE THE AVERAGE COVERAGE
WAS HIGH, 40X TO 100X, 2-3% OF THE NUCLEOTIDES HAD LESS THAN
FULL COVERAGE. MY POINT IS NOT THAT THESE
TECHNIQUES ARE VERY, VERY ERROR PRONE AND DON'T USE THEM.
YOU NEED TO UNDERSTAND THE ASSAY CHARACTERISTICS AND KNOW WHAT IS
MISSED. AND HERE IS A GREAT EXAMPLE TO
ILLUSTRATE THAT. SO THIS IS WORK DONE BY ONE OF
MY COLLABORATORS AND THIS WAS A FULL ON CHARIOT RACE THE FIRST
ONES IDENTIFIED THE GENE GETS THE GOOD PAPER.
AND THIS WAS A LARGE REGION THAT WAS IDENTIFIED BY LINKAGE
MAPPING AND MANY, MANY GENES WERE SEQUENCED OVER A YEAR OR
SEVERAL YEARS AND NOTHING WAS FOUND.
EXOME SEQUENCING CAME ALONG. TAKE THIS PROJECT OUT OF THE
FRIDGE LIKE WAS TALKED ABOUT AND TRY EXOME SEQUENCING.
DIDN'T FIND ANYTHING. SO THEN, ANOTHER MEMBER OF THE
LAB WENT BACK AND LOOKED FOR SPECIFIC REGION THAT IS HAD BEEN
MISSED BY THE EXOME SEQUENCING AND IT TURNED OUT THAT THE
ANSWER, OR THE CAUSE OF THE GENE WAS IN ONE OF THOSE SECTIONS
MISSED BY THE EXOME SEQUENCING AND YOU MAY ASK WHO WON THE
RACE? THIS AND TWO OTHER PAPERS WERE
SIMULTANEOUSLY PUBLISHED IN NATURE GENETICS.
SO I GUESS IT WAS A TIE. SO FOR VALIDATION AND REANALYSIS
AND SUMMARY, FUNCTIONAL VALIDATION IS REQUIRED TO PROVE
THAT A CANDIDATE YOU HAVE IS THE PATHOGENIC VARIANT.
AND IF THERE ARE NO GOOD CANDIDATES AT THE END OF THE
ANALYSIS, USE THE ITERATIVE APPROACH TALKED ABOUT TODAY.
RESIST ASSUMPTIONS AND ANALYSIS PARAMETERS.
USE HISTORICAL DATA AND STUDY THE QUALITY AND COVERAGE ISSUES
OF YOUR PARTICULAR PROJECT EVEN HISTORICAL DATA IF POSSIBLE.
AND DATA QUALITY IS CONSTANTLY IMPROVING BUT THAT DOESN'T MEAN
THAT EACH NEW TECHNIQUE WON'T HAVE SOME SORT OF FAILURE MODES
YOU NEED TO BE STUDIED. SO IN CONCLUSION, BE SURE TO
GIVE PLENTY OF TIME TO EXPERIMENTAL DESIGN BEFORE YOU
START YOUR PROJECT. CONSIDER USING ADD JUNK
TECHNOLOGIES TO COMPLIMENT EXOME ANALYSIS.
PHENOTYPING IS CRITICAL, ESPECIALLY WHEN USING FAMILY
DATA. CONSIDER USING ADDITIONAL FAMILY
MEMBERS IN CERTAIN CASES. FUNCTIONAL PROOF OF
PATHOGENICITY IS REQUIRED AND ANALYZE THE DATA IN AN
INTEGRATIVE MANNER ALTERING ASSUMPTIONS AND FILTERING
CONSTRAINTS AS YOU GO. SO I HAVE MANY PEOPLE TO THANK.
I'M NOT GOING TO THANK THEM ALL INDIVIDUALLY.
THESE ARE FROM SEVERAL DIFFERENT GROUPS, INCLUDING OUR OWN.
I WOULD SAY TOM, NEIL, MEROT AND CARRY ARE OUR CORE
BIOINFORMATICS GROUP FOR THE UDP NOW BUT WE FROM THE
COLLABORATION WITH OTHER MEMBERS OF THE NHGRI AND OTHER TRAIN
MURAL COMMUNITIES. THANK YOU VERY MUCH.