Wakefield's Smoking Gun

Some of you are aware of the story of Andrew Wakefield and the role he played in the modern anti-vaccine movement, but for those who aren't, here's the 1 minute version. Wakefield was a physician, and I emphasize WAS, he's been struck off for ethics violation and misconduct in medical practice and research. In 1998, he and 12 co-authors published a study in the Lancet linking autism spectrum disorders to entercolitis, or inflammation in the lining of the digestive tract. He speculated that this was because of the MMR vaccine, which may have caused an immune reaction to the viral antigens that promoted the rapid development of autism-related disorders in children. He followed this with two publications, in 2000, he characterized this new condition as Autism entercolitis, and he went about finding it in affected children but not neurotypical controls. In 2002, he claimed to have found active measles replication in the lymph nodes along the guts of autistic children. Just to make sure I'm clear on this. His model was that autism was caused by measles virus replication in the gut which the children got from their MMR jab. In 2010, the UK's "General Medical Council" determined that data used in his 1998-2002 publications were dishonestly manipulated to support a conclusion. They also found that the samples he tested were obtained unethically, outside of the purview of a human research ethics board, a very serious violation, especially in that it involved special needs children without informing the parents of the risk of an invasive medical procedure or obtaining proper consent. A journalist named Brian Deer also uncovered in 2004 that Wakefield had undeclared conflicts of interest, including owning a patent on a new measles vaccine to be given as a single jab, development of a new diagnostic test for autism entercolitis, and, of course, direct funding by lawyers who made a business of suing vaccine manufacturers. What I'm about to present is some of the data used, but not presented in the 2002 Molecular Pathology paper, one that was never retracted. A key figure, to me, in determining that the person producing this data was aware that they were fabricating the data to meet a conclusion. To the general public, this might not look like much, but to a scientist trained in this field, it's a smoking gun of intentional fraud. First, a brief introduction. We're going to be looking at real-time PCR data. PCR is the polymerase chain reaction, a method of copying and amplifying DNA until it can be detected. In a small tube is: the patient sample, a dye that detects DNA fluorescently, and all the components necessary to amplify the patient's DNA. The reaction mixture is heated and cooled in such a way that the DNA doubles in concentration each round or cycle. In 4 cycles of PCR, the DNA will have increased by 2 to the 4th power, or 16 times. In 20 cycles, we'll have increased the DNA by more than a million times its starting copy number, and by cycle 30, by more than a billion times. A general rule for real-time PCR is that if the sample still hasn't been detected by the dye by cycle 30, there was nothing there to start with. Eventually, contamination by DNA from sources other than the patient can begin to amplify, so any result past cycle 30 is suspect. The shape of these curves have three phases, a pre-detection phase, where the amount of DNA is too small to detect with the dye. The log phase, where each cycle produces a doubling in the dye's fluorescence, and the post-log phase, where something in the reaction mix has become depleted, and no more product can be made. I also have to explain that there's a serial dilution of a reference sample of measles virus run at the same time. The researchers copied the DNA of measles, grew it up in bacteria, then diluted it several times, with each dilution being 1/10th the concentration of the previous one. Since we know the concentration of this reference sample, we can use it for comparison to our unknown patient samples to determine the concentration of virus there. First, here's that reference standard curve. The Y or vertical axis here is the amount of fluoresence produced by the dye interacting with the reference sample's DNA as it is amplified. It's on a log scale, meaning each division represents ten times the one below it. Our scale runs from 1 at the top down to 0.001 at the bottom. The first dozen cycles produce very small fluctuations around background, a few thousandths up or down. Beyond cycle 12, though, we see sharp increases as the dye detects high concentration of DNA, and roughly a doubling in every cycle. Our fluorescence scales almost all the way up to 1, a 1000 fold increase in fluorescence vs. the baseline. The actual readout for each of these samples is where on the X-axis, that is the cycle during which they cross the horizontal line drawn at 10 to the minus 2. This value is called the threshold cycle or Ct value. We'll come back to this in a moment. Now, here are the patient samples. Notice the scale, because it no longer goes up to 1, it only reaches 0.1. The patient samples only go up as high as 0.01, and they only do so after cycle 30. They also don't have the distinctive sharp curves that we saw in our reference measles dilutions. These samples look more like the background has gently creeped up over time. As someone who has used real-time PCR in research for 15 years, I can explain exactly what happened here. The dye, or in this case it's actually a probe made of DNA labeled with dyes, has begun to decay. In decaying, it actually produces slightly more fluorescence in each cycle. This is baseline drift, a common issue in real-time PCR. To compensate for this drift, we adjust the threshold line to a point that crosses the very middle of the sharp increase curve, the log phase of the data. In fact, the software used on this instrument would have done so automatically. This is not where the software placed this threshold for this data. This is an inappropriate, overridden threshold. Someone had a goal in mind in setting such a low threshold. The outcome was that patients that were negative for measles virus appear to be positive. This is not an accidental false positive. It was not unintended contamination. It wasn't an artifact of how the technique was used. This data shows a deliberate and deceptive act, designed for one purpose, to make Andrew Wakefield a little richer, and perhaps more famous. Why didn't we catch this when it was published? The short answer is that the system was broken. Real-time PCR had become such a common technique that the raw data was considered extraneous. It was common practice to publish the subsequent analysis, but not the original data on which that analysis was based. In short, the published papers lacked transparency. In 2009, the standards for publication of real-time PCR data were revised to a much, much higher level of stringeny and transparency. To prevent just this kind of fraud in the future, researchers must now submit raw data and extensive documentation of how the tests were conducted, by whom, and on what material. This includes a comprehensive history of each sample, how long it's been frozen, and the reliability of the test. It's a four page form that must be completed before a manuscript can be accepted to a top-tier journal. So, I suppose we have Andrew Wakefield to thank for one thing. He helped to educate the scientific community, yet again, on how easy it is to perpetrate a fraud. That's his only contribution to the scientific body of knowledge. Thanks, Andy! and thanks for watching.