Tip:
Highlight text to annotate it
X
Art Reingold: Morning. Good morning. Good
morning. So, first of all a quick reminder that the midterm is
a week from Wednesday. Um, and you will need a calculator.
So, please bring a calculator, but you do not need to memorize
the formulas for various important things such as attributable
risk and things like that. We will provide you a list of all
the relevant formulas. That's one of the differences between
250 A and 250 B. In 250 B we expect you to derive them. Here
we give you the formulas but you need to know the right one to
use in order to answer a question. At the moment I only know
of one student who has requested a need to take the exam
outside of the regular time. One last warning, if you can't
take the exam a week from Wednesday at 9 o'clock you need to
let your GSI know right away. Today and Wednesday we're going
to finish our discussion about case control studies. Again,
just to reiterate that case control studies should be seen as
efficient sampling within cohorts or within populations.
So, the cases in case control studies however you find
them and enroll them are meant to be representative of all of
the cases of that condition occurring in the population. So
whether you try and enroll all the cases or some sample of the
cases they need to be representative of the cases that are
occurring.
Similarly, the controls selected are intended to be a
representative sample of the source population from which the
cases came. So those of you who have somewhere read or gotten
in your mind that controls should be just like the cases
without the disease, that's not correct and you should erase
that from your memory banks. Controls are intended to be
representative of the source population from which the cases
come. Okay? And then the issue of matching, for example on
age when it's useful and when it isn't. Toxic shock syndrome
in tampons. This was a study in which the investigators were
not sure the best approach to sampling controls were. Ideally
controls would be a random sample of the population. You can't
take a random sample unless you have a list of the entire
population and we in the United States virtually never have a
list of the whole population. We'll come back to different
approaches in this study people decided two types of controls.
Neighborhood matched selected by random digit dialling based on
telephone numbers and friends of the cases. I'm not showing
you the details. In point of fact when this control study was
analyzed using each of the control groups the odds ratio
measured for the associated *** use and toxic shock was
quite different. It was different because the prevalence of
the exposure in the two different control groups was quite
different. Then you could ask the question, which is in fact a
more representative group. Friends of cases or random digit
dialling selected controls. Of course unfortunately we never
know the answer to that question.
But simply to point out you might get a very different
answer depending on who the control group is. We'll come back
to that in a couple weeks when we talk about biases,
particularly selection bias.
This is to point out in all the various case control
studies of this disease people have used a variety of
approaches to choosing controls. In some studies friends were
chosen. In some neighborhood match. In some of them they were
having matched on having gone to a clinic of some kind. Some
cases it was based on a national representative sample of
households.
Again, each of these is a flawed second best to a
random sample. But each of them has its pluses and minuses.
And frequently we have to choose between the least bad method
of choosing the controls.
Okay. Now another setting in which we frequently do
case control studies is in outbreaks. One of the reasons for
that of course is because we want to get an answer quickly in
order to make sure that if people are still at risk of getting
sick we identify the source of the illness and remove it
quickly. Another is that we frequently have many, many
hypotheses to test. Not just one hypothesis, but many
hypotheses and case control studies are very good for that.
This would just be one example of a study done a number of
years ago. This was an outbreak in Sierra Leone in west
Africa. A number of people came into a clinic with sudden
onset of weakness, dizziness, vomiting and diarrhea, frothing
at the mouth, short of breath and losing consciousness and many
died within a few hours. First let me ask you it's fairly
obvious because this occurred so soon after eating this was
likely related to something that people ate.
Any suggestions about what type of illness this was or
food borne illness this was based on these signs and symptoms
and the rapidity of the onset? Yes.
>>>: Something toxic in the food.
Art Reingold: That's exactly right. This could
not be an infection because infections like bacteria and
viruses and incubation periods and none make you sick within a
matter of minutes. It has to be some preformed toxin in the
food. It could be manmade or naturally occurring but it has to
be a toxin of some kind. Here you can see people were
interested in figuring out what the source of this illness was.
A very severe illness with a lot of deaths. So they had to
come up with a case definition, who was a case. Based on
various signs and symptoms. And then they had to make a
selection of controls.
In this case they chose controls among people living in
the same household as the case. All of the hypotheses
basically related to what was eaten at a particular meal. The
meal obviously made the cases ill. So how would you ascertain
what people ate at a meal?
The easiest way of course is to ask them. Of course if they are dead or
incapacitated you can't ask them. So you need to ask some surrogate
what that person ate and there may be errors injected in that process,
but nevertheless that's typically the only approach you have.
So, they collected information about what foods and
beverages had been consumed during the four hours prior to
onset of illness in the case. Around the day the patient
became ill for the controls. They had about 50 different foods
and beverages on that list of hypotheses.
And so here is a simple two by two table from this case
control study. So cases and controls. Ate bread or didn't eat
bread. Here you can see for consumption of bread the odds
ratio was elevated with confidence intervals that exclude one
and that was not true for any of the other food items.
Okay? So this is a very simple example of a case
control study identifying the source of an outbreak. In this
case it was the bread and it was possible to trace back what it
was about the bread that made it so toxic. They were able to
figure out the bread was made from flour. The flour had been
in sacks transported by a truck. On that truck the sacks of
flour were sitting next to containers of pesticide. One of the
containers of pesticide leaked into the flour and the bread was
baked contaminated with a particular pesticide. It was
possible to tell this by this case control study pointing to
the bread as the problem.
Now you might ask if you look at this if it's really
the bread how come out of the 21 cases 7 of them didn't eat the
bread. In theory it should have been all of them. I don't
want to get into details here, but there are a number of
reasons why there might be people classified here. First of
all they might have actually eaten the bread and be
misclassified in terms of their exposure. That's the most
likely explanation but there are a number of others. Just a
simple example in outbreaks we frequently resort to case
control studies to test many hypotheses and figure out the
cause of the outbreak.
This is one more cute example of an outbreak in which a
case control study was very effective. C. jejuni is a cause of
food borne illness. Here you can see an outbreak was detected
in London. So here the case with someone in their stool.
Controls, individuals living in the same neighborhood. You go
next door around knock on the door and ask if someone will be a
control in your study. A neighborhood matched control. Then
you simply ask questions about various exposures and here was
the relevant exposure. This has to do with this was in the old
days. Anyone here live in a place where they deliver milk to
your doorstep? When I was growing up that was common in the
United States. We don't do that so much anymore. In the UK
they do deliver milk to the doorstep. Here you can see the
relevant exposures all relate to having milk delivered to your
door was a risk factor. And having your milk bottle attacked
by crows, birds, the milk is sitting there. The crows would
basically remove the top of the bottle and drink the cream from
the top of the milk and in the process introduce campylobacter
into the milk. So basically you can see that in fact having
your milk attacked by crows the previous week associated with a
very strong relationship in getting campylobacter.
Sort of an interesting, not something you might have
thought about in terms of an exposure having crows attack your
milk bottles on the front porch. In this case a simple example
of a case control study helping figure out the cause of an
outbreak.
Now, I just want to point out I think I've already said
this. People frequently think of case control studies in terms
of dichotomous outcomes and dichotomous exposures. Dichotomous
outcomes means ill or well, lived or dead. Dichotomous
exposures would be eat the food didn't eat the food, exposed,
not exposed. In point of fact you can also look at they don't
have to be dichotomous. You can look at categorical exposures
or continuous exposures.
And you can actually do case control studies that look
at dose response effects. Okay?
Going back to that campylobacter outbreak. Here the
number of days per week the milk bottles on your porch were
attacked by birds. Here you can see cases and controls. You
can basically see a substantial increase in the odds ratio of
getting campylobacter with a higher frequency of having your
milk bottles attacked by crows. The more times your milk
bottles are attacked by grows the more likelihood you ended up
of consuming contaminated milk. Evidence of a dose response
effect like cigarette and lung cancer. Case control studies
are not limited to dichotomous outcomes and dichotomous
exposures.
Okay. I do a lot of work on vaccines. And we
frequently are interested in knowing how effective vaccines are
in the real world after they've listen licensed and are in
common use and it's no longer ethical to do randomized control
trials. How well do vaccines work? A very common approach is
to use a case control study.
So in a case control study of vaccine effectiveness
what do we basically do? We find cases of the disease. We
choose appropriate controls and what's the exposure of interest
in that study that we have to ascertain?
>>>: Whether or not the person has been
vaccinated.
Art Reingold: Whether or not the person has been
vaccinated. The exposure of interest is a potentially
protective effect of rack vaccine. Have you been vaccinated or
not. If it's a vaccine where you get multiple doses you can
look at actual number of cases.
If you are interested, again, we're going to come back
to this relative risk is approximately equal to the odds ratio.
So we define vaccine efficacy as one minus the relative risk
times a hundred. What's the most effective a vaccine can be?
What's the maximal value of vaccine effectiveness?
>>>: A hundred percent.
Art Reingold: Louder.
>>>: A hundred percent.
Art Reingold: A hundred percent. If there are no
cases in the vaccinated then this will be 0, this will be a 0
in the numerator. One minus 0 is one times a hundred percent
is a hundred percent. If there are no cases in the vaccinated
the vaccine is 100 percent effective. What's the minimum
value?
>>>: 0.
Art Reingold: That's the trick quick. That's
incorrect. What if the rate of disease is higher in the
vaccinated than the unvaccinated. What if we make a mistake
and give you a vaccine that gives you a vaccine that increases
your risk of disease. In which case the relative risk of the
odds ratio will be greater than one. One minus anything
greater than one is a negative number. And vaccines can be
negative efficacy. We try and avoid that. It's not good for
public health. Technically the lower limit is negative
infinity. If the vaccine actually causes disease, if it
increases the risk of disease the vaccine will have a negative
efficacy. This is an example of a new vaccine rolled out
in 2000. The pneumococcal vaccine. How well is it working?
We have population based surveillance. We find all the cases.
How do we choose cases? The best way to decide this would be
based on birth certificates. If the case was born in a
particular zip code in a particular week. We found every other
child born in that zip code in that week and took a random
sample of those children. The exposure of interest is
vaccinated or not and how many doses of the vaccine. This is a
national study at ten sites. Number of doses ranging from 0 to
4. Classify the cases and controls based on their medical
records and how many doses of vaccine they receive.
>>>: Why you choose that method as opposed to
trying to find individuals who were perhaps matched for the
cases by other variables other than the zip code?
Art Reingold: We're going to come back to why we
chose that in a couple of minutes. If you don't have to match
you are better off not matching. Despite the fact most
epidemiologists want to match on everything all the time,
matching in fact is not a good idea if you don't have to do it.
Okay? We'll come back to that.
Here you can see we can calculate the odds ratio of
vaccine effective of one doses versus no doses, two doses etc.
We can show the vaccine is working quite well. This is an
underestimate of the efficacy which is more in the range of
95 percent. This is an example of a case control study to look
at the protective effect of a vaccine. When we talk about
screening such as breast cancer screening and cervical cancer
screening we can use case control studies to look at the
effectiveness of screening. In that case the outcome of
interest would be died of breast cancer, yes or no and the
exposure of interest would be screening them out. Right? Very
similar approach.
We'll come back to that when we talk about screening.
I think I'm going to skip over this for the sake of
time. It's really not that interesting as a case control
study.
Um, perhaps move onto this. Some of you may know
there's a lot of interest in the question of whether use of
cell phones increase your risk of brain cancer or not. Brain
cancers are very rare. And so again, really the most plausible
approach to studying this relationship is to do a case control
study.
And then obviously once you find the cases the first
question is how you choose controls and how do you measure the
exposure of interest which is of cell phone use, particularly
amount of cell phone use.
Okay. So this is just one example of such a study.
These are different types of brain cancers. They are basically
chosen by city and hospitals in finding all the cases. This is
a hospital based case control study. You sit in the hospital.
You find all the cases of brain cancer. They chose controls
admitted to the same hospital for nonmalignant conditions.
People admitted to the hospital for nonmalignant conditions
would be representative of the source population with regard to
cell phone use, which is the exposure of interest. Right?
So, how do you ascertain someone's cell phone use as
the exposure of interest?
If it's just do you use a cell phone or not you could
ask them. But if you actually want amount of time used how
else might you get that?
>>>: Cell phone provider.
Art Reingold: Cell phone records. Billing
records. Right? The other thing that might be of interest is
which side of your head do you put your cell phone on? Do you
use your left ear or right ear or ambi ear. I certainly only
use my left ear for my telephone. You might want to look at
the relationship between cell phone use and tumors just on that
side of the head. If the hypothesis is radiation from the cell
phone presumably if that's the case you would see a
relationship on the side of the head where you hold your phone
and not on the side of the head where you don't hold your
phone. Right? At least that's plausible.
Here's just where you know very early study. Here you
can see cumulative use over 100 hours of cell phone use versus
never. Probably you all rack up a hundred hours a week now.
This was in the early days of cell phones. Here you can see
this particular study didn't find any relationship but this was
11 years ago. And this was when cell phones were really only
becoming more and more common.
So cell phone use, A, wasn't that common. B, people
hadn't racked up large numbers of hours of cell phone use. C,
there hadn't been much of a latency period of when cell phone
use began and when people might have had the opportunity to
develop their cancer. And cancers might have latency periods
of 10 or 15 or 20 years. Certainly smoking related cancers
take 15, 20, 25 years to develop. This might not be a very
accurate reflection of what's going on. This is a very
controversy issue.
Here you can see cell phone use, number of subscribers
and of course a particularly interesting question is what about
when children use cell phones from a very early age might the
brain of a child be more susceptible to the radiation from cell
phones? And might 20 or 30 years of cell phone use have a
different effect? People are still doing these studies.
Here's a more recent study. Acoustic neuromas near the ear.
This is in Sweden find all the patients with this cancer.
Population based controls from the registry of the Swedish
population. If you want to do good epidemiology you move to
northern Europe. In Finland, Denmark and Sweden they have
registers of the entire population. They can take random
samples of the population.
Every person in the population has a unique identifying
number. And they can take random samples of the population.
We in the United States seem to be paranoid about the
government knowing who we are and where we are. This would not
go over well in the United States. Here you can see in this
study regular mobile phone use versus never. An odds ratio of
one. After ten years of years an odds ratio of almost four.
These studies are continuing. Some are finding relationships,
some are not. This is an open question of whether cell phones
increase the risk of different types of brain cancer or not.
Stay tuned and decide for yourself. Here's a famous and more
recent study. Here you can see different types of brain
cancer. For these types no increase in the odds ratio.
So, again, a very controversial area where this is a
risk factor or not. I'm going to skip over this also for the
sake of moving on.
So, I want to get now some of the details about this
process. Because it's important to think about them. As I've
alluded to how you choose controls is a very important issue in
case control studies. If you can take a random sample of the
population that's the ideal approach. When you can't do that
various approaches take either an approximation of a random
sample, we've already said random digit dialling doesn't give
you a random sampling of the population. Door to door,
knocking on doors basically matches on neighborhood. We
frequently take controls based ongoing to the same hospital or
clinic where the cases are found. Lots of issues about how to
do that and whether that works. Friends and relatives are
frequently chosen as controls.
And sometimes the question arises should we enroll dead
people as controls? Why might we want to enroll dead people as
controls? When might that be reasonable to do? Seems a
little, I don't know, odd. Go out and find dead people to be
your controls.
>>>: If the exposure of interest is lethal and
you didn't include dead people you might be introducing biases
to the study.
Art Reingold: Right. If the illness kills some
people and some of the cases are dead and you want to include
dead cases you might also want to include dead controls as a
way of minimizing biases that come into play when you enroll
dead cases. Because when a case is dead you have to limit your
collection of information about the exposures either to written
records or to surrogates such as spouses or children or family
members. And one feeling is you can at least equalize the bias
if you enrolled dead controls per dead cases. It gets a little
tricky going out and finding dead controls. That's an issue
that people have to deal with when they think that's
appropriate.
Okay. So, um, when we think about choosing the cases
for case control studies as I've said typically case control
studies are done for rare outcomes and we want every single
case we can find. Right? We can't afford to sample the cases.
We want all of the cases, but there are instances where we have
so many cases we can take a sample. And if we take a sample to
have cases the source population from which the cases came
should be definable. The case definition should provide
accurate classification of those with and without the disease.
Selection of the cases has to be independent of their exposure
status.
So what do I mean by that? If we're talking about
smoking and lung cancer we don't want to bias the cases that
are in our study primarily having smoking related cancers and
missing the cancers in nonsmokers. Right? If we did that we'd
come up with a biased estimate of the odds ratio.
So, the cases have to be selected independent of their
exposure status. Exposed and unexposed cases should have the
same probability of being selected. And in general we prefer
to do study with incident new cases rather than prevalent
cases. And why is that? Why is there a preference to only use
incident cases instead of prevalent cases?
>>>: They know the disease happened after the
exposure?
Art Reingold: Pardon?
>>>: You can insure the disease happened after
the exposure.
Art Reingold: Having made sure the disease
occurred after the exposure. I think you can generally do an
equally good job of that with prevalence or incidence cases.
So what does prevalence depend on? What are the two factors
that determine prevalence of the disease?
>>>: Duration and incidence.
Art Reingold: Incidence of the condition and
duration of survival of the condition. The problem is there
may be factors related to likelihood of surviving with a
condition that come into play that therefore make the prevalent
cases a biased sample compared to the incident cases. Right?
By choosing prevalent cases we might come up with a biased
estimation of the odds ratio if we only included incident cases
if the exposures we're interested not only relate to your risk
of getting the disease but also relate to your length of
survival.
Okay. Now in terms of controls as I've said these are
the three basic approaches, either some sort of population
based approach, particularly random selection, hospital and
clinic based or friends and neighbors. And I said the other
day all of these are problematic. They just have different
problems associated with them.
So, population based controls are the best for
generalizability of results. There's the least bias. They
tend to be the most expensive and the most difficult to get.
And we have real problems with declining rates of
participation. I don't know about you, but when my phone rings
at dinner time I can see who is calling. If it's not somebody
I know I just don't answer the phone. Right? First of all
getting people to answer the phone is difficult. And once you
get them on the phone getting them to participate is difficult.
These declining rates of participation mean even if you can
identify a representative sample of the population you may not
be able to enroll a representative sample of the population.
This is the ideal type of control, but very, very
difficult to get. Hospital and clinic based controls are
generally fairly easy to find. And they are thought to reduce
something called recall bias. We'll talk about recall bias in
a couple of weeks. But they can be difficult to determine who
can be included and who can't be included. For example, if you
are looking at the relationship between smoking and lung
cancer, who is eligible to be a control based on people coming
to a hospital? Should you include or exclude people with
chronic bronchitis? A smoking related health outcome. Should
you include or exclude people with heart disease? A smoking
related health outcome. So who can be included? Who can't be
included can lead to enormous amounts of bias. Friends and
neighbors are frequently used. They are really, really easy.
Okay.
Um, and of course choosing friends and neighbors tends
to control for lots of potential confounders. We'll talk about
confounders after the exam. Fundamentally when we choose
friends and neighbors we tend to live next to and be friends
with people just like us. So choosing friends and neighbors
tends to control for socioeconomic status and race and a number
of other potential confounders. But it may lead to this
problem of over-matching. All of these are flawed. All of
them are difficult in terms of choosing controls.
Again, I want to point out controls should be
representative of the source population from which the cases
came. They are not supposed to be just like the cases but
without the disease. They are supposed to be representative of
the source population.
Another way of thinking about it is controls would if
they developed the outcome under study be in the study as
cases. Okay. That's another way of thinking whether the
control group is actually achieving what it's meant to achieve
in terms of representativeness. If that control developed the
disease they would end up in the study as a case.
And again it's important to point out the controls like
cases should be sampled independent of their exposure status.
We shouldn't be more likely to enroll smokers or nonsmokers or
oral contraceptive pill takers as opposed to nonoral
contraceptive pill takers or whatever it is. We need to sample
controls independent of their exposure status.
This is an example just to show you how difficult this
process can be. There are a lot of studies showing the
selection of controls in case control studies ends up
introducing bias. This is a simple one on childhood leukemia.
She's doing case control studies in leukemia. The question is
how do you choose controls for cases? The cases are easy to
identify. How do you choose controls? For each case they
selected one control among the children identified using
computerized birth records and located successfully. This
would be a common approach to find controls. You take all the
children born. You take a sample of those children but you
only enroll the ones you can actually track down. That's a
very common approach. And then they also took a group of what
they, they also took friend controls. And then they took a
third group of what they called ideal controls selected from
birth records without tracing eliminating the whole problem of
being unable to find the control. This is the ideal control
group. Then they asked the question how different are the
people they actually managed to enroll either as friend
controls or as this successfully located population? And the
bad news is they found they're different. Okay?
So, the reality is for all the variables except birth
weight, they basically found that the ideal controls if you
consider these to be the perfect control group that friend
controls are in general a more biased sample, but even these
ideal controls, excuse me, the birth certificate controls, even
that group, the children they can track down based on their
birth certificates were also a biased sample compared to an
ideal control group. This is one example of the problems we
have with selection bias when we choose controls for case
control studies. We know what the ideal is and we can
virtually never achieve the ideal. The groups we can actually
enroll tend to be a biased sample.
Okay. Okay. This is a study I've been trying to do.
Many of you probably had the meningitis vaccine. We've been
trying to do a study to see how effective it was. If the cases
were in young people 12 to 19 years of age, how should we
choose controls for this study? The cases are people 12 to 19
with meningitis. How should we choose the controls? This is a
real study we've been doing for several years. We find the
cases through active laboratory based cases. We had a big
discussion how to choose controls. Random digit dials was one
approach. Incredibly labor intensive. Thousands of phones.
Cell phones and caller ID are a big problems. We decided we
couldn't find enough controls. School matched controls. We
choose controls for your classmates in school X. Schools have
to give permission. Getting permission from every school
district, getting schools to participate is incredibly
cumbersome and difficult.
Clinic and provider based controls. If the case was
seen by a particular doctor we'd find controls seen by the
particular doctor. Very likely to overmatch on vaccination.
The same doctor probably vaccinated or doesn't vaccinate all
the kids in his or her practice.
Hospital matched, real concerns about who is in a
hospital between the ages of 12 and 19? In that age group who
gets hospitalized and are they representative of people in this
age group? The answer is they clearly are not. Right? And if
you say maybe I'll take people 12 to 19 in automobile
accidents, for example. There are few unfortunately and it's
very hard to find them. Friends and acquaintances. We
basically couldn't get permission to ask people for their
friends's names in order to do this study. That didn't work.
Neighborhood matched too labor intensive. Sending people out
to knock on people's doors was considered potentially a safety
risk to the interviewers. All of these were considered
difficult to impossible.
We ended up doing a mix of all of them. And it's still
not clear whether this is an effective strategy or not and how
biased these controls are. Just to show you choosing controls
and what the strategy is can in fact be quite difficult.
Okay. So, we frequently will see in case control
studies there are many controls than cases. Right? Why is
that? Why are there typically more controls than cases in a
case control study?
>>>: Because people drop out.
Art Reingold: No. It's not because people drop
out. It's not a question of a cohort being followed over time.
So we don't worry about people dropping out of case control
studies.
So the ideal approach in terms of the amount of
statistical power you get per participant is equal numbers of
cases and equal numbers of controls, 1 to 1. One control for
each case is the most power, the most efficient study design in
terms of power per person enrolled in the study. Why might
people choose more controls than cases?
>>>: Increase the power.
Art Reingold: Increase the power of study when
that's really the only option for increasing the power of the
study because? Because cases are hard to find. When we are
studying rarer diseases cases are hard to find. Right? If
it's a rare disease and you can't find enough cases, one
alternative is to increase the number of controls as a way of
increasing the power of the study. Okay?
The most efficient study design in terms of statistical
power per person investigated is a 1 to 1 ratio. Cases are
harder to find than controls. So you can increase the power of
a study by increasing the number in one group compared to the
other group. Because cases are typically rare. That usually
means more controls than cases, usually. But the bigger the
ratio between the numbers in the two groups, the less
statistical power is gained by further increasing the larger of
the two groups. In other words, going beyond a ratio of about
4 to 1 or 5 to 1 doesn't add very much power with each
additional control. Okay? It does add power, but less and
less. That's illustrated here. This is the amount of power
added in the ratio of controls to cases and you can see it
levels off at about 5 to 1. So making a ratio of 6 to 1, 7 to
1, 10 to 1 will increase the power of the study, but you get
less and less *** for your buck. Okay? It may be your only
alternative to increasing the power of a study, but you get
less and less out of it. And each person included in the study
costs money to go out and find them and interview them and
collect the information there's a cost associated.
So for efficiency sake 1 to 1 is the ideal ratio. But
you typically see ratios more like 2 to 1, 3 to 1, 4 to 1
because cases are hard to find.
Matching. People tend to think about matching as a way
of improving the validity of a study, but that's really not
correct. Matching is primarily done to improve the efficiency
of a study. Okay.
That's a issue we're going to come back to, but
obviously if you are worried about things like age and sex and
race being confounders, something we'll talk about in a couple
of weeks, one way to deal with confounding is to match on
confounding factors. So, if you have a case of a certain age
you choose a control of a certain age. If you have a case
that's a male you choose a control who is a male. If you have
a case who is poor, you choose a control who is poor. That's
what we mean by matching.
You can do an individual matching or frequency
matching. Again, people tend to think about this as a way of
enhancing the validity of the studies by making the controls
look just like the cases except not having the disease. But in
fact what it really does is enhance the efficiency of a study.
What I mean by that the statisticians would tell you the ideal
thing to do is to not match on anything and then the control
for all those variables in the analysis. To use various
logistic regression or other techniques to control for
confounding factors in the analysis. They match on nothing and
then analyze and control for those factors in the analysis.
The problem with that is you may have a lot of small or
0 cells. And therefore a very inefficient study. By matching
on various factors you increase the efficiency of the study.
Which is why people match.
At least that's the reason they should be matching.
So, matching does prevent or minimize confounding. An issue
we'll come back to. It increases the efficiency of a study.
And but the reason people match is to give them a framework for
choosing controls. It's easier to choose controls if they
match on various things. Choosing neighborhood controls is
frequently a matter of convenience but in the process it
matches on things like socioeconomic status. Okay.
So, it's frequently done for logistical reasons. The
downsides of matching and there are major disadvantages to
matching which is why statisticians tell you generally don't
match. First of all you may not be able to match controls to
some of you cases. If you can't match, the cases can't be in
your study. In a matched study you simply throw them away. If
cases are rare you really don't like to throw away cases. So
you may have difficulty matching controls to some of the cases
resulting in the loss of cases from the study. The association
between the matching factors and the outcome of interest cannot
be assessed once you match on something. If you match on
gender or age or socioeconomic status you can no longer analyze
the relationship between that variable and the outcome. So
matching prevents you from studying these relationships that
might be of interest.
Certainly if you do match you have to use techniques to
take matching into account. These are conditional methods.
But there are such methods. Matching can sometimes produce
this problem of overmatching. This is an issue I don't want to
get into. If you watch on weak confounders you can actually
introduce bias into the study and cause problems. So matching
is not always a good thing despite the fact that people seem to
always want to match on lots and lots of things.
Okay. So I'm going to skip over that now. This is an
issue we won't finish today but I'll pick up on Wednesday.
Which has to do with the fact that another way of thinking
about choosing controls is this approach in terms of when do we
choose controls in relationship to their risk of becoming ill?
So, this is somewhat more of a 250 B issue. You need
to be familiar with this. There are basically three ways of
sampling controls. The first is what's called survivor
sampling. In an outbreak investigation that's typically what's
going on. In other words you ate at a meal where there was
salmonella in the food items. 72 hours you either got
salmonella or you didn't. Your risk of getting salmonella from
that outbreak is over within a few days. In other words, if we
choose controls from people that didn't get sick in the
outbreak they are survivors. Their risk period is over in
terms of that outbreak. They could get it two years from now.
In terms of this outbreak their incubation period is over.
They didn't get sick. They survived. When you choose
survivors, when you choose controls from among survivors that's
what we mean by survivor sampling. But in many case control
studies we are not choosing from among survivors. If we're
doing a case control study of smoking and lung cancer. We
choose everybody with cancer. We choose controls and we ask
them about their smoking status. Right?
Those controls could still get lung cancer next week or
next month. Or next year. Right? They are still at risk of
getting lung cancer. Their risk period is not over. Right?
So, in a study like that the risk set or density based
sampling are selected among individuals in the at risk
population who have not yet experienced the outcome by the time
a given case is diagnosed but they remain at risk. They could
still get the disease in the future. Right? It's very
different from survivor sampling. In this third thing that
we'll talk about both all of this more on Wednesday. Case base
or case cohort sampling is when we choose controls from among
everyone who is in the cohort at the baseline. Okay?
These are three we're going to come back to this on
Wednesday. Another way of looking at this is survivor sampling
we only choose controls among people who make it to the very
end of the risk period that didn't get sick. Incidence density
sampling we choose controls based on them not being sick at the
time the case is diagnosed. They are still at risk later. And
base sampling is when we choose controls among everybody in the
cohort at the beginning. This is an issue I'm going to pick up
and talk to more on Wednesday because it's a really important
issue. Okay?