Public Health 250a - Lecture 22

Art Reingold: Morning. Good morning. Good morning. So, first of all a quick reminder that the midterm is a week from Wednesday. Um, and you will need a calculator. So, please bring a calculator, but you do not need to memorize the formulas for various important things such as attributable risk and things like that. We will provide you a list of all the relevant formulas. That's one of the differences between 250 A and 250 B. In 250 B we expect you to derive them. Here we give you the formulas but you need to know the right one to use in order to answer a question. At the moment I only know of one student who has requested a need to take the exam outside of the regular time. One last warning, if you can't take the exam a week from Wednesday at 9 o'clock you need to let your GSI know right away. Today and Wednesday we're going to finish our discussion about case control studies. Again, just to reiterate that case control studies should be seen as efficient sampling within cohorts or within populations. So, the cases in case control studies however you find them and enroll them are meant to be representative of all of the cases of that condition occurring in the population. So whether you try and enroll all the cases or some sample of the cases they need to be representative of the cases that are occurring. Similarly, the controls selected are intended to be a representative sample of the source population from which the cases came. So those of you who have somewhere read or gotten in your mind that controls should be just like the cases without the disease, that's not correct and you should erase that from your memory banks. Controls are intended to be representative of the source population from which the cases come. Okay? And then the issue of matching, for example on age when it's useful and when it isn't. Toxic shock syndrome in tampons. This was a study in which the investigators were not sure the best approach to sampling controls were. Ideally controls would be a random sample of the population. You can't take a random sample unless you have a list of the entire population and we in the United States virtually never have a list of the whole population. We'll come back to different approaches in this study people decided two types of controls. Neighborhood matched selected by random digit dialling based on telephone numbers and friends of the cases. I'm not showing you the details. In point of fact when this control study was analyzed using each of the control groups the odds ratio measured for the associated *** use and toxic shock was quite different. It was different because the prevalence of the exposure in the two different control groups was quite different. Then you could ask the question, which is in fact a more representative group. Friends of cases or random digit dialling selected controls. Of course unfortunately we never know the answer to that question. But simply to point out you might get a very different answer depending on who the control group is. We'll come back to that in a couple weeks when we talk about biases, particularly selection bias. This is to point out in all the various case control studies of this disease people have used a variety of approaches to choosing controls. In some studies friends were chosen. In some neighborhood match. In some of them they were having matched on having gone to a clinic of some kind. Some cases it was based on a national representative sample of households. Again, each of these is a flawed second best to a random sample. But each of them has its pluses and minuses. And frequently we have to choose between the least bad method of choosing the controls. Okay. Now another setting in which we frequently do case control studies is in outbreaks. One of the reasons for that of course is because we want to get an answer quickly in order to make sure that if people are still at risk of getting sick we identify the source of the illness and remove it quickly. Another is that we frequently have many, many hypotheses to test. Not just one hypothesis, but many hypotheses and case control studies are very good for that. This would just be one example of a study done a number of years ago. This was an outbreak in Sierra Leone in west Africa. A number of people came into a clinic with sudden onset of weakness, dizziness, vomiting and diarrhea, frothing at the mouth, short of breath and losing consciousness and many died within a few hours. First let me ask you it's fairly obvious because this occurred so soon after eating this was likely related to something that people ate. Any suggestions about what type of illness this was or food borne illness this was based on these signs and symptoms and the rapidity of the onset? Yes. >>>: Something toxic in the food. Art Reingold: That's exactly right. This could not be an infection because infections like bacteria and viruses and incubation periods and none make you sick within a matter of minutes. It has to be some preformed toxin in the food. It could be manmade or naturally occurring but it has to be a toxin of some kind. Here you can see people were interested in figuring out what the source of this illness was. A very severe illness with a lot of deaths. So they had to come up with a case definition, who was a case. Based on various signs and symptoms. And then they had to make a selection of controls. In this case they chose controls among people living in the same household as the case. All of the hypotheses basically related to what was eaten at a particular meal. The meal obviously made the cases ill. So how would you ascertain what people ate at a meal? The easiest way of course is to ask them. Of course if they are dead or incapacitated you can't ask them. So you need to ask some surrogate what that person ate and there may be errors injected in that process, but nevertheless that's typically the only approach you have. So, they collected information about what foods and beverages had been consumed during the four hours prior to onset of illness in the case. Around the day the patient became ill for the controls. They had about 50 different foods and beverages on that list of hypotheses. And so here is a simple two by two table from this case control study. So cases and controls. Ate bread or didn't eat bread. Here you can see for consumption of bread the odds ratio was elevated with confidence intervals that exclude one and that was not true for any of the other food items. Okay? So this is a very simple example of a case control study identifying the source of an outbreak. In this case it was the bread and it was possible to trace back what it was about the bread that made it so toxic. They were able to figure out the bread was made from flour. The flour had been in sacks transported by a truck. On that truck the sacks of flour were sitting next to containers of pesticide. One of the containers of pesticide leaked into the flour and the bread was baked contaminated with a particular pesticide. It was possible to tell this by this case control study pointing to the bread as the problem. Now you might ask if you look at this if it's really the bread how come out of the 21 cases 7 of them didn't eat the bread. In theory it should have been all of them. I don't want to get into details here, but there are a number of reasons why there might be people classified here. First of all they might have actually eaten the bread and be misclassified in terms of their exposure. That's the most likely explanation but there are a number of others. Just a simple example in outbreaks we frequently resort to case control studies to test many hypotheses and figure out the cause of the outbreak. This is one more cute example of an outbreak in which a case control study was very effective. C. jejuni is a cause of food borne illness. Here you can see an outbreak was detected in London. So here the case with someone in their stool. Controls, individuals living in the same neighborhood. You go next door around knock on the door and ask if someone will be a control in your study. A neighborhood matched control. Then you simply ask questions about various exposures and here was the relevant exposure. This has to do with this was in the old days. Anyone here live in a place where they deliver milk to your doorstep? When I was growing up that was common in the United States. We don't do that so much anymore. In the UK they do deliver milk to the doorstep. Here you can see the relevant exposures all relate to having milk delivered to your door was a risk factor. And having your milk bottle attacked by crows, birds, the milk is sitting there. The crows would basically remove the top of the bottle and drink the cream from the top of the milk and in the process introduce campylobacter into the milk. So basically you can see that in fact having your milk attacked by crows the previous week associated with a very strong relationship in getting campylobacter. Sort of an interesting, not something you might have thought about in terms of an exposure having crows attack your milk bottles on the front porch. In this case a simple example of a case control study helping figure out the cause of an outbreak. Now, I just want to point out I think I've already said this. People frequently think of case control studies in terms of dichotomous outcomes and dichotomous exposures. Dichotomous outcomes means ill or well, lived or dead. Dichotomous exposures would be eat the food didn't eat the food, exposed, not exposed. In point of fact you can also look at they don't have to be dichotomous. You can look at categorical exposures or continuous exposures. And you can actually do case control studies that look at dose response effects. Okay? Going back to that campylobacter outbreak. Here the number of days per week the milk bottles on your porch were attacked by birds. Here you can see cases and controls. You can basically see a substantial increase in the odds ratio of getting campylobacter with a higher frequency of having your milk bottles attacked by crows. The more times your milk bottles are attacked by grows the more likelihood you ended up of consuming contaminated milk. Evidence of a dose response effect like cigarette and lung cancer. Case control studies are not limited to dichotomous outcomes and dichotomous exposures. Okay. I do a lot of work on vaccines. And we frequently are interested in knowing how effective vaccines are in the real world after they've listen licensed and are in common use and it's no longer ethical to do randomized control trials. How well do vaccines work? A very common approach is to use a case control study. So in a case control study of vaccine effectiveness what do we basically do? We find cases of the disease. We choose appropriate controls and what's the exposure of interest in that study that we have to ascertain? >>>: Whether or not the person has been vaccinated. Art Reingold: Whether or not the person has been vaccinated. The exposure of interest is a potentially protective effect of rack vaccine. Have you been vaccinated or not. If it's a vaccine where you get multiple doses you can look at actual number of cases. If you are interested, again, we're going to come back to this relative risk is approximately equal to the odds ratio. So we define vaccine efficacy as one minus the relative risk times a hundred. What's the most effective a vaccine can be? What's the maximal value of vaccine effectiveness? >>>: A hundred percent. Art Reingold: Louder. >>>: A hundred percent. Art Reingold: A hundred percent. If there are no cases in the vaccinated then this will be 0, this will be a 0 in the numerator. One minus 0 is one times a hundred percent is a hundred percent. If there are no cases in the vaccinated the vaccine is 100 percent effective. What's the minimum value? >>>: 0. Art Reingold: That's the trick quick. That's incorrect. What if the rate of disease is higher in the vaccinated than the unvaccinated. What if we make a mistake and give you a vaccine that gives you a vaccine that increases your risk of disease. In which case the relative risk of the odds ratio will be greater than one. One minus anything greater than one is a negative number. And vaccines can be negative efficacy. We try and avoid that. It's not good for public health. Technically the lower limit is negative infinity. If the vaccine actually causes disease, if it increases the risk of disease the vaccine will have a negative efficacy. This is an example of a new vaccine rolled out in 2000. The pneumococcal vaccine. How well is it working? We have population based surveillance. We find all the cases. How do we choose cases? The best way to decide this would be based on birth certificates. If the case was born in a particular zip code in a particular week. We found every other child born in that zip code in that week and took a random sample of those children. The exposure of interest is vaccinated or not and how many doses of the vaccine. This is a national study at ten sites. Number of doses ranging from 0 to 4. Classify the cases and controls based on their medical records and how many doses of vaccine they receive. >>>: Why you choose that method as opposed to trying to find individuals who were perhaps matched for the cases by other variables other than the zip code? Art Reingold: We're going to come back to why we chose that in a couple of minutes. If you don't have to match you are better off not matching. Despite the fact most epidemiologists want to match on everything all the time, matching in fact is not a good idea if you don't have to do it. Okay? We'll come back to that. Here you can see we can calculate the odds ratio of vaccine effective of one doses versus no doses, two doses etc. We can show the vaccine is working quite well. This is an underestimate of the efficacy which is more in the range of 95 percent. This is an example of a case control study to look at the protective effect of a vaccine. When we talk about screening such as breast cancer screening and cervical cancer screening we can use case control studies to look at the effectiveness of screening. In that case the outcome of interest would be died of breast cancer, yes or no and the exposure of interest would be screening them out. Right? Very similar approach. We'll come back to that when we talk about screening. I think I'm going to skip over this for the sake of time. It's really not that interesting as a case control study. Um, perhaps move onto this. Some of you may know there's a lot of interest in the question of whether use of cell phones increase your risk of brain cancer or not. Brain cancers are very rare. And so again, really the most plausible approach to studying this relationship is to do a case control study. And then obviously once you find the cases the first question is how you choose controls and how do you measure the exposure of interest which is of cell phone use, particularly amount of cell phone use. Okay. So this is just one example of such a study. These are different types of brain cancers. They are basically chosen by city and hospitals in finding all the cases. This is a hospital based case control study. You sit in the hospital. You find all the cases of brain cancer. They chose controls admitted to the same hospital for nonmalignant conditions. People admitted to the hospital for nonmalignant conditions would be representative of the source population with regard to cell phone use, which is the exposure of interest. Right? So, how do you ascertain someone's cell phone use as the exposure of interest? If it's just do you use a cell phone or not you could ask them. But if you actually want amount of time used how else might you get that? >>>: Cell phone provider. Art Reingold: Cell phone records. Billing records. Right? The other thing that might be of interest is which side of your head do you put your cell phone on? Do you use your left ear or right ear or ambi ear. I certainly only use my left ear for my telephone. You might want to look at the relationship between cell phone use and tumors just on that side of the head. If the hypothesis is radiation from the cell phone presumably if that's the case you would see a relationship on the side of the head where you hold your phone and not on the side of the head where you don't hold your phone. Right? At least that's plausible. Here's just where you know very early study. Here you can see cumulative use over 100 hours of cell phone use versus never. Probably you all rack up a hundred hours a week now. This was in the early days of cell phones. Here you can see this particular study didn't find any relationship but this was 11 years ago. And this was when cell phones were really only becoming more and more common. So cell phone use, A, wasn't that common. B, people hadn't racked up large numbers of hours of cell phone use. C, there hadn't been much of a latency period of when cell phone use began and when people might have had the opportunity to develop their cancer. And cancers might have latency periods of 10 or 15 or 20 years. Certainly smoking related cancers take 15, 20, 25 years to develop. This might not be a very accurate reflection of what's going on. This is a very controversy issue. Here you can see cell phone use, number of subscribers and of course a particularly interesting question is what about when children use cell phones from a very early age might the brain of a child be more susceptible to the radiation from cell phones? And might 20 or 30 years of cell phone use have a different effect? People are still doing these studies. Here's a more recent study. Acoustic neuromas near the ear. This is in Sweden find all the patients with this cancer. Population based controls from the registry of the Swedish population. If you want to do good epidemiology you move to northern Europe. In Finland, Denmark and Sweden they have registers of the entire population. They can take random samples of the population. Every person in the population has a unique identifying number. And they can take random samples of the population. We in the United States seem to be paranoid about the government knowing who we are and where we are. This would not go over well in the United States. Here you can see in this study regular mobile phone use versus never. An odds ratio of one. After ten years of years an odds ratio of almost four. These studies are continuing. Some are finding relationships, some are not. This is an open question of whether cell phones increase the risk of different types of brain cancer or not. Stay tuned and decide for yourself. Here's a famous and more recent study. Here you can see different types of brain cancer. For these types no increase in the odds ratio. So, again, a very controversial area where this is a risk factor or not. I'm going to skip over this also for the sake of moving on. So, I want to get now some of the details about this process. Because it's important to think about them. As I've alluded to how you choose controls is a very important issue in case control studies. If you can take a random sample of the population that's the ideal approach. When you can't do that various approaches take either an approximation of a random sample, we've already said random digit dialling doesn't give you a random sampling of the population. Door to door, knocking on doors basically matches on neighborhood. We frequently take controls based ongoing to the same hospital or clinic where the cases are found. Lots of issues about how to do that and whether that works. Friends and relatives are frequently chosen as controls. And sometimes the question arises should we enroll dead people as controls? Why might we want to enroll dead people as controls? When might that be reasonable to do? Seems a little, I don't know, odd. Go out and find dead people to be your controls. >>>: If the exposure of interest is lethal and you didn't include dead people you might be introducing biases to the study. Art Reingold: Right. If the illness kills some people and some of the cases are dead and you want to include dead cases you might also want to include dead controls as a way of minimizing biases that come into play when you enroll dead cases. Because when a case is dead you have to limit your collection of information about the exposures either to written records or to surrogates such as spouses or children or family members. And one feeling is you can at least equalize the bias if you enrolled dead controls per dead cases. It gets a little tricky going out and finding dead controls. That's an issue that people have to deal with when they think that's appropriate. Okay. So, um, when we think about choosing the cases for case control studies as I've said typically case control studies are done for rare outcomes and we want every single case we can find. Right? We can't afford to sample the cases. We want all of the cases, but there are instances where we have so many cases we can take a sample. And if we take a sample to have cases the source population from which the cases came should be definable. The case definition should provide accurate classification of those with and without the disease. Selection of the cases has to be independent of their exposure status. So what do I mean by that? If we're talking about smoking and lung cancer we don't want to bias the cases that are in our study primarily having smoking related cancers and missing the cancers in nonsmokers. Right? If we did that we'd come up with a biased estimate of the odds ratio. So, the cases have to be selected independent of their exposure status. Exposed and unexposed cases should have the same probability of being selected. And in general we prefer to do study with incident new cases rather than prevalent cases. And why is that? Why is there a preference to only use incident cases instead of prevalent cases? >>>: They know the disease happened after the exposure? Art Reingold: Pardon? >>>: You can insure the disease happened after the exposure. Art Reingold: Having made sure the disease occurred after the exposure. I think you can generally do an equally good job of that with prevalence or incidence cases. So what does prevalence depend on? What are the two factors that determine prevalence of the disease? >>>: Duration and incidence. Art Reingold: Incidence of the condition and duration of survival of the condition. The problem is there may be factors related to likelihood of surviving with a condition that come into play that therefore make the prevalent cases a biased sample compared to the incident cases. Right? By choosing prevalent cases we might come up with a biased estimation of the odds ratio if we only included incident cases if the exposures we're interested not only relate to your risk of getting the disease but also relate to your length of survival. Okay. Now in terms of controls as I've said these are the three basic approaches, either some sort of population based approach, particularly random selection, hospital and clinic based or friends and neighbors. And I said the other day all of these are problematic. They just have different problems associated with them. So, population based controls are the best for generalizability of results. There's the least bias. They tend to be the most expensive and the most difficult to get. And we have real problems with declining rates of participation. I don't know about you, but when my phone rings at dinner time I can see who is calling. If it's not somebody I know I just don't answer the phone. Right? First of all getting people to answer the phone is difficult. And once you get them on the phone getting them to participate is difficult. These declining rates of participation mean even if you can identify a representative sample of the population you may not be able to enroll a representative sample of the population. This is the ideal type of control, but very, very difficult to get. Hospital and clinic based controls are generally fairly easy to find. And they are thought to reduce something called recall bias. We'll talk about recall bias in a couple of weeks. But they can be difficult to determine who can be included and who can't be included. For example, if you are looking at the relationship between smoking and lung cancer, who is eligible to be a control based on people coming to a hospital? Should you include or exclude people with chronic bronchitis? A smoking related health outcome. Should you include or exclude people with heart disease? A smoking related health outcome. So who can be included? Who can't be included can lead to enormous amounts of bias. Friends and neighbors are frequently used. They are really, really easy. Okay. Um, and of course choosing friends and neighbors tends to control for lots of potential confounders. We'll talk about confounders after the exam. Fundamentally when we choose friends and neighbors we tend to live next to and be friends with people just like us. So choosing friends and neighbors tends to control for socioeconomic status and race and a number of other potential confounders. But it may lead to this problem of over-matching. All of these are flawed. All of them are difficult in terms of choosing controls. Again, I want to point out controls should be representative of the source population from which the cases came. They are not supposed to be just like the cases but without the disease. They are supposed to be representative of the source population. Another way of thinking about it is controls would if they developed the outcome under study be in the study as cases. Okay. That's another way of thinking whether the control group is actually achieving what it's meant to achieve in terms of representativeness. If that control developed the disease they would end up in the study as a case. And again it's important to point out the controls like cases should be sampled independent of their exposure status. We shouldn't be more likely to enroll smokers or nonsmokers or oral contraceptive pill takers as opposed to nonoral contraceptive pill takers or whatever it is. We need to sample controls independent of their exposure status. This is an example just to show you how difficult this process can be. There are a lot of studies showing the selection of controls in case control studies ends up introducing bias. This is a simple one on childhood leukemia. She's doing case control studies in leukemia. The question is how do you choose controls for cases? The cases are easy to identify. How do you choose controls? For each case they selected one control among the children identified using computerized birth records and located successfully. This would be a common approach to find controls. You take all the children born. You take a sample of those children but you only enroll the ones you can actually track down. That's a very common approach. And then they also took a group of what they, they also took friend controls. And then they took a third group of what they called ideal controls selected from birth records without tracing eliminating the whole problem of being unable to find the control. This is the ideal control group. Then they asked the question how different are the people they actually managed to enroll either as friend controls or as this successfully located population? And the bad news is they found they're different. Okay? So, the reality is for all the variables except birth weight, they basically found that the ideal controls if you consider these to be the perfect control group that friend controls are in general a more biased sample, but even these ideal controls, excuse me, the birth certificate controls, even that group, the children they can track down based on their birth certificates were also a biased sample compared to an ideal control group. This is one example of the problems we have with selection bias when we choose controls for case control studies. We know what the ideal is and we can virtually never achieve the ideal. The groups we can actually enroll tend to be a biased sample. Okay. Okay. This is a study I've been trying to do. Many of you probably had the meningitis vaccine. We've been trying to do a study to see how effective it was. If the cases were in young people 12 to 19 years of age, how should we choose controls for this study? The cases are people 12 to 19 with meningitis. How should we choose the controls? This is a real study we've been doing for several years. We find the cases through active laboratory based cases. We had a big discussion how to choose controls. Random digit dials was one approach. Incredibly labor intensive. Thousands of phones. Cell phones and caller ID are a big problems. We decided we couldn't find enough controls. School matched controls. We choose controls for your classmates in school X. Schools have to give permission. Getting permission from every school district, getting schools to participate is incredibly cumbersome and difficult. Clinic and provider based controls. If the case was seen by a particular doctor we'd find controls seen by the particular doctor. Very likely to overmatch on vaccination. The same doctor probably vaccinated or doesn't vaccinate all the kids in his or her practice. Hospital matched, real concerns about who is in a hospital between the ages of 12 and 19? In that age group who gets hospitalized and are they representative of people in this age group? The answer is they clearly are not. Right? And if you say maybe I'll take people 12 to 19 in automobile accidents, for example. There are few unfortunately and it's very hard to find them. Friends and acquaintances. We basically couldn't get permission to ask people for their friends's names in order to do this study. That didn't work. Neighborhood matched too labor intensive. Sending people out to knock on people's doors was considered potentially a safety risk to the interviewers. All of these were considered difficult to impossible. We ended up doing a mix of all of them. And it's still not clear whether this is an effective strategy or not and how biased these controls are. Just to show you choosing controls and what the strategy is can in fact be quite difficult. Okay. So, we frequently will see in case control studies there are many controls than cases. Right? Why is that? Why are there typically more controls than cases in a case control study? >>>: Because people drop out. Art Reingold: No. It's not because people drop out. It's not a question of a cohort being followed over time. So we don't worry about people dropping out of case control studies. So the ideal approach in terms of the amount of statistical power you get per participant is equal numbers of cases and equal numbers of controls, 1 to 1. One control for each case is the most power, the most efficient study design in terms of power per person enrolled in the study. Why might people choose more controls than cases? >>>: Increase the power. Art Reingold: Increase the power of study when that's really the only option for increasing the power of the study because? Because cases are hard to find. When we are studying rarer diseases cases are hard to find. Right? If it's a rare disease and you can't find enough cases, one alternative is to increase the number of controls as a way of increasing the power of the study. Okay? The most efficient study design in terms of statistical power per person investigated is a 1 to 1 ratio. Cases are harder to find than controls. So you can increase the power of a study by increasing the number in one group compared to the other group. Because cases are typically rare. That usually means more controls than cases, usually. But the bigger the ratio between the numbers in the two groups, the less statistical power is gained by further increasing the larger of the two groups. In other words, going beyond a ratio of about 4 to 1 or 5 to 1 doesn't add very much power with each additional control. Okay? It does add power, but less and less. That's illustrated here. This is the amount of power added in the ratio of controls to cases and you can see it levels off at about 5 to 1. So making a ratio of 6 to 1, 7 to 1, 10 to 1 will increase the power of the study, but you get less and less *** for your buck. Okay? It may be your only alternative to increasing the power of a study, but you get less and less out of it. And each person included in the study costs money to go out and find them and interview them and collect the information there's a cost associated. So for efficiency sake 1 to 1 is the ideal ratio. But you typically see ratios more like 2 to 1, 3 to 1, 4 to 1 because cases are hard to find. Matching. People tend to think about matching as a way of improving the validity of a study, but that's really not correct. Matching is primarily done to improve the efficiency of a study. Okay. That's a issue we're going to come back to, but obviously if you are worried about things like age and sex and race being confounders, something we'll talk about in a couple of weeks, one way to deal with confounding is to match on confounding factors. So, if you have a case of a certain age you choose a control of a certain age. If you have a case that's a male you choose a control who is a male. If you have a case who is poor, you choose a control who is poor. That's what we mean by matching. You can do an individual matching or frequency matching. Again, people tend to think about this as a way of enhancing the validity of the studies by making the controls look just like the cases except not having the disease. But in fact what it really does is enhance the efficiency of a study. What I mean by that the statisticians would tell you the ideal thing to do is to not match on anything and then the control for all those variables in the analysis. To use various logistic regression or other techniques to control for confounding factors in the analysis. They match on nothing and then analyze and control for those factors in the analysis. The problem with that is you may have a lot of small or 0 cells. And therefore a very inefficient study. By matching on various factors you increase the efficiency of the study. Which is why people match. At least that's the reason they should be matching. So, matching does prevent or minimize confounding. An issue we'll come back to. It increases the efficiency of a study. And but the reason people match is to give them a framework for choosing controls. It's easier to choose controls if they match on various things. Choosing neighborhood controls is frequently a matter of convenience but in the process it matches on things like socioeconomic status. Okay. So, it's frequently done for logistical reasons. The downsides of matching and there are major disadvantages to matching which is why statisticians tell you generally don't match. First of all you may not be able to match controls to some of you cases. If you can't match, the cases can't be in your study. In a matched study you simply throw them away. If cases are rare you really don't like to throw away cases. So you may have difficulty matching controls to some of the cases resulting in the loss of cases from the study. The association between the matching factors and the outcome of interest cannot be assessed once you match on something. If you match on gender or age or socioeconomic status you can no longer analyze the relationship between that variable and the outcome. So matching prevents you from studying these relationships that might be of interest. Certainly if you do match you have to use techniques to take matching into account. These are conditional methods. But there are such methods. Matching can sometimes produce this problem of overmatching. This is an issue I don't want to get into. If you watch on weak confounders you can actually introduce bias into the study and cause problems. So matching is not always a good thing despite the fact that people seem to always want to match on lots and lots of things. Okay. So I'm going to skip over that now. This is an issue we won't finish today but I'll pick up on Wednesday. Which has to do with the fact that another way of thinking about choosing controls is this approach in terms of when do we choose controls in relationship to their risk of becoming ill? So, this is somewhat more of a 250 B issue. You need to be familiar with this. There are basically three ways of sampling controls. The first is what's called survivor sampling. In an outbreak investigation that's typically what's going on. In other words you ate at a meal where there was salmonella in the food items. 72 hours you either got salmonella or you didn't. Your risk of getting salmonella from that outbreak is over within a few days. In other words, if we choose controls from people that didn't get sick in the outbreak they are survivors. Their risk period is over in terms of that outbreak. They could get it two years from now. In terms of this outbreak their incubation period is over. They didn't get sick. They survived. When you choose survivors, when you choose controls from among survivors that's what we mean by survivor sampling. But in many case control studies we are not choosing from among survivors. If we're doing a case control study of smoking and lung cancer. We choose everybody with cancer. We choose controls and we ask them about their smoking status. Right? Those controls could still get lung cancer next week or next month. Or next year. Right? They are still at risk of getting lung cancer. Their risk period is not over. Right? So, in a study like that the risk set or density based sampling are selected among individuals in the at risk population who have not yet experienced the outcome by the time a given case is diagnosed but they remain at risk. They could still get the disease in the future. Right? It's very different from survivor sampling. In this third thing that we'll talk about both all of this more on Wednesday. Case base or case cohort sampling is when we choose controls from among everyone who is in the cohort at the baseline. Okay? These are three we're going to come back to this on Wednesday. Another way of looking at this is survivor sampling we only choose controls among people who make it to the very end of the risk period that didn't get sick. Incidence density sampling we choose controls based on them not being sick at the time the case is diagnosed. They are still at risk later. And base sampling is when we choose controls among everybody in the cohort at the beginning. This is an issue I'm going to pick up and talk to more on Wednesday because it's a really important issue. Okay?