Tip:
Highlight text to annotate it
X
Good afternoon, it is Tapan Bagchi again. I have this series of lectures prepared for
you on six sigma. Now, one of the procedures or processes that is used in six sigma is
the improvement process. This step comes along in that d m a i c - dmaic procedure that we
had mentioned in the previous lectures. In this improvement step, what is really required
is to be able to find the causes of losses or the causes of high variance, and to try
do something about them, so that you end up with smaller variance or lower losses. In
order to do this, we have to discover some knowledge about the process, and most of this
work is imperical. Very rarely you will be able to find a theoretical
model, which if you optimize will give you the optimum result, the best result. So, most
of these times, we are guided by data collection, we are guided by experiments, we are guided
by data analysis and so on and so forth. This is something that we cannot escape. It is
actually a key step in the process of dmaic or the six sigma approached improving quality.
Now, today what we have for you, are the basic tools that are picked that have been picked
up from statistics and probability. And in order to do that, what I have to do is, I
have to give you some basic ideas about the concepts in probability and also the concepts
in the statistics. This is obviously not a complete lecture in probability on statistics.
But I will touch upon all the important concepts there, to make sure you get some practice,
and also you get to see how we utilize these principles to try to tackle real data, data
analysis problems. Now, there are many books available, and a
book that is fairly recent, and that are found to be quiet good is the one that is mentioned
right here; you can see the display, it is called complete business statistics. And it
is written by Aczel and Sounderpandian, and it is published by Tata Macgrylls, and the
one that I am using is the sixth edition of the book. You could use all those any other
book, but this one I found to be quiet readable, and it is got many examples which actually
comes from real life. So, if you feel like you could buy a book, buy this one or you
could buy some other book, it would be really handy. If you have one of these books around
to refer to it from time to time; not only for this lecture, but also when you are actually
tackling six sigma project. Let us begin by going to the slide number
one. And the very first thing, we have is the title slide probability and statistics
in six sigma a review.
The objectives of this lecture is going to be lay the statistical ground work for statistical
methods and quality assurance, that I will try to do in a very broad way. And then of
course, we would like to be familiar with the tools used in six sigma that include SPC
reliability desire of experiments, regression model building and process optimization. These
I will be touching upon and just to remind you, if you aspire to get a black belt in
six sigma. You are required to be familiar with these techniques. These are techniques
that actually are part and partial of the black belt training procedure. Say it is not
something that I have been put here only because, he is theoretically complete, but actually
these become the tools, these become the devices. That you utilize when you approach a project
that is to be tackled by six sigma.
Now, why study probability and statistics, the problem is that most things around us,
they have some uncertainty in them. For example, here I have got a system I have got some kind
of a process, which is shown by the green box. And notice here this green box is affected
by the environmental factors, it is affected by materials coming in, measurement devices
and instruments and so on. The methods and technologies that is used to do it. So, machinery
and also people obviously, these are all the things that are the input and they each have
an impact, each of these would have an impact on the process.
Therefore, the output itself will not be really a stationary or a steady output, it would
have some variation and it is this variation. When it gets beyond what is acceptable to
a customer, who have a quality problem, the customer is not satisfied. In this case what
we have to do is, we have to see if you can understand the cause and effect relationship,
which is like start with these parameters start with these factors. And see which of
these factors really is an important one, really an influential one in a impacting the
output. If that happens of course, you have got at least one handle on the process, it
is very possible of course, that more than one factor would be simultaneously affecting
the process. In that case you have to study that affect also.
How do we study these things? We study that by looking at the output and making some deductions
based on the principles of statistics and probability. And that is really the objective
of today’s lecture.
I have an example here and this example basically. I am going to take a couple of minutes to
explain, what this example is? This optimization of a process using a experiments. Now, if
you look at for example, again I will give the example of my digital watch. The theory
has progressed to the point, as far as digital devices are concerned. When we can write down
the exact equation to say, what this timing device will show at a particular moment in
time. We can relate that to the crystal vibrations the bolts cell all the devices that are inside
the, there are little tiny transistors that are inside; we can relate the output to the
characteristics of those devices. And we can write down the exact equation for it.
Now, that is a pretty high level of knowledge, generally this is not true when you go and
operate a plant. For example, a chemical plant or a steel plant or a mechanical plant or
producing widgets and so on and so forth or a metallurgical plant many times, we are really
not that sure about the running of that process, when I compared that to this state of knowledge
that we have with my digital watch. Therefore, what we have to do many times, we have to
do experiments usually have to run that process under different conditions. And these conditions
are changed by changing the input parameters, then of course, you look at the output and
you try to see, what is the effect that turns out to be most prominent and which factor
is causing it. If you can get that cause and effect relationship,
then you know that this is causing that. And therefore, by controlling the input I will
be able to control the output to the desired value, this is really our goal. How do we
do that? Let us take a look at this screen again I have a scheme here and I am going
to explain this to you later or I am just going to show you a matrix scheme. And this
matrix scheme I have got three factors involved, this is some process that has got three factors
affecting it. And this is of course, at this stage speculative, we do not know exactly
how does factor A impact the output or how does factor B impact the output, how does
factor C impact the output that we do not know what we do know is A is a factor that
can be set at two levels low and high. B is a factor that again can be set at two
levels low and high and c is a factor that can be set at two levels low and high. So,
in fact, now, I have a means to set this setting of A either at low or at high, I can do the
same thing for factor B, I can do the same thing for factor C. In other words now, two
times two times two, I can produce eight combinations. Eight different ways I can run this process
and observe the output, the result is this, A will have it is own impact, B will also
have it is own impact and simultaneously C will also have it is own impact. This is going
to show up with the experimental data. So, what we have here, we have experiment
number one, which is like when A is set at low level, B is set at low level and C is
also set at low level. So, this is the point where I have got factor A set at low level,
factor B set at low level and factor C at low level. And I observe the output I do the
same thing by changing A from low setting to high setting. And again I run a trial which
will be done at A at high level, B at low level and C at low level. So, I have got high,
low, low, that is the point, this point and again I run the process and observe the data.
In this way I will I am able to produce eight different pieces of data. Eight different
conditions under which I have run the process using three factors and at two levels each.
So, I produce this data, this data is highly valuable for us. This is the data that now
contains process information. What we then do is we subject this data to some statistical
analysis. Statistics actually is the science for analysing data; statistics is the only
science with the help of which you can analyse data. You cannot do that with geometry, you
cannot do that with geography, you cannot do that with physics, you cannot do that with
english or any other subject. Statistics is actually the subject with the help of which
you can analyse data that you produce, as the results how of some experiments.
So, I have got this data there, I do the analysis and the object of my analysis is to find.
For example, the very first thing factor f x, which really says does factor A have a
significant effect as far as the output is concerned, does factor B have a significant
effect on the output or does factor C have a significant effect on output. And is there
any interaction between A and B and C, if that is also there we also have to find that.
So, what we try to do is, we try to do the data analysis in such a way, that it expresses
this structure of the matrix experiment. And it produces these information these pieces
of information for me. What are these little graphs here, on the
Y axis I have got response plotted, on the X axis I have got the first block is for factor
A, factor A at low level and then at high level. And notice here when I move, when factor
A is moved from it is low setting to high setting response goes up. Which is like giving
me some a clue, as to for controlling the response I can move a from low to high setting,
when I look at B I find that, when I move B from high, low setting to high setting,
response comes down, when I look at C again I find again when I move C from low to high
response comes down. Now, here if my goal was to try to maximize
the output I will be picking the high setting for A plus, the high setting for, low setting
for B plus the low setting for C. So, high at A, low at B and low at C this combination
is going to give me the maximum output the maximum. The highest value for the response
Y that is like something that I did pretty simple just by doing trying some experiments,
doing some data analysis and plotting these little plots here. Similarly, I can also find
out what interaction between these two factor effects between these three factor effects.
And also I can find out by doing this, I can find the significance of these effects. By
significance I mean, if I run a trial, if I run an experiment.
For example, I will be doing an experiment right now, you know this room is relatively
quiet and so, there is a low level of noise. And therefore, you can hear my voice, because
my voice is significant when you compare that to the background what is there in this room?
But if suppose, I start lowering my voice, which I am going to do just now; and you can
hardly notice if I am saying anything, because I brought down my signal to the point it is
at the same level as the background noise here. What I have really done is I have reduced
the significance of my effect to the level when it is not distinguishable from the environment
or the environmental noise. Whatever we call the effect of a factor, which
could be written by shown by these plots here. These are each significant when you compare
them to the background noise, unless it is significant when you compare that to the background
noise, we will probably just say that factor really does not have an effect. It is just
like my trying to say something. So, if I have very little voice you will hardly notice
anything because, my voice level or the signal is at the same level as the background noise,
we got to make sure when we plot these plots. When we find these effects they are at a level
that is significant, when we compare to be a noise.
Unless you do that what is the point in controlling that factor because, the factors the same
effect as background noise. So, it is not really going to be the activising control
factor. Now, see I have got these things done, see I have got these plots done. And I can
probably empirically optimize the process by picking the high setting for A low setting
for B low setting for C. I can probably maximize Y that I can probably do, but is that really
the mathematical optimum, is that the best possible optimum. Best possible optimums are
producible, only when you got a mathematical equation and it can be optimized.
So, you can find the maxima or you can find the minima, it is only in under those conditions
you can really optimize the process. To do that of course, it will not be good enough
for me to just play with these charts here, these graphs here. I will have to build what
you call an equation this is an imperical equation. And in the language of statistics
this is called a regression model. Notice here, I have got the parameters A B C D, those
factor effects are there I have got some of the multipliers with this. And I have got
really a mathematical equation here, which is probably not as good as my, you know the
equation that is used to design my digital watch. It is not as good as that because,
here I have got a lot of good theory I have got Ohm’s law, Kirchoff’s law.
And all those things putting to this little device here, in the design of this device
that will not be possible, when I am working with this little equation here. This imperical
equation that you construct it, which I call the prediction model is really not going to
be good enough. Is not going to be as good as the mathematical model that I have for
my electrical device, but it is still pretty good. And with the help of this I can move
to the next step, which is like optimized. And in optimization we do a couple of things
we try to reduce variation and also we try to reduce defects, we try to cut losses and
also, we try to reduce variation. What is been my approach? what has been our
approach here? It is been imperical, it is not being like working out the theory and
then finding the optimum points it is not being that way. We run some experiments with
the real process; this is really the approach that is utilized in six sigma. Six sigma is
very empirical, you work with the real process it changes the different settings of the different
parameters, different factors. And then it tries to find out which factor is most active?
Which factor is not? And then you try to find the optimum settings for each of those factors
and then you get either lower losses or reduced variation.
Concepts in probability there are variety of things here, I have listed out a few things
there. For example, there is the concept of population, which is like any large body that
consists of products or it consists of people or it consists of books or any object that
is like slightly different from the other. So, there may be a particular population or
production. For example, like one days full production could be a population of widgets,
like far as producing pens, then a full days production might produce 5000, might produce
5000 pens. And that those 5000 pens then they would constitute the population.
Now, if I want to do quality testing I would not be able to sample all of them, I would
not be able to really test all of them. So, what all I have to do is I just have to pick
a handful I will pick a handful of those pens, that they are coming out of a production I
will just grab a handful of them. And then I will start looking at my sample and I will
examine each of them one by one. And then I will try to then assess quality of this
to try to make a make basically a statement about the quality of the full days production,
that is what I will try to do. So, I have population which is the full days production
and I from that I remove a sample and I have something that I called a sample.
Most of the time in statistics we play with the sample and in doing that of course, there
is the chance of probability that is there. So, we study a little bit of probability,
we study something called frequency of the various characteristics. And that is the frequency
distribution of frequency definition of probability. Then I have got some, simple events and then
in real life, we have compound events I am going to be describing all of them as we go
along. So, is the list of things that we will be looking at this is our first slide.
Then I have got some details on the way, we collect data or data may be our details basically
to try to detect two things or is the process delivering an accurate performance. And is
the precision of process good that is something that we will have to find out accuracy and
precision. And you know in order to be able to do that I have to collect data. And the
data will have be, will have to measured by some instrument and it might produce results
like categories of data. For example, excellent good fair bad, it could be that way or I could
rank them from the best one to the lowest one.
So, that also could be there ranking would be there I could also use interval scales,
when I really have slots and in that, we dump data, we dump basically we put some data there
here and there and so on. And I end up with what we call the use of the interval scale
and of course, the most popular one is the ratio scale that produces real numbers. Then
of course, when we got data collected, we would like to know, where does most of the
data stand and this is something that, we will have to do by slowly then getting into
the nature of the data that I have collected. So, we will like to know for example, what
is the central tendency of the data? Where is most of the data concentrated? And this
is usually concentrated around the mean. And now, because I have drawn a sample from the
data, from the, my population there will be something called the sample mean. And there
will be something called the population mean, both are there, there is the sample mean and
of course, there is the population mean. Generally speaking the population mean is unknown. And
we will try to estimate that by working out what if I have calculating? What the sample
mean is? So, the sample mean becomes the estimator, estimate for the population mean, there is
another way to say where is most of the data concentrated.
And that is to take a look at the median of the process. And the median basically is a
quantity that has equal number of in terms of frequency of probability, equal number
of items below eight and above eight. So, in fact, again I am going to be giving you
details on this I just want you all to know, that besides mean there are other measures.
And that tend to indicate for most of the data is concentrated and this tendency is
called the central tendency of the data. Then of course, we got something called mode, which
is mode is the point for most of the data they seem to have the highest probability.
And again I am going to be showing this to you, as we get into this.
Then sometimes of course, we trim the tail ends of the distribution, tail ends of the
distribution that we end up producing, as a result of some data collection. And we end
up, what we call the trimmed mean, the trimmed sample mean of the data. That is like something
that we will do again once we get into this analysis.
Now, that was the central tendency of the data, then there is something called the dispersion
of the data. How the data distributed? How they spread around? Are they really concentrated
together? When I got good precision or they spread around a lot, the measure for this
is, the measure for dispersion. How we find that out? Very the, very first range is called
range. Range is the difference between the max, maximum value of the observation and
within the same sample the minimum value of the observation. So, if you collected five
pieces of data x 1 x 2 x 3 x 4 and you find that x 2 is the highest number. So, put that
aside that is the max and if you find x 3 is the smallest number, then that becomes
the min. And the difference between x 2 and x 3 which
is the difference between max and min, that becomes the range for that little sample there.
Range is also a very good indicator of dispersion of the data. And of course, the data that
is theoretically most useful is the standard deviation, it is the square root of the variance
of the data that you have collected, that also is something that we will be looking
at. Then of course, we have got a population standard deviation, which is sigma and then
we have got something called the sample standard deviation. This sample standard deviation
is denoted by S and that actually is an estimated for sigma.
Sigma is the population standard deviation, then we got a quantity called inter quartile
range which is the range between the third quartile and the first quartile. The difference
between the first quartile and the third quartile, that space there is also an indication, how
widely the data is distributed. All of these are basically dispersion indications of dispersion.
Then I may have a situation, when for example, if you look at my chart here. The normal distribution
is like this symmetric, but sometimes what happens that the data tends to get skewed.
So, this distribution could be like this, that is a skew to the left and this is called
actually a positive skew. And it could also be that my skew turns out to be going to the
right and the data could be like this, this is here. I have got most of the data going
to the right this is called a negative skew. And there are formulas available that tend
to tell you how this distribution is shaped? The shape of the distribution would now come
from this Skewness of data. So, the word here is Skewness, there is another way I can represent the shape of the
data. That is to do it again when I start comparing
that I have a normal plot, which is symmetric like this that is the standard one. It is
very possible that my data is flatter than this, my data is flat like this or it is locked
like this, both of these are different from the standard normal distribution. And these
have what we call kurtosis; Kurtosis is a peculiar property which will be there in a
piece of data like this or a flat piece of data like this. And this is something, we
got to keep in mind when we are looking at data, we should not be using the normal distribution
theory. We should be using some other theory that would be appropriate, when you got a
situation like this. So, just to give you an idea I just plotted these two little graphs
there. And I gave you some indications of Skewness
and of Kurtosis. Now, it is very possible, many times what happens, you have data and
the data basically can be a plotted, there may be two variables from which I have collected
data and my data.
My data can be put on a scattered diagram and the scattered diagram goes like this.
This is x and this is y. These are two different observations and this could be for example,
sale of coffee and this could be temperature. Sale of coffee at the coffee shop and this
could be ambient temperature. You might find when I do this I tend to find a scattered
plot which is like this which will be like this. Actually there should be probably true, if temperature is high this
way and low this way. Then this would be the sale of cold drinks. As temperature becomes
low, the sale of cold drinks become also become slow and as temperature goes up, the sale
of cold drinks would become high. So, these two variables then they will rise together
and they fall together. This is a scattered diagram, to try to make
sure you can see them. I am just going to circle a few of them and you will be able
to see the scatter a bit more clearly. These are little flies and they tend to be rising
together or falling together, you see that scatter there. Now, it is very possible that
underneath there is some relationship. That is the relationship between x and y and that
probably could be represented by a little model like that. So, first we draw a scatter
diagram we take any pair of data. We took a height and weight of people. It could be
anything else, we have got two data that might have some relationship and in order to discover
that what you do is? Draw a scatter diagram and you end up with this little diagram.
If it shows this sort of tendency then you go for regression. Then you try to fit y as
a function of x and it could just turn out to be this y is a plus b x. This then becomes
your predictor equation. Given some value of x you are able to predict y, a and b are
the parameters of the model. This is also used a lot when you are trying to optimize
something and this is actually indicating association, but notice here not all the points
are straight on that direct, directly on that straight line. There is some variation around
this and that variation actually reflects the effect of not the only x, but perhaps
other factors which also might be impacting y.
Therefore, what we have to do is? We have to work out something called the correlation
coefficient or the co variation, covariance of x and y. If the co variance is 1, if the
covariance is really high then of course, you could say that I can predict y perfectly
given x. But if it is less than that then I need to have more terms in this model in
order to be able to predict y as precised as I would like it to be. If that is the situation
we will have to really collect more data and perhaps collect data on the third front also
and fourth front also. And our regression is going to be then more complicated, but
it can be done it is done all the time. This is the empirical way to build a model.
This is like using experiments to construct a model which eventually would look like my
digital watches you know that fine equation that the electrical engineer worked out. Correlation
coefficient of course, is a term that is used a lot and before that I have what we call
co variance. The correlation coefficient it goes like this. It has got co variance x y
divided by sigma x and sigma y. What is this term? This term if it is like 1 then you will
say x and y have a strong positive co relation. If this is minus 1, which will be the case
when I have got x and y going the other way. If it is minus 1 then you will say that x
and y they will move opposite to each other. And in many cases of course, x and y are not
related in which case you will probably find rho to be such that sigma; sigmas are the
only one that dominate when you look at sigma x and sigma y. You find that sigma tends out
to be pretty much close to 0 which means there is no dependency of x and y. Let me write
rho there in place of sigma. This rho if it turns out to be close to 0 there is no relationship
between x and y. So, this is again something that is useful if am using one variable to
try to control the other and this is what I indicate by saying that correlation can
be plus or it can be minus. And I have already discussed about regression
which is really a model like this. Which will be a model like this; this is a simple regression
model. More complicated models they will have more variables. These variables on the other
side they are on the right hand side. They are called independent variables and this
guy is called the dependent variable, a and b are the parameters of the model. Then of
course, we got the idea of probability and probability distributions and let us quickly
take a look at these things. We have something called random variables. I am going to give
you some examples here. Random variables are variables that depend on events.
So, you could probably say if a girl enters the room, I will mark it as one. I will mark
the value of the random variable as 1 and if a boy enters the room I will mark the value
of the random variable as 0. Here by counting out numbers of 1s, I can find out how many
boys have entered and by finding out how many 0s are there, I can find how many and so on.
So, this can be done quite easily. If this is done I have a measure now and if the events;
if they arrivals of the girls and boys is random. Then of course, the output is going
to be also random and the random variable now that is indicating the kind of people
who are walking in. And that is been indicated by 1 being girl and 0 being boy.
I will end up with here a random variable that will be influenced by a probability.
It will be influenced by some randomness and this randomness would depend on the kind of
people, who are walking in I could have a continuous random variable, many random variables
like height, weight and so on. Those are continuous and I could also discrete random variables
in which case I am basically countings an event. And I am looking at the count as the
value of the random variable. In this case the random variable is going to have discrete
value. It will be called a discrete random variable. If you look at the tossing of a
coin, that is an event that has got 2 outcomes head and tail.
When I toss it, if I toss it a large number of times then of course, I end up with an
estimate of the probability of my finding the lighter of my finding a head when a random
toss is made. This is good because I am collecting a lot of data, I can rely on it and so on
and so forth. So, I have probably toss 500 times and I found a number to be pretty close
to 0.5 maybe it is 0.501 or it is 0.497 or something which is pretty close to 0.5 and
I have got an estimate there. Tossing a coin can be too complex situations also. For example,
if I toss 3 times. What is the chance of my finding no heads? What is the chance of my
finding one head? What is the chance of my finding two heads in the last two tosses?
The first one being tail. These are complex events and the probability
of these can also be found. So, in fact what I have to do here is I have to find a method.
Just like we have plus and minus and multiply and divide to play with numbers. I must have
a system now to play with probabilities also. We can add probabilities we can multiply probabilities
I need a system for that. For that I need some ground rules. These ground rules are
called postulates of probability. These have been given about 100 years ago, more than
a 100 years ago. These were worked at just like the number system it came 1000 years
ago, more than a 1000 years ago. So, probabilities also were looked at an a 100 or 200 years
ago people worked on the algebra for probabilities adding and subtraction probabilities.
Tossing a die now a die as you know it is a thing that has got six heads. So, if I draw
a little die here it would look like this. This is a die and the distinct thing about
the die is it has got six faces and it will have certain number of dots on all sides.
So, I could have 3 there 5 here and 2 here. These are the different faces and they are
different numbers there. When I toss a die, when I throw a die the chance of the number
being the number being read being 2 or 4 or 6 or 5 is going to be one-sixth because the
die is symmetric. And the die when it is tossed it is thrown randomly. So, there is no likelihood
of three being there all the time. It could be any number between 1 and 6.
What is the chance of my finding 0s here? The probability of my finding a 0 reading
there is 0 because there is no face with a 0 with no dots there. Similarly, I cannot
find a number 9 there because it is just not there in the sample space. What we have to
then do is we have to really define what we call the sample space in order that we can
go out and start defining probabilities. So, we have something that we call the sample
space and again I am going to give you some examples there. Then we will discuss something
called the probability distribution function. We will have a probability density function,
we will have a cumulative distribution function, we will have something called a expected value
which is really the average. We will also have something called as variance
and these are values that are realized by looking at data that I collected when I was
collecting the sample. If you collect a lot of data then it tells out that the data itself
begins to take some shape. Let me give you some examples of the kind of shape data might
like to pickup.
In some cases the data might take a shape like this which is the normal distribution.
In some case when you do and what is this data really? Usually we will not find a continuous
curve, but you will find what we call a histogram. This is what you find in real. So, this is
actually these bars are the histogram if you take lots and lots of data. Eventually it will begin to look like a continuous curve
and that will be the distribution that we will represent. So, this histogram is an approximate
representation of the real distribution that is there. And this will become a plant as
you will collect more and more data. So, some processes they lead to this sort
of symmetric output. There are other processes that lead to this sort of output. Let me just
give you a picture here. So, what is the shape here? The shape here would look like something
like this. This is the explanation distribution. So, it is a different distribution. There
are many other distributions some look like this; others have some other shape they could
be like this and so on and so forth. So, there are a variety of different distributions and
these are all found in nature and that is why people came out with different types of
probability distribution models. Like for example, there is something called the hyper
geometry distribution, something called the binomial distribution, something called the
poisson distribution, normal distribution and so on and so forth.
These are all useful in studying random phenomena. Whenever you collect data it will probably
come from the one of these things and this is a lot of very rich theory available with
each of these distributions. And basically exploiting that we can by exploiting that
we can make statements about a process then this is given by let us say the binomial distribution
or the exponential distribution or the normal distribution. In quality assurance and in
particular when you are going to be working in six sigma. You will probably have to know
some of these distributions in order to make statements which others can also come back
and reproduce. That is like that is something that we would like to be able to do.
Then of course, many times we have collected some data, but we do not know what that true
mean of the population is? In that case we will have to do something called inferencing.
Inferencing basically says I have collected a certain amount of data and I am going to
be using that data now to come up with some estimate of that true mean which is mu. I
could do the same thing for variance and I could do the same thing for difference between
means or the proportion of parts or the defective in production. In fact we have many different
measures that can give us pretty decent ideas of the ideas about the what we call defects,
distributional defects or the kind of problems that we are having in a particular situation.
There are some special situations when we will need to do some testing. Testing tests
are basically mathematical processes, these are mathematical tricks or methods which are
utilized to make sure you really have a decent idea of what that true mean is? So, I may
have a quantity. Now, for example, I had x bar, x bar calculated from sample that I collected
from this normal distribution. I calculated x bar. How close is this x bar to the true
value mu? How close is this? This statement I can make only when I apply what we call
inferential statistics. So, there is some theory there that we will be utilizing to
be able to say with some confidence that I am 95 percent sure.
That the value of x bar is such that around x bar I can put a band and there is a 90 percent
chance that this guy is in that space. There is a 90 percent chance that around x bar I
can put a limit between a and b. And there is a 90 percent chance that the unknown mu
is going to be residing there. There is a 90 percent chance for that. That means if
I repeat this process of calculating x bar a 100 times, most correctly about 90 times,
I do not know 100 times. The true mu is going to be within that. That is the idea. This
is like one statement of inferencing and we do the same thing for variance and for different
parameters that are also unknown.
If we move along we also have this idea of test of hypothesis. For example, if you wanted
to say for example, we know the first example that we had. When we had the I am just going
to bring that up. This diagram here does A have an impact on the process. Does factor
A have an impact on the process? This could be posed as a hypothesis or a guess. There
is a way to collect data, there is a way to change the settings of A and look at the output.
And there is a way to test the data and this data testing will tell us whether it is reasonable
to assume that a has an impact on the process or it does not have an impact on the process.
The statistical procedure here is called test of hypothesis and that is what I have here
in this slide here which is the test of hypothesis. There I set up certain conditions and I call
them the null hypothesis and the alternate hypothesis. Then I carry through a process,
I carry through a data analysis process. And that process ultimately comes back and tells
me yes indeed, there is reason to believe that A has an impact or no I do not see the
evidence that a has an impact. Because, whatever A is doing noise is doing the same thing.
It is like Doctor Bagchi is trying to say something, but perhaps he is not saying anything,
because what he says if there is no sound, if there is no signal that is above the background
noise you will probably say Doctor Bagchi is quiet. He is not saying anything, there
is no signal. Factor A has no impact on the process and
these of course, these different test they came. They have different applications and
they would come as one tailed test or two tailed test and they involve something called
the type one error and type two error and so on and so forth. As we get deeper into
this subject you will find I am utilizing these ideas, these concepts. And I am doing
the t test sometimes, I am doing the f test sometimes, I am doing the chi-square test
sometimes and sometimes I am doing a plain and simple zee test. Those we will be doing
as the time comes along.
Now, what is probability? Why do we study probability? You see that phrase there called
stochastic. It is a very, very important point, it is a very important phrase. Stochastic
means random, it is a fancy word that the learned people they have used. Random is a
word that normal people use the you and I would use the word random, but if I speaking
as a professor I could not call it random, I would call it stochastic. So, a random variable
is also a stochastic variable. So, please do not be flustered by this fancy term called
stochastic. It means the same thing as random. Now, we have many, many decisions and these
decisions are generally based on events that cannot be predicted exactly.
Under such conditions, we still cannot escape decision making, but we are doing this in
an environment that has got lot of uncertainty in it. In that case we really have to rely
on the theory of probability. We will have to rely on the theory of odds and not just
simple coin toss, but also some rather complicated situations, when the events are interdependent,
the events have conditionality and so on and so forth. Many other complications are there
in with random events that is what we will have to do in order first we will be able
to make sound decisions and of course, when I am banking all of these things on data analysis.
I will probably observe something or I will conduct some experiments, I will collect some
data. That data will be subjected through what we call data analysis. Once you have
done that we will have some idea about the randomness. We will have some idea about the
stochasticity of the process and then our obviously our results are going to be far
more reliable.
Where do you apply probability theory? Let us take a look at this well many of you have
probably done forecasting of some sort. Forecasting is one place where you apply both statistics
and probability. Inventory management when you are trying to decide on the safety stock
for example, there you will be using probability. And also you might be using some statistics
based on past history and so on, fluctuations of past demand and that kind of thing. Quality
assurance is one area that cannot escape probability and statistics because in real life when you
go to a real factory producing things. There are many factors that are not in control.
Therefore, the output is variable. Once the output is variable to try to keep it in a
in confined two values which the costumer is going to accept all I have to understand
this randomness. All I will have to understand this stochasticity and of course, for this
I may use a control chart to plot it. I may use what we call sampling. I may use designer
experiments; I may use multiplication modelling and so on and so forth. So, variety of techniques
which are now borrowed from statistics though they utilize by people who are doing quality
assurance and of course, six sigma. Six sigma uses DOE design of experts in a very big way.
Project risk management this is like one large area where things can foul up and this is
by Murphy’s law. Whenever you have got a project, it is very possible that certain
assumptions will not hold true. For example, you are going to be building a factory and
you are counting on the demand, market demand to be rising and rising and rising. If it
does not happen then of course, your project is going to be a failure. But demand is not
a linear function always. Demand will have some fluctuations, it will probably begin
somewhere. Then it will begin to fluctuate and fluctuate and fluctuate and so on. So,
to somehow catch this up to somehow understand this fluctuation and on the basis of that
I will have to come up with a risk management plan.
So, whenever I have got; when I have a project going, I must do risk analysis and risk analysis
strongly depends upon probability theory. There are some special methods and we may
get a glimpse of this toward the end of this course when we discuss six sigma projects.
We might come back and discuss a little bit about project risk management that we might
do. Investment portfolio design, just this afternoon I was speaking with a senior professor
and he said he is got about 50,000 rupees in his pocket. And he would like to design
a stock portfolio to be able to protect his investment and at the same time generate some
good returns. Now, so his task is now going to be the market
place consists of all kinds of stocks. They have different types of returns; they have
different types of fluctuations, variations and so on. And what you would like to be able
to do is somehow balance this risk and return. And for that you would have to optimize the
composition of this portfolio. So, that is like another area where we would be using
probability and also we would be using statistics. Those are also utilized there. Business simulation
is one large area and market research. These are large areas where we use a lot of statistics.
Game theory is a big area where strategies are found and this is actually an area that
uses probability in a very big way. And I have already mentioned six sigma uses
in it is one of it is steps and I am going to write that again it is called DMAIC. DMAIC
define the problem the issue that you want to tackle, measure, analyse, improve and control.
It is the improvement step where I begin statistics and probability. I actually begin what we
call design of experiments? So, D O E is used right here and D O E comes on statistics,
statistics gives us D O E. So, if I am trying to use the DMAIC framework, I will not be
able to escape statistics. I will have to use a weinmann form or the other.
I would be discussing some of these things. I will be discussing outcomes; I will be discussing
random events and so on. I will be elaborating that as we go in the afternoon as we go deeper
into this. Well let us try to get an idea of what all various things we will be discussing?
I will be discussing something called what exactly is probability? What are some of the
basic rules for combining probabilities for adding them, for multiplying them and so on?
What are conditional events? What are compound events and how do I work out the probabilities
for compound events or conditional events? We will also get a glimpse of the distributions.
We will also try to get an introduction there and also we might get a glimpse of what we
call hypothesis testing? That also we will try to attempt. That is a lot of things and
we will see what we can get to.
To begin with what exactly is probability? This is something I would like to we are reaching
the end of this hour and I would like to make sure you have some idea of what is probability
before we end here? It is the study of chance associated with the occurrence of the events
and these events are actually random events. It is the study of chance; that is what probability
theory is. What kind of probabilities are we are talking about? We may talk about probability
it is like the chance of a rain when it is cloudy and people who have some experience
with monsoon clouds and so on. They may look at the colour of the cloud and
based on they may say there is a 50 percent chance, it is going to rain or say 80 percent
chance, it is going to rain or there is a 5 percent chance, it is going to rain. This
sort of statement is not based on really hard data, it is based on subjectivity. And this
is really the basis many times this is the only basis for working out probabilities.
Sometimes, what we do is? We construct a theoretical model and on the basis of the theoretical
model we make predictions about the probability of complex events that we do. If am not able
to do that, I will probably to sit down and collect a lot of data or run some experiments
and from that figure out the probability of certain events.
So, to try to estimate probabilities, first of all what are probabilities? Probabilities
basically are a study of any event that can occur by chance and we are interested in the
outcome of that outcome of that experiment. There are two ways I can measure probability.
I can measure probability either by theoretical consideration and I will show you how to do
that? or I can measure probability by relative frequency which is like by calculating based
on hard data calculating the odds that also I can do.
Rolling a dice, now here, I do not really need to toss a lot of coin or lot of dice.
I do not need to do that. Here I can directly say that based on my experience and based
on some little principles of probability theory, I can make a statement about if I roll the
die two times, what is the chance that the sum of the two numbers that I see the first
number and the second number is going to be equal to 4? That I could do just based on
my experience, I could do that and I am using some classical theory here. So, how could
I get a number 4? I have 1 plus 3 or I have 2 plus 2 or a 3 plus 1. Of course, I cannot
have any other number, any other pair of numbers that will add to a sum of 4 because none of
those; none of the dice would give me 0. If I had 0 plus 4 of course, we would have another
way to find 4, but that is not happening here. I have got 1 to 6 on the first die and I have
got 1 to 6 on the second die. So, I can add to get a sum of 4, I can have 1 plus 3 or
I could have 2 plus 2 or I could have 3 plus 1, these are three different ways. Now, finding
a 3 on a dice, finding a 3 is 1 by 6 and finding a number which is 3 on the first dice, 1 on
the second dice that is also going to be 1 over 6. So, here the chance of rolling a die,
1 by 6 is the probability of rolling the first die and finding number 3 and 1 by 6 is the
also the probability of finding a number 1 on the second dice. Therefore, for the two
of them to occur together to give me a sum of 4 is going to be 1 by 6 multiplied by 1
by 6 that is one-sixth multiplied by 1 by 6. That is a simple calculation that I could
do quite easily and it would turn out to be 1 by 6.
So, one way to find 4 is going to be probability 3 plus 1, these are on two different dice.
So, therefore, it is going to be probability of 3 and probability of 1. And the moment
I do that the probability of first one is going to be 1 by 6 multiplied by 1 by 6. Notice
here I have used a little bit of algebra and this algebra came from this notation. And
here what I have done is I have got really the probability of this guy 3 and 1 calculated.
I have wrote it loosely like this, but really I should be writing like this. And the probability
of this event 3 and 1 is going to be 1 by 6 and 1 by 6 which is 1 divided by 36. That
is the probability of my finding a 3 plus 1 sum.
I could do the same thing for 1 plus 3 I could do the same thing for 2 plus 2. These are
different ways by which I could construct my event 4. This is some of the numbers on
the two dice to be equal to 4. So, here what I am doing is I am using some classical theory.
I am using some classical theory. I could do the same thing for coin toss, I could define
the coin toss in a manner. When I have a certain number of heads in a number of trials like
if I throw it ten times what is the chance of my finding exactly three heads and seven
tails? I could work this out using classical theory.
Let us get down now, and very quickly define what we call sample space, events and random
variables. The sample space actually is the set that consists of all the outcomes. For
example, when I am tossing a coin, there are only two outputs possible; head and tail.
So, the sample space here consists of head and tail only. So, all I really have, I have
head, and I have tail, these are the only two outcomes when it comes to tossing a coin.
These are by the way event, the finding a head or finding a tail, observing a head or
observing a tail. These are the events or the coin appearing on head first that is going
to be an event, and tail is going to be also another event. And these two events in I mean,
I will tell you later these are disjoint, and they are also mutually exclusive. There
is nothing common between these two. If I have an event which is random, I can
talk about the probability of that event. For example, in coin toss probability of finding
head or the probability of finding a tail. In this case is going to be the probability
is going to be 0.5. What I could do is I notice here, I have got this sample space, which
I have in the diagram I have marked it as S. The sample space consists of all the outcomes.
So, in the head and tail case I will have a head setting there and also tail setting
there inside the sample space and nothing else. Now, I could define a random variable
based on that I could define it to be X, and I could map head to 1 and tail to 0.
So, I will have a little mapping done here. Head will be mapped to 1, and tail will be
mapped to 0, and these are the values of the random variable. These are events and from
the events, I ended up defining this random variable. And this is what I will be doing?
As we move along we will be defining many different events, and we will be defining
the obvious and the associated random variable with it. And we will end up with real numbers
1 or 0 or any other number that will turn out to be, that it is going to turn out to
be. I am going to continue this talk as we go into the next session. Thank you very much.