Tip:
Highlight text to annotate it
X
[ Music ]
>> Our next speaker is Tatiana Trejos who is
from the International Forensic Research Institute
at Florida International University in Miami.
And she is going to talk about Evaluation of the Performance
of Different Match Criteria for the Comparison
of Elemental Composition of Glass by micro-XRF, ICP-MS,
Laser Ablation ICP-MS, and Laser Induced Breakdown Spectroscopy.
[ Pause ]
>> Good morning everyone.
First of all, I would like to thank all the organizers
for such a wonderful workshop and for the opportunity
to come here and present some of this data.
I'm gonna be talking about the evaluation of the performance
of match criteria for the comparison of glass,
elemental composition of glass by different techniques.
So in order to evaluate the effect of the match criteria
on glass comparisons the elemental analysis working group
decided to conduct four inter-laboratory studies
or Round Robins and even though we met in advance to look
at the overall design of the Round Robins and the aims
for each of these tests all the samples were submitted
to the participants as blind test to avoid any bias
on the results and the reporting of the results.
So for the first Round Robin we focused on the evaluation
of the analytical performance of the methods
to see how we compared to the others and also to evaluate
and assess the match criteria that each laboratory was using
at that moment in their agencies.
For the second Round Robin and the succeeding Round Robins,
we design them based on the discussions that we have
in the group and experience that we gain
from the previous Round Robin studies.
So for the second round we decided to make a larger set
of standard reference materials
to further evaluate the analytical performance
of the methods and also include some samples for comparisons
to evaluate type 1 and type 2 errors.
The third Round Robin was more focused on the evaluation
of false inclusions while the fourth Round Robin was more
focused on the evaluation of false exclusions.
So there were two main questions that we wanted to answer to a--
with the study of these Round Robins, one is dealing
with the analytical performance of the methods
and the second one related to the match criteria.
So in terms of the analytical performance we wanted
to know how each technique perform versus the others
in terms of precision, accuracy, sensitivity, limitations,
interferences, discrimination capability,
how consistency we can get results among the participants,
and also something of interest in our group was to study
or work towards the standardization of the methods
and we are very close to submit two AS--
two methods for consideration to the ASTM as a product
of this group, one related to micro-XRF and the other one
for Laser Ablation ICP-MS of glass analysis.
In terms of match criteria what we wanted to do is
to evaluate the effect of sampling strategies as well
as the selection of the match criteria on the error rates
for element and comparison of glass.
And of course, one of the interests of the group was
to take a look at the interpretation
of the significance of an association when one is found.
So these graphs here represent the number
of participant laboratories that we get
for each of the Round Robins.
So we get about 14 to 18 different laboratories
that participated in each of the Round Robins.
And as you can see the majority of the data will come
from ICP users and XRF analysis.
ICP included data from digestion followed by ICP-MS analysis,
Laser Ablation ICP-MS,
and as well as Laser Ablation coupled to ICP-OES.
And one of the important things about the number of participants
that we get in this Round Robin is that we gathered enough data
from different techniques, different methods taking--
by different analysts at different locations,
different instruments, brands, and configuration.
So we get enough data to do inter
and intra lab variation studies.
So this is an example of the results
for the second Round Robin
where we were comparing the analytical performance.
This is an example of lithium present at about 5 ppms
in the standard reference material 1831.
And this is the comparison of the results obtained
for the participants of the ICP users.
And as you can see each laboratory was able
to compare their individual precision and accuracy
versus the study mean and the certified value.
So we get excellent agreement between the participants
in terms of precision and accuracy with--
per the majority of the elements greater than 10 percent.
And one important thing is that these studies are led
to the standardization of the methods, tweaking the methods,
improving, finding some outliers and that was important
as a validation process also for the members of the group.
The second Round Robin consisted also on three samples
that we submitted for a comparison
to simulate a case work.
And so those samples we're architectural float glass
that was manufactured in the Cardinal Plant.
K1 and Q1 share a common origin and they were manufactured
into 2001, while Q2 originating
from a different source manufactured
in the same plant but years apart.
And before I present the results for the different match criteria
that I-- we evaluated, I would
like to present really briefly a description of how
that data display and how we call an association.
So a for XRF or micro-XRF, the participants or the examiners,
usually the first thing that they do is to look
at the spectra of Ks and the Qs and compare to see
if they can find any significant differences in the spectra.
So, once they have done the comparison
of the spectra overlay they can also take a look
and do intensities, ratios of the intensities to look
at the data in a numerical way.
So when we ask during the protocol of the analysis
when we submitted the inter laboratory test is
that they report at least 6 to 8 ratios for the samples.
And we requested to take at least 9 to 10 measurements
of the K before comparing to the measurements with the Q samples.
The LIBS data will look very similar to the micro-XRF
so I didn't have any slide specifically for LIBS,
but we also have a spectra and they are gonna also have ratios
of intensity of the elements that they can be looking at.
In the case of ICP-MS data we get quantitative information.
So what we have is a concentration
of elements present in very low concentrations in ppms.
So we'd look about 16 to 18 different elements present
in the low ppm range for trace and minor elements.
And same thing we requested at least 10 measurements
of the non-samples and at least 3 measurements of each
of the questioned fragments.
So once we have the numerical data
and we selected a match criterion, if the K
and the Q are significantly different but at least one
out of those 18 elements or one
out of those 8 ratios then we can exclude the samples
to have come from the same source.
If we fail to find any differences in any
of those elements, then we call that an association.
So with these said I'm gonna present here a table
for represented results for the comparison of those samples
for the second Round Robin as reported
by each laboratory using their own match criteria.
So something that I want you to note here is
that the match criteria that was reported
for the second Round Robin changes a lot
between participants.
Everybody was using their own match criteria,
different match criteria in their agency.
So we have T-test with 95 percent confidence range overlap
2, 3 or 4 standard deviation or a spectra overlay
as the match criteria of choice.
And as you can see here
in the second Round Robin we got 100 percent correct association
of the samples that originating from the same source
and as well all the participants were able
to correctly discriminate the samples that came
from different sources.
>> So due to the encouraging result that we get
in the second Round Robin we try
to make the test every time more challenging to the participants.
So for the third Round Robin what we did is
that we asked each participant
to compare the elemental analysis of the samples
that we submitted as K1 versus all the question items
and this particular test was three question items
and also a second K with all the question items.
And the samples that we've submitted
for the analysis were manufactured
in the same plant it was at Cardinal Plant.
And they were manufactured years apart,
months apart, and weeks apart.
So we wanted to know but it was discrimination capabilities
of the methods when those intervals in time
in the same plant come closer.
So this is an example
of the data prior distribution of the analysis.
This was taken with the Laser Ablation ICP-MS.
And as you can see for example here with the yellow,
highlighted in yellow, as the samples get closer
in time the composition
of the element is look very, very similar.
And I highlighted in red the elements that were responsible
for the differences or the major differences between the samples.
So you can see that some of them are present
in very low concentrations
so not all the techniques may be able
to detect those differences.
That's what we expected in advance.
So this is a summary of the comparison results
for those samples that were manufactured
at least 2 years apart as reported by each
of the participants again using their own match criteria.
So, as you can see all the participants were able
to discriminate the samples that originated
from the different sources regardless of the method
that they were applying and the match criteria
that they selected.
The only two exceptions is here for one of the LIBS laboratory
that they were using their own mathematical algorithm
and after this they fine tuned the method,
they found some errors in the codes.
And we also have an inconclusive result for one
of the acid digestion ICP-MS participants
because they had a problem with one of the samples
and they didn't have enough sample to repeat the analysis
so they called this inconclusive.
But other than that we were able
to discriminate all the samples correctly.
When the samples were closer--
closer in time of manufacturing only the more sensitive methods
like ICP-MS and LIBS were able to discriminate the samples
that were manufactured few weeks or months apart.
So the summary for this, the Round Robin,
is that this Round Robin allowed the study of type 2 errors
in samples that were very similar in composition.
So we took this in the worse case scenario.
We took from our database the samples that were more close
in composition and in time.
However, all the techniques were able to differentiate samples
that were manufactured in the same plant, months, 2 months--
2 or 3 months apart, regardless of the match criteria
that they selected for the analysis.
The samples that were very, very similar in composition
and they were manufactured only two weeks apart were only
differentiated by the methods that were more sensitive
like the ICP and some of the LIBS laboratory.
So for the four Round Robin,
we decided to study a little bit a more the type 1 errors.
So we collected sample from our database from Pilkington Plant.
So we have the Q1 was manufacture in February of 2010,
and all the other samples that we submitted
for comparison originating from the same source manufactured
in the same plant just two weeks apart.
This is an example of the pre-distribution analysis
by Laser Ablation ICP-MS and something that I want you
to note here is that in this particular plant,
for these particular samples,
even though these samples were manufactured only two weeks
apart you can know that there are significant differences
in the elemental composition.
So in comparison to the previous one, we were expecting most
of the laboratories to be able to find these differences
in the samples even though they were manufacture only two
weeks apart.
And this is the summary of the results as reported
by the XRF participants using their own match criteria.
And something that I want you to note here is
that by these four Round Robins you can see how the participants
were having a match better agreement in the match criteria
that they were selecting for doing their comparisons.
Using this match criteria that all the participants were able
to discriminate correctly the samples
that we're manufactured two weeks apart and all
of them were able to associate correctly the samples
that originated from the same source.
When the measurements were taken
with more sensitive methods still all the participants were
able to discriminate the samples
that were manufactured two weeks apart.
However, we start seeing some type 1 errors
in some of the fragments.
And something that I want you to note here is
that the steel the ICP uses, we were using like a large variety
of match criteria for the fourth Round Robin.
And in this particular case this raised a flag
to the participants that we may need
to use a wider match criteria for ICP-MS data due
to the nature of the analytical technique
that we get very sensitive and the precision is very tied
between measurements, so that could contribute
in the increased rate of type 1 errors.
For the LIBS participants we got similar results
where we have some type 1 errors.
However, we also have some type 2 errors reported
which we attributed to the fact that in comparison
to the older methods LIBS still lack of standardization amount
in the participant laboratory so there was a lot of variation
of variables and even the ratios
that we're using for comparisons.
So because the rate of type 1 errors that we get
in the four Round Robin for the ICP data was very atypical
of what we have observed in over the years based on our database
and studies that have been conducted in the past.
We decided to take a closer look
to see what could have been also the cause of those errors.
So here's a little bit of history
about the source of the samples.
They were taken
from a manufacturing plant the Pilkington Plant
that was having a transition at the moment
of the iron content due to customer requirements.
So you can see that over the time they were going
from low concentration of iron to high iron and so on.
They reported-- important transition in this area,
in these dates in 2010 and the samples that we're taking
to evaluate type 1 errors were sampled just four days before
this big transition in the plant.
So the group I was interesting
in looking more closely are the originality of those samples
to see how that compares to other samples
that we have in the laboratory.
So we conducted homogeneity study
where we took all those Pilkington samples
that were included in the fourth Round Robin and also a pane
of glass from the Cardinal Plant that was used
in the third Round Robin.
So we conducted a homogeneity study to evaluate the variation
within a pane so we took five to seven fragments per set
and do a comparison of the elemental composition.
But we also look at the spatial variation within the fragment,
so we did comparison of the elemental composition
in the float side versus the non-float side and also
in different areas of the cross section.
And this is an example of what we observed
for the Cardinal Plant.
This is just iron concentration in here
and you can see this is the mean value and the standard deviation
for each of the measurements.
We didn't find significant differences across the section
of the glass in the Cardinal samples.
However, you can note here a significant difference
in the concentration of iron across the glass not only
for float versus the non-float but also
within the cross section.
For this particular Round Robin, the participants requested
that they wanted to have the samples as small as possible
to be a representative of what they have in real cases.
So we submitted samples that were very small
and they were irregular in shape so chances are that the samples
that we are submitted as Q samples originated
from different sources--
different areas of the cross section,
and therefore then may explain why we got such a high rate
on type 1 errors for the four Round Robin.
So after that we requested each participant
to take their own data and apply all these match criteria
to compare the error rates that we can get with--
under different circumstances.
>> So we requested range-overlap T-test at different P values,
T-test with Bonferroni correction, Hotelling's T
for some of the sets, and then two all way
to six standard deviation with
and without a minimum 3 percent RSD.
So this is a summary of the results for the micro-XRF
for the three Round Robin studies.
So as you can see this is for type 2 errors, and we we're able
or they were able
to discriminate correctly all the samples submitted
for Round Robin 2 and Round Robin 4 regardless
of the match criteria that was employed.
However, you can see that there are more type 2 errors
in then Round Robin 3 due to the nature of the test.
Some techniques perform better than others.
However, you have to notice that the samples
that produce these errors were samples
that were manufactured only 2 weeks or 3 months apart.
So they have very similar elemental composition.
In terms of type 1 errors, in most cases the failure
to associate the samples were obtained for techniques
that were like to test range--
range-overlap, three standard deviation, spectral overlay,
and Hotelling's T perform much better
than the other comparison methods
in terms of type 1 error.
For the ICP methods in terms
of type 2 errors again very good ability to discriminate samples
in Round Robin 2 and Round Robin 3.
We got some percentage of type 2 errors in Round Robin 3.
However, all these came from samples
that were manufactured only two weeks apart
for some laboratories that were not able
to discriminate the samples.
In terms of type 1 errors, if you look here
at the Round Robin 4 you can see
that there is a higher type 1 error associated mainly
to the heterogeneity of that sample
as I previously described particularly
for this transition in the plant.
Nonetheless, you can see here that still
for the second Round Robin that was taken from samples
from that Cardinal Plant.
We get some type 1 errors depending
on the match criteria that was selected.
So four standard deviation and four standard deviation
with minimum 3 percent RSD provided better rates
for type 1 error.
But I want you to take a closer look at the Round Robin 2,
taking the fourth standard deviation we still have
about 26 percent rate for the type 1 error.
Five of 19 of the comparisons where they're responsible
to give that 26 percent type 1 error in the Round Robin 2.
However, these errors came from 2 out 7 laboratories and only
in one element pair laboratory.
So these are the examples.
This is one of the laboratories comparison of K
versus all the Q fragments and other laboratory comparison
of the K with the Q fragments, only magnesium,
only potassium were discriminated
or excluded in the sample.
And I want you to know here when we use a four--
a standard deviation that the samples are very close
in composition however because of the excellent precision
that is typically observed
in Laser Ablation ICP-MS measurements these tiny
or small differences were responsible
for excluding those samples only for one out of 18 elements.
So one of the things that was studied presented
in the group is due to the reduced precision that we had
in ICP measurements we decided
that we can use four standard deviation
and 4S as the criteria.
But instead of using the relative standard deviation
of the measurements we can fix that value to 3 percent
when the precision is that small.
And this is a method that has been in used by the CFS
in Canada and the BKA.
They recently reported in 2011 in the Journal
of Analytical Atomic Spectroscopy all the
fundamentals behind that.
They performed a very nice study to evaluate the different type 1
and type 2 errors under different circumstances.
And they have found that these criteria provided the less
number of type 1 and type 2 errors.
So we also evaluated these match criteria for our Round Robins
in the case of ICP data.
Even though these match criteria may look a little bit wide
for what we have typically used in the past we noticed
that when we used this match criteria we reduced the type 1
error without really sacrificing the type 2 errors.
And the reason for that is that if you look here
at this graph it does represent the different elements
and each data point represents the concentration
and standard deviation for those measurements.
When the samples are originated from different sources
like in the case of the orange trend versus the blue
and the green one, they differ not only by one element by--
but by many elements that-- with many standard deviation.
So if those two samples came from different sources even
if we widen it a little bit, the match criteria,
we'd still be able to find those differences.
So in terms of recommendations learned from this study,
in terms on sampling what we recommend is to take as much
as measurements as practical.
So a minimum of nine measurements
for the K samples is recommended to really take representation
of the variation of the elemental composition
of the sample and taking
to account the heterogeneity of the samples.
In case of XRF data also appropriate samples should
account for differences in size of the fragments
and different geometries.
In terms of quality assurance and Kristine gave a talk
about that yesterday as well in much more detail.
What we recommend is to use an evaluation or a control standard
to evaluate precision and accuracy in a daily basis
in their laboratory and one easy way of doing
that is measuring a reference standard material like 1831.
And also conduct a study in their laboratories
to evaluate a method detection limits,
and method quantification limits, precision, and accuracy
so that you can know when to call Peak a Peak and when
to use those for comparisons
and Troy did an excellent job yesterday describing all the
idea behind it.
In terms of ICP what we learned is that we need
to open a little bit the match criteria,
make it a little bit wider.
So for standard deviation or for standard deviation
with three percent RSD produced the less amount
of type 1 and type 2 errors.
For XRF data, a spectra overlay seems to perform well
and that's one of the techniques preferred
by XRF users also the use ratios for the comparisons can be used
with the match criteria of range overlap
or three standard deviation which have shown
to perform well for XRF data.
Hotelling's T which is a multivariate t-test also
performs well for the XRF data, so it can be considered as well
as an alternative match criteria
for elemental composition or comparisons.
So finally in terms of interpretation,
the take home message for this study is that glass samples
that are manufactured in different plants or even
in the same plant at very short time intervals,
weeks or month depending on techniques and the variation
of plant, are clearly differentiated
by the methods that were evaluated.
And therefore we think that we can use this statement
as a start point to add in significance to the evaluation
of the elemental composition or match criteria
for the comparison of glasses.
I would like to thank the NIJ for the funding this grant
and of course all the members
for the Elemental Analysis Working Group
and particularly Dr. Robert Koons that helped a lot
with the data analysis
and discussions relating to match criteria.
Thank you.
[ Applause ]
>> I think we really had three very interesting talks there.
We're run a little bit late but I thought the importance
of these talks was worth giving them their allotted time.
Let me start by asking are there any questions
from the audience there are microphones here
and I guess that--
two microphones out there
if you have a question you might step up.
[ Pause ]
>> This question was for Megan.
What elemental analyses--
what elements were selected
for the copper wires for differentiation?
>> I believe there were multiple elements I think about 10, we--
I can't remember all of them but I know that molybdenum,
vanadium, titanium, iron, and I can't remember the others
where included in that study.
>> Other questions?
>> It seemed to be that homogeneity and heterogeneity
of the samples was an important consideration for all
of your, you know, studies.
And the particularly when you're aiming a laser at a small spot
on a larger sample, the homogeneity of the analysis
across the surface is of importance and of consideration.
Would you like to start by saying something more about,
you know, homogeneity of your work and homogeneity samples
in your work with respect to SERS?
>> Yeah, I can-- I can say something about as far as--
the tattoo was a very good example of heterogeneity sample.
And the Raman microscope itself, that example was done
with normal disperse of Raman not with SERS itself.
And so in that case because it was a solid sample we're able
to focusing on an area and get scattering
from essentially the diameter of the laser which is very small.
When it comes to SERS work however if we were to then take
that solid sample and add silver anything that would be soluble
that would be able to go on to the particle would--
could be on the particles.
So you could get mixtures and get multiple components.
So separation methods are important.
However, with that said, with SERS there are some compounds
that are much more SERS active than others.
Things with long chains don't give very good SERS spectra
things with-- with more ring groups are gonna give you a much
stronger signal.
So there is that, that aspect that can help separate some sort
of signal and be able to identify what's there.
>> Megan, let me ask you're using a laser to focus
and do laser [inaudible] indective a couple
of plasma aspects and again, you're focusing on a small part
of the sample and you've looked at variations
across some of your materials.
What kind of percent variations do you see?
Do you ever see a dramatic change in the signal
at some point on the sample?
>> Actually, yes we do see some dramatic changes especially
around the edges of the sample or parts
that have been intentionally contaminated.
As far as the other like just the rest of the sample.
We don't see much range in our-- sorry, I just blanked.
We don't see much, much range
which is nice especially it's an advantage of ICP.
>> Obviously we're interested in reproducibility of results.
But, you know, sometimes with some of these probe methods,
the identification of it and additional material
on the sample might be a probative value too, right?
>> Yes, that's correct.
>> To Tania, okay, so at the end you said nine measurements
or more if possible and sample three fragments
of glass if possible?
Now we had some discussions on our statistics workshop
on Monday and my joke was--
this is the question that statisticians hate the worst,
how many measurements should I take?
And their usual answer is "take more"
because that's always better.
But I think that your-- your comment about taking samples
from multiple fragments also speaks
to heterogeneity across fragments.
And in your data sets, what--
what how do the results compare taking measurements
across different fragments or did you look
at that aspect of the data?
>> We also conducted some heterogeneity studies
that we didn't discuss due to time constraints.
But we did some comparison of the heterogeneity of containers
versus float glass for example.
And containers had typically more a heterogenous.
So what we recommend is to take more measurements in that case
of containers and multiple fragments it's always better
to take a measurements from multiple--
a single measurement from multiple fragments than take
like three fragments and then three measurements
in each of the fragments.
If you have more than three fragments indicate
which is usually the case.
What we recommend is take as many measurements
as possible minimum nine
and if possible one measurement per fragments.
So that we can have an idea of the variation, a representation
of the variation of the heterogeneity
in the sample before we do any comparison with the Q samples
that are usually more limited in size.
>> Are there other questions from the audience?
Yes. Please step up.
I believe this might be a statistician?
>> Yes, but it's not a statistical question.
So interested in your study of the heterogeneity
of the plate glass, did you also sample sideways on the--
you know, did you have a big plate
and take different areas on the big plate?
>> We have done heterogeneity studies
in different manufacturing plants at FIU.
For these particular sets we have a limited size of samples.
So the panes were about this big, so the heterogeneity was
in like not in a big scale.
But we have sample like big panes
of architectural glass collected from the plants
at different time intervals for different parts of the ribbon
and that has been reported.
>> Wait! Stay right there George.
[ Laughter ]
>> I have a question for Tatiana but--
that perhaps you can help me answer.
But you know so you're using some statistical criteria
to make match to set match criteria--
match judgments about samples.
And particularly for the hypothesis test,
the hypothesis test are about the means of the measurement.
And so it's possible to say there is a different
in the means although have samples in the two--
you know, two-- have measurement in the two samples overlap
with one another to some extent so that it would be possible
for a measurement from--
form one chemical sample to actually be closer
to the measurements of another chemical sample despite the fact
that the hypothesis test says they're
at least statistically different--
statistically say they're different at the means.
I don't know, does anybody have any experience with talking
to lawyers and testifying in courts as to the results
of such hypothesis that's why I wanted you do stay
up there George and, you know, mention something about that
but do you have any response to that?
It's a statistical issue.
[ Laughter ]
>> I know that's one of the main problems that we all deal
with when we have to present the data in--
to a jury, how to explain that in lay terms.
And that's why when we evaluate the match criteria there is not
perfect match criteria, there is always a compromise
that is gonna be between type 1 and type 2 errors.
So we have to try to the make the best decision based
on the data and then try to worry as a community
to work a little bit more in the language, what do we say
about it, and how we present that in court
to make it understandable.
So I think that we us community can work on that.
First select the match no matter if it is difficult to explain
and then worry about how we are gonna present that in a easy way
for the jury to understand.
>> That was a good answer.
Have you testified before?
[ Laughter ]
>> Anyway, I think I'd like to thank all the speakers this
morning they were all great
and they'll be available for questions.
We need to move on to our break or we won't have one.
Thanks very much.
[ Applause ]