Rtog - Radiation therapy oncology group bioinformatics - Dr. ying xiao

>> Good morning, I'd like first to thank the organizer for giving me this opportunity to share with all of you the challenges that we face at RTOG of the bioinformatics technical issues and the projects that we undertake to try to resolve those issues. So for those of who is not familiar, RTOG which stands for Radiation Therapy Oncology Group is the institution that funded in 1968 and funded through NCI to contact clinical trial for adult cancer with the objectives to improve the survival outcome and quality of life, and to evaluate new forms of radiotherapy delivery techniques and to test new systemic therapies in conjunction with radiotherapy and to employ translational research strategies. Now, the Bioinformatics Working Group is formed within RTOG to facilitate the development and to develop personalized predictive models for radiation therapy guidance with specific characteristics information of patients and treatment with integrated clinical trial databases to breech clinical science, physics, biology and information technology and mathematics. Now, the two major components of the bioinformatics efforts at RTOG as with all bioinformatics efforts are database or database integration and data analysis. For the database, we have rich collections of RT dose, RT images and clinical data as well as genomic, proteomic information from biobanks and biomarker information, and to mine data and perform data analysis from these databases, we could help protocol development, protocol operation and to facilitate trial outcome and secondary analysis in other related research. So in the following slides now present a number of examples of projects that are undergoing or we're trying to start its RTOG Bioinformatics Group. And the first two are for data and data integration. And you see from this table, we have a vast clinical data from a number of clinical trials that cover multiple disease sites which include head, neck, lung, prostate. And you can see that this data can be used to model the trauma control probability and to model toxicity such as the delivery function and late or acute GU/GI toxicities. And a number of projects were quite successful by Dr. Deasy and Dr. Tucker that was successfully funded this through NCI. Now, so this example is the project that RTOG has started with this collaboration MAASTRO is an institution from Netherlands where advanced radiotherapy research has been going on. And Dr. Andre Dekker the head of MAASTRO Knowledge Engineering spent half a year late last year with RTOG and we have established this collaboration setting up the system of rapid learning, computer assisted diagnostics between RTOG and MAASTRO. So the following slides were used for by Dr. Andre Dekker at the end of his visit at RTOG to report the progress of this project. So why do we need rapid learning and computer assisted diagnostics? That's-- it's because that we wanted to achieve personalized medicine that improve survival and quality of life. As you see from this graphic, there is explosion of data with years that in addition to the general clinical information, we have structure a genetics information from it for example, and functional genetics information and proteomics and other effective molecular information in addition to diagnostic imaging of functional and anatomical type. And how can we use this explosion of data to make clinical decision is essential for us to move forward with personalized medicine. And one example that was presented with that, an experiment was conducted that eight radiation oncologist were presented with 30 patients with non-small-cell lung cancer and been asked to predict two-year survival from the information of this patient characteristics. And you can see that the performance from these eight radiation oncologists has the area under the curve of 0.57. We understand that 0.5 is pretty much random prediction and one being a perfect model and 0.85 to 0.9 would be the clinically acceptable. So this is not too far from a random prediction. And so, now how do we get data for rapid learning? And the problem is not just technical, rather they are ethical, political as well as administrative in terms of the time that's required to get the data together, political one who owns the data and ethical one, how do we maintain the privacy of patient. Now the CAT approach is that an IT infrastructure is being developed to make the radiotherapy centers semantically interoperable that takes care of administrative issue and the data actually stays within the institution that takes care of the ethical issue under the full control of the institution that result to political issue. So the component of this CAT system are the data exported from CTMS and PACS system to be converted to ETL to be deidentified and then filtered into a oncology database, and then the user would just query and retrieve from such a database to obtain outcome in standard format of XML or DICOM. And the application can be shared to analyze this data or distributed learning algorithm can be performed off this data. Now the key features of the system is that there is no sharing of data and truly federated. And both the community data and clinical trial data can be connected together and we use the extended NCI oncology library and formal additions to this library, and we use five languages and five countries and five legal systems have tested with this system. And the major focus now is on radiotherapy and we have a lot help from industry involvement. And this is the network as it stems so far, and we're actively talking to Chinese centers to see either we can extend operation, this CAT system to China as well as India and other countries. So one example that's shown here to demonstrate this-- how this system works are that we connect the database from RTOG 0522 the-- to test the model on laryngeal carcinoma that was developed from MAASTRO group of patients. And these are the input parameters that went into the modeling and the outcome. We studied were overall survival. And so this is just shows how we query the larynx oncology database. And this is the result that we obtained from our research together and it's showing the area under the curve plot as well as the stratified survival curve. And then, Dr. Andre Dekker went ahead and tested this distributed learning architecture where the-- instead of sending the data out, the parameters from the model were sent to the centralized server and we did manipulate it with updated model to be sent back to the individual model servers. And this iterative process would produce a final optimized modeling. So in here you can see that the performance of-- from the distributed learning operation is somewhat better than the model that we obtained from individual databases. So there are a number of obstacles to move this forward so that more institutions can adapt this system. One of them is that the cost associated with and we're trying to obtain funding so that we can use all open source components so that it can be more readily accessible for individual institutions. Now, the second project that we just started to explore is that we are under the guidance from NCI to explore the possibility of clinical trials comparing carbon, proton and photon radiotherapies. And we have invited Dr. Stephanie Combs to our RTOG Bioinformatics meeting in June. And when she presented the database system that is used at the particle center in Germany that's called ULICE. So, particle therapy is a very new and promising technique in radiation therapy for cancer treatment. Now we have carbon accelerators and protons. The advantage from particle therapy are that there is more precise dose delivery to the target thereby offering the advantage of sparing normal tissue and organs at risk, and also enhance radiobiological effective from carbon ions so to have the potential to increase local tumor control. Now how do we demonstrate from the clinical outcome the disadvantages theoretically that had been explored from radiobiological research or physics research that we need to organize randomized trials to establish the clinical advantage of this particle therapies? Now, Heidelberg Ion-Beam Therapy Center, they started to treat patient at the end of '09 and the main focus is to have clinical studies to evaluate the benefits of ion therapy for several indications. And the ULICE project is the Union of Light Ions Centers in Europe and they get together to develop a database with translational access to perform international clinical multicenter studies and should be accessible by both external or internal oncology students and researchers. So they were trying to establish a common database for hadrontherapy to exchange clinical experience to set up standard and harmonize study and treatment concept to transfer know-how. And their paper has just been published in Radiation Oncology the July which appeared a few days ago. And their approach is that they establish a centralized web-based system to have interface to existing information systems of the hospital so that to avoid redundant entry and to offer study-specific modules and they have implement security and data protection measures to fulfill legal requirements. And their database is rich in SQL database with capability to be dynamically extended and interfaces to the extended and DICOM and HL7 and with Java applet for manual import of data or receiving and sending data with DICOM. And the underlying components are compliant with IHE Framework. And so they have the capability to exchange store process and visualize both text data as well as DICOM data. And this is the diagram of the-- of their structure that you see that their documentation system can be interfaced with standard hospital or other information systems, either with HL7 or DICOM standards and is secured through gateways. So the security concept that they adopted for HTTPS protocol and they are tiers of user authority with account name and password, and the patient data can actually be pseudonymized and depending on the authority level, the viewer can either view the real name or the identified information. So they have been using this system for a few months now and documented 900 patients. And this-- they were able to exchange and store various DICOM RT data to be viewed by DICOM RT ion viewer. And their huge effort includes extent automatic and alectronic study analyses. So now that-- this is a very nice system that can perhaps resolve the issue of the European light ion centers if we start to contact clinical trials between US and Europe, and perhaps Japan, would be the optimal data integration method, could it be centralized or federated, we still need to work on these issues. Now moving forward, in addition to the clinical data that was used to check the modeling, what can we do to improve the area end of the curve performance of this model? So obviously we need to have a larger database that contain more patient information or more and more diversified parameters. Now, there is this simple geometrical information from CT that one could incorporate into the modeling. Also, the biomarker information and radiomics, the generic-- genetic information and be combined with biomarkers and the clinical data to hopefully improve the performance of the modeling to the clinically acceptable level. [ Pause ] So now comes to the second component of the bioinformatics effort, we come to data analysis. So the following example, I'll present the data mining that we have undertaken so that we can go towards evidence based radiation therapy quality assurance. So, the first example is on clinical target definition. Now, why is it important to perform radiotherapy quality assurance? There are two examples that I have included here. One is from TROG trial 02.02 head and neck trial where the outcome is actually not governed by the technique that the clinical trial has started to compare, but it's governed by the quality of the therapy that was given to the patient. As you can see from the separation of the survival curve, the compliant patient had a much better survival curve as compared with patient who received therapy that were not compliant with the quality specification from the protocol. And one of the major violation of the quality is actually target definition that is missing targets, either from the target definition or the radiotherapy planning or wrong prescription, and some of the duration of the treatment were to extend also. And-- another similar example from RTOG 9705, pancreatic cancer, and we can see similar performance that there's a separation of the survival not from the techniques, the clinical trial we set out to compare, but from the difference in the quality that was given to the patient. And again, the evaluation of the quality are with target definition and missing part of the targets in the treatment. So learning from the past experience, we have set out to perform this study before we activated RTOG 11 or 6, the adaptive protocol for treatments of lung cancer. So we have collected three dry run cases and sent it to about 12 institutions and asked expert to contour the targets as well as critical structures for this three lung cancer patients. And you can see the distribution of these contours from the 12 experts. And the mean sensitivity is actually 0.81 with a large extended deviation of 0.16. And this is the variation of OAR and you could see the difference between the contours of this OAR. And the consensus contour that's in thick line is plotted against all the individual contours from around 12 experts and you could see a pretty substantial spread. Now what are the impacts of this variation in the contours? And we have evaluated the tumor control probability using the consensus contour and we found out that by doing so, the tumor control probability can be reduces up to 100 percent as compared with what institution submitted in terms of the dose matrix. So, that is substantial finding. Now can we use that to maybe explore the unexpected result from RTOG 0617 when we are trying to compare the outcome with extended RT dose of 60 Gy with 74 Gy? And from the interim analysis that was presented at last year's Astral, the high dose had to stopped because of the infertility of continuing with the trial that we have not demonstrated in advantage with 74 Gy and we will not be able to with the rest of the accrual. So could our prior investigations point to one of the possible reasons to explain this unexpected outcome, that is one of the projects that is currently undertaken by the scientist at RTOG. Now, we have also used the data that we have collected for clinical trial quality assurance for image guided radiotherapy-- evidenced based quality assurance criteria establishment. So for image guided radiation therapy credentialing, we asked institutions to submit DICOM data as well as the shift information along with this DICOM data. And there a number of steps that we have established current quality assurance criteria. First, we start out to evaluate the different performance from multiple systems and obtain the uncertainty that is associated with different imagery registration systems and that was incorporated in the passing criteria that we used to review the IGRT credentialing. And then we set out to credential IGRT for a number of disease sites along by the neck and reported the outcome from this IGRT credentialing and-- from-- and we published its result. And from this investigation, we have found out what is the most impactful item of the IGRT and we have adopted our credentialing process accordingly. [ Pause ] And the following two examples are for evidence based quality assurance of radiotherapy planning, especially for intensity modulated radiotherapy. In one group from Duke, Jackie Wu and Yaorong Ge, they were invited to present their research at the January Bioinformatics Working Group meeting. They took head and neck IMRT and used the anatomical and physiological factors and quantify their individual influence in mathematical modeling and machine learning. And the code treatment planning and experience guidelines using knowledge engineering and they established a model to use these factors to offer peer review type of guidance. Now, this is their-- the example of their result. The red lines are from their modeled upper and lower level of DVH, and the blue line is the actual DVH they obtained from the clinic. And you could see that the performance is relatively good. And moving forward, we hope to strengthen the collaboration with them so that they can test their model on the bigger RTOG database. And also, they plan to use all types of knowledge sources to incorporate into their predictive models and to use the extended ontology framework to with decision support. And one example is that they wanted to incorporate the contact QUANTEC guideline in their decision making support. And another similar project by Kevin Moore who was invited to-- presented at June RTOG Bioinformatics meeting, they assume a similar approach that they have identified need that IMRT plans are not always optimal and how can we predict what kind of quality given new patient characteristic. And what they did is that they modeled the geometrical shape of a target in critical structure and modeled this premises so that they can actually predict the DVH outcome. And this graph shows that the red lines are clinical approved DVH, and these blue lines are average *** model, and the black lines are from the refined *** model. And you see it is close resemblance of the model performance to the clinically approved DVH. So this prediction for a new patient. And now, going forward again, we intend to establish the collaboration with these group of researchers to test out the model with RTOG database and perhaps we could use the results from this investigation to help us with plan quality assurance for clinical trials in future endeavors. [ Pause ] So now with the database and database integration, it-- we would be in great need of analytical method for us to extract information from this data. So, a group of researchers that's associated with RTOG had undertaken the project to study the algorithm that can resolve some of the challenges that we face in the data analysis. So one of the challenges that we face is that there is tremendous uncertainty that is associated with all the data that we are analyzing. So, with the conventional frequenistic inference method, for example with maximum likelihood estimation confidence intervals and P-values, the uncertainty is not taking into account in a genetic manner. However, with bayesian influence method, we have worked together with mathematician that we introduce the concept of belief and possibility and also, we used Dempster-Shafer theory when we can actually take into account the uncertainty within the analysis process. And this research has just been accepted by Physics in Medicine and Biology. And-- so the belief and plausibility prediction is plotted against the uncertainty range of the data points for radiation pneumonitis data that we extracted from the contact publication. And we test this against the conventional NTCP model and it shows that the-- our result is very much in line with the convention NTCP parameters such as TD50 and MS. So we hope to use this to more data and databases to offer a new way of visualizing the data to make clinical decisions. Now, future directions, and we just saw the new funding opportunity announcement early in the week that we are going to regroup into new cognitive centers as well as consolidating the quality assurance for radiation therapy and imaging. Now we call the IROC group. How do we consolidate and integrate all the data together along with tissue bank statistics data is a challenge that all of us are facing and it's going to be an exciting period for the next number of months and years. Thank you for your attention.