Mod - 08 lec - 08 proteomics and systems biology

Welcome to the proteomics course, today's lecture is on proteomics and system biology, the lecture outline; first I will start on proteomics, and then we will move on to the system biology. Today we will continue on first proteomics, and then how different type of omic data can be applied for systems approach analysis that will be discussed. So, this slide I have shown you how you can define different type of omic platform, so omic is a suffix derived from the Greek word om means all or every, the omics is used is a suffix which has enabled the explosion of terms genomics, transcriptomics, proteomics and metabolomics and so on and, so for omics also implies integration of biology with information science and conveys large scale biology using systems approach, so as you can see the slide if you are studying about DNA that will be totality known as genomic, RNA study and transcriptomics, proteins and proteomics, and metabolites in metabolomics, if you are looking at all of the cellular contents of proteome that will be known as cellular proteomics or cellular genomics; and similarly, all of the proteome of an organism will be known as global proteomics or similarly at the gene level global genomics. So, let us first start with proteomics. Say, if you remember from the previous lecture proteome is set of all the proteins which are expressed by a genome, proteomics a study of proteins and their properties to provide an integrated view of cellular processes, what are these different properties? These properties include the extent of the protein expression how different type of post-translational modification and co-modification occurs in the cell, different type of enzyme regulation whether it is activation or inactivation and then different type of intermolecular protein protein interactions. The current goals of proteomics are very broad it is including the diverse properties of proteins which we have discussed in the earlier lectures looking at the chemistry of amino acids and different levels of protein structures, so whether it is sequence, quantity, the state of modification, activity, interaction of proteins with other proteins and other biomolecules and subcellular distribution and a structure analysis all of these are broad goals of proteomics. A central dogma concept was discussed earlier which is an orderly and unidirectional flow of information in coded in base sequence of cells passed on from DNA to RNA and then to the proteins this is the simplest definition of central dogma, the genome sequencing projects they have provided researchers with an unprecedented information of genome sequences; however, there are numerous proteins which can be in coded by the genome therefore, analysis of the static genome by doing sequencing alone is not sufficient, a gene can code for several type of proteins, because of it alternative splicing and post-transcriptional and post-translational modifications; therefore, it suggest that studying proteins is more challenging than genome or transcriptome; therefore, proteomics has great significance to understand the biological systems. The complication of genome sequencing projects of several organism including human has been one of the most remarkable achievements of the century; however, these have not been sufficient to unravel the mystery of complex biological processes. The similar gene numbers of mainly divers group of organisms has failed to explained there varying biological complexity, a more meaningful understanding of biological function can be obtained through the characterization of products of gene expression, the protein which serve as ultimate effecter molecule of biological systems. The proteomics refers to the study of entire protein complement expressed by an organism at any given time, while the genome of an organism is mostly static; the proteome is dynamic and it changes with environment and time; thereby, elevating it is complexity level, the gene regulation is regulated by several post-transcriptional and post-translational modifications due to which the number of proteins expressed in a cell is much greater than it is genomic counterpart. The proteomics aims to decipher the structure and function of all proteins in a given cell under a specific conditions and to obtain a global view of cellular processes at the protein level, study either DNA level known as genomics, RNA level transcriptomics, and protein level proteomics. Analysis of the proteome involves protein extraction, separation, identification and finally, characterization of various proteins, the various proteomic techniques which are currently employed in different applications, we will talk in detail about different proteomic technologies in the subsequent modules of this course, but briefly very broadly you can group these technologies as gel-based methods, gel-free mass spectrometry-based methods, mass spectrometry techniques, techniques to study the protein interactions and structural proteomics, as I have shown some abbreviation in this slide broadly different type of gel- based proteomics such as sodium dodecyl sulfate, polyacrylamide gel electrophoresis, SDS page, two dimensional electrophoresis 2 d e, difference in gel electrophoresis DIGE, blue native page as well as different type of staining methods such as coomassie, silver, fluorescent dyes, and phosphorus stain such as process-q diamond and multiplex staining methods all these can be grouped under gel-based methods. The gel-free methods especially in the mass spectrometry site includes SILAC which is a stable isotope labeling by amino acids in cell culture, CDIT culture derived isotope tags, ICAT isotope-coded affinity tagging, VICAT visible isotope-coded affinity tagging, MCAT is mass coded affinity tagging, and then QUEST which is quantitation using enhance signal tags, iTRAC isobaric tagging for relative and absolute quantitation, gist global internal standard technology, ICPL isotope-coded protein labeling, aqua absolute quantitation, SISCAPA stable isotope standards and capture by anti-peptide antibodies, COFRADIC combined fractional diagonal chromatography, and MudPIT which is multidimensional protein identification technology all of these are various new advancement in the gel-free methodologies. The basic mass spectrometry which is central to the proteomics application includes different type of ionization sources such as matrix-assisted laser desorption ionization maldi, electrosprayionization ESI, and different type of mass analyzers such as quadrupole, time of flight, ion trap and fourier transform, mass spectrometry then different type of tandem m s based systems are also used, the surface-enhanced laser desorption ionization time of flight seldi-tof is also used for various chemical applications. The protein interaction methodologies include immunoprecipitation, yeast two hydride methods, and different type of protein microarray platforms such as antibody arrays, nucleic acid programmable protein arrays, multiple spotting techniques and various other cell based and cell free expression based protein microarrays, the detection can be either based on the label methods using fluorescence, chemiluminescence or radioactivity, or it could be different type of label free methods such as surface plasmon resonance, interferometry based methods, or different type of conducted based methods employing nanotubes and nanowires. The structural proteomics it involves X-ray crystallography, nuclear magnetic resonance N M R, transverse relaxation optimization spectroscopy TROSY, circular dichroism C D, and different type of microscopy methods including atomic force microscopy and electron microscopy. So, as we have seen the large number of proteomic technologies which are currently available for various applications, many times to address one biological question different type of methodologies come together and then provide solutions to that problem; for example, looking at some clinical sample for identification of biomarkers of a disease one can employ the samples such as tissue or blood or different type of body fluids and then either directly extract the protein and subject to the mass spectrometry, or first resolve on two dimensional electrophoresis followed by identification on mass spectrometry, or take these samples directly apply on the microarray-based platforms and then detect using label based or label- free methodologies. Eventually these type of results will enhance knowledge for the monitoring the therapy response as well as identification of early disease diagnosis, this is just one example, similarly, multiple type of proteomic technologies can be used for different applications. There are several proteomics techniques which are employed for studying these proteins such as two dimensional gel electrophoresis, mass spectrometry, protein microarrays as well as some label-free detection techniques such as surface plasmon resonance S P R. After discussing proteomics now let us talk about system biology, so what is system biology? Is that an esoteric knowledge, a method to understand biological systems or a tool to solve the practical problems, systems biology is the examination of a biological entity as an integrated system rather than the study of it is individual characteristic reactions and components which is termed as systems biology, in study of all the mechanisms underline the complex biological processes in the form of integrated system of many interacting components is studied under system biology. The system level underrating of biological networks require information from different levels, as you can see from DNA to RNA to proteins for make systems and then that information can be applied to understand a complex system for different organism, the biological information is represented by the networks of interacting elements and dynamic responses to the perturbation. These networks provide insights which cannot be analyze from the isolated components of system, the common elements of the system biology include networks, modeling, computation and dynamic properties. The different type of biological networks such as protein-protein interaction networks, gene regulator networks, protein and DNA interaction networks, protein lipid, protein other biomolecule networks and metabolic networks. The various ingredients of systems biology for example, if you are studying about a cell and a system behavior one need to look at the genome it is transcriptome profile, proteome profile, how protein D N A and different type of transcriptional networks are altered, protein-protein signaling networks, multimeric complexes how they are found? How proteome is localized in the intracellular dynamist, and metabolic networks? So, all of these are ingredients of studying about a system. A systems biology study can be done at different levels for example, to study the complex physiology of human one can look at individual system such as respiratory system, nervous system or other physiological systems, studies can be done at the intercellular or intracellular level and finally, at the molecular level involving genomics transcriptomics and proteomics So, why there is need for systems biology? The study of biology at the system and subsystem level for understanding the biological processes and network is very much required, as you can see to understand even simplest system of a cell how it is regulated with the it is extracellular space, the sitoplas and different other components, the examination of the structural and dynamics of cellular and organizable function is very much required for understanding the systems, rather than the characteristics of isolated parts of the cell or organism. So, what is the aim of systems biology to understand the biology in holistic approach rather than the reductionist approach? The systems biology aims to quantate the qualitative biological data and provide some level of prediction by applying different type of computational modeling. The systems biology approach involves first of all collection of large experimental data sets and then mathematical models to provide insight of some significant aspect of the data sets. Simple systems biology approach would involve experiments by adding new data sets which will be used for model constructions and model analysis, and the biological insight derived from these models can be used to propose new hypothesis, so the properties of a systems are probably more than just the sum of all it is individual properties or it is components; therefore, it is possible that system may have it is own property by applying all the components. So, what are different approaches have been taken to study the systems biology? The distinct approaches of systems biology include the model-based and data-based methods, the model-based approach involves some prior information which can be implemented in the models; whereas, the data-based objective is to find the new phenomenon, the model-based relies on computational modeling and simulation tools; whereas, the data-based method relies on the omics data sets. So, in model-based systems biology approach it is difficult to build the detailed kinetic models, but in data-based system the complex relationship among the various type of omics information and metabolic pathways and networks can be created. Studying systems component is very challenging, systems biology and biological network modeling aims to understand the systems structure and function for better understanding of system properties like it is robustness as well as use for the prediction of systems behavior in response to the perturbations. The reductionist approach involves disintegrating the system into it is component parts and studying them. Whereas the integrative approach involves integrating the study of individual components to form conclusions about the system. What is systems biology triangle? So, first of all the systems information is generated at various levels, as we have discussed starting from genes to mRNA to proteins to metabolite, or identifying regulatory motifs, metabolic pathways, functional modules and different large scale organizations, this information has to be stored processed and further executed to identify the system level information. Even simplest system such as cell can be linked with various properties it is genome sequences of different molecules, intracellular signals transcription factors, different type of Cis binding activities, the expression profiling of RNA and proteins and different type of cellular processes. So, what approach one can take to study about the system? Extraction and mining of complex and quantitative biological data, integration and analysis of these data sets for the development of mechanistic, mathematical and computational models, validation of these models by retesting and refining after proposing from hypothesis, different online data bases and repositories are nowadays develop for sharing large data sets and various systems models, the systematic approach to study how molecules act together within the network of interaction that makeup life is definitely going to be useful to understand the systems biology. The systems biology triangle as you can see here involves the experimental data set could be derived from different type of omics platforms, technologies how the computational analysis can be performed, different type of bioinformatics software's and tools, and then computational modeling by obtaining sub theoretical concepts, this synergistic application of the experiment theory and technology with modeling to enhance the understanding of biological processes as whole system rather than the isolated part is termed as systems biology triangle. So, the systems biology triangle the wet lab experiments or bioinformatics based data analysis can be used to propose a model, the model building as an aid to understand the complex system, and some hypothesis can be generated which could be used further to propose more quantitative models or predictive models, and also it can be used by independent techniques for the model validation. So, what is systems study? First of all the difference between systems study and the component study one need to understand, and this what we have try to emphasize in the previous slides after generating the data set and creating the systems biology triangle then this information can be used for understanding the systems in the more complex and mechanistic level. Systems study and model building, the system science includes synthesis modeling concepts analysis, life sciences discipline provides quantitative measurements, genetic modifications and deriving some hypothesis, then formation sciences enables the visualization, the modeling tools and different data bases, so this model building as an aid to understand the complex system is very useful for systems level investigation. System is an entity which maintains it is existence through a mutual interaction of it is constituent's parts. The systems biology research consist of identification of the parts, characterization of the components, exclude the once which are not a part of the system, identify the interaction of the components with each other, and identify the interaction of the components with the environment which modulate the parts either directly or indirectly through modulation of internal interactions. The system biology concept can be understood with help of two approaches such as reductionist approach and integrative approach, the reductionist approach focuses on disintegrating the system into it is component parts and studying them; whereas, the integrative approach focuses on integrating the study of individual component to form conclusions about the system. Consider a cell with it is component molecules, let us study the metabolic pathway as a biological system, when the environment of this cell is perturbed a little the individual components undergo unique changes such as increase in the production rate or decrease in their amount, at this stage due to lack of knowledge of the nature of interaction of proteins we cannot interpret how the system gets affected, but when we study the interaction of one component with the other we can conclude that the increase in rate if DNA binding protein leads to increase in the synthesized amount of DNA which further changes, the final amount of lipoproteins produced thus we can see that we study a system we need to analyze not just the components but their interactions, these biological systems can be protein-protein interaction networks, gene regulatory networks, protein DNA networks, protein lipid networks and metabolic networks. To study the system we need to know about the components and it is interactions, the data about the components comes from genomic and proteomic studies, the information about the molecular interactions comes from the interact omic studies, here it is shown that is the systems approach experiment, technology and computational modeling, this triangle is very important which has to be linked with the theory to form a systems triangle. So, the different technologies which have been employed to study the systems biology; obviously, you need high throughput data sets which could be derived from microarray platforms, RNA deep sequencing, different configurations of mass spectrometry, different type of structural proteomic tools and protein interaction data sets. So, some of the technologies which are commonly employed in systems biology can be classified broadly under the following techniques, for genomics the high throughput DNA sequencing, methodologies mutation detection using SNP methods for transcriptomics the transcript measurement can include serial analysis of gene expression Sage, gene chips, Microarrays and RNA sequencing, for proteomics mass spectrometry, two dimensional electrophoresis, protein chips, yeast two hybrids, X-ray NMR is mainly employed for the metabolic analysis the metabolomics. So, as you can see here to generate the systems level information, the systems study requires different type of technologies which could be employed in the biological systems, at level by studying different type of technologies using high throughput sequencing, high density arrays, transcriptomics, different type of transcriptome analysis using RNA sequencing and microarrays, proteome we discussed many methodologies, metabolome could be either using NMR or mass spectrometry, and then phenome which is studying about the images by using or NMR methods, so each level of these omic technologies can be useful for studying the systems biology. Let us now talk about how to model the biological networks? To build a model in system biology first of all the parts list can be generated by using data sets derived from the systems biology approaches. The systems are subsystems model can be generated which can be used for the systems model analysis, now this could be applied for the real systems and by applying the knowledge, using bioinformatic tools it could be again applied back to the original components which could be used to derive some hypothesis and validation of the data sets so it will work like a closed loop. To build the models in systems biology information is generated at different levels, level one such as DNA and gene expression, level two the intracellular networks, level three cell cell and transmembrane signals, and level four integrated organ level information. What is the framework required for the modeling schemes? Different type of deterministic or stochastic models have been proposed, the compartmental variables or individual or functional variables have been studied, the specially homogeneous or especially explicit models are generated which could be applied in the uniform time scale or separated time scales, now this framework could involve single scale entities or cross scale entities, as you can see here this framework requires different level of information in very complex manner, whether it is curation of the databases, how to align these information using bioinformatic tools to generate the predictive models which could be also developed by using the literature curated data sets or experimental data sets; and finally, it could be used to study the systems level properties. Let us discuss the workflow of mathematical modeling, a paradigm can be proposed based on modify, model, measure and mine, so systematic experiments different type of molecular genetics, chemical genetics and cell engineering approaches can be used for modifying, and different level of measurements by applying microarrays, spectroscopy imaging and microfluidics, based approaches from proteomics and genomics can be used further for mining which involves bioinformatics, data bases and data semantics, now these data sets could be used to derive the models which could be reaction mechanistic statistical or stochastic models, so starting from systematic experiments to reaching and deriving the quantitative models this work flow can be applied. The modeling of probabilistic processes involves let us say you want to study a biological system so some experiments have to be performed, the experimental data sets will be generated from which some can be applied which can be used for the comparison, now different type of models can be generated by using simulations and simulation data sets which can be used for intermediate statistics, and by comparing these two type of information and adjusting the parameters one can study the systems and derive the probabilistic processes. So, what is ordinary differential equations and stoichiometric models? The quantitative analysis measures and aims to makes models for precise kinetic parameters of a systems network component, it also uses the properties of network connectivity. The ODE is a mathematical relation that can be used for modeling the biologically systems, the quantitative models mostly use ordinary differential equations or ODE to link the reactants and products concentration through the reactions reaction rate constants, to develop a computationally efficient and reliable models of the underline g-regulatory networks these ODE models can be used, the stoichiometric model it is modeling a biological network based on stoichiometric coefficients reaction rates and metabolite concentrations. This is my pleasure to introduce doctor Sarath Chandra Janga from Indiana university and Purdue university Indianapolis, he is in school of informatics and school of medicine. So, as we have been discussing about need for studying proteomics assistant biology till lot of information available at the transcription and translation level, and often this is not good correlation between RNA level and the protein level, so today it will be interesting to talk about systems approaches for a study biological networks from post-transcriptional control towards the drug discovery, so I have invited professor Sarath for having a discussion and a short talk on this topic. Thank you doctor Srivastava it is my pleasure to be here to talk about some of the work that we have been doing and more generally the principles of regulation and how we can use systems approaches for understanding biological networks more generally, as some of you might be familiar with the use of the concept of networks is increasingly becoming prominent in not just proteomics but also in genomics data and all kinds of high throughput data. So, today what we will be talking about is some basic introduction to the application of networks and biological systems, and how it can be applied to understanding transcription regulation, post-transcription regulation and as well as to the proteomics data, and at large how this can be used to understand the drug discovery how can how it can be applied to the drug discovery concept. According to the central dogma of molecular biology DNA gives rise to RNA through the process of transcription, and this process facilitated by the binding of the RNA polymerase as well as a number of other transcription factors which bind to the upstream regions of the DNA as we can see and control the expression, and RNA can give rise to protein through the process of translation, and this happens through the process of translation with the help of ribosomes, now in this process the proteins which I have produced some of them can be classified as transcription factors which bind to the DNA, and some others classified as RNA binding process which can bind to the RNA and control the expression at the post-transcriptional level as oppose to at the transcriptional level where transcription factors bind to the DNA. Now, as an example let see that case of arc transcription factor in a bacterial genome such as e coli, this particular transcriptional factor binds to the upstream reasons of ara b a d operon which encodes for the enzyme and the transporter responsible for uptake of arabinos from the environment, now the transcription factor ara c not only binds to the upstream of ara b a d, but it can also binds to itself and control expression. As you can see from the small orange boxes which are shown as representation for the binding sights, now what they suggest is transcription factor can out regulate bind and regulate it is own expression or it can also bind to other gens controlling their expression, there are also cases there are many cases actually where transcription factor multiple transcription factors bind to the upstream regions. As you can see in the case in this case represented with the orange box as well as the blue box so blue circle where other transcription factors bind, now in addition to this binding of transcription factors as I mentioned earlier polymerase RNA polymerase also binds shown in with the with the green box a green circle green oval box out there so that they can control the expression, now there are other examples I also shown in this figure with mel r regulator also doing something similar. Now, this is what we just discussed is an idea of how regulation happens from a biological view point, now an increasing thing increasing amount of literature now supports the ideas of networks in biology, so what exactly our networks? Networks simply represent are represented as nodes and links or edges, this nodes can be biological entities and the links or edges or actually the association between these entities, now there are number of ways you can talk about the nodes or the entities, so one common form of representation are protein interaction networks, where the proteins form the nodes and the physical interaction between this proteins forms the edge as you can see in this in the figure below. The you can have a representation of this networks in a in a fashion that is shown in this figure below, now an alternate kind of network which is also studied in the literature over the last 10 years or so are metabolic networks, in metabolic networks the metabolites form the nodes and the conversation of one metabolite to other forms the edge in this case, now as you can imagine the conversion of one metabolite to the other is actually facilitated by the enzyme, so the particular protein enzyme converts a metabolite a to b, and you look at on a global scale and when you looking at the conversion number of metabolites one to the other and sometime one metabolite can give rise to more than one set of a metabolites, such complex set of associations can be called as a metabolic network. Now, the third kind of networks which I will be elaborating in more detail in the next slides a transcriptional networks, in transcriptional networks transcription factors form one set of nodes and the target genes form other set of nodes, so as you can imagine what here actually looking at in this case from a biological viewpoint is the interaction of the transcription factor with the DNA and controlling of the expression of the downstream genes, but in the context of the networks what we are showing here what we are showing is the transcription factor and the target gene or operon whose expression is control, again in this case you can see that the a protein a which is transcription factor controls b, but it may or may not b that b is transcription factor and it also controls a, so that might be a case to case specific and may or may not be having reciprocal interaction. As we just discussed these networks are actually the concept of network has been borrowed from physics and computer science where often this kind of networks are refold to as graphs, and graphs are objects which are on collection of nodes and entities the nodes are representing the entities it could be this these entity could be genes proteins small molecules cells organs or at any level you can represent these entities, the interactions are association between them or the links, now as I am just mentioned there are different kinds of networks there protein-protein interaction networks, metabolic networks, transcriptional networks. In the case of protein-protein interaction networks what we are looking at often is no directionality in such interactions and these are called as undirected networks; however, there are also directed networks such as transcription networks or metabolic networks, in these cases there is a flow of information I e where a controls b which should mean a is controlling a is regulating b so therefore, there is a directionality, and these are a often commonly studied as regulatory networks. And we will be talking in more detail in the next slides; however, before we get into the more specific observation about the properties of this networks one set of common properties which are studied when you are looking at biological networks are degree, path length and clustering coefficient. Now, often when you look into a network as such you do not have a clear understand of the properties of the different nodes, but when you look into the specific aspect such as in this case shown in this case as degree what it tells you is how many connections of particular gene protein or node has in your network? So, what we can say from the first example on the top is the degree of the node is 8; that means, it is connected to 8 other proteins, and in the second property is path length, what it is showing in this case if you is that the number of edges that you need to travel from one node to the other, so if I ask you what is the path length between that first and the bottom node in this figure you would say the path length is equal to two, the third kind of property which often studied is the clustering coefficient; clustering coefficient tells how often the neighbors of a given node are connected to what you would see in a completely connected graph. Let us look at more detailed examples, for instance if you are studying the degree of a node in the case of undirected network such as in the examples shown in the top the florescence node that shown florescence color has a degree equal to two, on the other hand a directed node example shown at the bottom has a degree equal to four, because it is connected to four other nodes; however, what you can also say is there is in degree and out degree, and in degree is the number of incoming connections of a particular node. So, the green florescence nodes here has in degree of one, it also has an out degree equal to three because it is directing three other nodes shown in red color out there, so it is out degree equal to three. Now, you can also extend this idea of undirected and directed graphs and ask what is the path length of a node, now as I mentioned the path length is referred to as the number of edges that one need to travel between two different nodes that you are interested, on in the top a network that you are seeing the path length between the two green or florescence nodes is equal to two as well as equal to one, because the path that you can take can be different then the shortest path that you looking at; however, almost often unless they were specified when you are talking about the path length between two nodes it is the shortest path length, so the two florescence nodes have a path length equals to one; however, if you if you are ask what are all the path lengths you would say they it has two different paths one with a path of one the other with a path length of two. In the undirected networks your definition of path length essentially does not change, so in the example that you see at the bottom the path length between two florescence nodes is equal to two. The other property that I was referring to previously is the clustering coefficient of node, and clustering coefficient refers to the number of the connection between the neighbors of a given node of interest to what you would see in a completely connected graph. Now, let us look an example in this figure that you see there is the first example, the florescence node the green node has three connections three red dots are connected to it; however, if you ask the number of connection between the red dots it is 0, there are 0 connections between the red dots, but if they were fully connected you would see that they will have three edges between them, so the clustering coefficient of the florescence node right now is 0 upon 3, let us look at this second troy network in the second troy networks the florescence node has a clustering coefficient of 2 upon 3, the third case the clustering coefficient of the florescence node is 3 upon 3 which is completely connected, so the clustering coefficient is equal to 1. Now, more generally formula can be brought up and it can be written as if there are n number of interactions between the neighbors of a node of interest and there are n number of neighbors of a given node of interest then it can be written as m upon n into n minus 1 by 2, so that would be defined as a clustering coefficient of that particular node, and on average the clustering coefficient of a node on a whole network scale it gives you an essence of modularity of the network, the higher the average clustering coefficient the more likely is the network cluster can be decomposed into specific a modules. Another property that is of great interest in understanding biological networks is a scale- free structure, and while lot of biological networks are documented and shown to be scale-free transcriptional networks are also documented to be scale-free structures, so what exactly our scale-free networks? Scale-free networks are correspond to the structure of a network where there are few nodes which are highly connected, for instance in the figure to the left in the network figure that you see to the left there is a big red dot big red node which is highly connected so but there are not many such highly connected nodes, and there are many nodes which are very poorly connected. So, in other words a scale-free structure refers to a network structure where there are few nodes which are highly connected and most nodes are poorly connected, or more mathematically if you plot the connectivity of a node versus the number of nodes with a given connectivity we should see a power-law distribution, otherwise if you plot the log-log plot of the connectivity versus the number of nodes with a given connectivity you should see a negative slope of gamma as shown in this figure, where gamma lies between 2 to 3, that is when you can call the structure to be scale-free and the and the distribution to be a power-law distribution. Now, what is so special about this scale-free structure? Scale-free structures have been postulated to provide robustness to the biological system. Now, what exactly is robustness? So, robustness is the ability of a complex system a complex system such as a biological system to maintain it is function even when the structure of the system changes significantly, now let us look at an example so in the network figure that you see if you randomly perturbed any of these nodes you are likely to effect a small fraction of the network; however, if you target the highly connected node that is the central node which is highly connected you are going to disrupt a major fraction of this network suggesting that these highly connected nodes can be vulnerable to be the drug targets, so if you are trying to inhibit the growth of a pathogen you are likely to target this highly connected nodes because you are more likely to be able to crumble the biological system of the pathogen, so and this has been increasingly gaining attention as a method of targeting drugs to this kind of this class of proteins. So, as mentioned earlier we have been talking about regulation of a single transcription factor, but in the context of network regulations is much more complex and what we are referring to is a combinatorial regulation by many different transcription factors, let us look at a specific scenario, so the slide shown here shows a typical regulatory system in a bacterial organism, what you usually have is a set of signals which are sense by the cell and these signals are sensed by sensor proteins, these sensor proteins could be transcription transporters, or this could also be kinesis. And once these sensor protein sense the signals from the exterior or even sometimes interior of the cell they can cascade the information to transcription factors, the transcription factors upon receiving the signals can change from active to inactive or inactive to active state, and when this happens because of multiple sensor proteins these transcription factors can change the confirmation and bind to the upstream regions, and shown at the bottom is a stretch of DNA where this transcription factors can bind in combinatorial fashion often and control the expression of the target gene or operon, as a rule of thumb if transcription factors bind to the upstream regions in the upstream of the transcription start site shown as plus 1 that is where the transcription actually starts, you often or stimulating the polymerase and enhancing the expression. However, when you bind to the downstream of transcription start site you typically repress the expression of the target gene thereby blocking the transcription by the polymerase shown in the oval shaped polymerase symbol in green, so based on these principles and together with the interplay with the transcription factors and the polymerase your transcript is produced, and once transcript is produced you can have mRNA and protein levels regulation which is not what will be talking immediately now, but all these levels together contribute to provide feedback and this is typically system simple regulatory system that you encounter in bacterial organisms, but more complex systems more complex eukaryotic gene regulation is much more complex and beyond the scope of our current discussion. As discussed in the previous slides the basic unit of regulation is a transcription factor and a target gene whose expression is being controlled, now on a different scale if you increase if you put together all the set of regulatory events between transcription factors and the target genes are operons you construct a global transcription regulatory network, and as I mentioned earlier this network is a scale-free structure scale-free network, but in addition to this it is also hierarchical structure wherein what we are actually referring to in a hierarchical structure is there are set of transcription factors which are able to regulate a large number of genes, and there are set of genes other transcription factors which are also controlled by this global transcription factors shown at the top of this network structure, and both the top layer and the second layer all of them together regulate the set of genes which are not essentially encoding for the protein coding which are not essentially encoding for the transcription factors. So, in a way there are transcription factors which are at the top of the system, there are transcription factors which are controlled by this top layer and there are subsequent layers, and the number of layers such a hierarchical structure depends on the complexity of the system, now in between the top and the bottom layer the in between the leftmost figure of basic unit and the rightmost figure there are set of substructures or sub graph within the regulatory network which we call as motifs. Motifs are the set of sub graphs which occur more often than expected by chance, and there were three kinds of regulatory motifs that are identified in regulatory networks. One is the feed forward loop where there is there are two transcription factors the first transcription factor regulates the other two genes, the second transcription factor regulates the target gene. The second kind of motif is multiple input modules where there are two different transcription factors both of them regulate two different target genes. The third is a single input module where a single transcription factor regulates a set of target genes, now each of this set of regulatory motifs have been shown a specific function and which would be beyond the scope of our current discussion. Now, although the idea of regulation of gene expression the level of transcription has been documented for several years and we have extensive understanding very little is known about the regulation of gene expression beyond transcription, and it has only been recently being appreciated about the role of regulation at the post-transcriptional level, now most of this evidence for the reason why post transcription regulation is becoming important is coming from the lack of correlateon between mRNA and protein pools in model systems, now there is now enough evidence to suggest that this post-transcriptional processes are actually controlled by a class of proteins called RNA binding proteins, among non-protein coding components such as micro RNA's non-coding RNA's, so RNA binding proteins are now known to be involved in controlling the RNA processing, RNA longevity as well as in the translation. Now, in particular as soon as the gene is transcribed and pre mRNA is produced splicing associated RNA binding proteins bind to the pre mRNA and convert into mature mRNA by splicing of the introns, now the produced RNA not necessarily only mRNA is needs to be exported from the nucleus into to the cytoplasm, and this is carried out by class of RNA binding proteins which can be termed as transport RNA binding proteins shown with number two in the figure, binding proteins have also been implicated in the specific sub cellular localization of this transcripts, RNA binding proteins are documented also in controlling the stability of the transcripts thereby promoting or degrading the expression of these transcripts, as expected RNA binding proteins a number of them are associated with the ribosomal proteins to control the regulation of expression at the translational level. Now, other aspect of regulation understanding regulation at the post-transcriptional level is that number of RNA binding proteins are involved in human diseases major class of human diseases such as cancer, muscular atrophies and neurological disorders, in this network diagram shown here the major class of diseases are shown in orange while the subtypes of diseases which are which can be sub classified or shown in blue, and the specific RNA binding proteins which have been documented are implicated in these disorders are shown in green. Now, let us take it a specific example of as a muscular atrophy called myotonic dystrophy, in this particular kind of disorder a CUG repeat binding protein called CUG B P 1 binds to the three prime un translated region of a D M protein kinase, and because of the sequestration of this CUG repeat binding protein onto the trinucleotide repeat expansion in the three prime un translated regions this particular disease phenotype is observed, another example of misregulation of an RNA binding protein happens in OPMD which is another kind of muscular atrophy, in this particular kind of disease there is a GCG repeat expansion in the axon one of an RNA binding protein which is a poly a binding protein called pab P N 1. Another example we can observe which is which is heavily documented in the literature is a brain specific splicing factor called nova, whose misexpression is known to cause a disease called poma which is a subtype of neurological diseases, so what I am trying to here is that if there is a change in expression of either RNA binding protein, or any of its targets it can be associated to a disease phenotype, and all the studies basically such is that it is not just the effect of a single gene or protein it is rather a combination of different set of genes and proteins which contributes to a disease phenotype, now while this observation is not very new while we knew that this is common for a number of complex diseases, what we have still been not able to achieve is they able to cure diseases for these complex diseases. Now, let me introduce to you the traditional notion of how drug discovery is usually happening in most places, let us represent the healthy state of an individual with a network of interactions shown in this figure on to the left, now at disease state could be studied as a perturbation such a network where some of these nodes are actually not properly connected compared to the healthy state, now according to the idea of Paul Ehrlich and others the magic bullet approach suggest that the conversion of disease state to the healthy state should involve one or most likely one particular drug which is non-promiscuous and specific to a particular drug target, so that you have minimal off target effects. Now, often such magic bullet approach can only yield only a semi-recovery to the to the from the disease state, now what network pharmacology or network medicine approaches are trying to arrive at is use a combination of perhaps promiscuous drugs, but which do not cause negative side effects, which do not cause side effects with a lethal and can still convert the disease state into healthy state as close as it is to the original one, now how would you achieve such an approach? To understand the whole this particular idea let us look at a network representation of how the different entities in the cell are interacting, in the figure to the right you can see that a number of drugs each of them can be perturbing different nodes. Now, all of these nodes are actually interconnected to each other because we are looking into the cellular contacts and there are protein-protein interactions; there are metabolic interactions; there are also regulatory interactions perturbation of one cannot be seen in isolation it has to be seen in the context of other perturbations, now a combination of these perturbations is going to yield phenotype which we hope can be treating the complex disease that is the concept behind this idea of network pharmacology. Now, how do we achieve such bigger goal? So, usually when you have such kind of complex problem complex phenotype you have to put together data such as knowledge on the current metabolic network in the human genome, knowledge on the transcription network, knowledge on the protein-protein interaction network, and knowledge on the post-transcription network, and together with a current knowledge of the drugs and the targets and the target pathways one can start looking at how these perturbations can be studied in the context of specific diseases, and what particular drugs can be used to identify potential new therapies for existing diseases. An alternative set of approaches which are being used in the context of drug discovery is that if you have a target a drug target network for all the approved drugs in the literature one can start understanding what are the drugs which are sharing the targets, can we use the drugs which share the targets as alternatives to existing drugs if there is any resistance acquired for a particular drug can you compliment the current drug with another drug which is having the same set of targets, or one can start studying the set of drug drug relations if there drugs are sharing the targets can we start studying what are the profiles of the two true drugs which are linked, are they similar in the structure are they similar in the final phenotypes or what are the common principles of these drugs which are connected to each other. Likewise one can also study disease disease associations by linking any pair of drugs which are working which are used for the same disease. Likewise one can study target target network to construct disease disease association network. So, this is these are some of the ideas which where the field is moving to understand or to even repurpose existing drugs for novel therapies. So, to conclude what we have tried to cover in the past set of slides is that the network- based approaches are essential and a powerful paradigm for dissecting the design principles of biological systems, they play an important role in biomarker identification and even in the elucidation of key players responsible for the disease phenotype, systems medicine can lead to the development of personalized medical treatment options in years to come with developments in high throughput sequencing and other technologies which can yield a lot of data in a very short time, so that clinical relevance can be achieved for based on these kind of techniques application these network based approaches in the context of clinical settings. Thank you very much Sarath for giving very nice talk and giving some of the basic concepts as well as illustrating how systems level network studies can be employed for illustrating various types of problems including in the drug discovery as well as in pharmacology and it could standard for even by discovery and many other applications. So thank you very much. Thank you. Now, let us try to integrate the omics approaches with systems biology, so genome sequencing projects in genomics era from 1990s to 2000 accelerated the of omics research, then from 2000 onwards proteomics field also got accelerated and new methodologies new tools came into the place for studying the proteome, and the data derived from genomics, transcriptomics, proteomics, metabolomics and other omics approaches have now brought the integration of the data sets in the systems biology field. The systems study requires obtaining data sets from different approaches and analyzing them for example, as shown in the slide the genome-wide data sets can be derived at the genome level and looking at the expression of the different transcripts or at the proteome level looking at different type of protein interactions, these data sets can be stored in the clinical databases and also it can be mined from the literature, literature manual curation then integration of the orthogonal data sets further can be used for validating the networks and deriving identifying therapeutic targets, further it can be used for experimental validation. Studying systems cannot be done in isolation in individual labs, it requires different expertise and collaborations from scientists from different disciplines of biology, physics, engineering, chemistry, computer science, mathematics, medicine, statistics and many more. See eventual aim of this goal of this current systems biology field is to employ the omics level information obtained from different levels from genome transcriptome and proteome derive that information at the systems level, integrate quantitate some models and then propose and use it for the understanding the physiology and apply that in medicine, so this omics to physiology this flow can be well maintained by employing systems biology tools. What are the challenges of systems biology? Systems biology is extremely challenging the emphasize to understand a system, understanding dynamics of even simplest biological networks not only requires only the understanding of biology, but also it is modeling and simulation, the disintegrative study can be used for studying from cells to proteins to gene, or integrative study could be used for putting these places back together again, and then understanding and doing the prediction and control of functional biological processes, so all of these are very challenging but currently being addressed by applying various systems level tools. So, how proteomics and systems biology are integrated? Proteomics as we have studied it is useful to understand the complex signaling networks in biological systems, it is very in dispensable tool for systems biology, the global analysis of proteome is important; however, there are many limitations in each experiment only thousands of proteins can be studied therefore, new approaches and systems level investigation and predictions are required, the system investigation is required to study the complex dynamic structure interaction with the biological systems whether it is at cellular level or at the organism level and ultimately it is responsible for their function and behavior. So, in summary today we discussed that how omics era the technological advancement in genomics proteomics and metabolomics have generated large scale data sets in all the aspects of biology, these large data sets has motivated the computational biologist and systems approaches with objective of understanding the biological system as a whole, while proteomics continues to generate the quality data at proteome level, so systems biology approach characterizes and predicts these dynamic properties of biological networks. Now, the next module we will focus in more detail different type of proteomic technologies. Thank you.