Center for
High-Throughput
Minimally-Invasive Radiation Biodosimetry
Bioinformatics Core
Core Leader: Michael Bittner, Translational Genomics Research Institute
Overview
The goal for the analysis of the various RNA expression data sets (Core B and Project 2) or metabolomic data sets (Project 3) is to identify expression changes that would ideally identify an irradiated individual and the dose of radiation they received through analysis of a small amount of blood. To achieve these ends, methods of analysis are being developed that can identify genes with patterns that are particularly useful for these purposes.
The overall functions of this Core are to:
Provide general statistical support for Consortium members, in terms of experimental design and data analysis.
Apply data viewing and analytical distribution testing to identify univariate trends among potential genomic or metabolomic biomarkers in cells as they react to radiation, to find independently informative, dose-dependent biomarkers that may be used for biodosimetry.
Apply multivariate analysis to identify groups of genomic or metabolomic biomarkers that act collaboratively to carry out the cellular response to radiation.
Apply contextual analysis to develop genomic or metabolomic biomarker panels that are specific for particular cell types, dose scenarios, or population subgroups.
Use the products of the various analyses done across the Consortium to identify measurements that are directly coupled to the cell’s response to radiation and are reliably altered at the doses relevant so that they can be used in the products to be implemented.
Provide a common secure data hosting facility so that the considerable amount of data to be shared due to the use of shared biological materials can be readily exchanged.
Contextual Analysis
We have developed the ExPattern (ExP) software to analyze genomic and metabolomic data in conjunction with clinical data such as radiation responsiveness, patient survival and patient reactivity to a stimulus. It is based on a novel algorithm, cellular context mining, to identify molecularly homogeneous patterns with strong clinical association. The program is written in Java to support multiplatform use including Windows, Unix/Linux and Mac OS. Currently, it can handle data sets with more than 20,000 molecular and clinical markers to sort through complicated molecular patterns and identify those with particular clinical and gene pattern statistical significance. Analytical results are presented to users as a list of molecular patterns, the relations among those patterns in graphical format, and the regulatory relations among molecular and clinical markers.
This approach to complex data analysis is based on a model of the decision making process of cells. When biological regulation is altered either normally or in a way that results in pathology, cascades of consequences driven by the underlying regulatory system logic result. For example, exposure to ionizing radiation, the mutational activation of the ras oncogene, or reduction in p53 activity due to mdm2 gene amplification all produce or allow particular changes in the levels of transcripts, proteins and signaling-related protein modification in each of the samples where they are present. The set of alterations in genes and proteins influenced by the new regulatory setting provides a pattern that can be recognized and exploited using tools that discern these patterns.
The analysis proceeds in three steps, as illustrated in Figure 1. First, all data is translated into either discrete-valued or categorical variables. This step allows all the variables to be analyzed in the same fashion. For simplicity, the figure shows all data as being in two states or categories, however any small number of states would be suitable, for instance, a range of exposure doses of ionizing radiation. Next, the matrix of samples and feature (gene) behavior for those samples is examined to find sets of samples and features where each feature behaves in a very homogeneous fashion. Estimates of the probability of a given feature displaying the observed homogeneity over a fixed set of samples of any size can be estimated as a hypergeometric distribution likelihood, and improbable distributions can be aligned to find sample/feature subsets where there are blocks of homogeneous behavior. The improbability of the observed block is estimated by an ad hoc statistic by evaluating the observed frequency of homogeneous blocks of x genes and y samples in simulations randomly producing the same distribution of feature behavior. Multiple features showing coherent behavior over multiple samples become improbable very quickly. The power of considering the overall improbability of many features behaving homogeneously over a number of samples makes this method quite sensitive to such indications of collaborative action. Behavior that is not present in a large fraction of samples out of the whole set surveyed, and that is thus too dilute to be detected by more conventional approaches, such as linear discriminant analysis, can be easily recognized.
As an example, in our analysis of a set of 31 melanomas, a relatively large block of samples showing differential behavior is discovered by univariate analysis (Figure 2), while a smaller block of samples with more genes involved is recognized by multivariate contextual analysis (Figure 3). This method is also able to detect the kinds of gene behavior common in many biological cascades. For instance, it is common for the same gene to be regulated by different upstream influences in different samples or different individuals. In the case of genotoxic signaling, there are many well-characterized examples of this tendency, such as over a thousand published articles examining responses that can be mediated either through p53, or through other regulatory mechanisms in the absence of functional p53.
While the mathematics behind the context analysis model are somewhat complex, and the computations required to fully explore the combinatorial space involved are quite large, the meaning of the observed blocks of coherent behavior in samples is extremely intuitive from both the biological and decision process perspective. Any set of normal, differentiated tissues exhibits many such blocks that partition the samples according to the consequences of the different regulatory settings in play in each tissue type, making this analysis approach well suited to identification of radiation dose-related gene expression signatures in the tissue of peripheral blood.
Over the past year, the program has been expanded in ways that improve presentation to the researcher, allow more explicit searches for specific types of patterns, and allows more direct connection to databases where functional characterization of the genes is available. To help the investigator interpret the results, the program can connect to various public biological data repositories such as NCBI/Entrez and PubMed. It also supports connections to a popular Gene Ontology mining tool, GOMiner, to help identify subsets of genes within a given pattern that can be associated with particular cellular functions or locations.
Research Progress
Work over the previous year has focused on examining the reliability of the fourteen genes selected from the first array experiments as likely to be useful reporters of radiation exposure in lymphocytes. The set of genes chosen is listed below. In addition to the performance characteristics (signal to noise, sufficient abundance to be easily detected) these genes were also chosen on the basis of their functional roles, attempting to identify genes that would not be very likely to show altered regulation in lymphocytes in the absence of radiation exposure. As neither widespread apoptosis nor widespread DNA repair activity in the blood is likely to be observed without other obvious symptoms except in the case of exposure, the list is heavily slanted toward indicators of apoptosis and DNA repair. A few genes with unclear functions that have not been seen to vary in a wide variety of other stress or disease states but that do show strong response to radiation were also chosen as candidates for dosimetry.
Table 1. Characteristics of candidate dosimeter genes Gene Symbol Gene Name Role Function AXL AXL receptor tyrosine kinase proliferation & apoptosis RTK signaling, can be protective against apoptosis MYC v-myc myelocytomatosis viral oncogene homolog proliferation & apoptosis transcription factor ASCC3 activating signal cointegrator 1 complex subunit 3 proliferation & apoptosis transcriptional coactivator of SRF, AP-1, and NF-KB CDKN1A cyclin-dependent kinase inhibitor 1A cell cycle (replication control), response to DNA damage cyclin dependent kinase inhibitor, PCNA regulator DDB2 damage-specific DNA binding protein 2 response to DNA damage component of nucleotide excision repair complex PCNA DNA polymerase delta processivity factor DNA replication & repair RAD6-dependent DNA repair pathway AEN apoptosis enhancing nuclease apoptosis p53 induced nuclear nuclease BAX BCL2-associated X protein apoptosis apoptotic activator, mitochondrial rupture CD70 tumor necrosis factor (ligand) superfamily, member 7 apoptosis cytokine signaling through TNFRSF27/CD27 (receptor) TNFRSF10B tumor necrosis factor receptor superfamily, member 10b apoptosis Trail sensitive receptor with death domain FDXR ferredoxin reductase apoptosis via oxidative stress Mitochondrial electron transport initiator for P450s TRIM22 tripartite motif-containing 22 interferon response E3 ubiquitin ligase ASTN2 astrotactin 2 cell guidance? cell guidance receptor? MAMDC4 MAM domain containing 4 unknown component of apical endosomal compartment?
The response profiles of these genes have been examined in a number of ways. The first test was simply to determine how these genes behave in the set of 76 samples that have been microarray profiled to date. These samples represent thirty one samples of volunteers that were subjected to ex vivo irradiation, twenty four samples from volunteers with a history of smoking that were subjected to ex vivo radiation, and samples from 21 patients undergoing total body irradiation. In 75 of these samples all the genes showed significant expression change upon irradiation with doses of 0.5 Gy or more. In one patient, there appeared to be very little signal produced in the assay, presumably due to anemia, yet even in that case, three genes show clear response. Given that we see responses in 76 of 76 trials, we can use the binomial distribution to estimate the range of incidences that generate 95% of the responses. A graph showing the distribution of incidences and the lower incidence bound for 95% of the 76/76 events, 96.2%, is shown in Figure 1. This indicates that, at a minimum, we can be 95% sure that the incidence of response to radiation by these genes can be expected in at least 96.2% of the population.
Figure 1. Histogram of incidence probabilities
producing 76 events in 76 draws.Another important consideration is whether the response of these genes can be detected using much more cost-effective assays than microarrays. As an example of levels of comparative levels of change and variance, two independent sets of samples gathered and irradiated at Dr. Amundson’s lab were tested by either qNPA or Agilent chip. The qNPA set had 6 samples and were each tested in triplicate wells for 0, 2 Gy and 8 Gy of irradiation. The Agilent set had 5 samples, each tested on one Agilent chip. The log2 Ratio of the 2 Gy intensity and the 8 Gy intensity compared to the 0 Gy intensity were calculated for each measurement and then the average log2 Ratio change and the standard deviation of the log2 Ratio change were calculated for all measurements on each platform and these are shown in the Table below. The levels of change and variance seen in the two assays are quite similar, even though the original intensity measurements are quite different. The largest difference between the sets is for AXL, for which the qNPA assay is at present significantly less sensitive. Further optimization of detectors for this gene is ongoing.
Table 2. Comparison of qNPA and Agilent microarray measurements in ex vivo irradiated samples 2Gy qNPA 2Gy Agilent 44K 8Gy qNPA 8Gy Agilent 44K Mean StDev Mean StDev Mean StDev Mean StDev Gene log2 Change log2 Change log2 Change log2 Change log2 Change log2 Change log2 Change log2 Change AEN 3.08 0.38 3.89 0.22 3.63 0.81 4.01 0.28 FDXR 3.03 0.57 3.73 0.23 4.30 0.73 4.46 0.11 DDB2 3.06 0.41 2.77 0.20 3.89 0.68 3.12 0.29 BAX 2.75 0.57 2.20 0.20 2.96 0.81 2.02 0.11 TNSF7 2.16 1.43 2.95 0.74 3.03 1.33 3.41 0.69 AXL 0.51 0.22 2.72 1.21 0.93 0.28 3.14 1.35 ASTN2 1.31 0.59 2.96 0.56 1.68 0.70 3.68 0.72 TRIM22 1.71 0.37 1.58 0.22 1.97 0.73 1.44 0.26 CDKN1A 1.86 0.53 2.20 0.10 3.12 0.47 2.75 0.44 PCNA 2.15 0.57 2.24 0.34 3.06 0.85 2.77 0.38 MAMDC4 1.17 0.48 1.90 0.36 1.85 0.66 2.37 0.58 TNFRSF10B 2.37 0.45 1.67 0.32 2.76 0.74 1.84 0.25 ASCC3 0.84 0.27 2.06 0.09 1.09 0.43 2.48 0.15 MYC -1.33 0.16 -1.22 0.41 -1.67 0.33 -2.18 0.39
At this point, we can confidently move ahead with efforts to implement the qNPA test in an automated, high throughput mode, while improving the accuracy with which we can estimate dose using the magnitude of expression results. This will become easier with the introduction of the optimized normalizer genes for lymphocytes that have been chosen using similar methods.
References
Brengues, M., Paap, B., Bittner, M., Amundson, S., Seligmann, B., Korn, R., Lenigk, R. and Zenhausern F. (2009). Biodosimetry on small blood volume using gene expression assay. Health Physics (in press).
Amundson, SA, K.T. Do, L.C. Vinikoor, R.A. Lee, C.A. Koch-Paiz, J. Ahn, M. Reimers, Y. Chen, D.A. Scudiero, J.N. Weinstein, J.M. Trent, M.L. Bittner, P.S. Meltzer and A.J. Fornace, Jr. Integrating global gene expression and radiation survival parameters across the 60 cell lines of the national cancer institute anticancer drug screen. Cancer Res 68: 415-424, 2008. [abstract] [PDF] [supplemental data: SI figure 1, SI table 1, 2, 3 and 4] [microarray data]
Kim, S., Sen, I. and Bittner, M. Mining molecular contexts of cancer via in-silico conditioning. Proc LSS Comput Syst Bioinform Conf. 6:169-179, 2007. [PDF] [Download ExPattern Software]
Collaborating Institutions
Translational Genomics Research Institute, Phoenix, AZ
Department of Biostatistics, Columbia University
website updated
09/18/2009
Home| Cytogenetic
Biodosimetry |
Functional Genomic Biodosimetry |
Metabolomic Biodosimetry
Product Development Core |
Functional Genomics Core |
Bioinformatics Core |
Contact