Center for High-Throughput
Minimally-Invasive Radiation Biodosimetry

Bioinformatics Core

Core Leader: Michael Bittner, Translational Genomics Research Institute

Overview

The goal for the analysis of the various RNA expression data sets (Core B and Project 2) or metabolomic data sets (Project 3) is to identify expression changes that would ideally identify an irradiated individual and the dose of radiation they received through analysis of a small amount of blood. To achieve these ends, methods of analysis are being developed that can identify genes with patterns that are particularly useful for these purposes.

The overall functions of this Core are to:

  1. Provide general statistical support for Consortium members, in terms of experimental design and data analysis.

  2. Apply data viewing and analytical distribution testing to identify univariate trends among potential genomic or metabolomic biomarkers in cells as they react to radiation, to find independently informative, dose-dependent biomarkers that may be used for biodosimetry.

  3. Apply multivariate analysis to identify groups of genomic or metabolomic biomarkers that act collaboratively to carry out the cellular response to radiation.

  4. Apply contextual analysis to develop genomic or metabolomic biomarker panels that are specific for particular cell types, dose scenarios, or population subgroups.

  5. Use the products of the various analyses done across the Consortium to identify measurements that are directly coupled to the cell’s response to radiation and are reliably altered at the doses relevant so that they can be used in the products to be implemented.

  6. Provide a common secure data hosting facility so that the considerable amount of data to be shared due to the use of shared biological materials can be readily exchanged.

 

Contextual Analysis

We have developed the ExPattern (ExP) software to analyze genomic and metabolomic data in conjunction with clinical data such as radiation responsiveness, patient survival and patient reactivity to a stimulus. It is based on a novel algorithm, cellular context mining, to identify molecularly homogeneous patterns with strong clinical association. The program is written in Java to support multiplatform use including Windows, Unix/Linux and Mac OS. Currently, it can handle data sets with more than 20,000 molecular and clinical markers to sort through complicated molecular patterns and identify those with particular clinical and gene pattern statistical significance. Analytical results are presented to users as a list of molecular patterns, the relations among those patterns in graphical format, and the regulatory relations among molecular and clinical markers.

This approach to complex data analysis is based on a model of the decision making process of cells. When biological regulation is altered either normally or in a way that results in pathology, cascades of consequences driven by the underlying regulatory system logic result. For example, exposure to ionizing radiation, the mutational activation of the ras oncogene, or reduction in p53 activity due to mdm2 gene amplification all produce or allow particular changes in the levels of transcripts, proteins and signaling-related protein modification in each of the samples where they are present. The set of alterations in genes and proteins influenced by the new regulatory setting provides a pattern that can be recognized and exploited using tools that discern these patterns.

The analysis proceeds in three steps, as illustrated in Figure 1. First, all data is translated into either discrete-valued or categorical variables. This step allows all the variables to be analyzed in the same fashion. For simplicity, the figure shows all data as being in two states or categories, however any small number of states would be suitable, for instance, a range of exposure doses of ionizing radiation. Next, the matrix of samples and feature (gene) behavior for those samples is examined to find sets of samples and features where each feature behaves in a very homogeneous fashion. Estimates of the probability of a given feature displaying the observed homogeneity over a fixed set of samples of any size can be estimated as a hypergeometric distribution likelihood, and improbable distributions can be aligned to find sample/feature subsets where there are blocks of homogeneous behavior. The improbability of the observed block is estimated by an ad hoc statistic by evaluating the observed frequency of homogeneous blocks of x genes and y samples in simulations randomly producing the same distribution of feature behavior. Multiple features showing coherent behavior over multiple samples become improbable very quickly. The power of considering the overall improbability of many features behaving homogeneously over a number of samples makes this method quite sensitive to such indications of collaborative action. Behavior that is not present in a large fraction of samples out of the whole set surveyed, and that is thus too dilute to be detected by more conventional approaches, such as linear discriminant analysis, can be easily recognized.

As an example, in our analysis of a set of 31 melanomas, a relatively large block of samples showing differential behavior is discovered by univariate analysis (Figure 2), while a smaller block of samples with more genes involved is recognized by multivariate contextual analysis (Figure 3). This method is also able to detect the kinds of gene behavior common in many biological cascades. For instance, it is common for the same gene to be regulated by different upstream influences in different samples or different individuals. In the case of genotoxic signaling, there are many well-characterized examples of this tendency, such as over a thousand published articles examining responses that can be mediated either through p53, or through other regulatory mechanisms in the absence of functional p53.

While the mathematics behind the context analysis model are somewhat complex, and the computations required to fully explore the combinatorial space involved are quite large, the meaning of the observed blocks of coherent behavior in samples is extremely intuitive from both the biological and decision process perspective. Any set of normal, differentiated tissues exhibits many such blocks that partition the samples according to the consequences of the different regulatory settings in play in each tissue type, making this analysis approach well suited to identification of radiation dose-related gene expression signatures in the tissue of peripheral blood.

Over the past year, the program has been expanded in ways that improve presentation to the researcher, allow more explicit searches for specific types of patterns, and allows more direct connection to databases where functional characterization of the genes is available. To help the investigator interpret the results, the program can connect to various public biological data repositories such as NCBI/Entrez and PubMed. It also supports connections to a popular Gene Ontology mining tool, GOMiner, to help identify subsets of genes within a given pattern that can be associated with particular cellular functions or locations.


Research Progress

Work over the previous year has focused on examining the reliability of the fourteen genes selected from the first array experiments as likely to be useful reporters of radiation exposure in lymphocytes. The set of genes chosen is listed below. In addition to the performance characteristics (signal to noise, sufficient abundance to be easily detected) these genes were also chosen on the basis of their functional roles, attempting to identify genes that would not be very likely to show altered regulation in lymphocytes in the absence of radiation exposure. As neither widespread apoptosis nor widespread DNA repair activity in the blood is likely to be observed without other obvious symptoms except in the case of exposure, the list is heavily slanted toward indicators of apoptosis and DNA repair. A few genes with unclear functions that have not been seen to vary in a wide variety of other stress or disease states but that do show strong response to radiation were also chosen as candidates for dosimetry.

Table 1. Characteristics of candidate dosimeter genes
Gene Symbol
Gene Name
Role
Function
AXL
AXL receptor tyrosine kinase
proliferation & apoptosis
RTK signaling, can be protective against apoptosis
MYC
v-myc myelocytomatosis viral oncogene homolog
proliferation & apoptosis
transcription factor
ASCC3
activating signal cointegrator 1 complex subunit 3
proliferation & apoptosis
transcriptional coactivator of SRF, AP-1, and NF-KB
CDKN1A
cyclin-dependent kinase inhibitor 1A
cell cycle (replication control), response to DNA damage
cyclin dependent kinase inhibitor, PCNA regulator
DDB2
damage-specific DNA binding protein 2
response to DNA damage
component of nucleotide excision repair complex
PCNA
DNA polymerase delta processivity factor
DNA replication & repair
RAD6-dependent DNA repair pathway
AEN
apoptosis enhancing nuclease
apoptosis
p53 induced nuclear nuclease
BAX
BCL2-associated X protein
apoptosis
apoptotic activator, mitochondrial rupture
CD70
tumor necrosis factor (ligand) superfamily, member 7
apoptosis
cytokine signaling through TNFRSF27/CD27 (receptor)
TNFRSF10B
tumor necrosis factor receptor superfamily, member 10b
apoptosis
Trail sensitive receptor with death domain
FDXR
ferredoxin reductase
apoptosis via oxidative stress
Mitochondrial electron transport initiator for P450s
TRIM22
tripartite motif-containing 22
interferon response
E3 ubiquitin ligase
ASTN2
astrotactin 2
cell guidance?
cell guidance receptor?
MAMDC4
MAM domain containing 4
unknown
component of apical endosomal compartment?


 
 

Figure 1. Histogram of incidence probabilities
producing 76 events in 76 draws.

The response profiles of these genes have been examined in a number of ways. The first test was simply to determine how these genes behave in the set of 76 samples that have been microarray profiled to date. These samples represent thirty one samples of volunteers that were subjected to ex vivo irradiation, twenty four samples from volunteers with a history of smoking that were subjected to ex vivo radiation, and samples from 21 patients undergoing total body irradiation. In 75 of these samples all the genes showed significant expression change upon irradiation with doses of 0.5 Gy or more. In one patient, there appeared to be very little signal produced in the assay, presumably due to anemia, yet even in that case, three genes show clear response. Given that we see responses in 76 of 76 trials, we can use the binomial distribution to estimate the range of incidences that generate 95% of the responses. A graph showing the distribution of incidences and the lower incidence bound for 95% of the 76/76 events, 96.2%, is shown in Figure 1. This indicates that, at a minimum, we can be 95% sure that the incidence of response to radiation by these genes can be expected in at least 96.2% of the population.

Another important consideration is whether the response of these genes can be detected using much more cost-effective assays than microarrays. As an example of levels of comparative levels of change and variance, two independent sets of samples gathered and irradiated at Dr. Amundson’s lab were tested by either qNPA or Agilent chip. The qNPA set had 6 samples and were each tested in triplicate wells for 0, 2 Gy and 8 Gy of irradiation. The Agilent set had 5 samples, each tested on one Agilent chip. The log2 Ratio of the 2 Gy intensity and the 8 Gy intensity compared to the 0 Gy intensity were calculated for each measurement and then the average log2 Ratio change and the standard deviation of the log2 Ratio change were calculated for all measurements on each platform and these are shown in the Table below. The levels of change and variance seen in the two assays are quite similar, even though the original intensity measurements are quite different. The largest difference between the sets is for AXL, for which the qNPA assay is at present significantly less sensitive. Further optimization of detectors for this gene is ongoing.

Table 2. Comparison of qNPA and Agilent microarray measurements in ex vivo irradiated samples
2Gy qNPA
2Gy Agilent 44K
8Gy qNPA
8Gy Agilent 44K
Mean
StDev
Mean
StDev
Mean
StDev
Mean
StDev
Gene
log2 Change
log2 Change
log2 Change
log2 Change
log2 Change
log2 Change
log2 Change
log2 Change
AEN
3.08
0.38
3.89
0.22
3.63
0.81
4.01
0.28
FDXR
3.03
0.57
3.73
0.23
4.30
0.73
4.46
0.11
DDB2
3.06
0.41
2.77
0.20
3.89
0.68
3.12
0.29
BAX
2.75
0.57
2.20
0.20
2.96
0.81
2.02
0.11
TNSF7
2.16
1.43
2.95
0.74
3.03
1.33
3.41
0.69
AXL
0.51
0.22
2.72
1.21
0.93
0.28
3.14
1.35
ASTN2
1.31
0.59
2.96
0.56
1.68
0.70
3.68
0.72
TRIM22
1.71
0.37
1.58
0.22
1.97
0.73
1.44
0.26
CDKN1A
1.86
0.53
2.20
0.10
3.12
0.47
2.75
0.44
PCNA
2.15
0.57
2.24
0.34
3.06
0.85
2.77
0.38
MAMDC4
1.17
0.48
1.90
0.36
1.85
0.66
2.37
0.58
TNFRSF10B
2.37
0.45
1.67
0.32
2.76
0.74
1.84
0.25
ASCC3
0.84
0.27
2.06
0.09
1.09
0.43
2.48
0.15
MYC
-1.33
0.16
-1.22
0.41
-1.67
0.33
-2.18
0.39


At this point, we can confidently move ahead with efforts to implement the qNPA test in an automated, high throughput mode, while improving the accuracy with which we can estimate dose using the magnitude of expression results. This will become easier with the introduction of the optimized normalizer genes for lymphocytes that have been chosen using similar methods.


References

  1. Brengues, M., Paap, B., Bittner, M., Amundson, S., Seligmann, B., Korn, R., Lenigk, R. and Zenhausern F. (2009). Biodosimetry on small blood volume using gene expression assay. Health Physics (in press).

  2. Amundson, SA, K.T. Do, L.C. Vinikoor, R.A. Lee, C.A. Koch-Paiz, J. Ahn, M. Reimers, Y. Chen, D.A. Scudiero, J.N. Weinstein, J.M. Trent, M.L. Bittner, P.S. Meltzer and A.J. Fornace, Jr. Integrating global gene expression and radiation survival parameters across the 60 cell lines of the national cancer institute anticancer drug screen. Cancer Res 68: 415-424, 2008. [abstract] [PDF] [supplemental data: SI figure 1, SI table 1, 2, 3 and 4] [microarray data]

  3. Kim, S., Sen, I. and Bittner, M. Mining molecular contexts of cancer via in-silico conditioning. Proc LSS Comput Syst Bioinform Conf. 6:169-179, 2007. [PDF] [Download ExPattern Software]


Collaborating Institutions

Translational Genomics Research Institute, Phoenix, AZ

Department of Biostatistics, Columbia University



website updated 09/18/2009

Home| Cytogenetic Biodosimetry | Functional Genomic Biodosimetry | Metabolomic Biodosimetry
Product Development Core | Functional Genomics Core | Bioinformatics Core | Contact