The goal of the Bioinformatics and Pathways Core is to aid in the experimental design, analysis, interpretation and presentation of data within the systems being studied by the COBRE investigators. Jonathan Wren, Ph.D., serves as director and is assisted by Constantin Georgescu, Ph.D., a trained statistician.
The specific aims of the core are 1) to provide complete bioinformatics analysis and biological interpretation of high throughput data, 2) to identify key genes and biomarkers involved in processes of interest and predict gene functions, phenotypes, and disease relevance, and 3) to create a sustainable core facility that can be used institution-wide. The services of the core are available free of charge to COBRE investigators.
In addition to providing conventional statistical analyses, the Bioinformatics and Pathways Core has novel software developed by Wren.
GAMMA
The Global Microarray Meta-Analysis program predicts functions of poorly annotated genes based on co-expression data and mutual information metrics. GAMMA aids in interpreting the functional significance of these genes in biological studies. Conversely, it can also identify candidate genes of interest relevant to experimental systems under study (e.g., meiosis, hematopoiesis, etc.) and screen for those where no publications exist between the gene and the system, enabling discovery of novel associations for new investigators.
IRIDESCENT
A second novel software program called IRIDESCENT, or Implicit Relationship IDEntification by Software Construction of an Entity-based Network from Text, is designed for large-scale analysis of PubMed abstracts. IRIDESCENT automates the identification and analysis of relationships within the published literature by identifying simple relationships between terms, along with a relative strength of association between them. This large network of relationships between genes, diseases, phenotypes, chemical compounds, ontological categories and FDA-approved drugs serves as a basis for analyzing lists (e.g., microarray data), and for identifying implied relationships. That is, given two things that are not related themselves, IRIDESCENT can be used to identify things they have in common. By evaluating the statistical significance of what they have in common, a measure of strength of their relatedness can be developed.
Genome Runner
Genome Runner is a tool for automating genome exploration. It performs annotation and enrichment analyses of user-provided genomic regions (SNPs, ChIP-seg binding sites, etc.) against >6000 human epigenomic features available from the UCSC genome browser. It gives a detailed annotation of each genomic region in the input data and can be used to prioritize individual genomic regions by the total number of epigenomic features they co-localize with. It also provides p-values for statistically significant co-localizations of input genomie-wide data with genome annotation features selected for the analysis. These p-values can be used to prioritize epigenomic features associated with user data.