Chapter Seven - Large-Scale Neuroinformatics for In Situ Hybridization Data in the Mouse Brain

https://doi.org/10.1016/B978-0-12-398323-7.00007-0Get rights and content

Abstract

Large-scale databases of the brain are providing content to the neuroscience community through molecular, cellular, functional, and connectomic data. Organization, presentation, and maintenance requirements are substantial given the complexity, diverse modalities, resolution, and scale. In addition to microarrays, magnetic resonance imaging, and RNA sequencing, several in situ hybridization databases have been constructed due to their value in spatially localizing cellular expression. Scalable techniques for processing and presenting these data for maximum utility in viewing and analysis are key for end user value. We describe methods and use cases for the Allen Brain Atlas resources of the adult and developing mouse.

Introduction

Large-scale databases of the brain are providing increasing content to the neuroscience community through molecular, cellular, functional, and connectomic data (Bloom and Young, 2005, Bohland et al., 2009). Requirements for organizing, presenting, and maintaining these databases are substantial given the volume, complexity, and diverse modalities of data, their resolution, and scale. Imaging databases based on gene expression (Baldock et al., 2003, Lein et al., 2007), high-resolution MRI (Ma et al., 2005), diffusion tensor MRI (Johnson et al., 2010), histology (Mikula, Trotts, Stone, & Jones, 2007), morphology, and electrophysiology (Gardner et al., 2008) provide a framework for understanding both large-scale and detailed cellular level structure. Gene expression databases have used microarray (Su et al., 2004, Zapala et al., 2005), in situ hybridization (ISH; Baldock et al., 2003, Lein et al., 2007), and recently RNA-sequencing (Belgard et al., 2011) to profile the transcriptome from the simplest laboratory model organisms to humans. Databases of connectivity in the brain are also beginning to be approached in a similar large-scale manner (Bohland et al., 2009).

Gene expression signatures in the mammalian brain may hold the key to understanding neural development and neurological diseases. High-throughput ISH (Lein et al., 2007), a technique that uses labeled nucleic acid probes to bind to specific mRNA transcripts in tissue sections, generates large volumes of cellular level gene expression data that can be compiled and examined effectively. A unique benefit of this approach is the ability to spatially map genomic scale data, and this is becoming more feasible at large scale in a variety of model organisms as well as the human (Sunkin & Hohmann, 2007). Projects creating gene expression atlases at large scales for the embryonic fruit fly (Tomancak et al., 2002) as well as the embryonic and adult mouse (Lein et al., 2007, Visel et al., 2004) already involve the analysis of hundreds of thousands of high-resolution experimental images mapping mRNA expression patterns. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. By including the developmental axis, more advanced tools can be developed such as time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues (Baldock et al., 2003).

There are substantial challenges associated with the organization and presentation of large volumes of gene expression when striving to maximize utility and usability. Many important high-throughput projects using ISH may require the analysis of images of tissue cross-sections imaged with cellular level microscopic resolution. Characteristic problems include management of resolution, annotation, interaction, data integration, and data mining (Mikula et al., 2007). A key requirement for these efforts is the development of fast and robust algorithms for anatomically mapping and quantifying gene expression for ISH and imaging data in general. Typical problems encountered include large data management, segmentation and three-dimensional (3D) reconstruction, computing the anatomic geometry, and workflow management allowing for hybrid approaches combining manual and algorithmic processing (Helmstaedter & Mitra, 2012). There are specific challenges involved in the accurate registration of highly deformed tissues, associating cells with known anatomical regions, and identifying groups of genes with coordinated expression regulation with respect to both concentration and spatial location. Solutions to these and other challenges will lead to a richer understanding of the complex system aspects of gene regulation in heterogeneous tissue (Jagalur, Pal, Learned-Miller, Zoeller, & Kulp, 2007).

The digital atlas is central to the overall organization of neuroscientific images. Digital atlases provide a common semantic and spatial coordinate system that can be leveraged to compare and correlate data from disparate sources (Hawrylycz et al., 2011). Atlases provide invaluable aid in understanding the impact of genetic manipulations by providing a standard for comparison. To place data in a common coordinate system, the mapping process itself often effectively takes the form of voxellation of a standardized space (Lein et al., 2007, Petyuk et al., 2010). As the quality and amount of biological data continues to advance and grow, searching, referencing, and comparing this data with a researcher's own data is essential (Lein et al., 2007, Petyuk et al., 2010). The integration process can be cumbersome and time-consuming due to misaligned data, implicitly defined associations, and incompatible data sources. This approach of using image registration to render all data in a common anatomic framework has been used in various modalities, for example, magnetic resonance microscopy, classical histology, and immunohistochemistry, with registration into a common and defined coordinate system (MacKenzie-Graham et al., 2003). The importance of these issues is recognized by the recent effort of the International Neuroinformatics Coordinating Facility (INCF) on methods in standardization in digital atlasing (Hawrylycz et al., 2011), and this work is already being leveraged in the mouse and macaque (Bowden et al., 2011, Lee et al., 2010).

An important question common to molecular neurobiology and neuroanatomy is whether regionalized gene expression patterns correspond to or define the basic architecture of the brain (Bohland et al., 2010). High-throughput methods permit genome-wide searches to discover genes that are uniquely expressed in brain circuits and regions that control behavior, and availability of genome-scale ISH data allows systematic analysis of genetic neuroanatomical architecture. As an illustration, although cytoarchitectonic organization of the mammalian cortex into different lamina has been well studied, identifying the architectural differences that distinguish cortical areas from one another is more challenging (Ng et al., 2010). Localization of large anatomical structures is possible using magnetic resonance imaging or invasive techniques (such as anterograde or retrograde tracing), but identifying patterns in gene expression architecture is limited as gene products do not necessarily identify an immediate functional consequence of a specialized area. Expression of specific genes in the mouse and human cortex is most often identified across entire lamina, and areal patterning of expression (when it exists) is most easily differentiated on a layer-by-layer basis. Since cortical organization is defined by the expression of large sets of genes, the task of identifying individual (or groups of structures) cannot usually be done using individual gene expression areal markers (Hawrylycz et al., 2010).

It is of increasing importance to compare gene expression patterns of neurodevelopment in the brain in relation to the adult, as often clues to adult gene expression and gene function can be determined by examining embryonic development. Once all genes are mapped in the brain from the embryo to adult, studies can be conducted based on information derived from such databases in conjunction with other bioinformatics sources (Brumwell & Curran, 2006). While gene expression is a dynamic process, when fixed at a specific time point the spatial expression patterns may approximately reflect the anatomy and function of a region. The challenges in extending this informatics strategy to development in order to produce useful research tools are complex (Baldock et al., 2003). To unravel regulatory networks of genes functioning during embryonic development, ISH data are useful. Similar to the adult, the best way to make these data accessible is via spatiotemporal gene expression atlases. Ideal atlases should be based on a spatial framework, that is, a series of 3D reference models, which are anatomically annotated using an ontology with sufficient resolution, both for relations as well as for anatomical terms (de Boer, Ruijter, Voorbraak, & Moorman, 2009).

We have previously described a neuroinformatics pipeline for automatically mapping expression profiles of ISH data and its use to produce the first genomic scale 3D mapping of gene expression in a mammalian brain (Lein et al., 2007, Ng et al., 2007). The pipeline is fully automated and adaptable to other organisms and tissues. In this chapter, we review the methodology and techniques behind the functionality and usage of the tools of the Allen Mouse Brain Atlas and the Allen Developing Mouse Brain Atlas. In addition to a full suite of data access, search and analysis tools, recent work includes a grid-level application programming interface (API) that enables users to programmatically access data and develop personalized tools suited to their own analyses. All high-resolution images and visualization tools for expression analysis are available through the Allen Brain Atlas data portal (http://www.brain-map.org).

Section snippets

Informatics Data Processing for the Allen Mouse Brain Atlas

The Allen Mouse Brain Atlas is a genome-wide, spatially registered collection of cellular resolution ISH gene expression image data of the C57Bl/6J adult mouse brain with an accompanying anatomic reference atlas. Each ISH experiment is processed through an automated pipeline (Ng, Pathak, et al., 2007) that detects the location of expressing cells in the images.

Our informatics data processing pipeline (Dang et al., 2007, Ng et al., 2007) enables the navigation, analysis, and visualization of

Generalizing to the Allen Developing Mouse Brain Atlas

The Allen Developing Mouse Brain Atlas provides ISH data for approximately 2000 genes over embryonic and postnatal time points: embryonic day 11.5 (E11.5), E13.5, E15.5, E18.5, postnatal day 4 (P4), P14, and P28. The genes profiled in this developmental atlas include transcription factors; neuropeptides, neurotransmitters, and their receptors; neuroanatomical markers; signaling pathways relevant to brain development; and genes of general interest (such as ion channels, cell adhesion molecules,

Programmatic Data Access

Programmatic access of our published data is offered through an API. At a high level, the API is a set of internet-accessible uniform resource locators (URLs) through which the public can download Allen Brain Atlas public resources, which include high-resolution images, experimental metadata, and gene expression values common to many of these resources. In addition to raw data values, many of the tools developed to support data browsing and visualization in the public databases have also been

Discussion

In this chapter, we have reviewed the methodology and techniques behind the functionality and usage of the tools of the Allen Mouse Brain Atlas and the Allen Developing Mouse Brain Atlas. Our informatics data processing pipeline consists of preprocessing, a 3D reference model, alignment, expression detection, expression gridding, and a structure unionizer. Several different types of search services, such as the differential search service, NeuroBlast, AGEA, and anatomic search and temporal

Acknowledgments

The authors wish to thank the Allen Institute for Brain Science founders, Paul G. Allen and Jody Allen, for their vision, encouragement, and support. We express our gratitude to Allen Institute for Brain Science staff that played a key role in the development of the image quantification methods (Leonard Kuan), the 3D reference models (Yang Li), the Allen Mouse Brain Atlas Web application (Tim Dolbeare), the Allen Developing Mouse Brain Atlas Web application (Rob Young), and the API (Andy Sodt

References (51)

  • L. Puelles et al.

    Forebrain gene expression domains and the evolving prosomeric model

    Trends in Neurosciences

    (2003)
  • C.L. Thompson et al.

    Genomic anatomy of the hippocampus

    Neuron

    (2008)
  • R.A. Baldock et al.

    EMAP and EMAGE: A framework for understanding spatially organized data

    Neuroinformatics

    (2003)
  • M. Bello et al.

    Hybrid segmentation framework for tissue images containing gene expression data

    International Conference on Medical Image Computing and Computer Assisted Intervention

    (2005)
  • F.E. Bloom et al.

    Database needs of neuroscience: Schema and design

  • J.W. Bohland et al.

    A proposal for a coordinated effort for the determination of brainwide neuroanatomical connectivity in model organisms at a mesoscopic scale

    PLoS Computational Biology

    (2009)
  • C.L. Brumwell et al.

    Developmental mouse brain gene expression maps

    The Journal of Physiology

    (2006)
  • J.P. Carson et al.

    A digital atlas to characterize the mouse brain transcriptome

    PLoS Computational Biology

    (2005)
  • C. Dang et al.

    The Allen Brain Atlas: Delivering neuroscience to the web on a genome wide scale

    Data Integration in the Life Sciences, Lecture Notes in Computer Science

    (2007)
  • B.A. de Boer et al.

    More than a decade of developmental gene expression atlases: Where are we now?

    Nucleic Acids Research

    (2009)
  • H.W. Dong

    The Allen Reference Atlas: A Digital Color Brain Atlas of the C57BL/6J Male Mouse

    (2008)
  • H.W. Dong et al.

    Genomic-anatomic evidence for distinct functional domains in hippocampal field CA1

    Proceedings of the National Academy of Sciences of the United States of America

    (2009)
  • D. Gardner et al.

    The neuroscience information framework: A data and knowledge environment for neuroscience

    Neuroinformatics

    (2008)
  • R.C. Gonzalez et al.

    Digital Image Processing

    (2002)
  • R.D. Hawkins et al.

    Next-generation genomics: An integrative approach

    Nature Reviews. Genetics

    (2010)
  • Cited by (7)

    • Understanding the circuit basis of cognitive functions using mouse models

      2020, Neuroscience Research
      Citation Excerpt :

      While this combined manipulation/recording/behavioral task approach is applicable to experiments employing a variety of species, it has been particularly effective in studies that employ mouse models. Given the extensive documentation of gene expression and connectivity (Lein et al., 2007; Ng et al., 2012; Oh et al., 2014; Wang et al., 2018b; Weed et al., 2019) along with the arsenal of transgenic lines and viral vectors available in this species (Asrican et al., 2013; Daigle et al., 2018; Gerfen et al., 2013; Madisen et al., 2012; Taniguchi et al., 2011), the mouse is an ideal model for this type of approach. Moreover, many brain circuits found in humans have analogs in the brains of mice (Krubitzer, 1995, 2007; Larsen and Krubitzer, 2008; Marton and Sohal, 2016) and there is substantial conservation of cellular and genetic properties related to brain function across these mammalian species (Georgi et al., 2013; Hodge et al., 2019; Miller et al., 2010; Monaco et al., 2015; Strand et al., 2007).

    • The allen brain atlas

      2014, Springer Handbook of Bio-/Neuroinformatics
    View all citing articles on Scopus
    View full text