Skip to main content

Advertisement

Log in

Automated Metadata Suggestion During Repository Submission

  • Software Original Article
  • Published:
Neuroinformatics Aims and scope Submit manuscript

Abstract

Knowledge discovery via an informatics resource is constrained by the completeness of the resource, both in terms of the amount of data it contains and in terms of the metadata that exists to describe the data. Increasing completeness in one of these categories risks reducing completeness in the other because manually curating metadata is time consuming and is restricted by familiarity with both the data and the metadata annotation scheme. The diverse interests of a research community may drive a resource to have hundreds of metadata tags with few examples for each making it challenging for humans or machine learning algorithms to learn how to assign metadata tags properly. We demonstrate with ModelDB, a computational neuroscience model discovery resource, that using manually-curated regular-expression based rules can overcome this challenge by parsing existing texts from data providers during user data entry to suggest metadata annotations and prompt them to suggest other related metadata annotations rather than leaving the task to a curator. In the ModelDB implementation, analyzing the abstract identified 6.4 metadata tags per abstract at 79% precision. Using the full-text produced higher recall with low precision (41%), and the title alone produced few (1.3) metadata annotations per entry; we thus recommend data providers use their abstract during upload. Grouping the possible metadata annotations into categories (e.g. cell type, biological topic) revealed that precision and recall for the different text sources varies by category. Given this proof-of-concept, other bioinformatics resources can likewise improve the quality of their metadata by adopting our approach of prompting data uploaders with relevant metadata at the minimal cost of formalizing rules for each potential metadata annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ambert, K. H., & Cohen, A. M. (2012). Text-mining and neuroscience. International Review of Neurobiology, 103, 109–132.

    Article  Google Scholar 

  • Anderson, J. C., Binzegger, T., Kahana, O., Martin, K. A. C., & Segev, I. (1999). Dendritic asymmetry cannot account for directional responses of neurons in visual cortex. Nature Neuroscience, 2(9), 820–824.

    Article  CAS  Google Scholar 

  • Ascoli, G. A. (2015). Sharing neuron data: carrots, sticks, and digital records. PLoS Biology, 13(10), e1002275.

    Article  Google Scholar 

  • Beech, D. J., & Barnes, S. (1989). Characterization of a voltage-gated K+ channel that accelerates the rod response to dim light. Neuron, 3, 573–581.

    Article  CAS  Google Scholar 

  • Cohen, K. B., & Hunter, L. (2008). Getting started in text mining. PLoS Computational Biology, 4(1), e20.

    Article  Google Scholar 

  • Cornelisse, L. N., van Elburg, R. A. J., Meredith, R. M., Yuste, R., & Mansvelder, H. D. (2007). High speed two-photon imaging of calcium dynamics in dendritic spines: consequences for spine calcium kinetics and buffer capacity. PLoS One, 2(10), e1073.

    Article  Google Scholar 

  • Crasto, C. J., Marenco, L. N., Migliore, M., Mao, B., Nadkarni, P. M., Miller, P., & Shepherd, G. M. (2003). Text mining neuroscience journal articles to populate neuroscience databases. Neuroinformatics, 1(3), 215–237.

    Article  Google Scholar 

  • Crockford, D. (2006). The application/json media type for JavaScript object notation (JSON). Available: https://www.ietf.org/rfc/rfc4627.txt. Accessed 1 Aug 2018.

  • De Schutter, E. (2014). The dangers of plug-and-play simulation using shared models. Neuroinformatics, 12(2), 227.

    PubMed  Google Scholar 

  • French, L., Liu, P., Marais, O., Koreman, T., Tseng, L., Lai, A., & Pavlidis, P. (2015). Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application. Frontiers in Neuroinformatics, 9, 13.

    Article  Google Scholar 

  • Garci­a-Grajales, J. A., Rucabado, G., Garci­a-Dopico, A., Pena, J. M., & Jerusalem, A. (2015). Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading. PLoS One, 10(2), e0116532.

    Article  Google Scholar 

  • Heinz, M. G., Zhang, X., Bruce, I. C., & Carney, L. H. (2001). Auditory nerve model for predicting performance limits of normal and impaired listeners. Acoustics Research Letters Online, 2(3), 91–96.

    Article  Google Scholar 

  • Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D. P., Kania, R., Schaeffer, M., St Pierre, S., & Twigger, S. (2008). Big data: the future of biocuration. Nature, 455(7209), 47–50.

    Article  CAS  Google Scholar 

  • Kim, M., Park, A. J., Havekes, R., Chay, A., Guercio, L. A., Oliveira, R. F., Abel, T., & Blackwell, K. T. (2011). Colocalization of protein kinase A with adenylyl cyclase enhances protein kinase A activity during induction of long-lasting long-term-potentiation. PLoS Computational Biology, 7, e1002084.

    Article  CAS  Google Scholar 

  • McDougal, R. A., Morse, T. M., Carnevale, T., Marenco, L., Wang, R., Migliore, M., Miller, P. L., Shepherd, G. M., & Hines, M. L. (2017). Twenty years of ModelDB and beyond: building essential modeling tools for the future of neuroscience. Journal of Computational Neuroscience, 42(1), 1–10.

    Article  Google Scholar 

  • Mirsky, J. S., Nadkarni, P. M., Healy, M. D., Miller, P. L., & Shepherd, G. M. (1998). Database tools for integrating and searching membrane property data correlated with neuronal morphology. Journal of Neuroscience Methods, 82, 105–121.

    Article  CAS  Google Scholar 

  • Morse, T., Carnevale, N. T., Mutalik, P., Migliore, M., & Shepherd, G.M. (2010). Abnormal excitability of oblique dendrites implicated in early Alzheimer’s: a computational study. Frontiers in Neural Circuits, 4, 16. https://doi.org/10.3389/fncir.2010.00016.

  • Neymotin, S. A., Lee, H., Park, E., Fenton, A. A., & Lytton, W. W. (2011). Emergence of physiological oscillation frequencies in a computer model of neocortex. Frontiers in Computational Neuroscience, 5, 19–75.

    Article  Google Scholar 

  • Nielsen, J. (1993). Usability Engineering. Academic Press: Boston, MA.

  • Prescott, S. A., Ratte, S., De Koninck, Y., & Sejnowski, T. J. (2008). Pyramidal neurons switch from integrators in vitro to resonators under in vivo-like conditions. Journal of Neurophysiology, 100(6), 3030–3042.

    Article  Google Scholar 

  • Richardet, R., Chappelier, J. C., Telefont, M., & Hill, S. (2015). Large-scale extraction of brain connectivity from the neuroscientific literature. Bioinformatics, 31(10), 1640–1647.

    Article  CAS  Google Scholar 

  • Rishikesh, N., & Venkatesh, Y. V. (2003). A computational model for the development of simple-cell receptive fields spanning the regimes before and after eye-opening. Neurocomputing, 50, 125–158.

    Article  Google Scholar 

  • Sousa, M., Szucs, P., Lima, D., & Aguiar, P. (2014). The pronociceptive dorsal reticular nucleus contains mostly tonic neurons and shows a high prevalence of spontaneous activity in block preparation. Journal of Neurophysiology, 111(7), 1507–1518.

    Article  Google Scholar 

  • Tirupattur, N., Lapish, C.C., & Mukhopadhyay, S. (2011). Text mining for neuroscience. In American Institute of Physics Conference Series 1371, 118–127. https://doi.org/10.1063/1.3596634.

  • Van Auken, K., Jaffery, J., Chan, J., Müller, H. M., & Sternberg, P. W. (2009). Semi-automated curation of protein subcellular localization: a text mining-based approach to gene ontology (GO) cellular component curation. BMC Bioinformatics, 10(1), 228.

    Article  Google Scholar 

  • Wallach, H. M. (2006). Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd international conference on machine learning, 977–984. ACM: Chicago. https://doi.org/10.1145/1143844.1143967.

  • Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E., & Bouwman, J., (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3.

  • Wolf, J. A., Moyer, J. T., Lazarewicz, M. T., Contreras, D., Benoit-Marand, M., O'Donnell, P., & Finkel, L. H. (2005). NMDA-AMPA ratio impacts state transitions and entrainment to oscillations in a computational model of the nucleus accumbens medium spiny projection neuron. The Journal of Neuroscience, 25, 9080–9095.

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This study was funded by the NIH grant R01 DC009977. We thank N Ted Carnevale for valuable feedback on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert A. McDougal.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McDougal, R.A., Dalal, I., Morse, T.M. et al. Automated Metadata Suggestion During Repository Submission. Neuroinform 17, 361–371 (2019). https://doi.org/10.1007/s12021-018-9403-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12021-018-9403-z

Keywords

Navigation