Printer friendly version
An "omics" for databases in the life sciences
24 June 2010
A searchable directory of the countless biological databases on the web produced in the wake of genomic and post-genomic research could be generated automatically, according to research published in the International Journal of Metadata, Semantics and Ontologies.
CNRS researcher Marie-Dominique Devignes and colleagues at Nancy-Université in France, explain how the BioRegistry repository aims to associate content metadata belonging to a biomedical thesaurus with biological databases with a view to making discovery and retrieval relatively easily. The registry is aggregated from publicly available lists of biological databases and allows for semantic searching.
Finding a way to use the vast quantities of data in the genomics field, which is held on numerous very disparate database systems is one of the biggest challenges facing biology and biomedical research today. The concept of a resourceome, the data equivalent of a genome, introduced by Cannata, Merelli and Altmann in 2005, would allow tools and databases to be mapped. As the number and type of available biological data sources (BDS) continues to grow, not forsaking so-called "invisible" and "private" web resources, the need to address the problem with resourceomics is becoming increasingly urgent.
"Organising the bioinformatics resourceome is a first step towards an ideal web in which intelligent middleware will be able to handle a user query, distribute it over relevant resources, collect partial answers, and merge them back to the user." the team explains.
Specific search spiders have been designed to get around the problem that generic search engines cannot differentiate between databases, information within a database and independent web pages that simply refer to databases. However, the resulting collections are not well-maintained even with these designed spiders and they do not facilitate efficient querying.
The team has now developed an analytical tool for deciphering metadata from databases, and sites listing descriptions of such databases so that an annotated and validated registry of the myriad resources can be obtained. "A first implementation of the BioRegistry repository has been completed," the team says, "It corresponds to all BDSs listed in the 2009 release of the major database catalogue, the Nucleic Acids Research (NAR) catalogue. It is planned to update the repository with each new release of the NAR catalogue."
The team is now optimising keywords and indexing procedures to allow them to merge subject and keyword metadata into a single system that can be queried directly. "High-quality semantic annotation of resources currently remains a crucial issue for optimising the coverage of resources and enhancing their discovery," the team adds. "Our approach consists of automating the harvesting of such annotations, their encoding and their structuring to provide efficient discovery of BDS."