A Project of the LMU Bioinformatics Group
LMU Logo

XMLPipeDB

A Reusable, Open Source Tool Chain for Building Relational Databases from XML Sources

Documentation

XMLPipeDB is an open source suite of Java-based tools for automatically building relational databases from an XML schema (XSD). XMLPipeDB provides functionality for managing, querying, importing, and exporting information to and from XML data with minimum manual processing of the data. While its applicability is fairly general, the original motivation for XMLPipeDB was to create a solution for the management of biological data from different sources that are used to create Gene Databases for GenMAPP (Gene Map Annotator and Pathway Profiler), software for viewing and analyzing DNA microarray and other genomic and proteomic data on biological pathways.

The creation of Gene Databases for GenMAPP has been difficult because there are a number of different gene ID systems in common usage, necessitating that we relate one set of gene identifiers to the other. Currently, the GenMAPP Gene Databases use the integrated data source from Ensembl for this task. However, this limits the number of species that can be represented in GenMAPP to the mostly animal species supported by Ensembl.

We have used the XMLPipeDB software tool chain to create relational databases for UniProt and Gene Ontology. In turn, we have used these databases to generate UniProt-centric GenMAPP Gene Databases for Escherichia coli and other bacterial species, extending the functionality of GenMAPP to species not currently supported by the GenMAPP.org project team. Moreover, since XMLPipeDB can create the relational databases based solely on the XSD and XML files, it will be more robust to changes in the source files made by the data providers.

XMLPipeDB has the following tools for developers and database designers: the XSD-to-DB application takes a well-formed XSD or DTD file and converts it into a collection of Java source code and Hibernate mapping files that allows XML files based on that definition file to be read into a relational database. XSD-to-DB’s conversion functions are based on the open source Hyperjaxb2 project, which adds Hibernate functionality to Sun Microsystems’ JAXB library. The XMLPipeDB Utilities library is a suite of Java classes that provide functions needed by many XMLPipeDB database applications. Specifically, the library includes reusable classes for: importing XML files into Java objects, saving these XML-derived Java objects to a relational database, querying the relational database using either HQL (Hibernate Query Language) or SQL, and configuring a client application to communicate with a relational database. Finally, GenMAPP Builder is an application for creating the GenMAPP Gene Database files.

GenMAPP Builder’s UniProt and Gene Ontology database libraries were generated with XSD-to-DB, and the application itself uses the XMLPipeDB Utilities library. The application works by first importing UniProt and Gene Ontology XML files as well as a tab-delimited UniProt-to-GO associations file into a relational database. The database can then be queried by organism in order to produce a GenMAPP Gene Database.

GenMAPP Builder has been tested for use with the open source PostgreSQL relational database, but can be used with any other relational database management system for which a JDBC driver is available. JDBC-to-ODBC connectivity is used to transfer data from this relational database to a Microsoft Access MDB file, which is the format expected by the GenMAPP application.

XMLPipeDB is available under the GNU Lesser General Public License (LGPL).