- XMLPipeDB Manual (417K PDF)
- UniProtDB relational database schema as generated by XSD-to-DB
- GODB relational database schema as generated by XSD-to-DB
- GenMAPPBuilder relational database schema
- Read Me for Arabidopsis thaliana Gene Database (268K PDF, revision 20090610)
- Read Me for Escherichia coli K12 Gene Database (284K PDF, revision 20090529)
- Read Me for Helicobacter pylori str. 26695 Gene Database (501K PDF, revision 20101130)
- Read Me for Mycobacterium smegmatis ATCC 700084/mc(2)155 Gene Database (248K PDF, revision 20110128)
- Read Me for Mycobacterium tuberculosis ATCC 25618/H37Rv Gene Database (230K PDF, revision 20110303)
- Read Me for Plasmodium falciparum (isolate 3D7) Gene Database (744K PDF, revision 20090604)
- Read Me for Pseudomonas aeruginosa Gene Database (274K PDF, revision 20100212)
- Read Me for Salmonella typhimurium ATCC 700720/SGSC1412/LT2 Gene Database (250K PDF, revision 20101130)
- Read Me for Staphylococcus aureus Gene Database (274K PDF, revision 20100218)
- Read Me for Vibrio cholerae O1 biovar El Tor str. N16961 Gene Database (246K PDF, revision 20101022)
XMLPipeDB is an open source suite of Java-based tools for automatically building relational databases from an XML schema (XSD). XMLPipeDB provides functionality for managing, querying, importing, and exporting information to and from XML data with minimum manual processing of the data. While its applicability is fairly general, the original motivation for XMLPipeDB was to create a solution for the management of biological data from different sources that are used to create Gene Databases for GenMAPP (Gene Map Annotator and Pathway Profiler), software for viewing and analyzing DNA microarray and other genomic and proteomic data on biological pathways.
The creation of Gene Databases for GenMAPP has been difficult because there are a number of different gene ID systems in common usage, necessitating that we relate one set of gene identifiers to the other. Currently, the GenMAPP Gene Databases use the integrated data source from Ensembl for this task. However, this limits the number of species that can be represented in GenMAPP to the mostly animal species supported by Ensembl.
We have used the XMLPipeDB software tool chain to create relational databases for UniProt and Gene Ontology. In turn, we have used these databases to generate UniProt-centric GenMAPP Gene Databases for Escherichia coli and other bacterial species, extending the functionality of GenMAPP to species not currently supported by the GenMAPP.org project team. Moreover, since XMLPipeDB can create the relational databases based solely on the XSD and XML files, it will be more robust to changes in the source files made by the data providers.
XMLPipeDB has the following tools for developers and database designers: the XSD-to-DB application takes a well-formed XSD or DTD file and converts it into a collection of Java source code and Hibernate mapping files that allows XML files based on that definition file to be read into a relational database. XSD-to-DB’s conversion functions are based on the open source Hyperjaxb2 project, which adds Hibernate functionality to Sun Microsystems’ JAXB library. The XMLPipeDB Utilities library is a suite of Java classes that provide functions needed by many XMLPipeDB database applications. Specifically, the library includes reusable classes for: importing XML files into Java objects, saving these XML-derived Java objects to a relational database, querying the relational database using either HQL (Hibernate Query Language) or SQL, and configuring a client application to communicate with a relational database. Finally, GenMAPP Builder is an application for creating the GenMAPP Gene Database files.
GenMAPP Builder’s UniProt and Gene Ontology database libraries were generated with XSD-to-DB, and the application itself uses the XMLPipeDB Utilities library. The application works by first importing UniProt and Gene Ontology XML files as well as a tab-delimited UniProt-to-GO associations file into a relational database. The database can then be queried by organism in order to produce a GenMAPP Gene Database.
GenMAPP Builder has been tested for use with the open source PostgreSQL relational database, but can be used with any other relational database management system for which a JDBC driver is available. JDBC-to-ODBC connectivity is used to transfer data from this relational database to a Microsoft Access MDB file, which is the format expected by the GenMAPP application.
XMLPipeDB is available under the GNU Lesser General Public License (LGPL).