gbdExample - copyright 2006 by P. Schattner Release This program and its use to illustrate the kent-source API to the UCSC genome browser database are described in: P. Schattner, "Automated Querying of Genome Databases", PLoS Computational Biology, 2006 =================================================================== NOTE: This software uses Jim Kent's software library routines. This library is freely available for academic, personal and nonprofit use. Much of Jim Kent's software library is also freely available for commercial use. However commercial use of some of Jim KentŐs routines, as indicated in the source code files, may be used only by explicit agreement with Jim Kent (jim_kent@pacbell.net). If in doubt, please check with Jim Kent prior to commercial use. Thank you Jim! =================================================================== Instructions for running "gbdExample" on Linux/Unix/Mac OS X machines: This program uses Jim Kent's software library and accesses data from databases underlying the UCSC Genome Browser. The program can be run in three separate ways: 1. Using gene data previously downloaded from the UCSC ftp site 2. Using gene data from the public UCSC mySQL mirror database or 3. Using gene data from a private UCSC mySQL mirror database Input files necessary to run the program in the first two modes are supplied in this package. However, unless you have installed a private mirror of the relevant UCSC Genome Browser database(s) you will not be able to demo the third (Note: it is reasonably straightforward to install the relevant database(s) on a Unix type system; however, is _not_ trivial (see README files in http://www.soe.ucsc.edu/~kent/src/ and http://genome.ucsc.edu/admin/cvs.html), and describing such an installation is beyond the scope of this README. To run the demo program (indeed even just to compile the underlying libraries) you will need to have a MySQL client installed. To install a MySQL client see http://dev.mysql.com/downloads/. Edit config.mk to set the make variables MYSQLINC MYSQLLIBS to the location of MySQL on your system. Assuming that the MySQL libraries are properly installed and configured, installing the demo program consists of the following steps: 1) Uncompress and unpack the package > tar -xvzf gbdExample-0.1.tar.gz > cd gbdExample-0.1/ 2) edit config.mk to point to the location of MySQL includes and libraries on your system 3) build the libraries and executables with the command: make Depending on your compiler, you may get warnings from the compiler such as: warning: ISO C requires whitespace after the macro name warning: `rcsid' defined but not used warning: -Wuninitialized is not supported without -O These can all be safely ignored. =================================================================== You should now be ready to run the demo program. The program finds median value of lengths of introns overlapping ranges in input file and compare with lengths of other introns in those genes. The program reads a 'bed' file of genomic regions and extracts longest gene overlapping each region. For each gene, lengths of introns overlapping the region as well as those not overlapping the region are computed. Medians of each set of intron lengths is printed out. The program supports three methods for accessing the table of known genes in the relevant UCSC database. These access methods are specified by the final argument to the program (either 'file', 'public' or 'localDb') and are illustrated by the three commands below. The format of the commands is: > gbdExample db dbTable myBedFile method where db is the database name, dbTable is tableFileName in 'file' mode or else the name of db table to use in 'public' or 'localDb' modes, myBedFile is a bed file of genomic ranges, and method is either 'public' or 'localDb' or 'file'. Demos: To run the demos with the commands below, you need to change to the 'demoData' subdirectory from the main gbdExample directory: > cd demoData 1) For the first demo, the access method is 'file' and uses a previously stored downloaded file (here called 'sgdGene.txt') of containing the table of known genes from the UCSC yeast (sacCer1) database. > ../bin/gbdExample sacCer1 sgdGene.txt yeast2intron.bed file 2) Access method is 'public' and uses the knownGene data stored in the publically available mySQL mirror maintained by UCSC at genome-mysql.cse.ucsc.edu > ../bin/gbdExample sacCer1 sgdGene yeast2intron.bed public > ../bin/gbdExample hg17 refGene mammalianSnos.hg17.refGene.bed public 3) Access method is 'localDb' and uses the knownGene data stored in a locally installed private mirror of the UCSC (sacCer1) Database. The program will fail (probably not gracefully) if a local mirror is not present. > ../bin/gbdExample sacCer1 sgdGene yeast2intron.bed localDb At this point, you may want to explore developing your own programs using the included kent-src library routines. Good luck! Comments and suggestions regarding this demonstation software are welcome. Peter Schattner schattner@soe.ucsc.edu Last revised: 09/13/06