gbdExample - copyright 2006 by P. Schattner
Release
This program and its use to illustrate the kent-source API to the
UCSC genome browser database are described in:
P. Schattner, "Automated Querying of Genome Databases", PLoS Computational
Biology, 2006
===================================================================
NOTE: This software uses Jim Kent's software library routines. This library is freely
available for academic, personal and nonprofit use. Much of Jim Kent's software
library is also freely available for commercial use. However commercial use of
some of Jim Kent�s routines, as indicated in the source code files, may be used
only by explicit agreement with Jim Kent (jim_kent@pacbell.net). If in doubt,
please check with Jim Kent prior to commercial use.
Thank you Jim!
===================================================================
Instructions for running "gbdExample" on Linux/Unix/Mac OS X machines:
This program uses Jim Kent's software library and accesses data from
databases underlying the UCSC Genome Browser. The program can be run
in three separate ways:
1. Using gene data previously downloaded from the UCSC ftp site
2. Using gene data from the public UCSC mySQL mirror database
or
3. Using gene data from a private UCSC mySQL mirror database
Input files necessary to run the program in the first two modes are
supplied in this package. However, unless you have installed a private
mirror of the relevant UCSC Genome Browser database(s) you will
not be able to demo the third
(Note: it is reasonably straightforward to install the relevant
database(s) on a Unix type system; however, is _not_ trivial
(see README files in http://www.soe.ucsc.edu/~kent/src/ and
http://genome.ucsc.edu/admin/cvs.html), and describing such an
installation is beyond the scope of this README.
To run the demo program (indeed even just to compile the underlying
libraries) you will need to have a MySQL client installed.
To install a MySQL client see http://dev.mysql.com/downloads/.
Edit config.mk to set the make variables
MYSQLINC
MYSQLLIBS
to the location of MySQL on your system.
Assuming that the MySQL libraries are properly installed and configured,
installing the demo program consists of the following steps:
1) Uncompress and unpack the package
> tar -xvzf gbdExample-0.1.tar.gz
> cd gbdExample-0.1/
2) edit config.mk to point to the location of MySQL includes and libraries
on your system
3) build the libraries and executables with the command:
make
Depending on your compiler, you may get warnings from the compiler such as:
warning: ISO C requires whitespace after the macro name
warning: `rcsid' defined but not used
warning: -Wuninitialized is not supported without -O
These can all be safely ignored.
===================================================================
You should now be ready to run the demo program.
The program finds median value of lengths of introns overlapping
ranges in input file and compare with lengths of other introns
in those genes.
The program reads a 'bed' file of genomic regions and
extracts longest gene overlapping each region. For each
gene, lengths of introns overlapping the region as well
as those not overlapping the region are computed. Medians
of each set of intron lengths is printed out.
The program supports three methods for accessing the table
of known genes in the relevant UCSC database. These access
methods are specified by the final argument to the program
(either 'file', 'public' or 'localDb') and are illustrated
by the three commands below. The format of the commands is:
> gbdExample db dbTable myBedFile method
where db is the database name, dbTable is tableFileName in
'file' mode or else the name of db table to use in 'public' or
'localDb' modes, myBedFile is a bed file of genomic ranges, and
method is either 'public' or 'localDb' or 'file'.
Demos:
To run the demos with the commands below, you need to change
to the 'demoData' subdirectory from the main gbdExample directory:
> cd demoData
1) For the first demo, the access method is 'file' and uses a
previously stored downloaded file (here called 'sgdGene.txt')
of containing the table of known genes from the UCSC yeast (sacCer1)
database.
> ../bin/gbdExample sacCer1 sgdGene.txt yeast2intron.bed file
2) Access method is 'public' and uses the knownGene data stored
in the publically available mySQL mirror maintained by UCSC at
genome-mysql.cse.ucsc.edu
> ../bin/gbdExample sacCer1 sgdGene yeast2intron.bed public
> ../bin/gbdExample hg17 refGene mammalianSnos.hg17.refGene.bed public
3) Access method is 'localDb' and uses the knownGene data stored
in a locally installed private mirror of the UCSC (sacCer1) Database.
The program will fail (probably not gracefully) if a local
mirror is not present.
> ../bin/gbdExample sacCer1 sgdGene yeast2intron.bed localDb
At this point, you may want to explore developing your own programs
using the included kent-src library routines. Good luck!
Comments and suggestions regarding this demonstation software are welcome.
Peter Schattner
schattner@soe.ucsc.edu
Last revised: 09/13/06