An Online Bioinformatics Curriculum

David B. Searls

doi:10.1371/journal.pcbi.1002632

Abstract

Online learning initiatives over the past decade have become increasingly comprehensive in their selection of courses and sophisticated in their presentation, culminating in the recent announcement of a number of consortium and startup activities that promise to make a university education on the internet, free of charge, a real possibility. At this pivotal moment it is appropriate to explore the potential for obtaining comprehensive bioinformatics training with currently existing free video resources. This article presents such a bioinformatics curriculum in the form of a virtual course catalog, together with editorial commentary, and an assessment of strengths, weaknesses, and likely future directions for open online learning in this field.

Figures

Citation: Searls DB (2012) An Online Bioinformatics Curriculum. PLoS Comput Biol 8(9): e1002632. https://doi.org/10.1371/journal.pcbi.1002632

Editor: Fran Lewitter, Whitehead Institute, United States of America

Published: September 13, 2012

Copyright: © David B. Searls. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The author received no specific funding for this article.

Competing interests: The author has declared that no competing interests exist.

Online Learning Comes of Age

Online academic “courseware” at the university level has now been available to the public for a decade, the earliest concerted effort having originated in 2002 with the Massachusetts Institute of Technology (MIT) and their OpenCourseWare initiative (http://ocw.mit.edu). This project offered up the syllabi, lecture notes, quizzes, exams, and/or other study materials for a very large number of courses, at the discretion of professors but with strong support and encouragement from the MIT administration. Only in a minority of cases were videos of lectures posted.

Even before this, The University of California, Berkeley, had started webcasting lectures, and eventually began posting both audio and video for public consumption at their Berkeley Webcast site (http://webcast.berkeley.edu), though without the ancillary materials of MIT's OpenCourseWare. A number of other universities followed suit, though seldom so extensively; among these was Stanford with its ClassX streaming service (http://classx.stanford.edu/ClassX) and an earlier effort called Stanford Engineering Everywhere (http://see.stanford.edu/see/courses.aspx). In many cases, individual faculty members took the initiative to post course materials, including video, in widely varying formats. Some adopted the use of “Khan-style videos” or tablet-based screencasts of the sort popularized by the Kahn Academy with its vast library of instructional videos, which started as a viral YouTube sensation and has now become its own well-funded institution (http://www.khanacademy.org).

YouTube indeed became the destination of many academic videos, which are now aggregated by institution under YouTube EDU (http://www.youtube.com/education). Apple has also put its distinctive stamp on online learning with iTunes U (http://www.apple.com/education/itunes-u), also organized by institution but with integrated search capability and, of course, deployment to iPad and iPhone apps. Countless aggregators also assemble collections of video courses, but generally with little value added.

Yale University began in 2007 to release Open Yale Courses (http://oyc.yale.edu) in a more curated and consistent format than most other efforts, including high-quality video and extensive syllabi; courses appeared incrementally, with just under 50 available to date. Then, in 2011, MIT revamped several of its online courses into a much more structured instructional format, with learning modules in outline form containing videos interspersed with self-assessment and other activities. In a somewhat different vein, the non-profit Saylor Foundation compiled a comprehensive online university curriculum comprising courses that are essentially mashups of video and text resources from many existing sources, including a number of those described above (http://www.saylor.org).

In the fall of 2011, a highly publicized online course, “Introduction to Artificial Intelligence” (AI), was conducted by Stanford University Prof. Sebastian Thrun and Google's Director of Research, Peter Norvig, based on the Stanford AI course. It ran “live” in the sense that new videos were released and homework assignments collected on a weekly basis, and quizzes and exams were given at set times, while discussion logs allowed for some degree of interaction. The course attracted 160,000 students from 190 countries, 22,000 of whom finished successfully and were granted “certificates of completion” [1]. Shortly afterwards, MIT set up a similar approach on a new platform called MITx, offering a course in electronic circuits that attracted comparable numbers of students (https://6002x.mitx.mit.edu).

The trend to structured presentation and high production quality then accelerated remarkably, and took an entrepreneurial turn. The AI course was effectively spun off by Prof. Thrun into a Web startup called Udacity (http://www.udacity.com), which is currently live with six courses. In April of 2012, two other Stanford scientists, Profs. Andrew Ng and Daphne Koller, announced a similar newco called Coursera (https://www.coursera.org), with backing from major Silicon Valley venture capital firms. Coursera, also now live, is being stocked with courses from academic partners Stanford, Princeton University, the University of Pennsylvania, and the University of Michigan; this list was recently augmented with a tranche of a dozen more top-tier universities. And in May of 2012, barely six months after MIT had rolled out its new MITx platform, they and Harvard announced that the institutions were investing $30 million each in a joint online learning initiative called edX (http://www.edxonline.org).

All of these initiatives promise to offer undiluted, highly interactive university-level courses to the public, free of charge. Moreover, there is every indication that the instruction can be effective; the U.S. Department of Education, in an exhaustive meta-analysis of 51 published head-to-head trials, found that “on average, students in online learning conditions performed better than those receiving face-to-face instruction” [2].

An Online Bioinformatics Education

Clearly a revolution in open online learning is at hand. This is a welcome addition to a movement that also encompasses open online scientific publication, of which this journal is an example. As such, this is an appropriate forum to assess the current potential for a freely accessible online bioinformatics education.

Both the completeness and the quality of such an unconventional education should be evaluated. Such judgments cannot be entirely objective, and even curricula in conventional university settings vary widely. Thus, this must ultimately be considered an “opinion piece.” Even its purely factual content has to be viewed as evanescent, given the rate of change in online education, and the fact that newly announced initiatives may increase the selection and quality of courses available to a considerable extent even within the year.

Even so, the first opinion offered here is that it is probably already possible for a motivated student to become a competent, employable bioinformatics professional in the comfort of his or her own home—with certain important caveats to be elaborated in the discussion at the end. By way of evidence, a suggested curriculum will be laid out that is supported by existing online resources.

This central thesis, that online bioinformatics education has in some sense “arrived,” can certainly be challenged on a number of counts. The fundamental question of the optimal content for bioinformatics training would probably elude universal consensus in any case, and perhaps the most that can be hoped for is that what follows will contribute meaningfully to the dialogue. Even so, the reader has a right to question both the author's qualifications and methodology in offering these opinions.

The author has advanced degrees in both biology and computer science, has published original research in both fields, and has passing familiarity with but is by no means expert in all of the advanced course topics described below. He has helped design academic curricula as part of a major training grant and taught at both an undergraduate and graduate level, though not extensively, having spent most of his career in the computer and then the pharmaceutical industries. However, in the latter positions he was directly or indirectly responsible for hiring well over a hundred scientists and engineers for bioinformatics-related roles. Thus if any bias exists, it is probably in favor of the practical over the theoretical, though the author's own research is somewhat more in the latter category.

In terms of methodology, the author has personally sampled all of the main courses listed below that are currently available, as well as most of those offered as alternatives or suggested for advanced study. Of these, he has actually completed six of the main courses and seven in the latter categories (most recently, two of the inaugural offerings by Coursera), and has made significant progress in several more. In each case the main course offering for a given topic was adjudged superior to the alternatives based on a variety of criteria including coverage, production quality, availability of ancillary course material, and incorporation of the latest modular courseware technologies described above. Less tangible factors such as teaching style, clarity, and pace were also considered. Courses listed as alternatives to the main courses still met basic standards of quality, and in addition to offering redundancy often had other features that might appeal to specific students, for instance in terms of areas of emphasis. In several cases, courses were selected as main offerings despite being scheduled but not yet online; such judgments were made based on instructors' proven teaching backgrounds and in some instances after direct consultation with them on the syllabi.

Only courses offered without charge were considered. Online courses and entire degree programs for money are widely available, though troubling to some given issues of accreditation and mounting student debt. Course discussion logs on free resources like Coursera indicate a tremendous demand for online education in the developing world, and students anywhere may need to be thrifty, particularly if they are retraining or exploring career change. There are certainly extension programs of universities and other for-profit resources that offer good value-for-money in this arena, and those who can afford it should not be discouraged from taking advantage of such benefits as personalized instruction. Nevertheless, part of the challenge in the present instance is to see just how far the free resources have come. Moreover there is the practical issue that extending the analysis to paid courses would open up a much larger set of alternatives, most of which are inaccessible to evaluation without expenditure.

Only video courses are included, either showing the instructor with slides and/or blackboard, or in screencast format. Learning from course notes only, or even disembodied audio, simply doesn't have the immediacy of the visual experience of a lecture hall or even a tablet-based screencast. At the other extreme, one could maintain that reading textbooks at one's own speed is a more efficient and focused way to learn. That is certainly true for some, and perhaps more so for experienced and mature scholars, but it is probably also true that a lecture format offers much-needed structure to the learning process for others. Moreover, cognitive psychology offers both a theoretical basis and empirical evidence for the benefits of multimedia learning [3]. In any case, most of the courses below require reading at least selections from one or more textbooks in close coordination with the lectures (though in a surprising number of cases the textbooks are freely available online).

What follows, then, is a virtual catalog for a course of study in bioinformatics. It includes both core courses and electives, as will be evident in the commentaries included with each course. Even at that, different paths are possible depending on preparation (whether the student starts with a biology and/or computer science background already) and inclination (whether the student plans to focus on bioinformatics analysis and needs less programming experience, or hopes to develop algorithms and systems that require considerably more computational sophistication). Since this virtual program awards no degrees and makes no guarantees, it will not attempt to set absolute standards for numbers of credits and distribution of core and elective subjects, but will suggest possible study threads in the penultimate section of this article.

Going further.

One particular subfield of biology that constitutes an exceedingly complex system is immunology, which has even spawned its own discipline of immunoinformatics. There are several introductory immunology courses available, including a shorter one presented from a medical perspective by Dr. Harris Goldstein of Albert Einstein Medical College (http://www.youtube.com/playlist?list=PL5703ABB5D07584D7) and another from a molecular and evolutionary standpoint by Prof. Gregory Beck of the University of Massachusetts (http://itunes.apple.com/us/itunes-u/intro-to-immunology-biol-378/id476313031).

Eukaryotic Gene Expression

Source.

Indian Institute of Science (IISc), Bangalore, Prof. P.N. Rangarajan

Link.

http://nptel.iitm.ac.in/courses/104108056

Provider description.

“[Topics include] cis-acting elements and trans-acting factors … domain structure of eukaryotic transcription factors … role of chromatin … synthesis of mRNA, rRNA, and tRNA … cell surface receptors … intracellular receptors … regulation of gene expression during development … recombinant protein expression systems … gene therapy and transgenic technology …”

Commentary.

This NPTEL course offers a significantly more detailed view of gene regulation than the courses above, though it overlaps with them. It is not absolutely current but will still be of interest to those interested in bioinformatics of signaling pathways and genetic networks. For the larger perspective students should also view a seminar by Dr. Robert Tjian on “The Molecular Biology of Gene Regulation” (http://www.ibioseminars.org/lectures/bio-mechanisms/robert-tjian.html) and, for more recent aspects of microRNA-based regulation, talks by Dr. Adrian Ferré-D'Amaré on “Catalytic and Gene Regulatory RNAs” (http://videocast.nih.gov/launch.asp?17170), by Dr. Victor Ambros on “MicroRNA Pathways in Animal Development” (http://videocast.nih.gov/launch.asp?14844), and by Dr. Witold Filipowicz on “Regulating the Regulators: Mechanisms Controlling Function and Metabolism of microRNAs” (http://videocast.nih.gov/launch.asp?17234).

Prerequisites.

Introduction to Biology and Biochemistry or equivalent.

Computational Molecular Biology

Source.

Stanford, Biochem 218, Prof. Doug Brutlag (Spring 2012)

Link.

Going further.

The “EMBO Practical Course on Analysis of High-Throughput Sequence Data” (http://www.ebi.ac.uk/training/online/course/embo-practical-course-analysis-high-throughput-seq) is highly recommended as a hands-on introduction to modern genomic analysis. It closely coordinates video lectures with detailed analysis exercises, with tutorial handouts and code supplied, using R and Bioconductor. Topics include short read analysis, ChIP-Seq data and analysis, statistical concepts, differential expression by RNA-Seq, and allele-specific expression and eQTL.

Biological Seminars

Source.

Howard Hughes Medical Institute, iBioSeminars

Link.

http://www.ibioseminars.org

Provider description.

“iBioSeminars is a freely available library of video seminars from outstanding scientists, including many HHMI investigators. These lectures, which describe on-going research in leading laboratories, feature an extensive introduction to the subject matter, making them accessible to advanced undergraduates or beginning graduate students and researchers outside of the specific field. The main subject areas are biological mechanisms, cell biology and medicine, developmental biology and evolution, chemical biology and biophysics, and global health and energy.”

Commentary.

Much of a biologist's advanced training is down to departmental seminars, invited speakers, conferences, etc. This star-studded collection amassed by the Howard Hughes Medical Institute now has some 80 extended seminars covering a wide range of topics, including some that are underrepresented in the available online courseware, such as neurosciences and developmental biology. An important side benefit of learning the scientific content itself is the educational experience of becoming familiar with the names, faces, and presentation techniques of many of the top scientists in the American biological community.

Alternatives.

A particularly rich lode of talks by distinguished scientists is the NIH Director's Wednesday Afternoon Lecture series (http://videocast.nih.gov/PastEvents.asp?c=3). While there are almost 15 years' worth of these videos available for mining, the online student might be well advised to make a habit of tuning in to the live streaming of these events, for more of a flavor of the campus experience.

Mathematics Department

Differential Equations

Source.

MIT, 18.03SC, Prof. Arthur Mattuck (Fall 2011)

Link.

http://ocw.mit.edu/courses/mathematics/18-03sc-differential-equations-fall-2011

Provider description.

Alternatives.

The charismatic Prof. N.J. Wildberger of the University of New South Wales offers a similar course (http://www.youtube.com/playlist?list=PL01A21B9E302D50C1). Prof. Jim Hefferon of Saint Michael's College has a nice introductory online textbook (http://joshua.smcvt.edu/linearalgebra).

Going further.

The Harvard Extension School has an advanced course in “Abstract Algebra” taught by Prof. Benedict Gross, starting from a linear algebra foundation to study group theory, vector spaces, fields, etc. (http://www.extension.harvard.edu/open-learning-initiative/abstract-algebra). Prof. Edwin Connell of the University of Miami has a free online textbook “Elements of Abstract and Linear Algebra” with a similar approach (http://www.math.miami.edu/~ec/book). While these may be overkill for bioinformatics, it might just inspire some to seek deeper insights into structures in large datasets. Prof. Strang himself teaches two follow-on video courses in applied mathematics, developing his linear algebra-oriented approach to networks, structures, estimation, Fourier analysis, convolution filtering, etc. (http://ocw.mit.edu/courses/mathematics/18-085-computational-science-and-engineering-i-fall-2008 and http://ocw.mit.edu/courses/mathematics/18-086-mathematical-methods-for-engineers-ii-spring-2006). His magisterial self-published textbook for these courses includes a treatment of microarray analysis to discover “eigengenes” [9].

Statistics

Source.

Princeton on Coursera, Prof. Andrew Conway (Fall 2012)

Link.

https://www.coursera.org/course/stats1

Provider description.

“Statistics One is designed to be a friendly introduction to very simple, very basic, fundamental concepts in statistics … Random sampling and assignment. Distributions … Descriptive statistics. Measurement … Correlation. Causality … Multiple regression. Ordinary least squares … Confidence intervals. Statistical power … t-tests, chi-square tests. Analysis of Variance.”

Commentary.

Only those with no exposure at all to statistics, or those who would benefit from a refresher, should feel the need to take this rather elementary introduction, but the skills are certainly essential to bioinformatics analysis. If necessary it can also provide a gentle lead-in to the Introduction to Probability course, which in turn will be required for more advanced work in statistics. The course makes use of the free statistical software package R (http://www.r-project.org), which bioinformatics practitioners should have in their toolbox not only for classical statistical tests taught here but for more advanced applications such as linear and nonlinear modeling, time-series analysis, classification, clustering, etc.

Alternatives.

Udacity is offering a similar introductory course by Stanford Prof. Sebastian Thrun (http://www.udacity.com/overview/Course/st101). Profs. Susan Dean and Barbara Illowski of De Anza College offer an “Elementary Statistics” video course that also has a free online textbook and a full complement of quizzes, exams, and assignments (http://sofia.fhda.edu/gallery/statistics/index.html). For a stimulating change, one can consider learning or reviewing the basics of statistics from the perspectives of other disciplines. For instance, another way to pick up R while learning a little epidemiology is through Berkeley Prof. Tomas Aragon's course in “Applied Epidemiology using R” (http://www.youtube.com/view_play_list?p=1CBCB8C53D0CBE1F). A somewhat more detailed (but also considerably more protracted) treatment of basic research statistics is to be found in Berkeley Prof. Frederic Theunissen's “Research and Data Analysis in Psychology” (http://www.youtube.com/view_play_list?p=A07B0BAB1D82C53C). For those with more math and less time, an “Introduction to Statistical Methods for High-Energy Physics” by Prof. Glen Cowan (http://videolectures.net/cernstudentsummerschool09_cowan_is) is a four-lecture overview of material taught in the University of London course.

Going further.

Prof. Wim Krijnen of Hanze University in the Netherlands has a free online textbook “Applied Statistics for Bioinformatics using R” [10] that does a lovely job of combining a course in statistics with instruction in R and more advanced applications to bioinformatics such as microarray analysis. Further study of statistics should be undertaken only after completing the Introduction to Probability below.

Introduction to Probability

Source.

Alternatives.

Prof. Mark Wickert of the University of Colorado at Colorado Springs has put up a very nice screencast series with good notes (http://www.eas.uccs.edu/wickert/ece2610). Prof. Richard Baraniuk of Rice University, who has been a long-time advocate for open source learning (http://www.ted.com/talks/richard_baraniuk_on_open_source_learning.html), maintains a free online textbook (http://cnx.org/content/col10064). In a somewhat different vein, Stanford Engineering offers an excellent course by Prof. Brad Osgood on “The Fourier Transform and its Applications” that adopts more of a deep mathematical than an engineering approach to the subject, so for those who passionately prefer “i” to “j” (and you know who you are) this may be a better choice (http://see.stanford.edu/see/courseInfo.aspx?coll=84d174c2-d74f-493d-92ae-c3f45c0ee091).

Going further.

Prof. Wickert (see above) has also created an advanced video course on “Statistical Signal Processing” that again has good notes (http://www.eas.uccs.edu/wickert/ece5615). The book “Introduction to Statistical Signal processing” by Stanford Prof. Robert Gray and University of Maryland Prof. L. D. Davisson is freely available online [31]. For a deeper dive into modern linear systems theory, Stanford Engineering has a wonderful course by Prof. Stephen Boyd called “Introduction to Linear Dynamical Systems” (http://see.stanford.edu/see/courseinfo.aspx?coll=17005383-19c6-49ed-9497-2ba8bfcfe5f6). Linear Algebra is an absolute prerequisite for both these advanced courses, and the former would require Probability as well.

MIT's approach in their “Computer System Engineering” course tends to view software and hardware as a whole, focusing on controlling complexity, strong modularity, networks, parallelism, recovery, reliability, and security (http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-033-computer-system-engineering-spring-2009). A more traditional course, with greater emphasis on project management, is available from IIT Bombay Profs. N.L. Sarda, Umesh Bellur, and Rushikesh Joshi through NPTEL (http://nptel.iitm.ac.in/video.php?subjectId=106101061).

Going further.

MIT also offers a higher-level course called “Performance Engineering of Software Systems” that focuses on performance analysis, algorithmic techniques for high performance, instruction-level optimizations, cache and memory hierarchy optimization, parallel programming, and building scalable distributed systems (http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010). A more elementary course but one that focuses on an important specific skill is Udacity's “Software Testing” by Prof. John Regehr of the University of Utah (http://www.udacity.com/overview/Course/cs258).

Introduction to Databases

Source.

Stanford, Prof. Jennifer Widom (Fall 2011)

Link.

http://www.db-class.org/course

Provider description.

“This course covers database design and the use of database management systems for applications. It includes extensive coverage of the relational model, relational algebra, and SQL. It also covers XML data including DTDs and XML Schema for validation, and the query and transformation languages XPath, XQuery, and XSLT. The course includes database design in UML, and relational design principles based on dependencies and normal forms. Many additional key database topics from the design and application-building perspective are also covered: indexes, views, transactions, authorization, integrity constraints, triggers, on-line analytical processing (OLAP), and emerging ‘NoSQL’ systems.”

Commentary.

This is a relatively short but well-constructed course that was yet another variation on Stanford Engineering's courseware initiatives. The quizzes and short segments, presaging the approach used by Coursera, seem particularly effective for learning efficiently. This material should be considered core to bioinformatics of any stripe.

Alternatives.

The University of Washington has an archived distance learning course by Prof. Alon Halevy (now at Google) that is titled “Introduction to Database Systems” but emphasizes data management (http://www.cs.washington.edu/education/courses/csep544/04sp). There is a more classical and in-depth database course by Profs. Dharanipragada Janakiram of IIT Madras and Srinath Srinivasa of IIT Bangalore via NPTEL (http://nptel.iitm.ac.in/video.php?subjectId=106106093).

Computer Graphics

Source.

UC Davis, ECS 175, Prof. Kenneth Joy (Fall 2009)

Link.

http://itunes.apple.com/us/itunes-u/computer-graphics-fall-2009/id457893733

Provider description.

“Principles of computer graphics. Current graphics hardware, elementary operations in two-and three-dimensional space, transformational geometry, clipping, graphics system design, standard graphics systems, individual projects.”

Commentary.

Given the importance of scientific visualization to bioinformatics, this should be a popular elective. This course goes straight to 3D graphics, using Open GL and Qt for a considerable amount of high-level coding. It is also a good opportunity to get some exposure to graphical processing units (GPUs), which can also be used to greatly speed up non-graphical computations of relevance to bioinformatics.

Prerequisites.

Linear Algebra, Data Structures, strong programming skills.

Alternatives.

The Harvard Extension School has a substantially similar offering entitled “Introduction to Computer Graphics and GPU Programming” by Prof. Hanspeter Pfister and Eric Chan (http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=429428034). A more exhaustive introduction to the algorithms (but with no coding) is provided by IIT Madras Prof. Sukhendu Das in “Computer Graphics” via NPTEL (http://nptel.iitm.ac.in/video.php?subjectId=106106090).

Going further.

UC Davis also offers advanced courses through their Institute for Data Analysis and Visualization, including “Graphics Architecture” (http://itunes.apple.com/us/itunes-u/graphics-architecture-winter/id404606990), which does GPUs in-depth; “Geometric Modeling” (http://itunes.apple.com/us/itunes-u/computer-science-introduction/id389259246); and “Advanced Visualization” (http://itunes.apple.com/us/itunes-u/advanced-visualization-ecs277/id389259186).

Digital Image Processing

Source.

Indian Institute of Technology (IIT) Kharagpur, EC61501, Prof. P.K. Biswas

Link.

http://nptel.iitm.ac.in/video.php?subjectId=117105079

Provider description.

“Digital image fundamentals … Image enhancement in spatial domain … Edge detection … Image filtering in frequency domain … Image restoration … Color image processing … Morphological Image Processing … Image segmentation … Texture Analysis …”

Commentary.

Image processing has long been important in biomedical imaging and in certain omic technologies such as microarrays. It also comes into play with next-generation sequencing platforms as well as high-content screening that involves image processing of cell-based assays. This is a rigorous engineering approach to the subject for hard-core pixel jockeys.

Prerequisites.

Differential Equations, Linear Algebra, Signals and Systems

Alternatives.

The UC Davis program described in the previous entry also offers an “Image Processing and Analysis” course (http://itunes.apple.com/us/itunes-u/image-processing-analysis/id458753849).

Going further.

Machine learning techniques for computer vision and image understanding are useful extensions of the basic techniques of image processing. Berkeley Prof. Jitendra Malik has a Coursera entry entitled “Computer Vision: The Fundamentals” that covers segmentation of biological images (https://www.coursera.org/course/vision). Short courses available on Videolectures.net (see Computational Seminars below) include, among others, “Learning in Computer Vision” by Prof. Simon Lucey of Carnegie Mellon University (http://videolectures.net/mlss08au_lucey_linv) and “Markov Random Fields for Vision and Graphics” by Prof. Richard Hartley of the Australian National University (http://videolectures.net/ssll09_hartley_covi). Students should first take Learning Systems or similar.

Massively Parallel Computing

Source.

Harvard Extension School, CSCI E-292, Profs. Hanspeter Pfister and Nicolas Pinto (Spring 2011)

Link.

http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=429428651

Provider description. “In this course, students get hands-on experience in developing software for massively parallel computing resources. We cover parallel programming models, hardware architectures, multi-threaded programming, GPU programming, cluster computing, cloud computing, and MapReduce using Hadoop and Amazon's EC2.”

Commentary.

Another set of skills highly relevant to current bioinformatics practice, and therefore an attractive elective. This course focuses first on GPU programming with CUDA and then on MapReduce/Hadoop programming on the Amazon Cloud. For the former, a home computer with a high-end Nvidia GPU should be sufficient (the pyCUDA Python binding is used), though online students will of course not have access to the GPU cluster used in the course. For the Cloud, EC2 accounts are free but Amazon will charge a modest amount for cycles (http://aws.amazon.com/ec2).

Prerequisites.

Programming skills and some exposure to UNIX systems programming.

Alternatives.

Stanford offers a course more narrowly focused on GPUs (http://itunes.apple.com/itunes-u/programming-massively-parallel/id384233322?mt=2) as well as shorter practical courses in GPUs (http://classx.stanford.edu/ClassX/system/users/web/pg/view_subject.php?subject=NVIDIA_ICME_SPRING_2010_2011), Hadoop (http://classx.stanford.edu/ClassX/system/users/web/pg/view_subject.php?subject=HADOOP_WINTER_2010_2011), and the Amazon Cloud (http://classx.stanford.edu/ClassX/system/users/web/pg/view_subject.php?subject=AEC2_WINTER_2010_2011).

Introduction to Algorithms

Source.

MIT, 6.046J, Profs. Charles Leiserson and Erik Demaine (Fall 2005)

Link.

Prerequisites.

Data Structures or equivalent. Basic probability and propositional logic.

Going further.

This course provides modest coverage of the topics in the Commentary, which may well lead the interested student to pursue additional elective courses below. Students interested in robotic technologies, for instance in control of laboratory automation, should consider Stanford Prof. Oussama Khatib's course “Introduction to Robotics” (http://see.stanford.edu/see/courseinfo.aspx?coll=86cc8662-f6e4-43c3-a1be-b30d1d179743). For a look at the deepest philosophical foundations of ontologies, students may enjoy a short course by Prof. Barry Smith of the University of Buffalo entitled “An Introduction to Ontology” (http://ontology.buffalo.edu/smith/IntroOntology_Course.html). For a more computational approach, Prof. John Sowa has a well-organized but text-only “Guided Tour of Ontology” (http://www.jfsowa.com/ontology/guided.htm) that includes readings from his book “Knowledge Representation” [40]. Dr. Doug Lenat, another knowledge representation pioneer, gave an interesting seminar at NIH called “Computers versus Common Sense” (http://videocast.nih.gov/launch.asp?15085).

Learning Systems

Source.

California Institute of Technology, CS 156, Prof. Yaser Abu-Mostafa (Spring 2012)

Going further.

For an advanced, more purely mathematical approach to the subject matter, see “Non-Cooperative Game Theory” as taught by Prof. Tamer Basar of the University of Illinois at Urbana-Champaign (http://www.networkmaths.ie/videos/list_videos.php?course=game). Two worthwhile seminars relating game theory to neurosciences are “Neural Basis of Strategic Choice” by Dr. Giorgio Coricelli (http://videocast.nih.gov/launch.asp?17030) and “Neuroeconomic Approaches to Mental Disorders” by Dr. P. Read Montague (http://videocast.nih.gov/launch.asp?16632).

Entrepreneurship

Source.

Stanford Technology Ventures Program Entrepreneurship Corner

Link.

http://ecorner.stanford.edu

Provider description.

“The Stanford Technology Ventures Program (STVP) Entrepreneurship Corner is a free online archive of entrepreneurship resources for teaching and learning. The mission of the project is to support and encourage faculty around the world who teach entrepreneurship to future scientists and engineers, as well as those in management and other disciplines.”

Commentary.

Many students who learn bioinformatics will be exposed to the very latest advances in both biotechnology and computing, probably the two fields that result in the greatest rate of business startups, especially from academic spinoffs. Thus learning entrepreneurship skills is entirely appropriate as an elective in this curriculum. The STVP is housed in Stanford Engineering and hosted by the department of Management Science and Engineering. The web site has hundreds of videos, including seminars, case studies, and tutorials, many by Silicon Valley luminaries. As a way of organizing the student's approach to this cornucopia, two collections in particular are recommended: “Invitation to Venture” (http://ecorner.stanford.edu/collections.html?collectionId=1) as an introduction, and then “Technology Ventures” (http://ecorner.stanford.edu/collections.html?collectionId=2) as a more directed approach of relevance to bioinformatics.

Going further.

Students with strongly entrepreneurial tendencies might also wish to take a look at University of Michigan Prof. Gautam Kaul's “Introduction to Finance” on Coursera (https://www.coursera.org/course/introfinance). For the basics, there are countless economics courses online, but the Annenberg Center has a particularly nicely produced overview (http://www.learner.org/resources/series79.html).

Justice

Source.

Harvard, ER22, Prof. Michael Sandel (Fall 2008)

Link.

http://www.justiceharvard.org

Provider description.

“A critical analysis of classical and contemporary theories of justice, including discussion of present-day applications. The course examines debates about justice prominent in moral and political philosophy, and invites students to subject their own views on these controversies to critical examination.”

Commentary.

At the inception of the Genome Project significant emphasis was placed on “ELSI” or ethical, legal, and social implications, and these are even more prominent today in such issues as personal data privacy, bioethics in human and animal experimentation, and the like. Biologists nowadays often have some training in bioethics but for computer scientists it may be more novel, yet increasingly important given new capacities for mining Big Data. This is a relatively short and very general introduction to ethics, but one that is highly intellectually stimulating—so much so that it fills a large theatre whenever it is presented at Harvard by Prof. Sandel, with production values worthy of a one-man show on Broadway. You are likely to discover useful things about yourself, for example, whether you are a deontologist or a consequentialist (which, for you computer types, has something to do with whether your moral judgments are determined at compile-time or run-time).

Alternatives.

Oxford has a course of similar (short) length called “A Romp through Ethics for Complete Beginners,” taught by Prof. Marianne Talbot with more focus on traditional moral philosophy (http://podcasts.ox.ac.uk/series/romp-through-ethics-complete-beginners).

Going further.

UCLA Prof. Bob Goldberg teaches an honors collegium entitled “Genetic Engineering in Medicine, Law, & Agriculture” that focuses on a range of legal and ethical issues in biotechnology (http://www.mcdb.ucla.edu/Research/Goldberg/HC70A_W12/videos.php). On Coursera from the University of Pennsylvania, Prof. Ezekiel Emanuel has a timely course on “Health Policy and the Affordable Care Act” (https://www.coursera.org/course/healthpolicy), while his colleague Prof. Jonathan Moreno will be covering the interaction of neurosciences with ethics for “Neuroethics” (https://www.coursera.org/course/neuroethics). The NIH offers a comprehensive short course on “Ethical and Regulatory Aspects of Clinical Research” (http://www.bioethics.nih.gov/hsrc and click on “Podcasts” for the videos).

Courses of Study

As noted at the outset, students will come to online learning from different backgrounds and with different goals in mind, and moreover will have different amounts of time to devote to the process. Therefore it is not helpful to be overly prescriptive about course selection. However, it is possible to identify some basic “types” of bioinformatics practitioners, and to suggest possible course selections best suited to those career paths. It should be emphasized that different institutions and individuals may have other views on bioinformatics curricula, disagreeing on appropriate electives and even on core courses. To this, the author can only plead editorial privilege, and remind the reader that these are opinions based on one person's experience in the field. It would be prudent for potential students to seek a variety of opinions.

Curriculum Tracks

We identify below a set of five possible tracks, noting two-letter abbreviations used in Tables 1–4 where the recommended distributions of courses for each track are indicated using symbols defined in the key at the bottom of each table. (See individual course descriptions above for explanations of source abbreviations and further elaboration of requirements.) There may well be other paths, and certainly a variety of more specialized ones, but these broad categories would seem to be a useful start.

Download:

Table 1. Biology Department curriculum with recommended tracks.

https://doi.org/10.1371/journal.pcbi.1002632.t001

Download:

Table 2. Mathematics Department curriculum with recommended tracks.

https://doi.org/10.1371/journal.pcbi.1002632.t002

Download:

Table 3. Computer Science Department curriculum with recommended tracks.

https://doi.org/10.1371/journal.pcbi.1002632.t003

Download:

Table 4. Other Departments curriculum with recommended tracks.

https://doi.org/10.1371/journal.pcbi.1002632.t004

In Tables 1–4, the courses in each virtual department indicated as prerequisites for a given track represent an assumed background for individuals entering the track, and should certainly be taken if the material is unfamiliar or needs refreshing. Core courses are those deemed central to the track, and should be taken if the material has not already been mastered elsewhere. Electives are at the option of the student, but certain of these are indicated as recommended, and several at least should be taken as time permits. Finally, for some tracks, additional study is recommended to extend certain course topics (denoted by plus signs), as discussed below under Independent Study.

Bioinformatics Analysis (BA).

This track prepares an individual to do biological data analysis with a view to interpretation or prediction. It involves such skills as sequence, expression, and functional analysis by means of a standard bioinformatics tool set, as well as an ability to write computational scripts, database queries, and simple programs.

Data Mining (DM).

This track begins with the analyst skill set but goes further to enable more sophisticated analyses of datasets that are especially complex, for example, by virtue of being very large scale, noisy, high-dimensional, semantically rich, poorly organized or integrated, etc. It entails a greater depth of both mathematical knowledge and programming skills.

Bioinformatics Tools (BT).

This track is meant to afford the capability to develop standalone tools of significant sophistication for bioinformatics analysis, visualization, presentation, and local data management. It requires programming skills in a variety of languages and the ability to implement complex algorithms efficiently, based on solid biological domain knowledge.

Bioinformatics Systems (BS).

This track adds to the previous one the competency for software engineering in-the-large, at a level sufficient to participate in or lead the development of major bioinformatics systems and/or products, for instance supporting data management and analysis from novel technological platforms through complex downstream analysis pipelines.

Computational Biology (CB).

This track is intended to prepare individuals to do original research in biological modeling and analysis by way of advanced mathematical and computational techniques. It provides a deeper grounding in computer science and engineering disciplines relevant to the sciences of complexity, information, and systems.

Independent Study

Even in a university environment, it is not unusual for the classes that are necessary or desirable for a given course of study to be unavailable when needed. Certainly the curriculum above is constrained by the available online courses, as discussed below in the conclusion. In addition the patchwork nature of the courses, arising as they do from many institutions, can be a strength but also a weakness, with less opportunity for coordination and seamless sequencing of course contents. As in academia, any gaps can be addressed, or special interests accommodated, by independent study. The major disadvantage is the lack of a faculty mentor, which requires students to be proactive, self-sufficient, and conscientious in discerning the needs and means for supplementing their coursework. Perhaps the best way to approach this is for students to make a habit of reading the key journals in their field so as to discover systematic gaps in their knowledge.

The type of independent study needed will depend on the background of the student and on the track they are following. Some suggestions for individual tracks are indicated by plus signs in Tables 1–4. A plus to the right of a course symbol (whether prerequisite or core) indicates that advanced work in the topic area of that course is recommended for students in that track. Often some specific suggestions for additional study are indicated in the “Going Further” sections of the course catalog, but where specialized courses are not to be found online (as is likely), one hopes that the basic course has provided sufficient background for the student to learn by self-study of more advanced texts and journals.

For Bioinformatics Analysis, additional biology coursework or other study would be required for the student to approach problems with the expected degree of domain sophistication, so that interpretations of data are placed in an appropriate biological context. Ideally this would include exposure to laboratory science, which of course is unlikely in the case of online learners. However, it is expected that many individuals embarking on this track would already be degreed biologists who are seeking additional training to do advanced analyses with their own data or that of others. To some degree the same may be true of the Data Mining track, though these individuals are probably more likely to be committed to a career in exclusively “dry” biology.

Students in the two software tracks, Bioinformatics Tools and Bioinformatics Systems, may wish to take additional courses in subjects such as machine architecture, operating systems, or theory of programming languages, but by far the most important requirement for independent study is actual programming experience. These individuals would be well advised to take on substantial projects in the biological domain that go beyond the requirements of the courses taken.

Finally, the Computational Biology track may call for independent study in a variety of topics in advanced mathematics and computer science as well as biological background necessary for a particular specialization. The curriculum offered here is slanted toward systems biology in this regard, but individuals may prefer to study topics such as evolutionary dynamics or mathematical genetics that would require additional study.

Conclusion

As noted at the outset, any proposed curriculum must be based on the shifting sands of available offerings, and moreover is necessarily a matter of opinion, both scientific and pedagogical. Without a doubt there are gaps, and quality is not uniform. For instance, there are few suitable resources in important areas such as neuroscience and structural biology, and several other areas are thin. But the offerings are only getting better and more numerous, and so any imperfections in the current collection should be increasingly easy to correct with the passage of time. A more pertinent question is whether an online education is an adequate substitute for what is termed a resident education, in general and in the particular case of bioinformatics.

One undeniable truism is that independent study requires motivation and discipline in the extreme. Students must be committed to doing assigned readings, exercises and assessments faithfully to achieve maximum uptake, the more so for being on their own. A companion article by the author, “Ten Simple Rules for Online Learning” [44], attempts to provide practical advice along these lines.

A particular piece of advice it offers is to pay special attention to doing programming projects in the biological domain. One great risk to the proposition of online bioinformatics education is that students never really get to grips with applying newfound computational or analytic skills to real biological data and actual problems in the full context of the scientific establishment. To be sure, biological databases are readily accessible and datasets may be found online that can serve as challenge problems for classification, and so forth. But that is not the same as the interactive process of designing a novel experimental program, acquiring data direct from instrumentation, cleaning and reducing it, and taking responsibility for storing it in both persistent and queryable form. Nor does classroom learning by itself, virtual or otherwise, fully prepare one for establishing real-world error models, dealing with missing data, establishing a statistical case for some result, arguing and defending scientific positions, navigating the publication process, and sundry other practical skills.

Thus, a useful adjunct to online learning in bioinformatics might be a portfolio of suggested projects based on real-world datasets that would help exercise the skills of trainees, perhaps in the context of an online community of peers. One can even imagine a future in which the use of virtual laboratories makes it possible for students to undertake mixed wet/dry studies of their own. Just as the Amazon Cloud now makes large-scale computing accessible and economically feasible without the support of a large institutional data center, the decreasing cost of sequencing technology and the synthetic biology movement are both suggestive of the possibility of analogous sorts of remote biology. Educational grants for the creation of virtual laboratories to enrich the online learning experience might be public (or philanthropic) money well spent.

Any amount of study in any context cannot substitute for immersion in the social context of science. In an online learning environment, direct interaction with peers is certainly possible after a fashion, through discussion logs and the like, but to date hasn't addressed such important educational elements as the development of public speaking skills. Perhaps the last great barrier to self-learning is the absence of an advisor, with all that implies, and of membership in a working lab. Even the most imaginative web technology will only go so far in this regard, and probably not far enough in the case of wet biology. However, the field of bioinformatics by its nature may offer the best chance for finding ways to involve distance learners directly in ongoing scientific research, and that would seem to be a worthy goal for the burgeoning online education movement.

References

1. Markoff J (18 Apr 2012) Online education venture lures cash infusion and deals with 5 top universities. The New York Times Available: http://www.nytimes.com/2012/04/18/technology/coursera-plans-to-announce-university-partners-for-online-classes.html. Accessed 16 August 2012.
- View Article
- Google Scholar
2. Means B, Toyama Y, Murphy R, Bakia M, Jones K (SRI International) (2009) Evaluation of evidence-based practices in online learning: a meta-analysis and review of online learning studies. Final Report September 2010. Washington (D.C.): Department of Education. Contract number ED-04-CO-0040 Task 0006. 66 p. Available: http://www2.ed.gov/rschstat/eval/tech/evidence-based-practices/finalreport.pdf. Accessed 16 August 2012.
3. Mayer RE (2001) Multimedia learning. New York, NY: Cambridge University Press.
4. Garrett RH, Grisham CM (2004) Biochemistry. 3rd edition. St. Paul, MN: Brooks/Cole Publishing. Available: http://www.web.virginia.edu/Heidi/home.htm
5. Strachan T, Reed A (2010) Human molecular genetics. 4th edition. New York: Garland Science. 807 p.
6. Strang G (1991) Calculus. Wellesley, MA: Wellesley-Cambridge Press. 615 p. Available: http://ocw.mit.edu/resources/res-18-001-calculus-online-textbook-spring-2005/textbook. Accessed 16 August 2012.
7. Williamson SG (1987) Top-down calculus. Rockville, MD: Computer Science Press. 429 p. Available: http://cseweb.ucsd.edu/~gill/TopDownCalcSite. Accessed 16 August 2012.
8. Kaw A, Kalu EE (2011) Numerical methods with applications. Raleigh, NC: Lulu. 740 p. Available: http://numericalmethods.eng.usf.edu/topics/textbook_index.html. Accessed 16 August 2012.
9. Strang G (2007) Computational science and engineering. Wellesley, MA: Wellesley-Cambridge Press. 713 p.
10. Krijnen WP (2009) Applied statistics for bioinformatics using R. Available: http://cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf.Accessed 16 August 2012.
11. Grinstead CM, Snell JL (1997) Introduction to probability. New York: American Mathematical Society. 510 p. Available: http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html. Accessed 16 August 2012.
12. Wasserman L (2003) All of statistics. New York: Springer. 461 p.
13. Ewens WJ, Grant GR (2001) Statistical methods in bioinformatics New York: Springer. 476.
- View Article
- Google Scholar
14. Gray RM (2010) Probability, random processes, and ergodic properties. 2nd edition. New York: Springer. 357 p. Available: http://ee.stanford.edu/~gray/arp.html. Accessed 16 August 2012.
15. Aho AV, Ullman JD (1994) Foundations of computer science. San Francisco, CA: W.H. Freeman. 786 p. Available: http://i.Stanford.edu/~ullman/focs.html. Accessed 16 August 2012.
16. Sipser M (1997) Introduction to the theory of computation. Boston, MA: PWS Publishing. 396 p.
17. Gurari E (1989) An introduction to the theory of computation. New York, NY: Computer Science Press. 314 p. Available: http://www.cse.ohio-state.edu/~gurari/theory-bk/theory-bk.html. Accessed 16 August 2012.
18. Graham RL, Knuth DE, Patashnik O (1989) Concrete mathematics. Reading, MA: Addison-Wesley. 625 p.
19. Bender EA, Williamson SG (2004) A short course in discrete mathematics. New York: Dover. 256 p. Available: http://cseweb.ucsd.edu/~gill/BWLectSite. Accessed 16 August 2012.
20. Flagolet P, Sedgewick R (2012) Analytic combinatorics. Cambridge: Cambridge University. 824 p. Available: http://ac.cs.princeton.edu/home. Accessed 16 August 2012.
21. Bender EA, Williamson SG (2006) Foundations of combinatorics with applications. New York: Dover. 480 p. Available: http://cseweb.ucsd.edu/~gill/FoundCombSite. Accessed 16 August 2012.
22. Wilf HS (2005) generatingfunctionology. 3rd edition. Natick, MA: A.K Peters/CRC Press. 245 p. Available: http://www.math.upenn.edu/~wilf/DownldGF.html. Accessed 16 August 2012.
23. Easley D, Kleinberg J (2010) Networks, crowds and markets: reasoning about a highly connected world. Cambridge, UK: Cambridge University Press. 744 p. Available: http://www.cs.cornell.edu/home/kleinber/networks%2Dbook. Accessed 16 August 2012.
24. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge, UK: Cambridge University Press. 730 p. Available: http://www.stanford.edu/~boyd/cvxbook. Accessed 16 August 2012.
25. Luke S (2009) Essentials of metaheuristics. Raleigh, NC: Lulu. 230 p. Available: http://cs.gmu.edu/~sean/book/metaheuristics. Accessed 16 August 2012.
26. Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Raleigh, NC: Lulu. 252 p. Available: http://www.gp-field-guide.org.uk. Accessed 16 August 2012.
27. Cover TM, Thomas JA (1991) Elements of information theory. New York: Wiley. 748 p.
28. Gray RM (2011) Entropy and information. 2nd edition. New York: Springer. 436 p. Available: http://ee.stanford.edu/~gray/it.html. Accessed 16 August 2012.
29. MacKay D (2003) Information theory, inference, and learning algorithms. Cambridge, UK: Cambridge University Press. 640 p. Available: http://www.inference.phy.cam.ac.uk/mackay/itila. Accessed 16 August 2012.
30. Oppenheim AV, Willsky AS, Hamid S (1996) Signals and systems (2^nd edition). Englewood Cliffs, NJ: Prentice Hall.
31. Gray RM, Davisson LD (2010) Introduction to statistical signal processing. Cambridge, UK: Cambridge University Press. 478 p. Available: http://ee.stanford.edu/~gray/sp.html. Accessed 16 August 2012.
32. Evans D (2011) Introduction to computing: explorations in language, logic, and machines. Charleston, SC: CreateSpace. 266 p. Available: http://www.computingbook.org. Accessed 16 August 2012.
33. Abelson H, Sussman GJ, Sussman J (1996) Structure and interpretation of computer programs. 2nd edition. Cambridge, MA: MIT Press. Available: http://mitpress.mit.edu/sicp/full-text/book/book.html. Accessed 16 August 2012.
34. Bates B, Sierra K (2003) Head first java: your brain on java - a learner's guide. Sebastopol, CA: O'Reilly Media.
35. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. 3rd edition. Cambridge, MA: MIT Press.
36. Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge, UK: Cambridge University Press. 556 p.
37. Jones NC, Pevzner PA (2004) An introduction to bioinformatics algorithms. Cambridge, MA: MIT Press.
38. Russell S, Norvig P (2009) Artificial intelligence: a modern approach. 3rd edition. Englewood Cliffs, NJ: Prentice Hall. 1152 p.
39. Rowe NC (1988) Artificial intelligence through prolog. 2nd edition. Englewood Cliffs, NJ: Prentice Hall. 481 p. Available: http://faculty.nps.edu/ncrowe/book/book.html. Accessed 16 August 2012.
40. Sowa JF (2000) Knowledge representation. Pacific Grove, CA: Brooks Cole Publishing. 594 p.
41. Abu-Mostafa YS, Magdon-Ismail M, Lin H-T (2012) Learning from data. Pasadena: AMLBook.
42. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edition. New York: Springer. 768 p. Available: http://www-stat.stanford.edu/~tibs/ElemStatLearn. Accessed 16 August 2012.
43. Bird S, Klein E, Loper E (2009) Natural language processing with python. Sebastopol, CA: O'Reilly Media. Available: http://www.nltk.org/book. Accessed 16 August 2012.
44. Searls DB (2012) Ten simple rules for online learning. PLoS Comp Biol 8: e1002631
- View Article
- Google Scholar

[ref1] 1. Markoff J (18 Apr 2012) Online education venture lures cash infusion and deals with 5 top universities. The New York Times Available: http://www.nytimes.com/2012/04/18/technology/coursera-plans-to-announce-university-partners-for-online-classes.html. Accessed 16 August 2012.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Means B, Toyama Y, Murphy R, Bakia M, Jones K (SRI International) (2009) Evaluation of evidence-based practices in online learning: a meta-analysis and review of online learning studies. Final Report September 2010. Washington (D.C.): Department of Education. Contract number ED-04-CO-0040 Task 0006. 66 p. Available: http://www2.ed.gov/rschstat/eval/tech/evidence-based-practices/finalreport.pdf. Accessed 16 August 2012.

[ref3] 3. Mayer RE (2001) Multimedia learning. New York, NY: Cambridge University Press.

[ref4] 4. Garrett RH, Grisham CM (2004) Biochemistry. 3rd edition. St. Paul, MN: Brooks/Cole Publishing. Available: http://www.web.virginia.edu/Heidi/home.htm

[ref5] 5. Strachan T, Reed A (2010) Human molecular genetics. 4th edition. New York: Garland Science. 807 p.

[ref6] 6. Strang G (1991) Calculus. Wellesley, MA: Wellesley-Cambridge Press. 615 p. Available: http://ocw.mit.edu/resources/res-18-001-calculus-online-textbook-spring-2005/textbook. Accessed 16 August 2012.

[ref7] 7. Williamson SG (1987) Top-down calculus. Rockville, MD: Computer Science Press. 429 p. Available: http://cseweb.ucsd.edu/~gill/TopDownCalcSite. Accessed 16 August 2012.

[ref8] 8. Kaw A, Kalu EE (2011) Numerical methods with applications. Raleigh, NC: Lulu. 740 p. Available: http://numericalmethods.eng.usf.edu/topics/textbook_index.html. Accessed 16 August 2012.

[ref9] 9. Strang G (2007) Computational science and engineering. Wellesley, MA: Wellesley-Cambridge Press. 713 p.

[ref10] 10. Krijnen WP (2009) Applied statistics for bioinformatics using R. Available: http://cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf.Accessed 16 August 2012.

[ref11] 11. Grinstead CM, Snell JL (1997) Introduction to probability. New York: American Mathematical Society. 510 p. Available: http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html. Accessed 16 August 2012.

[ref12] 12. Wasserman L (2003) All of statistics. New York: Springer. 461 p.

[ref13] 13. Ewens WJ, Grant GR (2001) Statistical methods in bioinformatics New York: Springer. 476.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref14] 14. Gray RM (2010) Probability, random processes, and ergodic properties. 2nd edition. New York: Springer. 357 p. Available: http://ee.stanford.edu/~gray/arp.html. Accessed 16 August 2012.

[ref15] 15. Aho AV, Ullman JD (1994) Foundations of computer science. San Francisco, CA: W.H. Freeman. 786 p. Available: http://i.Stanford.edu/~ullman/focs.html. Accessed 16 August 2012.

[ref16] 16. Sipser M (1997) Introduction to the theory of computation. Boston, MA: PWS Publishing. 396 p.

[ref17] 17. Gurari E (1989) An introduction to the theory of computation. New York, NY: Computer Science Press. 314 p. Available: http://www.cse.ohio-state.edu/~gurari/theory-bk/theory-bk.html. Accessed 16 August 2012.

[ref18] 18. Graham RL, Knuth DE, Patashnik O (1989) Concrete mathematics. Reading, MA: Addison-Wesley. 625 p.

[ref19] 19. Bender EA, Williamson SG (2004) A short course in discrete mathematics. New York: Dover. 256 p. Available: http://cseweb.ucsd.edu/~gill/BWLectSite. Accessed 16 August 2012.

[ref20] 20. Flagolet P, Sedgewick R (2012) Analytic combinatorics. Cambridge: Cambridge University. 824 p. Available: http://ac.cs.princeton.edu/home. Accessed 16 August 2012.

[ref21] 21. Bender EA, Williamson SG (2006) Foundations of combinatorics with applications. New York: Dover. 480 p. Available: http://cseweb.ucsd.edu/~gill/FoundCombSite. Accessed 16 August 2012.

[ref22] 22. Wilf HS (2005) generatingfunctionology. 3rd edition. Natick, MA: A.K Peters/CRC Press. 245 p. Available: http://www.math.upenn.edu/~wilf/DownldGF.html. Accessed 16 August 2012.

[ref23] 23. Easley D, Kleinberg J (2010) Networks, crowds and markets: reasoning about a highly connected world. Cambridge, UK: Cambridge University Press. 744 p. Available: http://www.cs.cornell.edu/home/kleinber/networks%2Dbook. Accessed 16 August 2012.

[ref24] 24. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge, UK: Cambridge University Press. 730 p. Available: http://www.stanford.edu/~boyd/cvxbook. Accessed 16 August 2012.

[ref25] 25. Luke S (2009) Essentials of metaheuristics. Raleigh, NC: Lulu. 230 p. Available: http://cs.gmu.edu/~sean/book/metaheuristics. Accessed 16 August 2012.

[ref26] 26. Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Raleigh, NC: Lulu. 252 p. Available: http://www.gp-field-guide.org.uk. Accessed 16 August 2012.

[ref27] 27. Cover TM, Thomas JA (1991) Elements of information theory. New York: Wiley. 748 p.

[ref28] 28. Gray RM (2011) Entropy and information. 2nd edition. New York: Springer. 436 p. Available: http://ee.stanford.edu/~gray/it.html. Accessed 16 August 2012.

[ref29] 29. MacKay D (2003) Information theory, inference, and learning algorithms. Cambridge, UK: Cambridge University Press. 640 p. Available: http://www.inference.phy.cam.ac.uk/mackay/itila. Accessed 16 August 2012.

[ref30] 30. Oppenheim AV, Willsky AS, Hamid S (1996) Signals and systems (2^nd edition). Englewood Cliffs, NJ: Prentice Hall.

[ref31] 31. Gray RM, Davisson LD (2010) Introduction to statistical signal processing. Cambridge, UK: Cambridge University Press. 478 p. Available: http://ee.stanford.edu/~gray/sp.html. Accessed 16 August 2012.

[ref32] 32. Evans D (2011) Introduction to computing: explorations in language, logic, and machines. Charleston, SC: CreateSpace. 266 p. Available: http://www.computingbook.org. Accessed 16 August 2012.

[ref33] 33. Abelson H, Sussman GJ, Sussman J (1996) Structure and interpretation of computer programs. 2nd edition. Cambridge, MA: MIT Press. Available: http://mitpress.mit.edu/sicp/full-text/book/book.html. Accessed 16 August 2012.

[ref34] 34. Bates B, Sierra K (2003) Head first java: your brain on java - a learner's guide. Sebastopol, CA: O'Reilly Media.

[ref35] 35. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. 3rd edition. Cambridge, MA: MIT Press.

[ref36] 36. Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge, UK: Cambridge University Press. 556 p.

[ref37] 37. Jones NC, Pevzner PA (2004) An introduction to bioinformatics algorithms. Cambridge, MA: MIT Press.

[ref38] 38. Russell S, Norvig P (2009) Artificial intelligence: a modern approach. 3rd edition. Englewood Cliffs, NJ: Prentice Hall. 1152 p.

[ref39] 39. Rowe NC (1988) Artificial intelligence through prolog. 2nd edition. Englewood Cliffs, NJ: Prentice Hall. 481 p. Available: http://faculty.nps.edu/ncrowe/book/book.html. Accessed 16 August 2012.

[ref40] 40. Sowa JF (2000) Knowledge representation. Pacific Grove, CA: Brooks Cole Publishing. 594 p.

[ref41] 41. Abu-Mostafa YS, Magdon-Ismail M, Lin H-T (2012) Learning from data. Pasadena: AMLBook.

[ref42] 42. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edition. New York: Springer. 768 p. Available: http://www-stat.stanford.edu/~tibs/ElemStatLearn. Accessed 16 August 2012.

[ref43] 43. Bird S, Klein E, Loper E (2009) Natural language processing with python. Sebastopol, CA: O'Reilly Media. Available: http://www.nltk.org/book. Accessed 16 August 2012.

[ref44] 44. Searls DB (2012) Ten simple rules for online learning. PLoS Comp Biol 8: e1002631
View Article
Google Scholar

[49] View Article

[50] Google Scholar