TY - JOUR T1 - A Feature-Based Approach to Modeling Protein–DNA Interactions A1 - Sharon, Eilon A1 - Lubliner, Shai A1 - Segal, Eran Y1 - 2008/08/22 N2 - Author Summary Transcription factor (TF) protein binding to its DNA target sequences is a fundamental physical interaction underlying gene regulation. Characterizing the binding specificities of TFs is essential for deducing which genes are regulated by which TFs. Recently, several high-throughput methods that measure sequences enriched for TF targets genomewide were developed. Since TFs recognize relatively short sequences, much effort has been directed at developing computational methods that identify enriched subsequences (motifs) from these sequences. However, little effort has been directed towards improving the representation of motifs. Practically, available motif finding software use the position specific scoring matrix (PSSM) model, which assumes independence between different motif positions. We present an alternative, richer model, called the feature motif model (FMM), that enables the representation of a variety of sequence features and captures dependencies that exist between binding site positions. We show how FMMs explain TF binding data better than PSSMs on both synthetic and real data. We also present a motif finder algorithm that learns FMM motifs from unaligned promoter sequences and show how de novo FMMs, learned from binding data of the human TFs c-Myc and CTCF, reveal intriguing insights about their binding specificities. JF - PLOS Computational Biology JA - PLOS Computational Biology VL - 4 IS - 8 UR - https://doi.org/10.1371/journal.pcbi.1000154 SP - e1000154 EP - PB - Public Library of Science M3 - doi:10.1371/journal.pcbi.1000154 ER -