QM Logo Fabrizio

Transcription Factor Binding Site-based Alignment of Conserved Non-coding Sequences

  1. Abdollahyan, F. Smeraldi, B. Noyvert and G. Elgar, 15th European Conference on Computational Biology (ECCB), 2016


The identification and functional characterization of regulatory modules in the human genome is a challenging task. Regulatory modules act through the sequence-specific binding of transcription factors and previous studies have demonstrated that co-occurrence of transcription factor binding sites (TFBSs) in close proximity can be a good indicator of regulatory activity. In this study, we analysed the co-occurrence of TFBSs within a set of highly conserved non-coding elements (CNEs) that are associated with the regulation of early vertebrate development. From a computational point of view, analysis of the co-occurrence of TFBSs is complicated by the fact that TFBSs overlap. This rules out the use of classic alignment algorithms (that cannot handle alternative motifs in sequences) or k-mer-based approaches (that count the occurrences of motifs and would enumerate all alternative motifs indiscriminately). Our approach is fundamentally different in that we wrote each CNE as a sequence of symbols, each representing a TFBS identified within that element. We then constructed a graph representation of the CNEs which accounts for the ambiguity due to the overlapping of TFBSs and used a dynamic programming approach to find the optimal alignment between these graphs. We then computed the relative enrichment of short sequences of TFBSs in the alignments of CNEs compared to a background distribution. Our results identify a number of enriched TFBS alignments within CNEs, including a regulatory signature that has been functionally validated in this set of CNEs previously and is associated with hindbrain enhancer activity.

View online or download

Backlinks: Publications