Introduction:
Enhancers regulate gene expression through direct contact points. While there already exist enhancer databases for various candidate genes, the precise sequences involved in gene:enhancer contact points remains unclear. We have developed “BEES_KNEES”: Bioinformatic Exploration and Evaluation Suite of Known and New Extended Enhancer Sequences, an automated tool evaluating the location, strength, and likelihood of potential enhancer interactions for a user-specified gene.
Methodology:
Using a scalable, Docker-based automation framework, we developed a custom modular pipeline housing algorithms (Python and R). Modules already developed include: (i) metadata module, collecting information required by downstream modules; (ii) a GeneHancer module, ranking all associated enhancers; (iii) a GC content profile module; (iv) a transient transcriptome sequencing (TT-Seq) module, mapping RNA transcriptional activity; (v) a primary sequence conservation module; (vi) a topologically associating domain (TAD) module; and (vii) an epigenetics module, profiling DNA methylation and chromatin accessibility. Modules under development include: (viii) secondary sequence conservation; and (vii) a “jury” module, to weigh evidence generated by individual modules. We have also developed the capacity to select and/or limit tissue and cell type, to determine the gene:enhancer pairs common and unique to specific environments.
Results:
In initial validation studies, BEES_KNEES identified and ranked enhancer regions, providing insights into their regulatory potential. BEES_KNEES is computationally parsimonious, offering immediate feedback on candidate enhancers. Validation using genes with established regulatory frameworks is underway to confirm broader applicability. Additional modules can be added to explore which sequences within enhancers are active and to evaluate collated evidence from each module.
Conclusion:
We have developed an automated computational tool to assist in defining key regulatory sequences for eukaryotic genes. Further work is underway to characterise its performance and potential for wider application.