Poster Presentation Australasian RNA Biology and Biotechnology Association 2024 Conference

DEVELOPING AUTOMATED COMPUTATIONAL METHODS FOR INVESTIGATING KEY COMPONENTS OF VERTEBRATE ENHANCER SEQUENCES. (#130)

Mia Gruzin 1 2 3 , Kavitha Krishna Sudhakar 1 3 , Ted Wong 1 3 , Gavin Sutton 3 4 , Timothy Leong 1 3 , Leslie Burnett 1 2 3 5
  1. Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia
  2. School of Clinical Medicine, UNSW Medicine and Health, St Vincent’s Clinical Healthcare Campus, Darlinghurst, NSW 2010, Australia
  3. Genium Pty Ltd, Potts Point, NSW 2011, Australia
  4. School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Camperdown, NSW 2006, Australia
  5. Northern Clinical School, Faculty of Medicine and Health, University of Sydney, St Leonards, NSW 2065, Australia

Introduction:

Enhancers regulate gene expression through direct contact points. While there already exist enhancer databases for various candidate genes, the precise sequences involved in gene:enhancer contact points remains unclear. We have developed “BEES_KNEES”: Bioinformatic Exploration and Evaluation Suite of Known and New Extended Enhancer Sequences, an automated tool evaluating the location, strength, and likelihood of potential enhancer interactions for a user-specified gene.

Methodology:

Using a scalable, Docker-based automation framework, we developed a custom modular pipeline housing algorithms (Python and R). Modules already developed include: (i) metadata module, collecting information required by downstream modules; (ii) a GeneHancer module, ranking all associated enhancers; (iii) a GC content profile module; (iv) a transient transcriptome sequencing (TT-Seq) module, mapping RNA transcriptional activity; (v) a primary sequence conservation module; (vi) a topologically associating domain (TAD) module; and (vii) an epigenetics module, profiling DNA methylation and chromatin accessibility. Modules under development include: (viii) secondary sequence conservation; and (vii) a “jury” module, to weigh evidence generated by individual modules. We have also developed the capacity to select and/or limit tissue and cell type, to determine the gene:enhancer pairs common and unique to specific environments.  

Results:

In initial validation studies, BEES_KNEES identified and ranked enhancer regions, providing insights into their regulatory potential. BEES_KNEES is computationally parsimonious, offering immediate feedback on candidate enhancers. Validation using genes with established regulatory frameworks is underway to confirm broader applicability. Additional modules can be added to explore which sequences within enhancers are active and to evaluate collated evidence from each module. 

Conclusion:

We have developed an automated computational tool to assist in defining key regulatory sequences for eukaryotic genes. Further work is underway to characterise its performance and potential for wider application.