Poster Presentation — ASN Events

Search
Speakers

Evaluation the reproducibility of m⁶A peak calls across public databases (#172)

Gavin J Sutton ¹ , Renhua Song ¹ , Fuyi Li ² ³ , Qian Liu ⁴ ⁵ , Justin J-L Wong ¹

School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
South Australian immunoGENomics Cancer Institute , The University of Adelaide, Adelaide, South Australia, Australia
Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Maryland, USA
School of Life Sciences, University of Nevada, Las Vegas, Maryland, USA

N6-methyladenosine (m⁶A) is a widely-studied modification to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases have been developed to reprocess and collate m⁶A calls across tissues, cell-types, and phenotypes, facilitating non-expert researchers to mine m⁶A in their genes-of-interest. Here, we evaluate the reproducibility and accuracy of 9 such databases. Whilst recent work has highlighted low reproducibility across experiments within a cell-type, we find even single experiments are reprocessed across databases to produce highly variable results, including a three-fold difference in the number of m⁶A peaks called, with <25% of peaks being reproduced by more than half of the databases. This is driven by parameter and algorithm choices across their pipelines. Further, many databases report peaks from less refined m⁶A-enrichment protocols, which may contribute a higher false positive rate. Ultimately, to ensure that time and resources are allocated to studying real m⁶A sites, we recommend users confirm that putative site is reproduced 1) across databases with various processing pipelines, and 2) across studies within each database, with some of those studies using refined m⁶A-enrichment protocols. For the broader bioinformatics community, however, this study provides clear observational evidence that the same input data will, in the hands of different analysis teams, produce starkly divergent output; it suggests that greater efforts should be made to ensure the reproducibility of our analyses.

Poster Presentation Australasian RNA Biology and Biotechnology Association 2024 Conference

Evaluation the reproducibility of m6A peak calls across public databases (#172)

Evaluation the reproducibility of m⁶A peak calls across public databases (#172)