N6-methyladenosine (m6A) is a widely-studied modification to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases have been developed to reprocess and collate m6A calls across tissues, cell-types, and phenotypes, facilitating non-expert researchers to mine m6A in their genes-of-interest. Here, we evaluate the reproducibility and accuracy of 9 such databases. Whilst recent work has highlighted low reproducibility across experiments within a cell-type, we find even single experiments are reprocessed across databases to produce highly variable results, including a three-fold difference in the number of m6A peaks called, with <25% of peaks being reproduced by more than half of the databases. This is driven by parameter and algorithm choices across their pipelines. Further, many databases report peaks from less refined m6A-enrichment protocols, which may contribute a higher false positive rate. Ultimately, to ensure that time and resources are allocated to studying real m6A sites, we recommend users confirm that putative site is reproduced 1) across databases with various processing pipelines, and 2) across studies within each database, with some of those studies using refined m6A-enrichment protocols. For the broader bioinformatics community, however, this study provides clear observational evidence that the same input data will, in the hands of different analysis teams, produce starkly divergent output; it suggests that greater efforts should be made to ensure the reproducibility of our analyses.