Mycoplasma Contamination in your Sequencing Data

mycoplasma contamination

Mycoplasma, the bane of any cell culture lab’s existence is a genus of bacteria characterized by a lack of a cell wall.  With a relatively small genome, mycoplasma have limited biosynthetic capabilities, requiring a host to efficiently replicate. Inspired by a bout of mycoplasma contamination in their own lab, Anthony O Olarerin-George and John B Hogenesch from the University of Pennsylvania recently set out to determine how widespread mycoplasma contamination was in other labs by screening RNA-seq data deposited in the NCBI Sequence Read Archive (1). Their study estimates that ~ 11% of NCBI’s Gene Expression Omnibus (GEO) projects between 2012 and 2013 contain at least ≥ 100 reads / million reads mapping to mycoplasma’s small 0.6 Mb genome. They also reference a recent study (2) which suggests that 7% of the samples from the 1,000 Genomes project are contaminated. Bad news if you’ve recently completed a large study and are wondering why you have so many unmapped reads. While most of these are likely from regions of the genome that haven’t been sequenced, reads mapping to mycoplasma should be taken seriously as they can affect the expression of thousands of genes and slow cellular growth.

Preventing contamination in the first place along with routine monitoring is essential, but if you’ve already completed the sequencing end of your project you can start aligning your data to several completed mycoplasma genomes.

With recent drops in cost, routine sequencing of cell culture samples has become more prevalent. If you’re interested in testing your cultures, start by searching for sequencing services and providers on Genohub

1) Assessing the prevalence of mycoplasma contamination in cell culture via a survey of NCBI’s RNA-seq archive. Anthony O Olarerin-George, John B Hogenesch doi:

2) Mycoplasma contamination in the 1000 Genomes Project. William B Langdon