Genohub Blog

How Many Replicates are Sufficient for Differential Gene Expression?

May 6, 2015

Written by:

In a nicely done, convincing 2 condition, 48 replicate RNA-Seq experiment, researchers from the University of Dundee aimed to answer a frequently asked question in the field and on Genohub.com, ‘How many replicates are necessary for differential gene expression (DGE)?’. In their study they examined three statistical models to see which best represented read-count distribution of genes from commonly used DGE tools.

Using the statistical power of 48 replicates they determined that inter-lane variability does not play a large role in DGE results. Assuming even loading and amplification their results showed a Poisson distribution of counts from individual genes. The authors also determined that read count distribution was consistent with a negative binomial model, an assumption in widely used tools such as edgR, DESeq, cuffdiff and baySeq. Performing a goodness-of-fit test for log-normal, negative binomial and normal distributions the authors demonstrated that inclusion of ‘bad replicates’ made results inconsistent with the statistical models they tested, complicating the interpretation of differential expression results. A bad replicate was defined as 1) one that poorly correlated with other replicates, 2) a replicate with atypical read counts, 3) one having non-uniform read depth profiles.

So how many replicates are sufficient for differential gene expression? The authors sequenced 96 mRNA samples in 7 1×50 HiSesq lanes. The cost for this on Genohub today is $28k USD. When the authors removed 6-8 bad replicates from their pool of 48 samples, their data became consistent with a negative binomial distribution. Assuming experimental variability similar to the authors, this indicates at least 6 replicates in a DGE experiment is good practice. The cost for the preparation of 6 RNA-seq libraries and sequencing is $2,500 USD.

Doing a literature search and observing client behavior on Genohub, we estimate ~80% of those studying DGE use 3 replicates in their experiments, which with dropouts and variation is unsatisfactory. A final point we’d like to make is the authors used ~11M, 1×50 reads/sample, which goes to show that with DGE, replicates can be more important than read depth. This is further discussed in our Coverage and Read Depth Guide.

At Genohub, we help consult and design sequencing experiments with users. Determining the replicates needed for a study is a common question that needs to be answered. Unfortunately, too few studies examine these fundamental elements in sequencing design. We hope this article gets the recognition it deserves.

Reference:

Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment Marek Gierliński, Christian Cole, Pietà Schofield, Nicholas J. Schurch, Alexander Sherstnev, Vijender Singh, Nicola Wrobel, Karim Gharbi, Gordon Simpson, Tom Owen-Hughes, Mark Blaxter, Geoffrey J. Barton

2 responses to “How Many Replicates are Sufficient for Differential Gene Expression?”

Alejandro Sanchez-Flores

May 7, 2015 at 11:41 am

The sequencing has never been the big problem and nowadays neither the cost. However, the price behind the sample is not even taken into account here. In the cited paper, they compared a mutant yeast against its wild type. The cost of culture media, the yeast strain or any other material used, can’t be compared to experiments using other organisms.

Suppose that you or your collaborator are working with mice. What would you say if the advice is to perform 6 replicates for 2 conditions? Despite the low cost of sequencing and library prep, the experimental design including 12 mice and maintaining them will be considerable. Now, add up more conditions…

I think the paper is interesting and I understand the need of guidelines for DGE analysis. Probably we can extend the question to “How many biological replicates I need for this model/organism”. Also, “For my organism, what can I consider a biological replicate?”. What about human samples? cancer samples?

Cheers.

LikeLike

Reply
How Many Replicates are Sufficient for Differential Gene Expression? | Genohub High Throughput Sequencing Blog | GenomicsNX – NGS Knowledge-Based

November 26, 2015 at 9:51 am

[…] Source: How Many Replicates are Sufficient for Differential Gene Expression? | Genohub High Throughput Seque… […]

LikeLike

Reply

Genohub Blog

How Many Replicates are Sufficient for Differential Gene Expression?

Sequencing Small RNA and MicroRNA

Fungal Sequencing – ITS vs. 18S

The Ultima Genomics UG 100: Challenging Illumina’s Dominance in High-Throughput Sequencing

Low-Input and Degraded FFPE Samples in NGS: Choosing the Right Library Prep Kit

PacBio Revio: High-Quality Long Reads at an Affordable Price

How Many Replicates are Sufficient for Differential Gene Expression?

Share this:

2 responses to “How Many Replicates are Sufficient for Differential Gene Expression?”

Leave a comment Cancel reply

Sequencing Small RNA and MicroRNA

Fungal Sequencing – ITS vs. 18S

The Ultima Genomics UG 100: Challenging Illumina’s Dominance in High-Throughput Sequencing

Low-Input and Degraded FFPE Samples in NGS: Choosing the Right Library Prep Kit

PacBio Revio: High-Quality Long Reads at an Affordable Price