Sequencing Design Part I: Replication, Randomization and Multiplexing

Replicates

Replicates are essential in any biological experiment, the same goes for high throughput sequencing. Samples are subject to variation thus making biological replicates important for statistical significance and identifying sources of variation. Despite the desire to cut back on replicates to reduce cost, it’s important to remember that there are many factors which may cause a sequencing run or sample to fail. If you don’t have sufficient replicates, you may have to repeat your sequencing run. In general we recommend at least 4 biological replicates for every experiment.

Randomization

Randomization is a process of assigning biological samples at random to groups or to different groups within an experiment. This reduces bias by equalizing independent variables that have not been accounted for in the experimental design. Randomization reduces instrument effect, systemic bias and the potential for the occurrence and effect of confounding factors (operational, procedural and person confound). The two main sources of variation that contribute to confounding factors are 1) library effects that occur due to reverse transcription and amplification and 2) subunit effects (sequencing lanes [Illumina and SOLiD], chips [Ion], plates [Roche 454]) such as poor base calling, bad sequencing cycles. We recommend randomizing your samples by making sure each sequencing subunit contains samples from both control and experimental groups. This can be done by barcoding or indexing your samples to allow for multiplexing.

Multiplexing

DNA (or cDNA fragments made from RNA) can be labelled with sample specific sequences or barcodes that allow multiple samples to be included in the same sequencing reaction. Multiplexing allows for proper sample identification after the sequencing run is complete. Multiplexing can be used to create balanced, pooled experimental designs. If you have 8 samples that require the sequencing output obtained from 3 Illumina lanes, subunit effects can be eliminated by multiplexing all 8 samples and loading each 8 sample multiplexed pool into all 8 lanes. All subunit (lane effects) will be the same for each sample. Multiplexing also has the advantage of eliminating phasing issues related to low multiplex pools. Low multiplexed pools can result in no signal in one of the color channels of an index read. The image registration might fail and no base will be called from that cycle. If a base isn’t called then samples will not be able to be demultiplexed.

To conclude, the best way to ensure reproducibility is to include independent biological replicates that are randomly assigned to a sequencing subunit (flow cell lane, chip or plate). This can be done by multiplexing your samples using sample indices or barcodes. Multiplexing by adding a barcode during the ligation step of library prep will eliminate 1) library (amplification) and 2) subunit effects, confounding factors in sequencing.

If you’re new to high throughput sequencing and have questions about how you should design your sequencing run, email us to take advantage of our free consultation. AtGenohub we’re always happy to discuss your sequencing project, regardless of whether you use our service.

Leave a comment