Single Cell Experiments: Spike-in
I shared it on my LinkedIn page first. Although there are many protocols and guidelines on sequencing, if you are not preparing the sample for the sequencing machine to set up, you might not know what spike-in is. No worries, I am here to explain this. I need to learn these details (including troubleshooting the sequencing itself, and still learning) because my PhD experience covers molecular assay development and low-bulk optimization to sample preparation for sequencing (including maintaining the Miseq device (Illumina)).
Spike-ins are external nucleic acids. Depending on the purpose, they can be RNA- or DNA-based. They are used mainly for normalization, quantification, and sequencing calibration/control.
Let’s learn the details of the commonly used spike-ins in transcriptomics sequencing: ERCC and PhiX.
ERCC
Since internal controls such as housekeeping genes might vary expression and primer affinities might differ, researchers came up with the idea of external controls. The External RNA Controls Consortium (ERCC) is an RNA spike-in with known sequence (with minimal sequence homology with endogenous transcripts of most of the popular organisms), sense strand, and concentrations. It is added to RNA samples as a reference (while they were still RNA, not converted to cDNA, etc.) to track the variation. They are poly-A-tailed to be captured just like mRNA.
Where to add ERCC spike-in
It is particularly useful for detecting the sensitivity (e.g., for low abundance/limited number transcripts), eliminating the technical variability across samples and platforms due to initial sample input or sequencing depth, linearity of amplifications, strandedness accuracy of alignments, and absolute quantification. It might not only be used in Next Generation Sequencing (NGS) but also some microarrays and RT-PCRs.
You can use custom-designed RNA spike-ins as an alternative.
Do we always need ERCC as a control? Not exactly. In most cases of RNA-seq data differential gene expression analysis, we rely on relative gene abundance for the comparison. We use sequencing depth normalizations across samples in most cases. If you need absolute gene quantification, then yes. Moreover, you need to consider bias can be introduced due to ERCC spike-in itself (e.g., due to manual addition difference, low diversity, an additional layer of complexity, and the requirement for the calibration for varying abundance).
PhiX
PhiX is a bacteriophage with single-stranded DNA. It has a balanced base diversity (~45% G/C and ~55% A/T). It is used to check the overall quality of Illumina sequencing runs itself (It has nothing to do with sample preparation and library normalization).
Where to add PhiX spike-in
Sometimes, the library you are preparing is low diversity. For example, I was trying to figure out total RNA sequencing in various formats, one of them was using miRNA with external polyA extension from another species (e.g., cel-mir-37). If you try to sequence such a sample with a small number of cDNA-based library or imbalanced base concentrations (A/T/C/G) due to unique library preparation, Illumina machines fail. You might need to add higher concentrations of PhiX to increase the diversity and balanced nucleotide ratio up to 50%. Even if you have enough diversity in the library, you might still need to monitor the quality of sequencing, base calling (e.g., Q30 scores), and estimate some sequencing by synthesis (SBS) metrics in this case, as low as 1% would be enough.
Fun fact: PhiX was the first DNA-based genome to be sequenced by Sanger.
In summary:
Further Reading/ References:
- ERCC, https://www.nist.gov/programs-projects/external-rna-controls-consortium and the initial release: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-6-150
- ERCC Thermo Fisher detailed manual, https://assets.thermofisher.com/TFS-Assets/LSG/manuals/cms_086340.pdf , and Lexogen detailed sequencing manual with spike-in details: https://www.lexogen.com/wp-content/uploads/2020/01/113UG227V0100_QuantSeq-FWD_UDI_Kits_2019-12-27.pdf
- Where and how to use ERCC (briefly), https://genomics.rwth-aachen.de/site/ercc/, https://pmc.ncbi.nlm.nih.gov/articles/PMC4760223/#sec7
- PhiX, https://en.wikipedia.org/wiki/Phi_X_174, and more about its sequencing applications: https://knowledge.illumina.com/library-preparation/general/library-preparation-general-reference_material-list/000001545, https://www.illumina.com/products/by-type/sequencing-kits/cluster-gen-sequencing-reagents/phix-control-v3.html
- Where to be careful about PhiX, https://environmentalmicrobiome.biomedcentral.com/articles/10.1186/1944-3277-10-18#:~:text=Due%20to%20its%20small%2C%20well,40%25%20for%20low%20diversity%20samples.
- Bonus: Unique Molecular Identifiers (UMI) to be spiked-in: https://www.nature.com/articles/s41592-022-01446-x