ITS Sequencing Pipeline Comparison: Paired-End vs Forward-Only
Overview
Following the initial analysis of A Preliminary Look at Fungal Communities in Necrophagous Insects, I reprocessed the ITS1 sequencing data using forward reads only instead of paired-end merging. This comparison evaluates which approach is more appropriate for this dataset.
Why This Comparison?
ITS (Internal Transcribed Spacer) sequencing presents unique challenges for paired-end approaches:
- Variable ITS1 length: Ranges from ~50-500 bp, making overlap unpredictable
- Primer-to-primer distance: ITS1F to ITS2 can exceed 600 bp in some species
- Poor overlap quality: When reads don’t fully overlap, merging introduces errors
The original analysis used:
- Paired-end ITS1F/ITS2 sequencing (2×300 bp)
- Cutadapt primer trimming
- DADA2 paired-end merging
- ITSxpress attempted but failed
This reprocessing tests:
- Forward reads only (single-end)
- Same Cutadapt primer trimming
- DADA2 single-end denoising
- ITSxpress successful ITS1 extraction
Key Findings:
- Forward-only processing retains 90-96% of reads vs 50-75% for paired-end
- Forward-only approach detected 1,640 ASVs vs 397 for paired-end
- ITSxpress successfully extracted ITS1 regions from forward reads (failed on paired-end)
- Higher read retention may improve statistical power for downstream analyses
Table of Contents
Forward-Only Processing Quality
Before comparing the two approaches, let’s examine the quality metrics for the forward-only processing across both sequencing runs.
Figure 1: Read retention through forward-only pipeline stages. Each line represents one sample, colored by sequencing run. Stages: (1) Input, (2) Filtered, (3) Denoised, (4) Non-chimeric.
Figure 2: Distribution of final retention rates across samples. Both runs show consistently high retention (>90%).
Figure 4: Read count distribution at each processing stage (box plots with individual samples as dots). Y-axis labels show typical sequence lengths at each stage. Small losses from demux → primer trim → ITSxpress, with final variable ITS1 lengths of 100-350 bp.
Processing Pipeline Comparison
Original Approach (Paired-End)
Click to view paired-end workflow
# 1. Import paired-end sequences
qiime tools import \
--type EMPPairedEndSequences \
--input-path Run1 \
--output-path emp_paired_end_run1.qza
# 2. Demultiplex
qiime demux emp-paired \
--i-seqs emp_paired_end_run1.qza \
--m-barcodes-file demux.tsv \
--o-per-sample-sequences demux_paired_run1.qza
# 3. Trim primers (both F and R reads)
qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux_paired_run1.qza \
--p-front-f CTTGGTCATTTAGAGGAAGTAA \
--p-front-r GCTGCGTTCTTCATCGATGC \
--p-adapter-f GCATCGATGAAGAACGCAGC \
--p-adapter-r TTACTTCCTCTAAATGACCAAG \
--o-trimmed-sequences trimmed_seqs_run1.qza
# 4. ITSxpress (FAILED - removed too much sequence)
# 5. DADA2 paired-end denoising WITH MERGING
qiime dada2 denoise-paired \
--i-demultiplexed-seqs trimmed_seqs_run1.qza \
--p-trunc-len-f 0 \
--p-trunc-len-r 0 \
--p-max-ee-f 2 \
--p-max-ee-r 2 \
--p-min-overlap 12 \
--p-chimera-method consensus \
--o-table table_run1.qza
Critical step: denoise-paired merges forward and reverse reads, which causes substantial read loss when overlap is poor.
New Approach (Forward-Only)
Click to view forward-only workflow
# 1. Import single-end sequences (forward reads only)
qiime tools import \
--type EMPSingleEndSequences \
--input-path Run1_forward \
--output-path emp_single_fwd_run1.qza
# 2. Demultiplex
qiime demux emp-single \
--i-seqs emp_single_fwd_run1.qza \
--m-barcodes-file demux.tsv \
--o-per-sample-sequences demux_single_fwd_run1.qza
# 3. Trim primers (forward read + adapter)
qiime cutadapt trim-single \
--i-demultiplexed-sequences demux_single_fwd_run1.qza \
--p-front CTTGGTCATTTAGAGGAAGTAA \
--p-adapter GCATCGATGAAGAACGCAGC \
--o-trimmed-sequences trimmed_fwd_run1.qza
# 4. ITSxpress ITS1 extraction (SUCCESSFUL)
qiime q2-itsxpress trim-single \
--i-per-sample-sequences trimmed_fwd_run1.qza \
--p-region ITS1 \
--p-taxa ALL \
--o-trimmed its1_fwd_run1.qza
# 5. DADA2 single-end denoising (NO MERGING)
qiime dada2 denoise-single \
--i-demultiplexed-seqs its1_fwd_run1.qza \
--p-trunc-len 0 \
--p-max-ee 2 \
--p-chimera-method consensus \
--o-table table_fwd_run1.qza
Critical difference: denoise-single processes forward reads only, avoiding the merging bottleneck entirely.
Read Retention Analysis
Per-Sample Read Retention
The most dramatic difference between approaches is read retention through the DADA2 pipeline.
Figure 5: Read retention comparison for Run 1. Each point represents one sample. Forward-only approach (blue) retains 90-96% of reads, while paired-end (orange) retains only 50-75% due to merging losses.
Example: Sample insect_100
| Step | Paired-End (Run1) | Forward-Only (Run1) |
|---|---|---|
| Input | 4,896 | 6,185 |
| Filtered | 4,320 (88.2%) | 6,053 (97.9%) |
| Denoised | 4,232 (86.4%) | 5,943 (96.1%) |
| Merged | 3,554 (72.6%) | N/A |
| Non-chimeric | 3,554 (72.6%) | 5,943 (96.1%) |
ITSxpress Performance
Paired-End Attempt (Failed)
The original analysis attempted ITSxpress on paired-end reads after primer trimming:
qiime q2-itsxpress trim-pair-output-unmerged \
--i-per-sample-sequences trimmed_seqs_run1.qza \
--p-region ITS1 \
--p-taxa ALL \
--o-trimmed its1_seqs_run1.qza
Result: Extracted sequences were too short (<50 bp in most cases), making them unusable for DADA2 and taxonomy assignment.
Likely cause: ITSxpress’s HMM models require well-aligned sequences. When forward and reverse reads don’t overlap well, ITSxpress may misidentify ITS1 boundaries.
Forward-Only Attempt (Success)
Using forward reads only, ITSxpress successfully extracted ITS1 regions:
qiime q2-itsxpress trim-single \
--i-per-sample-sequences trimmed_fwd_run1.qza \
--p-region ITS1 \
--p-taxa ALL \
--o-trimmed its1_fwd_run1.qza
Sequence length distribution after ITSxpress:
Figure 23: ITS1 sequence length distribution after ITSxpress extraction (forward-only). Lengths range from 100-350 bp, consistent with expected ITS1 variability.
ASV Detection and Diversity
Total ASV Counts
The higher read retention translates directly to increased ASV detection:
| Approach | Total ASVs | Fungal ASVs | Mean ASVs per Sample |
|---|---|---|---|
| Paired-End | 397 | ~350 | 45.2 ± 18.6 |
| Forward-Only | 1,640 | 1,640 (all fungi) | 74.7 ± 50.2 |
4.1× increase in total ASV detection with forward-only processing.
Alpha Diversity Comparison
Figure 7a: Alpha diversity metrics for paired-end approach (2x2 grid). Shannon diversity, observed richness, Pielou's evenness, and Chao1 richness.
Figure 7b: Alpha diversity metrics for forward-only approach (2x2 grid). Forward-only approach shows higher richness and diversity across all metrics.
Beta Diversity Comparison
Figure 8: Bray-Curtis PCoA ordination comparing approaches. Left: Paired-end. Right: Forward-only. Both show clear separation between beetle and fly samples, with similar clustering patterns.
PERMANOVA Results:
| Approach | Factor | Distance | R² | p-value |
|---|---|---|---|---|
| Paired-End | Insect Species | Bray-Curtis | 0.122 | < 0.001* |
| Forward-Only | Insect Species | Bray-Curtis | 0.107 | < 0.001* |
| Paired-End | Insect Species | Jaccard | 0.088 | < 0.001* |
| Forward-Only | Insect Species | Jaccard | 0.060 | < 0.001* |
Both approaches detect significant differences between insect species, with similar effect sizes.
Differential Abundance Analysis
Both approaches identified ASVs that are significantly more abundant in one insect species compared to the other. The volcano plots below show the results of DESeq2 differential abundance testing:
Figure 9: Differential abundance comparison (volcano plots). Left: Paired-end. Right: Forward-only. Each point represents an ASV, colored by significance (padj < 0.05). Positive log₂ fold change indicates higher abundance in C. macellaria (flies), negative indicates higher in N. surinamensis (beetles). Horizontal dashed line shows significance threshold (padj = 0.05).
Key observations:
- Forward-only detected many more differentially abundant ASVs due to higher overall ASV counts
- Both approaches identify the same major patterns (e.g., Cladosporium enriched in flies, Phallus in beetles)
- Forward-only provides greater power to detect subtle differences in rare/low-abundance taxa
- The core biological signal is preserved across both approaches
Figure 10: Differential abundance comparison excluding Cladosporium and Phallus to better visualize remaining taxa. Left: Paired-end. Right: Forward-only.
Taxonomy Comparison
Genus-Level Relative Abundance
Do the two approaches identify similar dominant genera despite different ASV counts?
Figure 11: Top 15 most abundant genera in each approach. Bar height shows mean relative abundance across all samples. Despite 4× difference in ASV counts, both approaches identify similar dominant taxa.
Sample-Level Genus Composition
Figure 14: Genus-level relative abundance for beetle abdomen samples. Left: Paired-end approach. Right: Forward-only approach. Both show similar composition dominated by Rhodotorula, Hasegawaea, and Basidiobolus.
Figure 15: Genus-level relative abundance for beetle head-thorax samples. Left: Paired-end approach. Right: Forward-only approach. Similar patterns with dominant Rhodotorula and Symmetrospora.
Figure 16: Genus-level relative abundance for fly samples. Left: Paired-end approach. Right: Forward-only approach. Consistent Cladosporium dominance across both approaches.
Mean Relative Abundance by Insect Species and Body Part
Figure 17: Mean relative abundance of top 10 genera (stacked bars) grouped by insect species and body part. Left: Paired-end approach. Right: Forward-only approach. Bars show only the top 10 most abundant genera; missing portions represent remaining genera not shown.
Heatmap: Top 30 Genera Across Samples
Figure 18: Heatmap showing log₁₀-transformed genus-level relative abundance (all ASVs aggregated per genus) for top 30 genera across all samples. Top row: Paired-end approach for beetles (left) and flies (right). Bottom row: Forward-only approach for beetles (left) and flies (right). Darker red indicates higher abundance. Each panel shows only genera detected in that dataset. Bold genus names indicate genera detected by both approaches.
Decomposer-Associated Fungi Prevalence
Figure 19: Prevalence of decomposer-associated fungal genera. Left: Beetles (N. surinamensis). Right: Flies (C. macellaria). Both approaches detect similar prevalence patterns, with beetles showing high prevalence of Clathrus, Phallus, and Rhodotorula; flies dominated by Cladosporium.
Prevalence of decomposer-associated fungi in beetles (N. surinamensis):
| Genus | Paired-End | Forward-Only | Change |
|---|---|---|---|
| Clathrus | 31% | 97% | +66% |
| Phallus | 35% | 95% | +60% |
| Cladosporium | 85% | 85% | 0% |
| Rhodotorula | 42% | 69% | +27% |
| Hasegawaea | 23% | 64% | +41% |
Forward-only processing detects substantially higher prevalence of stinkhorn fungi (Phallus, Clathrus) and yeasts (Rhodotorula, Hasegawaea) in beetles, while maintaining similar detection rates for ubiquitous molds like Cladosporium.
The stacked bar charts below show relative abundances of curated decomposer-associated fungal genera across samples, comparing both processing approaches:
Figure 20: Relative abundance of curated fungal genera in beetle abdomen samples. Left: Paired-end approach. Right: Forward-only approach.
Figure 21: Relative abundance of curated fungal genera in beetle head-thorax samples. Left: Paired-end approach. Right: Forward-only approach.
Figure 22: Relative abundance of curated fungal genera in C. macellaria (fly) samples. Top: Paired-end approach. Bottom: Forward-only approach.
Recommendations
When to Use Forward-Only ITS Processing
Based on this comparison, forward-only processing is preferable when:
- Poor read overlap: ITS1F to ITS2 distance exceeds 600 bp in target organisms
- Maximizing read retention: Studies with low biomass or limited sequencing depth
- ITSxpress extraction: HMM-based ITS region extraction works better on single-end reads
- Rare taxa detection: Higher ASV counts improve detection of low-abundance species
When to Use Paired-End ITS Processing
Paired-end merging may still be beneficial when:
- Short ITS1 regions: Target organisms have ITS1 <400 bp (good overlap expected)
- Error correction priority: Merged paired reads have lower error rates than single reads
- Reference database constraints: Databases contain merged ITS1-5.8S-ITS2 sequences