ITS Sequencing Pipeline Comparison: Paired-End vs Forward-Only

Overview

Following the initial analysis of A Preliminary Look at Fungal Communities in Necrophagous Insects, I reprocessed the ITS1 sequencing data using forward reads only instead of paired-end merging. This comparison evaluates which approach is more appropriate for this dataset.

Why This Comparison?

ITS (Internal Transcribed Spacer) sequencing presents unique challenges for paired-end approaches:

Variable ITS1 length: Ranges from ~50-500 bp, making overlap unpredictable
Primer-to-primer distance: ITS1F to ITS2 can exceed 600 bp in some species
Poor overlap quality: When reads don’t fully overlap, merging introduces errors

The original analysis used:

Paired-end ITS1F/ITS2 sequencing (2×300 bp)
Cutadapt primer trimming
DADA2 paired-end merging
ITSxpress attempted but failed

This reprocessing tests:

Forward reads only (single-end)
Same Cutadapt primer trimming
DADA2 single-end denoising
ITSxpress successful ITS1 extraction

Key Findings:

Forward-only processing retains 90-96% of reads vs 50-75% for paired-end
Forward-only approach detected 1,640 ASVs vs 397 for paired-end
ITSxpress successfully extracted ITS1 regions from forward reads (failed on paired-end)
Higher read retention may improve statistical power for downstream analyses

Forward-Only Quality Pipeline Comparison Read Retention ASV Detection & Diversity Taxonomy ITSxpress Recommendations

Forward-Only Processing Quality

Before comparing the two approaches, let’s examine the quality metrics for the forward-only processing across both sequencing runs.

Figure 1: Read retention through forward-only pipeline stages. Each line represents one sample, colored by sequencing run. Stages: (1) Input, (2) Filtered, (3) Denoised, (4) Non-chimeric.

Figure 2: Distribution of final retention rates across samples. Both runs show consistently high retention (>90%).

Figure 4: Read count distribution at each processing stage (box plots with individual samples as dots). Y-axis labels show typical sequence lengths at each stage. Small losses from demux → primer trim → ITSxpress, with final variable ITS1 lengths of 100-350 bp.

Processing Pipeline Comparison

Original Approach (Paired-End)

Click to view paired-end workflow

# 1. Import paired-end sequences
qiime tools import \
  --type EMPPairedEndSequences \
  --input-path Run1 \
  --output-path emp_paired_end_run1.qza

# 2. Demultiplex
qiime demux emp-paired \
  --i-seqs emp_paired_end_run1.qza \
  --m-barcodes-file demux.tsv \
  --o-per-sample-sequences demux_paired_run1.qza

# 3. Trim primers (both F and R reads)
qiime cutadapt trim-paired \
  --i-demultiplexed-sequences demux_paired_run1.qza \
  --p-front-f CTTGGTCATTTAGAGGAAGTAA \
  --p-front-r GCTGCGTTCTTCATCGATGC \
  --p-adapter-f GCATCGATGAAGAACGCAGC \
  --p-adapter-r TTACTTCCTCTAAATGACCAAG \
  --o-trimmed-sequences trimmed_seqs_run1.qza

# 4. ITSxpress (FAILED - removed too much sequence)

# 5. DADA2 paired-end denoising WITH MERGING
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs trimmed_seqs_run1.qza \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 0 \
  --p-max-ee-f 2 \
  --p-max-ee-r 2 \
  --p-min-overlap 12 \
  --p-chimera-method consensus \
  --o-table table_run1.qza

Critical step: denoise-paired merges forward and reverse reads, which causes substantial read loss when overlap is poor.

New Approach (Forward-Only)

Click to view forward-only workflow

# 1. Import single-end sequences (forward reads only)
qiime tools import \
  --type EMPSingleEndSequences \
  --input-path Run1_forward \
  --output-path emp_single_fwd_run1.qza

# 2. Demultiplex
qiime demux emp-single \
  --i-seqs emp_single_fwd_run1.qza \
  --m-barcodes-file demux.tsv \
  --o-per-sample-sequences demux_single_fwd_run1.qza

# 3. Trim primers (forward read + adapter)
qiime cutadapt trim-single \
  --i-demultiplexed-sequences demux_single_fwd_run1.qza \
  --p-front CTTGGTCATTTAGAGGAAGTAA \
  --p-adapter GCATCGATGAAGAACGCAGC \
  --o-trimmed-sequences trimmed_fwd_run1.qza

# 4. ITSxpress ITS1 extraction (SUCCESSFUL)
qiime q2-itsxpress trim-single \
  --i-per-sample-sequences trimmed_fwd_run1.qza \
  --p-region ITS1 \
  --p-taxa ALL \
  --o-trimmed its1_fwd_run1.qza

# 5. DADA2 single-end denoising (NO MERGING)
qiime dada2 denoise-single \
  --i-demultiplexed-seqs its1_fwd_run1.qza \
  --p-trunc-len 0 \
  --p-max-ee 2 \
  --p-chimera-method consensus \
  --o-table table_fwd_run1.qza

Critical difference: denoise-single processes forward reads only, avoiding the merging bottleneck entirely.

Read Retention Analysis

Per-Sample Read Retention

The most dramatic difference between approaches is read retention through the DADA2 pipeline.

Figure 5: Read retention comparison for Run 1. Each point represents one sample. Forward-only approach (blue) retains 90-96% of reads, while paired-end (orange) retains only 50-75% due to merging losses.

Example: Sample `insect_100`

Step	Paired-End (Run1)	Forward-Only (Run1)
Input	4,896	6,185
Filtered	4,320 (88.2%)	6,053 (97.9%)
Denoised	4,232 (86.4%)	5,943 (96.1%)
Merged	3,554 (72.6%)	N/A
Non-chimeric	3,554 (72.6%)	5,943 (96.1%)

ITSxpress Performance

Paired-End Attempt (Failed)

The original analysis attempted ITSxpress on paired-end reads after primer trimming:

qiime q2-itsxpress trim-pair-output-unmerged \
  --i-per-sample-sequences trimmed_seqs_run1.qza \
  --p-region ITS1 \
  --p-taxa ALL \
  --o-trimmed its1_seqs_run1.qza

Result: Extracted sequences were too short (<50 bp in most cases), making them unusable for DADA2 and taxonomy assignment.

Likely cause: ITSxpress’s HMM models require well-aligned sequences. When forward and reverse reads don’t overlap well, ITSxpress may misidentify ITS1 boundaries.

Forward-Only Attempt (Success)

Using forward reads only, ITSxpress successfully extracted ITS1 regions:

qiime q2-itsxpress trim-single \
  --i-per-sample-sequences trimmed_fwd_run1.qza \
  --p-region ITS1 \
  --p-taxa ALL \
  --o-trimmed its1_fwd_run1.qza

Sequence length distribution after ITSxpress:

Figure 23: ITS1 sequence length distribution after ITSxpress extraction (forward-only). Lengths range from 100-350 bp, consistent with expected ITS1 variability.

ASV Detection and Diversity

Total ASV Counts

The higher read retention translates directly to increased ASV detection:

Approach	Total ASVs	Fungal ASVs	Mean ASVs per Sample
Paired-End	397	~350	45.2 ± 18.6
Forward-Only	1,640	1,640 (all fungi)	74.7 ± 50.2

4.1× increase in total ASV detection with forward-only processing.

Alpha Diversity Comparison

Figure 7a: Alpha diversity metrics for paired-end approach (2x2 grid). Shannon diversity, observed richness, Pielou's evenness, and Chao1 richness.

Figure 7b: Alpha diversity metrics for forward-only approach (2x2 grid). Forward-only approach shows higher richness and diversity across all metrics.

Beta Diversity Comparison

Figure 8: Bray-Curtis PCoA ordination comparing approaches. Left: Paired-end. Right: Forward-only. Both show clear separation between beetle and fly samples, with similar clustering patterns.

PERMANOVA Results:

Approach	Factor	Distance	R²	p-value
Paired-End	Insect Species	Bray-Curtis	0.122	< 0.001*
Forward-Only	Insect Species	Bray-Curtis	0.107	< 0.001*
Paired-End	Insect Species	Jaccard	0.088	< 0.001*
Forward-Only	Insect Species	Jaccard	0.060	< 0.001*

Both approaches detect significant differences between insect species, with similar effect sizes.

Differential Abundance Analysis

Both approaches identified ASVs that are significantly more abundant in one insect species compared to the other. The volcano plots below show the results of DESeq2 differential abundance testing:

Figure 9: Differential abundance comparison (volcano plots). Left: Paired-end. Right: Forward-only. Each point represents an ASV, colored by significance (padj < 0.05). Positive log₂ fold change indicates higher abundance in C. macellaria (flies), negative indicates higher in N. surinamensis (beetles). Horizontal dashed line shows significance threshold (padj = 0.05).

Key observations:

Forward-only detected many more differentially abundant ASVs due to higher overall ASV counts
Both approaches identify the same major patterns (e.g., Cladosporium enriched in flies, Phallus in beetles)
Forward-only provides greater power to detect subtle differences in rare/low-abundance taxa
The core biological signal is preserved across both approaches

Figure 10: Differential abundance comparison excluding Cladosporium and Phallus to better visualize remaining taxa. Left: Paired-end. Right: Forward-only.

Taxonomy Comparison

Genus-Level Relative Abundance

Do the two approaches identify similar dominant genera despite different ASV counts?

Figure 11: Top 15 most abundant genera in each approach. Bar height shows mean relative abundance across all samples. Despite 4× difference in ASV counts, both approaches identify similar dominant taxa.

Sample-Level Genus Composition

Figure 14: Genus-level relative abundance for beetle abdomen samples. Left: Paired-end approach. Right: Forward-only approach. Both show similar composition dominated by Rhodotorula, Hasegawaea, and Basidiobolus.

Figure 15: Genus-level relative abundance for beetle head-thorax samples. Left: Paired-end approach. Right: Forward-only approach. Similar patterns with dominant Rhodotorula and Symmetrospora.

Figure 16: Genus-level relative abundance for fly samples. Left: Paired-end approach. Right: Forward-only approach. Consistent Cladosporium dominance across both approaches.

Mean Relative Abundance by Insect Species and Body Part

Figure 17: Mean relative abundance of top 10 genera (stacked bars) grouped by insect species and body part. Left: Paired-end approach. Right: Forward-only approach. Bars show only the top 10 most abundant genera; missing portions represent remaining genera not shown.

Heatmap: Top 30 Genera Across Samples

Figure 18: Heatmap showing log₁₀-transformed genus-level relative abundance (all ASVs aggregated per genus) for top 30 genera across all samples. Top row: Paired-end approach for beetles (left) and flies (right). Bottom row: Forward-only approach for beetles (left) and flies (right). Darker red indicates higher abundance. Each panel shows only genera detected in that dataset. Bold genus names indicate genera detected by both approaches.

Decomposer-Associated Fungi Prevalence

Figure 19: Prevalence of decomposer-associated fungal genera. Left: Beetles (N. surinamensis). Right: Flies (C. macellaria). Both approaches detect similar prevalence patterns, with beetles showing high prevalence of Clathrus, Phallus, and Rhodotorula; flies dominated by Cladosporium.

Prevalence of decomposer-associated fungi in beetles (N. surinamensis):

Genus	Paired-End	Forward-Only	Change
Clathrus	31%	97%	+66%
Phallus	35%	95%	+60%
Cladosporium	85%	85%	0%
Rhodotorula	42%	69%	+27%
Hasegawaea	23%	64%	+41%

Forward-only processing detects substantially higher prevalence of stinkhorn fungi (Phallus, Clathrus) and yeasts (Rhodotorula, Hasegawaea) in beetles, while maintaining similar detection rates for ubiquitous molds like Cladosporium.

The stacked bar charts below show relative abundances of curated decomposer-associated fungal genera across samples, comparing both processing approaches:

Figure 20: Relative abundance of curated fungal genera in beetle abdomen samples. Left: Paired-end approach. Right: Forward-only approach.

Figure 21: Relative abundance of curated fungal genera in beetle head-thorax samples. Left: Paired-end approach. Right: Forward-only approach.

Figure 22: Relative abundance of curated fungal genera in C. macellaria (fly) samples. Top: Paired-end approach. Bottom: Forward-only approach.

Recommendations

When to Use Forward-Only ITS Processing

Based on this comparison, forward-only processing is preferable when:

Poor read overlap: ITS1F to ITS2 distance exceeds 600 bp in target organisms
Maximizing read retention: Studies with low biomass or limited sequencing depth
ITSxpress extraction: HMM-based ITS region extraction works better on single-end reads
Rare taxa detection: Higher ASV counts improve detection of low-abundance species

When to Use Paired-End ITS Processing

Paired-end merging may still be beneficial when:

Short ITS1 regions: Target organisms have ITS1 <400 bp (good overlap expected)
Error correction priority: Merged paired reads have lower error rates than single reads
Reference database constraints: Databases contain merged ITS1-5.8S-ITS2 sequences