Nanopore CO1 Barcoding of Fly Larvae from Forensic Casework
Introduction
Fly larvae were collected from a forensic death investigation scene and submitted for molecular identification. Due to the sensitive nature of the case, specific details regarding the circumstances and location cannot be disclosed. We used Oxford Nanopore sequencing to amplify and sequence the COI (cytochrome c oxidase subunit I) gene for molecular species identification. This mitochondrial barcoding region is widely used for insect identification and provides rapid, accurate species-level resolution for forensic entomology applications.
Table of Contents
Methods
Sample Collection: Fly larvae were collected from a human decomposition case and preserved for molecular analysis. Specific case details are confidential and cannot be disclosed.
DNA Extraction & Amplification: COI gene region was PCR amplified using universal insect primers (Wells & Sperling, 2001):
- C1-J-1751-F (forward):
5'-ACA CTG ACG ACA TGG TTC TAC AGG ATC ACC TGA TAT AGC ATT CCC-3' - TL2-N-3014-R (reverse):
5'-TAC GGT AGC AGA GAC TTG GTC TCG AGG TAT TCC AGC AAG TCC-3'
Primer coordinates based on Drosophila yakuba mtDNA reference numbering (Simon et al. 1994). C1-J-1751 binds 212 bp from the start of COI; TL2-N-3014 binds 63 bp from the 3′ end. Amplicon spans the full diagnostic barcode region used for Diptera species identification.
Nanopore Sequencing: Samples were prepared using the Rapid Barcoding Kit 96 V14 (Oxford Nanopore Technologies) and sequenced on an Oxford Nanopore MinION using a Flongle flow cell.
Bioinformatics Pipeline:
- Basecalling and demultiplexing: Raw fast5 files were basecalled and demultiplexed using Oxford Nanopore Guppy basecaller
- Quality filtering: Top 20 reads (by quality score) per barcode were selected for consensus generation
- Consensus sequence generation: Reads aligned using MAFFT with majority-rule consensus calling
- BOLD identification: Consensus sequences queried against BOLD Systems using BOLDigger3 v4 (public database, comprehensive mode)
- BLAST search: Consensus sequences queried against NCBI nucleotide (nt) database using blastn
- Dual-database validation: BOLD and BLAST results cross-validated to detect conflicts and assess confidence
Consensus Sequence Generation:
After basecalling and demultiplexing, the top 20 reads (by quality score) from each barcode were used to build consensus sequences. This approach balances sequencing depth with computational efficiency while minimizing the impact of sequencing errors.
The consensus building pipeline performs the following steps:
- Multiple sequence alignment using MAFFT with automatic algorithm selection and directional correction
- Majority-rule consensus calling where each position is assigned the most common base across aligned reads
- Gap removal to produce the final consensus sequence
- Single-read handling for barcodes with insufficient read depth
Click to view consensus_builder.py
import os
import subprocess
from collections import Counter
def mafft_align(fasta_path, out_path):
subprocess.run([
"mafft", "--auto", "--adjustdirectionaccurately", "--quiet",
fasta_path
], stdout=open(out_path, "w"), stderr=subprocess.DEVNULL)
def majority_consensus(aligned_fasta):
seqs = []
current = []
with open(aligned_fasta) as f:
for line in f:
line = line.strip()
if line.startswith(">"):
if current:
seqs.append("".join(current).upper())
current = []
else:
current.append(line)
if current:
seqs.append("".join(current).upper())
if not seqs:
return None
if len(seqs) == 1:
return seqs[0].replace("-", "")
consensus = []
for i in range(len(seqs[0])):
col = [s[i] for s in seqs if i < len(s)]
counts = Counter(b for b in col if b != "-")
if counts:
consensus.append(counts.most_common(1)[0][0])
return "".join(consensus)
fasta_dir = "filtered_reads"
tmp_dir = "tmp_alignments"
os.makedirs(tmp_dir, exist_ok=True)
output_fasta = "consensus_sequences.fasta"
skipped = 0
written = 0
with open(output_fasta, "w") as out:
for fname in sorted(os.listdir(fasta_dir)):
if not fname.endswith(".fasta"):
continue
barcode = fname.replace("_top20.fasta", "")
fpath = os.path.join(fasta_dir, fname)
if os.path.getsize(fpath) == 0:
skipped += 1
continue
n_seqs = sum(1 for l in open(fpath) if l.startswith(">"))
if n_seqs < 2:
# Only one read, just use it directly
with open(fpath) as f:
lines = f.readlines()
out.write(f">{barcode}\n")
out.write("".join(l for l in lines[1:] if not l.startswith(">")))
written += 1
print(f"{barcode}: single read, used directly")
continue
aligned = os.path.join(tmp_dir, f"{barcode}_aligned.fasta")
mafft_align(fpath, aligned)
consensus = majority_consensus(aligned)
if consensus:
out.write(f">{barcode}\n{consensus}\n")
written += 1
print(f"{barcode}: consensus from {n_seqs} reads ({len(consensus)} bp)")
else:
skipped += 1
print(f"{barcode}: failed")
print(f"\nDone: {written} consensus sequences written, {skipped} skipped")
print(f"Output: {output_fasta}")
Results
Summary Statistics
Table 1: Sequencing run metrics for MinION nanopore sequencing using a Flongle flow cell with 96 barcoded samples.
| Metric | Value |
|---|---|
| Total Barcodes | 96 |
| Barcodes with Data | 87 (90.6%) |
| Sequencing Time | ~24 hours |
| Total Reads | >100,000 |
Table 2: Taxonomic identification performance comparing BOLD Systems and NCBI BLAST databases. Agreement between databases indicates reliable identification; conflicts highlight COI limitations and database gaps.
| Metric | BOLD | BLAST |
|---|---|---|
| Samples Analyzed | 87 | 77 (9 BOLD-only) |
| High Confidence | ~50 species-level | 23 species agree + 7 ambiguous |
| Moderate Confidence | ~15 genus-level | 9 genus agree |
| Low Confidence | ~15 family-level | 11 conflicts + unresolved |
| Failed IDs | ~7 samples | 12 samples |
| Agreement | — | 32/77 (42%) agree with BOLD |
| Most Common Species | Lucilia coeruleiviridis | Lucilia coeruleiviridis |
Sequencing Performance
Figure 1: Cumulative read count over the ~24 hour sequencing run. The MinION generated >100,000 quality-passed reads across the Flongle flow cell.
Figure 2: Read count distribution across 96 barcodes, ranked by abundance. The red dashed line indicates the minimum threshold (100 reads) for reliable consensus generation.
Sequence Quality Metrics
Consensus sequences averaged ~1,300 bp in length, covering the full COI barcode region (658 bp) plus flanking primer regions. Most samples showed >97% identity to reference sequences.
Figure 3: Consensus sequence length distribution. Most sequences exceeded the 658 bp COI barcode region, capturing flanking primer sequences.
Figure 4: Percent identity to reference sequences from NCBI BLAST searches. The majority of samples exceeded 97% identity, indicating high-quality species-level matches.
Figure 5: Read depth used for consensus sequence generation. Most samples utilized the maximum depth of 20 reads for optimal consensus quality.
Species Identification Overview
Consensus sequences were queried against two complementary reference databases for taxonomic identification: BOLD Systems (Barcode of Life Data System) and NCBI BLAST (Basic Local Alignment Search Tool). Each database has distinct strengths and limitations for insect identification.
BOLD Identifications
BOLD Systems is a specialized database for DNA barcoding with curated COI sequences and taxonomic assignments. The majority of samples (>60%) achieved species-level identification through BOLD, with varying confidence levels based on sequence quality and reference database matches.
Consensus sequences were queried using the BOLD Identification Engine v4 (BOLDigger3) command-line tool:
boldigger3 identify consensus_sequences.fasta --db 1 --mode 3
This command queries the public BOLD database (–db 1) using the comprehensive identification mode (–mode 3), which returns top matches with similarity scores and taxonomic assignments.
Figure 6: BOLD identification success by taxonomic level. Most samples (>60%) achieved species-level identification, while others were resolved to genus or family level.
Figure 7: BOLD confidence levels for taxonomic identifications. High-confidence identifications (>97% sequence identity) were achieved for the majority of samples.
Figure 8: Family-level composition of identified larvae. Calliphoridae (blow flies) dominated the samples, consistent with their role as primary colonizers of carrion.
Figure 9: Species-level identifications among samples that achieved species-level resolution. Lucilia coeruleiviridis was the most abundant species detected.
BLAST Identifications
NCBI BLAST provides access to a broader genomic database including GenBank submissions. While less specialized for barcoding, BLAST can identify sequences missed by BOLD and provides alternative taxonomic perspectives. Consensus sequences were queried against the NCBI nucleotide (nt) database using the command-line blastn tool:
blastn -query consensus_sequences.fasta -db nt -remote \
-outfmt "6 qseqid sseqid stitle pident length evalue bitscore" \
-max_target_seqs 5 -num_threads 4 > blast_results.tsv
This command queries the remote NCBI nt database, returns the top 5 hits per sequence in tabular format, and includes percent identity, alignment length, e-value, and bit score for downstream filtering.
Figure 10: BLAST verdict categories for 77 samples with NCBI data. Agreement with BOLD at species or genus level was achieved in 42% of samples, while conflicts and ambiguous cases highlight COI limitations.
Figure 11: Final confidence distribution combining BOLD and BLAST results. High-confidence identifications (35%) required database agreement; low-confidence and failed cases reflect conflicts or poor sequence quality.
Figure 12: Database agreement breakdown showing the relationship between BOLD and BLAST results. Agreement (37%) validates identifications, while conflicts (13%) highlight database gaps and COI limitations.
Figure 13: Final species identifications after integrating BOLD and BLAST results. Lucilia coeruleiviridis dominated, with several ambiguous species pairs and genus-level assignments reflecting COI limitations.
Key Findings:
- BLAST confirmed the majority of BOLD identifications for Lucilia coeruleiviridis and Phormia regina
- Several samples showed species-level conflicts, particularly within the Lucilia genus where COI alone cannot reliably distinguish certain species pairs
- BLAST identified non-Dipteran contaminants in 12 samples (bacteria: Enterococcus, Ignatzschineria, Photobacterium, Pseudomonas), highlighting the importance of dual-database validation
Identification Reliability and Geographic Considerations
BOLD Database Issues
While the majority of samples achieved high-confidence species-level identifications, several BOLD assignments warrant caution due to their biogeographic incongruence with the sampling location (the northern United States):
Table 3: Biogeographically implausible BOLD identifications. Species identified are either geographically restricted to Old World (Europe, Australia) or Neotropical regions, making their occurrence in the northern United States highly unlikely and suggesting database reference gaps.
| Barcode | Species Identified | % Identity | Geographic Range | Issue |
|---|---|---|---|---|
| barcode74 | Lucilia pulverulenta | 98.19% | Europe, Australia | No established North American presence |
| barcode92 | Lucilia mexicana | — | Mexico, Central America | Only 1 BOLD record; very unlikely in the northern United States |
| barcode30 | Lucilia eximia | — | Neotropical | Likely misidentified L. coeruleiviridis |
| barcode34 | Lucilia eximia | — | Neotropical | Likely misidentified L. coeruleiviridis |
| barcode64 | Lucilia eximia | — | Neotropical | Likely misidentified L. coeruleiviridis |
| barcode69 | Lucilia eximia | — | Neotropical | Likely misidentified L. coeruleiviridis |
| barcode85 | Lucilia eximia | — | Neotropical | Likely misidentified L. coeruleiviridis |
Lucilia pulverulenta (barcode74) — This is primarily an Old World species found in Europe and Australia with essentially no established presence in North America. A 98.19% identity match in the northern United States is almost certainly a misidentification, likely representing a Lucilia species whose COI sequence isn’t well-represented in BOLD.
Lucilia mexicana (barcode92) — Primarily a Mexican/Central American species. While possible, this species is very unlikely in the northern United States. The identification was supported by only 1 record in BOLD, which is a red flag for a spurious hit indicating insufficient reference data.
Lucilia eximia (barcodes 30, 34, 64, 69, 85) — Primarily Neotropical (South/Central America). Finding 5 individuals of this species in the northern United States would be remarkable and almost certainly reflects a reference database gap rather than genuine identifications. L. eximia and L. coeruleiviridis have notoriously similar COI sequences, and BOLD’s coverage of North American Lucilia is patchy enough that this kind of confusion is common.
BLAST Database Issues
While BLAST provided valuable cross-validation of BOLD results, several identification conflicts and database-specific issues emerged:
Table 4: BOLD and BLAST identification conflicts. Discordant species assignments highlight COI limitations for closely related species and database-specific biases in reference coverage.
| Barcode | BOLD ID | BLAST ID | % Identity | Issue |
|---|---|---|---|---|
| barcode31 | Lucilia coeruleiviridis | Lucilia pulverulenta | 94.68% | Conflict: BLAST suggests Old World species unlikely in the northern United States |
| barcode35 | Lucilia retroversa | Lucilia coeruleiviridis | 80.21% | Conflict: Low BLAST identity; species determination uncertain |
| barcode50 | Lucilia retroversa | Lucilia coeruleiviridis | 81.92% | Conflict: Both identifications plausible; COI insufficient for resolution |
| barcode74 | Lucilia pulverulenta | Lucilia coeruleiviridis | 84.73% | Conflict: BLAST favors L. coeruleiviridis (more likely geographically) |
| barcode30 | Lucilia eximia | Lucilia mexicana | 82.15% | Conflict: Both Neotropical; poor BLAST identity suggests neither correct |
| barcode34 | Lucilia eximia | Lucilia mexicana | 87.48% | Conflict: Neotropical species unlikely; low confidence overall |
| barcode69 | Lucilia eximia | Lucilia coeruleiviridis | 87.86% | Conflict: BLAST supports North American species |
| barcode85 | Lucilia eximia | Lucilia mexicana | 92.01% | Conflict: Neotropical assignments in the northern United States questionable |
Conflict Patterns:
1. Lucilia retroversa vs. L. coeruleiviridis conflicts (barcodes 35, 50) — BOLD identified these as L. retroversa, but BLAST matched L. coeruleiviridis with low identity (80-82%). Both species occur in North America, but the low BLAST identities suggest possible database gaps for L. retroversa in GenBank.
2. Lucilia pulverulenta mismatches (barcodes 31, 74) — BOLD assigned L. pulverulenta (an Old World species), while BLAST matched L. coeruleiviridis (North American). BLAST results appear more biogeographically plausible, suggesting BOLD’s L. pulverulenta references may be contaminating North American identifications.
3. Neotropical Lucilia conflicts (barcodes 30, 34, 69, 85) — BOLD identified several samples as L. eximia (Neotropical), while BLAST returned L. mexicana or L. coeruleiviridis. Given the the northern United States collection site, all these identifications are suspect. The conflicts likely reflect incomplete COI reference coverage for Nearctic Lucilia species in both databases.
4. Low BLAST percent identities — Many samples showed 80-90% BLAST identity despite high BOLD matches (>97%). This discrepancy suggests:
- BOLD’s curated COI barcode database is more complete for North American blow flies
- NCBI GenBank contains many partial or lower-quality COI sequences
- Geographic sampling biases in GenBank favor European and Asian specimens
5. Non-Dipteran contaminants — BLAST successfully identified 12 bacterial contaminants (Enterococcus, Ignatzschineria, Photobacterium, Pseudomonas) that BOLD could not classify, demonstrating BLAST’s utility for detecting non-target DNA.
The L. coeruleiviridis / mexicana Problem
Seven samples in our dataset were flagged as “ambiguous” due to indistinguishable COI sequences between Lucilia coeruleiviridis and L. mexicana (barcodes 03, 11, 18, 26, 59, 66, 94). This is not a database error but a well-documented biological limitation of COI barcoding for this species pair.
DeBry et al. (2012) Study:
DeBry et al. conducted a comprehensive COI barcoding study of continental U.S. Lucilia species, assembling ~1,100 bp COI sequences from 122 specimens representing 9 of the 10 U.S. species. Their key findings:
-
Monophyly Test: They defined a species as “DNA-identifiable” if it formed an exclusively monophyletic clade in >95% of bootstrap pseudoreplicates in COI phylogenies.
-
Seven Species Passed: Most Lucilia species (including L. illustris, L. sericata, L. cuprina) formed well-supported monophyletic groups separable by COI alone.
-
L. coeruleiviridis and L. mexicana Failed: These two species share COI haplotypes and do not form exclusive, separable clades. As sampled in the continental U.S., they are indistinguishable using mitochondrial COI alone.
Our seven ambiguous identifications align perfectly with DeBry et al.’s findings. Where BOLD assigned L. coeruleiviridis and BLAST returned L. mexicana (or vice versa), we marked these as “Lucilia coeruleiviridis / mexicana“ with high confidence for the species pair, but low confidence for distinguishing between them. Given the the northern United States collection site, L. coeruleiviridis is more biogeographically likely, but COI data alone cannot definitively rule out L. mexicana.
Consensus Identification Table
The following table presents our best interpretation of each sample’s identity after integrating BOLD and BLAST results with biogeographic assessment. Identifications flagged as L. mexicana have been corrected to L. coeruleiviridis / mexicana (acknowledging COI indistinguishability) or reassigned to Lucilia sp. where conflicts render species-level assignment unreliable.
Table 5: Consensus species identifications for all samples after integrating BOLD, BLAST, and biogeographic assessment. Identifications apply corrections for COI limitations (L. coeruleiviridis/mexicana indistinguishability) and biogeographically implausible assignments.
| Barcode | Consensus Identification | Confidence | Notes |
|---|---|---|---|
| barcode01 | Lucilia sp. | Moderate | — |
| barcode02 | Lucilia coeruleiviridis | High | — |
| barcode03 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
| barcode04 | Lucilia coeruleiviridis | High | — |
| barcode05 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode06 | Lucilia coeruleiviridis | Low | — |
| barcode07 | Lucilia coeruleiviridis / mexicana | Low | COI indistinguishable; likely L. coeruleiviridis (biogeography) |
| barcode09 | Lucilia coeruleiviridis / mexicana | Low | COI indistinguishable; likely L. coeruleiviridis (biogeography) |
| barcode10 | Lucilia coeruleiviridis | High | — |
| barcode11 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
| barcode12 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode13 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode14 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode15 | Lucilia coeruleiviridis | High | — |
| barcode16 | Lucilia coeruleiviridis | High | — |
| barcode17 | Phormia regina | High | — |
| barcode18 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
| barcode19 | Lucilia coeruleiviridis | High | — |
| barcode20 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode21 | Lucilia coeruleiviridis | High | — |
| barcode22 | Lucilia coeruleiviridis | Low | BLAST conflict with unidentified specimen; BOLD ID retained |
| barcode23 | Lucilia coeruleiviridis | High | — |
| barcode25 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode26 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
| barcode27 | Lucilia coeruleiviridis | High | — |
| barcode28 | No match | Moderate | No reliable database hit |
| barcode29 | Lucilia sp. | Moderate | — |
| barcode30 | Lucilia sp. | Low | Conflict; biogeographically implausible IDs |
| barcode31 | Lucilia sp. | Low | Database conflict |
| barcode33 | Lucilia vulgata | Low | European species; ID uncertain |
| barcode34 | Lucilia sp. | Low | Conflict; biogeographically implausible IDs |
| barcode35 | Lucilia retroversa / coeruleiviridis | Low | Both Nearctic; COI insufficient for distinction |
| barcode36 | Lucilia sp. | Low | Neotropical ID implausible; likely misidentified L. coeruleiviridis |
| barcode37 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode38 | Lucilia coeruleiviridis | High | — |
| barcode39 | Lucilia coeruleiviridis | High | — |
| barcode41 | Lucilia coeruleiviridis | High | — |
| barcode42 | Lucilia sp. | Moderate | — |
| barcode43 | Lucilia coeruleiviridis | High | — |
| barcode44 | Lucilia sp. | Moderate | — |
| barcode45 | Phormia regina | High | — |
| barcode46 | Lucilia coeruleiviridis | Moderate | — |
| barcode47 | Lucilia coeruleiviridis | Moderate | — |
| barcode48 | Lucilia sp. | Moderate | — |
| barcode49 | Lucilia coeruleiviridis | Low | — |
| barcode50 | Lucilia retroversa / coeruleiviridis | Low | Both Nearctic; COI insufficient for distinction |
| barcode51 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode52 | Lucilia sp. | Low | Old World species implausible in the northern United States |
| barcode53 | Lucilia sp. | Moderate | — |
| barcode54 | No match | Moderate | No reliable database hit |
| barcode55 | Lucilia illustris | High | — |
| barcode57 | Lucilia coeruleiviridis | High | — |
| barcode58 | Lucilia sp. | Moderate | — |
| barcode59 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
| barcode60 | Lucilia coeruleiviridis | High | — |
| barcode61 | Lucilia coeruleiviridis | Moderate | — |
| barcode62 | Lucilia coeruleiviridis | Low | — |
| barcode63 | Lucilia coeruleiviridis | Moderate | — |
| barcode64 | Lucilia sp. | Moderate | Neotropical ID implausible; likely misidentified L. coeruleiviridis |
| barcode65 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode66 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
| barcode67 | Lucilia sp. | Moderate | — |
| barcode68 | No match | Moderate | No reliable database hit |
| barcode69 | Lucilia sp. | Low | Conflict; biogeographically implausible IDs |
| barcode70 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode72 | Lucilia sp. | Moderate | — |
| barcode73 | Lucilia coeruleiviridis | Low | — |
| barcode74 | Lucilia coeruleiviridis | Low | L. pulverulenta (Old World) implausible; BLAST ID retained |
| barcode75 | Lucilia coeruleiviridis | High | — |
| barcode76 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode77 | Lucilia coeruleiviridis | High | — |
| barcode78 | Lucilia coeruleiviridis | High | — |
| barcode79 | Phormia regina | Low | — |
| barcode80 | Lucilia coeruleiviridis | Low | BLAST conflict with unidentified specimen; BOLD ID retained |
| barcode81 | Lucilia sp. | Low | Insufficient data for species-level ID |
| barcode82 | Lucilia coeruleiviridis / mexicana | Low | COI indistinguishable; likely L. coeruleiviridis (biogeography) |
| barcode83 | Lucilia coeruleiviridis | Low | — |
| barcode84 | Lucilia coeruleiviridis | Low | BLAST conflict with unidentified specimen; BOLD ID retained |
| barcode85 | Lucilia sp. | Low | Conflict; biogeographically implausible IDs |
| barcode86 | Lucilia coeruleiviridis | High | — |
| barcode89 | Lucilia coeruleiviridis / mexicana | Low | COI indistinguishable; likely L. coeruleiviridis (biogeography) |
| barcode90 | Lucilia sp. | Low | Neotropical ID implausible; likely misidentified L. coeruleiviridis |
| barcode91 | Lucilia coeruleiviridis | Moderate | — |
| barcode92 | Lucilia coeruleiviridis / mexicana | High | COI indistinguishable; likely L. coeruleiviridis (biogeography) |
| barcode93 | Failed | Failed | Non-Dipteran contaminant or poor quality |
| barcode94 | Lucilia coeruleiviridis / mexicana | High | COI cannot distinguish species pair |
Summary:
- 29 samples identified as Lucilia coeruleiviridis with high confidence
- 10 samples flagged as L. coeruleiviridis / mexicana (COI indistinguishable species pair)
- 2 samples identified as Phormia regina
- 1 sample identified as Lucilia illustris
- 2 samples as L. retroversa / coeruleiviridis (uncertain)
- 22 samples assigned to Lucilia sp. (genus-level only)
- 12 samples failed (non-Dipteran contaminants)
- 4 samples with no reliable match
Discussion
Our nanopore sequencing approach, combined with dual-database validation (BOLD and BLAST), successfully identified fly larvae with varying confidence levels. Of 87 samples, 30 (35%) achieved high-confidence identifications where both databases agreed, while 26 (30%) remained low-confidence due to conflicts or insufficient reference data. This dual-database strategy proved essential for:
- Cross-validation: 42% of samples showed agreement between BOLD and BLAST, providing confidence in species assignments
- Conflict detection: 11 samples revealed species-level disagreements, highlighting regions where COI alone is insufficient
- Contaminant identification: BLAST detected 12 bacterial contaminants missed by BOLD’s arthropod-focused database
- Database bias assessment: Comparison revealed BOLD’s superior coverage for North American blow flies, while BLAST provided broader taxonomic scope
The blow fly family Calliphoridae dominated the samples, with Lucilia coeruleiviridis being the most abundant species (29 samples, 34%). This species is a primary colonizer of carrion commonly encountered in forensic investigations.
Several BOLD identifications of Neotropical species (L. eximia, L. mexicana) were contradicted or poorly supported by BLAST, highlighting database gaps for Nearctic Lucilia species. These conflicts likely represent misidentifications due to incomplete reference coverage rather than genuine biogeographic anomalies.
Technical Limitations
COI Gene Limitations: Seven samples showed L. coeruleiviridis / L. mexicana ambiguity, representing a known COI limitation where these species pairs share nearly identical barcode sequences. Additional genetic markers (e.g., CAD, ITS2) would be required for definitive separation.
Database Completeness: Low-confidence identifications and conflicts between databases underscore the critical dependence on reference sequence availability. For North American forensic entomology applications, BOLD’s curated COI database outperformed NCBI GenBank, which contains many partial or geographically biased sequences.
Failed Identifications: Twelve samples (14%) failed both databases, attributable to:
- Bacterial DNA contamination from decomposition microbiome
- Low read counts producing poor-quality consensus sequences
- Non-target arthropod DNA (e.g., mites, parasitoids)
- Sequences from species absent from both reference databases
References
Abeynayake, S. W., Fiorito, S., Dinsdale, A., Whattam, M., Crowe, B., Sparks, K., Campbell, P. R., & Gambley, C. (2021). A Rapid and Cost-Effective Identification of Invertebrate Pests at the Borders Using MinION Sequencing of DNA Barcodes. Genes, 12(8), 1138. https://doi.org/10.3390/genes12081138
Boehme, P., Amendt, J., & Zehner, R. (2011). The use of COI barcodes for molecular identification of forensically important fly species in Germany. Parasitology Research, 110(6), 2325–2332. https://doi.org/10.1007/s00436-011-2767-8
DeBry, R. W., Timm, A., Wong, E. S., Stamper, T., Cookman, C., & Dahlem, G. A. (2012). DNA-Based Identification of Forensically Important Lucilia (Diptera: Calliphoridae) in the Continental United States. Journal of Forensic Sciences, 58(1), 73–78. https://doi.org/10.1111/j.1556-4029.2012.02176.x
Sandoval-Arias, S., Morales-Montero, R., Araya-Valcerde, E., & Hernández-Calvajal, E. (2020). Identificación molecular mediante código de barras de DNA de moscas Lucilia (Diptera: Calliphoridae) recolectadas en Costa Rica. Revista Tecnología En Marcha, 33(1). https://doi.org/10.18845/tm.v33i1.5025
Srivathsan, A., Baloğlu, B., Wang, W., Tan, W. X., Bertrand, D., Ng, A. H. Q., Boey, E. J. H., Koh, J. J. Y., Nagarajan, N., & Meier, R. (2018). A MinION™-based pipeline for fast and cost-effective DNA barcoding. Molecular Ecology Resources, 18(5), 1035–1049. https://doi.org/10.1111/1755-0998.12890
Wells, J. D., & Sperling, F. A. H. (2001). DNA-based identification of forensically important Chrysomyinae (Diptera: Calliphoridae). Forensic Science International, 120(1-2), 110–115. https://doi.org/10.1016/s0379-0738(01)00414-5
Yusseff-Vanegas, S. Z., & Agnarsson, I. (2017). DNA-barcoding of forensically important blow flies (Diptera: Calliphoridae) in the Caribbean Region. PeerJ, 5, e3516. https://doi.org/10.7717/peerj.3516