Nanopore CO1 Barcoding of Fly Larvae from Forensic Casework

Introduction

Fly larvae were collected from a forensic death investigation scene and submitted for molecular identification. Due to the sensitive nature of the case, specific details regarding the circumstances and location cannot be disclosed. We used Oxford Nanopore sequencing to amplify and sequence the COI (cytochrome c oxidase subunit I) gene for molecular species identification. This mitochondrial barcoding region is widely used for insect identification and provides rapid, accurate species-level resolution for forensic entomology applications.

Methods Results Summary Statistics Sequencing Performance Quality Metrics Species Identification BOLD Identifications BLAST Identifications Reliability & Geography COI Limitations Consensus IDs Discussion References

Methods

Sample Collection: Fly larvae were collected from a human decomposition case and preserved for molecular analysis. Specific case details are confidential and cannot be disclosed.

DNA Extraction & Amplification: COI gene region was PCR amplified using universal insect primers (Wells & Sperling, 2001):

C1-J-1751-F (forward): 5'-ACA CTG ACG ACA TGG TTC TAC AGG ATC ACC TGA TAT AGC ATT CCC-3'
TL2-N-3014-R (reverse): 5'-TAC GGT AGC AGA GAC TTG GTC TCG AGG TAT TCC AGC AAG TCC-3'

1,539

COI length (bp)

1,264

Amplicon (bp)

82%

COI coverage

Primer coordinates based on Drosophila yakuba mtDNA reference numbering (Simon et al. 1994). C1-J-1751 binds 212 bp from the start of COI; TL2-N-3014 binds 63 bp from the 3′ end. Amplicon spans the full diagnostic barcode region used for Diptera species identification.

Nanopore Sequencing: Samples were prepared using the Rapid Barcoding Kit 96 V14 (Oxford Nanopore Technologies) and sequenced on an Oxford Nanopore MinION using a Flongle flow cell.

Bioinformatics Pipeline:

Basecalling and demultiplexing: Raw fast5 files were basecalled and demultiplexed using Oxford Nanopore Guppy basecaller
Quality filtering: Top 20 reads (by quality score) per barcode were selected for consensus generation
Consensus sequence generation: Reads aligned using MAFFT with majority-rule consensus calling
BOLD identification: Consensus sequences queried against BOLD Systems using BOLDigger3 v4 (public database, comprehensive mode)
BLAST search: Consensus sequences queried against NCBI nucleotide (nt) database using blastn
Dual-database validation: BOLD and BLAST results cross-validated to detect conflicts and assess confidence

Consensus Sequence Generation:

After basecalling and demultiplexing, the top 20 reads (by quality score) from each barcode were used to build consensus sequences. This approach balances sequencing depth with computational efficiency while minimizing the impact of sequencing errors.

The consensus building pipeline performs the following steps:

Multiple sequence alignment using MAFFT with automatic algorithm selection and directional correction
Majority-rule consensus calling where each position is assigned the most common base across aligned reads
Gap removal to produce the final consensus sequence
Single-read handling for barcodes with insufficient read depth

Click to view consensus_builder.py

import os
import subprocess
from collections import Counter

def mafft_align(fasta_path, out_path):
    subprocess.run([
        "mafft", "--auto", "--adjustdirectionaccurately", "--quiet",
        fasta_path
    ], stdout=open(out_path, "w"), stderr=subprocess.DEVNULL)

def majority_consensus(aligned_fasta):
    seqs = []
    current = []
    with open(aligned_fasta) as f:
        for line in f:
            line = line.strip()
            if line.startswith(">"):
                if current:
                    seqs.append("".join(current).upper())
                current = []
            else:
                current.append(line)
        if current:
            seqs.append("".join(current).upper())

    if not seqs:
        return None
    if len(seqs) == 1:
        return seqs[0].replace("-", "")

    consensus = []
    for i in range(len(seqs[0])):
        col = [s[i] for s in seqs if i < len(s)]
        counts = Counter(b for b in col if b != "-")
        if counts:
            consensus.append(counts.most_common(1)[0][0])
    return "".join(consensus)

fasta_dir = "filtered_reads"
tmp_dir = "tmp_alignments"
os.makedirs(tmp_dir, exist_ok=True)

output_fasta = "consensus_sequences.fasta"
skipped = 0
written = 0

with open(output_fasta, "w") as out:
    for fname in sorted(os.listdir(fasta_dir)):
        if not fname.endswith(".fasta"):
            continue
        barcode = fname.replace("_top20.fasta", "")
        fpath = os.path.join(fasta_dir, fname)

        if os.path.getsize(fpath) == 0:
            skipped += 1
            continue

        n_seqs = sum(1 for l in open(fpath) if l.startswith(">"))
        if n_seqs < 2:
            # Only one read, just use it directly
            with open(fpath) as f:
                lines = f.readlines()
            out.write(f">{barcode}\n")
            out.write("".join(l for l in lines[1:] if not l.startswith(">")))
            written += 1
            print(f"{barcode}: single read, used directly")
            continue

        aligned = os.path.join(tmp_dir, f"{barcode}_aligned.fasta")
        mafft_align(fpath, aligned)
        consensus = majority_consensus(aligned)

        if consensus:
            out.write(f">{barcode}\n{consensus}\n")
            written += 1
            print(f"{barcode}: consensus from {n_seqs} reads ({len(consensus)} bp)")
        else:
            skipped += 1
            print(f"{barcode}: failed")

print(f"\nDone: {written} consensus sequences written, {skipped} skipped")
print(f"Output: {output_fasta}")

Results

Summary Statistics

Table 1: Sequencing run metrics for MinION nanopore sequencing using a Flongle flow cell with 96 barcoded samples.

Metric	Value
Total Barcodes	96
Barcodes with Data	87 (90.6%)
Sequencing Time	~24 hours
Total Reads	>100,000

Table 2: Taxonomic identification performance comparing BOLD Systems and NCBI BLAST databases. Agreement between databases indicates reliable identification; conflicts highlight COI limitations and database gaps.

Metric	BOLD	BLAST
Samples Analyzed	87	77 (9 BOLD-only)
High Confidence	~50 species-level	23 species agree + 7 ambiguous
Moderate Confidence	~15 genus-level	9 genus agree
Low Confidence	~15 family-level	11 conflicts + unresolved
Failed IDs	~7 samples	12 samples
Agreement	—	32/77 (42%) agree with BOLD
Most Common Species	Lucilia coeruleiviridis	Lucilia coeruleiviridis

Sequencing Performance

Figure 1: Cumulative read count over the ~24 hour sequencing run. The MinION generated >100,000 quality-passed reads across the Flongle flow cell.

Figure 2: Read count distribution across 96 barcodes, ranked by abundance. The red dashed line indicates the minimum threshold (100 reads) for reliable consensus generation.

Sequence Quality Metrics

Consensus sequences averaged ~1,300 bp in length, covering the full COI barcode region (658 bp) plus flanking primer regions. Most samples showed >97% identity to reference sequences.

Figure 3: Consensus sequence length distribution. Most sequences exceeded the 658 bp COI barcode region, capturing flanking primer sequences.

Figure 4: Percent identity to reference sequences from NCBI BLAST searches. The majority of samples exceeded 97% identity, indicating high-quality species-level matches.

Figure 5: Read depth used for consensus sequence generation. Most samples utilized the maximum depth of 20 reads for optimal consensus quality.

Species Identification Overview

Consensus sequences were queried against two complementary reference databases for taxonomic identification: BOLD Systems (Barcode of Life Data System) and NCBI BLAST (Basic Local Alignment Search Tool). Each database has distinct strengths and limitations for insect identification.

BOLD Identifications

BOLD Systems is a specialized database for DNA barcoding with curated COI sequences and taxonomic assignments. The majority of samples (>60%) achieved species-level identification through BOLD, with varying confidence levels based on sequence quality and reference database matches.

Consensus sequences were queried using the BOLD Identification Engine v4 (BOLDigger3) command-line tool:

boldigger3 identify consensus_sequences.fasta --db 1 --mode 3

This command queries the public BOLD database (–db 1) using the comprehensive identification mode (–mode 3), which returns top matches with similarity scores and taxonomic assignments.

Figure 6: BOLD identification success by taxonomic level. Most samples (>60%) achieved species-level identification, while others were resolved to genus or family level.

Figure 7: BOLD confidence levels for taxonomic identifications. High-confidence identifications (>97% sequence identity) were achieved for the majority of samples.

Figure 8: Family-level composition of identified larvae. Calliphoridae (blow flies) dominated the samples, consistent with their role as primary colonizers of carrion.

Figure 9: Species-level identifications among samples that achieved species-level resolution. Lucilia coeruleiviridis was the most abundant species detected.

BLAST Identifications

NCBI BLAST provides access to a broader genomic database including GenBank submissions. While less specialized for barcoding, BLAST can identify sequences missed by BOLD and provides alternative taxonomic perspectives. Consensus sequences were queried against the NCBI nucleotide (nt) database using the command-line blastn tool:

blastn -query consensus_sequences.fasta -db nt -remote \
  -outfmt "6 qseqid sseqid stitle pident length evalue bitscore" \
  -max_target_seqs 5 -num_threads 4 > blast_results.tsv

This command queries the remote NCBI nt database, returns the top 5 hits per sequence in tabular format, and includes percent identity, alignment length, e-value, and bit score for downstream filtering.

Figure 10: BLAST verdict categories for 77 samples with NCBI data. Agreement with BOLD at species or genus level was achieved in 42% of samples, while conflicts and ambiguous cases highlight COI limitations.

Figure 11: Final confidence distribution combining BOLD and BLAST results. High-confidence identifications (35%) required database agreement; low-confidence and failed cases reflect conflicts or poor sequence quality.

Figure 12: Database agreement breakdown showing the relationship between BOLD and BLAST results. Agreement (37%) validates identifications, while conflicts (13%) highlight database gaps and COI limitations.

Figure 13: Final species identifications after integrating BOLD and BLAST results. Lucilia coeruleiviridis dominated, with several ambiguous species pairs and genus-level assignments reflecting COI limitations.

Key Findings:

BLAST confirmed the majority of BOLD identifications for Lucilia coeruleiviridis and Phormia regina
Several samples showed species-level conflicts, particularly within the Lucilia genus where COI alone cannot reliably distinguish certain species pairs
BLAST identified non-Dipteran contaminants in 12 samples (bacteria: Enterococcus, Ignatzschineria, Photobacterium, Pseudomonas), highlighting the importance of dual-database validation

Identification Reliability and Geographic Considerations

BOLD Database Issues

While the majority of samples achieved high-confidence species-level identifications, several BOLD assignments warrant caution due to their biogeographic incongruence with the sampling location (the northern United States):

Table 3: Biogeographically implausible BOLD identifications. Species identified are either geographically restricted to Old World (Europe, Australia) or Neotropical regions, making their occurrence in the northern United States highly unlikely and suggesting database reference gaps.

Barcode	Species Identified	% Identity	Geographic Range	Issue
barcode74	Lucilia pulverulenta	98.19%	Europe, Australia	No established North American presence
barcode92	Lucilia mexicana	—	Mexico, Central America	Only 1 BOLD record; very unlikely in the northern United States
barcode30	Lucilia eximia	—	Neotropical	Likely misidentified L. coeruleiviridis
barcode34	Lucilia eximia	—	Neotropical	Likely misidentified L. coeruleiviridis
barcode64	Lucilia eximia	—	Neotropical	Likely misidentified L. coeruleiviridis
barcode69	Lucilia eximia	—	Neotropical	Likely misidentified L. coeruleiviridis
barcode85	Lucilia eximia	—	Neotropical	Likely misidentified L. coeruleiviridis

Lucilia pulverulenta (barcode74) — This is primarily an Old World species found in Europe and Australia with essentially no established presence in North America. A 98.19% identity match in the northern United States is almost certainly a misidentification, likely representing a Lucilia species whose COI sequence isn’t well-represented in BOLD.

Lucilia mexicana (barcode92) — Primarily a Mexican/Central American species. While possible, this species is very unlikely in the northern United States. The identification was supported by only 1 record in BOLD, which is a red flag for a spurious hit indicating insufficient reference data.

Lucilia eximia (barcodes 30, 34, 64, 69, 85) — Primarily Neotropical (South/Central America). Finding 5 individuals of this species in the northern United States would be remarkable and almost certainly reflects a reference database gap rather than genuine identifications. L. eximia and L. coeruleiviridis have notoriously similar COI sequences, and BOLD’s coverage of North American Lucilia is patchy enough that this kind of confusion is common.

BLAST Database Issues

While BLAST provided valuable cross-validation of BOLD results, several identification conflicts and database-specific issues emerged:

Table 4: BOLD and BLAST identification conflicts. Discordant species assignments highlight COI limitations for closely related species and database-specific biases in reference coverage.

Barcode	BOLD ID	BLAST ID	% Identity	Issue
barcode31	Lucilia coeruleiviridis	Lucilia pulverulenta	94.68%	Conflict: BLAST suggests Old World species unlikely in the northern United States
barcode35	Lucilia retroversa	Lucilia coeruleiviridis	80.21%	Conflict: Low BLAST identity; species determination uncertain
barcode50	Lucilia retroversa	Lucilia coeruleiviridis	81.92%	Conflict: Both identifications plausible; COI insufficient for resolution
barcode74	Lucilia pulverulenta	Lucilia coeruleiviridis	84.73%	Conflict: BLAST favors L. coeruleiviridis (more likely geographically)
barcode30	Lucilia eximia	Lucilia mexicana	82.15%	Conflict: Both Neotropical; poor BLAST identity suggests neither correct
barcode34	Lucilia eximia	Lucilia mexicana	87.48%	Conflict: Neotropical species unlikely; low confidence overall
barcode69	Lucilia eximia	Lucilia coeruleiviridis	87.86%	Conflict: BLAST supports North American species
barcode85	Lucilia eximia	Lucilia mexicana	92.01%	Conflict: Neotropical assignments in the northern United States questionable

Conflict Patterns:

1. Lucilia retroversa vs. L. coeruleiviridis conflicts (barcodes 35, 50) — BOLD identified these as L. retroversa, but BLAST matched L. coeruleiviridis with low identity (80-82%). Both species occur in North America, but the low BLAST identities suggest possible database gaps for L. retroversa in GenBank.

2. Lucilia pulverulenta mismatches (barcodes 31, 74) — BOLD assigned L. pulverulenta (an Old World species), while BLAST matched L. coeruleiviridis (North American). BLAST results appear more biogeographically plausible, suggesting BOLD’s L. pulverulenta references may be contaminating North American identifications.

3. Neotropical Lucilia conflicts (barcodes 30, 34, 69, 85) — BOLD identified several samples as L. eximia (Neotropical), while BLAST returned L. mexicana or L. coeruleiviridis. Given the the northern United States collection site, all these identifications are suspect. The conflicts likely reflect incomplete COI reference coverage for Nearctic Lucilia species in both databases.

4. Low BLAST percent identities — Many samples showed 80-90% BLAST identity despite high BOLD matches (>97%). This discrepancy suggests:

BOLD’s curated COI barcode database is more complete for North American blow flies
NCBI GenBank contains many partial or lower-quality COI sequences
Geographic sampling biases in GenBank favor European and Asian specimens

5. Non-Dipteran contaminants — BLAST successfully identified 12 bacterial contaminants (Enterococcus, Ignatzschineria, Photobacterium, Pseudomonas) that BOLD could not classify, demonstrating BLAST’s utility for detecting non-target DNA.

The L. coeruleiviridis / mexicana Problem

Seven samples in our dataset were flagged as “ambiguous” due to indistinguishable COI sequences between Lucilia coeruleiviridis and L. mexicana (barcodes 03, 11, 18, 26, 59, 66, 94). This is not a database error but a well-documented biological limitation of COI barcoding for this species pair.

DeBry et al. (2012) Study:

DeBry et al. conducted a comprehensive COI barcoding study of continental U.S. Lucilia species, assembling ~1,100 bp COI sequences from 122 specimens representing 9 of the 10 U.S. species. Their key findings:

Monophyly Test: They defined a species as “DNA-identifiable” if it formed an exclusively monophyletic clade in >95% of bootstrap pseudoreplicates in COI phylogenies.
Seven Species Passed: Most Lucilia species (including L. illustris, L. sericata, L. cuprina) formed well-supported monophyletic groups separable by COI alone.
L. coeruleiviridis and L. mexicana Failed: These two species share COI haplotypes and do not form exclusive, separable clades. As sampled in the continental U.S., they are indistinguishable using mitochondrial COI alone.

Our seven ambiguous identifications align perfectly with DeBry et al.’s findings. Where BOLD assigned L. coeruleiviridis and BLAST returned L. mexicana (or vice versa), we marked these as “Lucilia coeruleiviridis / mexicana“ with high confidence for the species pair, but low confidence for distinguishing between them. Given the the northern United States collection site, L. coeruleiviridis is more biogeographically likely, but COI data alone cannot definitively rule out L. mexicana.

Consensus Identification Table

The following table presents our best interpretation of each sample’s identity after integrating BOLD and BLAST results with biogeographic assessment. Identifications flagged as L. mexicana have been corrected to L. coeruleiviridis / mexicana (acknowledging COI indistinguishability) or reassigned to Lucilia sp. where conflicts render species-level assignment unreliable.

Table 5: Consensus species identifications for all samples after integrating BOLD, BLAST, and biogeographic assessment. Identifications apply corrections for COI limitations (L. coeruleiviridis/mexicana indistinguishability) and biogeographically implausible assignments.

Barcode	Consensus Identification	Confidence	Notes
barcode01	Lucilia sp.	Moderate	—
barcode02	Lucilia coeruleiviridis	High	—
barcode03	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair
barcode04	Lucilia coeruleiviridis	High	—
barcode05	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode06	Lucilia coeruleiviridis	Low	—
barcode07	Lucilia coeruleiviridis / mexicana	Low	COI indistinguishable; likely L. coeruleiviridis (biogeography)
barcode09	Lucilia coeruleiviridis / mexicana	Low	COI indistinguishable; likely L. coeruleiviridis (biogeography)
barcode10	Lucilia coeruleiviridis	High	—
barcode11	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair
barcode12	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode13	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode14	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode15	Lucilia coeruleiviridis	High	—
barcode16	Lucilia coeruleiviridis	High	—
barcode17	Phormia regina	High	—
barcode18	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair
barcode19	Lucilia coeruleiviridis	High	—
barcode20	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode21	Lucilia coeruleiviridis	High	—
barcode22	Lucilia coeruleiviridis	Low	BLAST conflict with unidentified specimen; BOLD ID retained
barcode23	Lucilia coeruleiviridis	High	—
barcode25	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode26	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair
barcode27	Lucilia coeruleiviridis	High	—
barcode28	No match	Moderate	No reliable database hit
barcode29	Lucilia sp.	Moderate	—
barcode30	Lucilia sp.	Low	Conflict; biogeographically implausible IDs
barcode31	Lucilia sp.	Low	Database conflict
barcode33	Lucilia vulgata	Low	European species; ID uncertain
barcode34	Lucilia sp.	Low	Conflict; biogeographically implausible IDs
barcode35	Lucilia retroversa / coeruleiviridis	Low	Both Nearctic; COI insufficient for distinction
barcode36	Lucilia sp.	Low	Neotropical ID implausible; likely misidentified L. coeruleiviridis
barcode37	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode38	Lucilia coeruleiviridis	High	—
barcode39	Lucilia coeruleiviridis	High	—
barcode41	Lucilia coeruleiviridis	High	—
barcode42	Lucilia sp.	Moderate	—
barcode43	Lucilia coeruleiviridis	High	—
barcode44	Lucilia sp.	Moderate	—
barcode45	Phormia regina	High	—
barcode46	Lucilia coeruleiviridis	Moderate	—
barcode47	Lucilia coeruleiviridis	Moderate	—
barcode48	Lucilia sp.	Moderate	—
barcode49	Lucilia coeruleiviridis	Low	—
barcode50	Lucilia retroversa / coeruleiviridis	Low	Both Nearctic; COI insufficient for distinction
barcode51	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode52	Lucilia sp.	Low	Old World species implausible in the northern United States
barcode53	Lucilia sp.	Moderate	—
barcode54	No match	Moderate	No reliable database hit
barcode55	Lucilia illustris	High	—
barcode57	Lucilia coeruleiviridis	High	—
barcode58	Lucilia sp.	Moderate	—
barcode59	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair
barcode60	Lucilia coeruleiviridis	High	—
barcode61	Lucilia coeruleiviridis	Moderate	—
barcode62	Lucilia coeruleiviridis	Low	—
barcode63	Lucilia coeruleiviridis	Moderate	—
barcode64	Lucilia sp.	Moderate	Neotropical ID implausible; likely misidentified L. coeruleiviridis
barcode65	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode66	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair
barcode67	Lucilia sp.	Moderate	—
barcode68	No match	Moderate	No reliable database hit
barcode69	Lucilia sp.	Low	Conflict; biogeographically implausible IDs
barcode70	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode72	Lucilia sp.	Moderate	—
barcode73	Lucilia coeruleiviridis	Low	—
barcode74	Lucilia coeruleiviridis	Low	L. pulverulenta (Old World) implausible; BLAST ID retained
barcode75	Lucilia coeruleiviridis	High	—
barcode76	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode77	Lucilia coeruleiviridis	High	—
barcode78	Lucilia coeruleiviridis	High	—
barcode79	Phormia regina	Low	—
barcode80	Lucilia coeruleiviridis	Low	BLAST conflict with unidentified specimen; BOLD ID retained
barcode81	Lucilia sp.	Low	Insufficient data for species-level ID
barcode82	Lucilia coeruleiviridis / mexicana	Low	COI indistinguishable; likely L. coeruleiviridis (biogeography)
barcode83	Lucilia coeruleiviridis	Low	—
barcode84	Lucilia coeruleiviridis	Low	BLAST conflict with unidentified specimen; BOLD ID retained
barcode85	Lucilia sp.	Low	Conflict; biogeographically implausible IDs
barcode86	Lucilia coeruleiviridis	High	—
barcode89	Lucilia coeruleiviridis / mexicana	Low	COI indistinguishable; likely L. coeruleiviridis (biogeography)
barcode90	Lucilia sp.	Low	Neotropical ID implausible; likely misidentified L. coeruleiviridis
barcode91	Lucilia coeruleiviridis	Moderate	—
barcode92	Lucilia coeruleiviridis / mexicana	High	COI indistinguishable; likely L. coeruleiviridis (biogeography)
barcode93	Failed	Failed	Non-Dipteran contaminant or poor quality
barcode94	Lucilia coeruleiviridis / mexicana	High	COI cannot distinguish species pair

Summary:

29 samples identified as Lucilia coeruleiviridis with high confidence
10 samples flagged as L. coeruleiviridis / mexicana (COI indistinguishable species pair)
2 samples identified as Phormia regina
1 sample identified as Lucilia illustris
2 samples as L. retroversa / coeruleiviridis (uncertain)
22 samples assigned to Lucilia sp. (genus-level only)
12 samples failed (non-Dipteran contaminants)
4 samples with no reliable match

Discussion

Our nanopore sequencing approach, combined with dual-database validation (BOLD and BLAST), successfully identified fly larvae with varying confidence levels. Of 87 samples, 30 (35%) achieved high-confidence identifications where both databases agreed, while 26 (30%) remained low-confidence due to conflicts or insufficient reference data. This dual-database strategy proved essential for:

Cross-validation: 42% of samples showed agreement between BOLD and BLAST, providing confidence in species assignments
Conflict detection: 11 samples revealed species-level disagreements, highlighting regions where COI alone is insufficient
Contaminant identification: BLAST detected 12 bacterial contaminants missed by BOLD’s arthropod-focused database
Database bias assessment: Comparison revealed BOLD’s superior coverage for North American blow flies, while BLAST provided broader taxonomic scope

The blow fly family Calliphoridae dominated the samples, with Lucilia coeruleiviridis being the most abundant species (29 samples, 34%). This species is a primary colonizer of carrion commonly encountered in forensic investigations.

Several BOLD identifications of Neotropical species (L. eximia, L. mexicana) were contradicted or poorly supported by BLAST, highlighting database gaps for Nearctic Lucilia species. These conflicts likely represent misidentifications due to incomplete reference coverage rather than genuine biogeographic anomalies.

Technical Limitations

COI Gene Limitations: Seven samples showed L. coeruleiviridis / L. mexicana ambiguity, representing a known COI limitation where these species pairs share nearly identical barcode sequences. Additional genetic markers (e.g., CAD, ITS2) would be required for definitive separation.

Database Completeness: Low-confidence identifications and conflicts between databases underscore the critical dependence on reference sequence availability. For North American forensic entomology applications, BOLD’s curated COI database outperformed NCBI GenBank, which contains many partial or geographically biased sequences.

Failed Identifications: Twelve samples (14%) failed both databases, attributable to:

Bacterial DNA contamination from decomposition microbiome
Low read counts producing poor-quality consensus sequences
Non-target arthropod DNA (e.g., mites, parasitoids)
Sequences from species absent from both reference databases

References

Abeynayake, S. W., Fiorito, S., Dinsdale, A., Whattam, M., Crowe, B., Sparks, K., Campbell, P. R., & Gambley, C. (2021). A Rapid and Cost-Effective Identification of Invertebrate Pests at the Borders Using MinION Sequencing of DNA Barcodes. Genes, 12(8), 1138. https://doi.org/10.3390/genes12081138

Boehme, P., Amendt, J., & Zehner, R. (2011). The use of COI barcodes for molecular identification of forensically important fly species in Germany. Parasitology Research, 110(6), 2325–2332. https://doi.org/10.1007/s00436-011-2767-8

DeBry, R. W., Timm, A., Wong, E. S., Stamper, T., Cookman, C., & Dahlem, G. A. (2012). DNA-Based Identification of Forensically Important Lucilia (Diptera: Calliphoridae) in the Continental United States. Journal of Forensic Sciences, 58(1), 73–78. https://doi.org/10.1111/j.1556-4029.2012.02176.x

Sandoval-Arias, S., Morales-Montero, R., Araya-Valcerde, E., & Hernández-Calvajal, E. (2020). Identificación molecular mediante código de barras de DNA de moscas Lucilia (Diptera: Calliphoridae) recolectadas en Costa Rica. Revista Tecnología En Marcha, 33(1). https://doi.org/10.18845/tm.v33i1.5025

Srivathsan, A., Baloğlu, B., Wang, W., Tan, W. X., Bertrand, D., Ng, A. H. Q., Boey, E. J. H., Koh, J. J. Y., Nagarajan, N., & Meier, R. (2018). A MinION™-based pipeline for fast and cost-effective DNA barcoding. Molecular Ecology Resources, 18(5), 1035–1049. https://doi.org/10.1111/1755-0998.12890

Wells, J. D., & Sperling, F. A. H. (2001). DNA-based identification of forensically important Chrysomyinae (Diptera: Calliphoridae). Forensic Science International, 120(1-2), 110–115. https://doi.org/10.1016/s0379-0738(01)00414-5

Yusseff-Vanegas, S. Z., & Agnarsson, I. (2017). DNA-barcoding of forensically important blow flies (Diptera: Calliphoridae) in the Caribbean Region. PeerJ, 5, e3516. https://doi.org/10.7717/peerj.3516