Revised Analysis Workflow

ONT CRISPR amplicon sequencing of Pieris brassicae NSP gene — updated pipeline with read trimming (fastp) and post-trim QC

Oxford Nanopore CRISPR-Cas9 fastp minimap2 IGV
← Back to original overview

What changed?

The revised workflow adds a read trimming/cleaning step using fastp after initial QC, followed by a second round of QC on the cleaned reads. This ensures that adapter remnants, low-quality bases, and short fragments are removed before alignment, improving mapping rates and indel calling accuracy.

7
Pipeline steps (was 5)
+fastp
Read cleaning & trimming added
QC rounds (before & after trimming)

Analysis Workflow

1
MinION Sequencing

R10.4.1 flow cell, HAC basecalling

2
Raw QC

FastQC on raw reads

3
Trim (fastp)

Adapter & quality trimming

4
Post-trim QC

FastQC on cleaned reads

5
Align

minimap2 to reference

6
BAM Sort & Index

samtools sort + index

7
Indel Analysis

Visual inspection in IGV

Step-by-Step Commands

All commands assume the following directory structure and variables:

# --- Set paths ---
OUTDIR="/Volumes/KINGSTON/ONTsequencing/ONT_testing"
FASTQ_BASE="/Volumes/KINGSTON/ONTsequencing/NSP_500bp_trial/NSP_Pnapi_500bp/20260305_1613_MD-103872_FBF81064_b0e09933/fastq_pass"
REF="$OUTDIR/NSP_genomic_ref.fasta"
REF_MMI="$OUTDIR/NSP_genomic_ref.mmi"

1 MinION Sequencing

Data was generated on an Oxford Nanopore MinION with an R10.4.1 flow cell. Basecalling used the HAC model (dna_r10.4.1_e8.2_400bps_hac@v5.2.0) with built-in demultiplexing into 24 barcode directories.

Raw reads are located at:

$FASTQ_BASE/barcode01/*.fastq.gz
$FASTQ_BASE/barcode02/*.fastq.gz
...
$FASTQ_BASE/barcode24/*.fastq.gz

2 Raw Read QC (FastQC)

Run FastQC on the raw reads to assess base quality, read length distributions, and adapter content before any trimming.

# Run FastQC on raw reads for a single barcode
fastqc "$FASTQ_BASE/barcode01"/*.fastq.gz -o "$OUTDIR/fastqc_raw" --threads 2

# Loop over all barcodes
mkdir -p "$OUTDIR/fastqc_raw"
for i in $(seq -w 1 24); do
  fastqc "$FASTQ_BASE/barcode${i}"/*.fastq.gz -o "$OUTDIR/fastqc_raw" --threads 2
done

Check the HTML reports for per-base quality scores, read length distributions, and any adapter contamination flags.

3 Read Trimming & Cleaning (fastp) New

Use fastp to trim adapters, remove low-quality bases, and filter out short reads. fastp auto-detects adapters and provides its own HTML/JSON quality reports.

Install

# Install via Homebrew (macOS)
brew install fastp

# Or via conda
conda install -c bioconda fastp

Single barcode

# Concatenate multiple fastq files per barcode if needed, then run fastp
mkdir -p "$OUTDIR/trimmed"

# For a single barcode with one or more input files:
cat "$FASTQ_BASE/barcode01"/*.fastq.gz > "$OUTDIR/trimmed/barcode01_raw_merged.fastq.gz"

fastp \
  --in1 "$OUTDIR/trimmed/barcode01_raw_merged.fastq.gz" \
  --out1 "$OUTDIR/trimmed/barcode01_trimmed.fastq.gz" \
  --html "$OUTDIR/trimmed/barcode01_fastp.html" \
  --json "$OUTDIR/trimmed/barcode01_fastp.json" \
  --qualified_quality_phred 10 \
  --length_required 200 \
  --thread 4

All barcodes (loop)

mkdir -p "$OUTDIR/trimmed"

for i in $(seq -w 1 24); do
  BC="barcode${i}"

  # Merge raw fastqs per barcode
  cat "$FASTQ_BASE/$BC"/*.fastq.gz > "$OUTDIR/trimmed/${BC}_raw_merged.fastq.gz"

  # Run fastp
  fastp \
    --in1 "$OUTDIR/trimmed/${BC}_raw_merged.fastq.gz" \
    --out1 "$OUTDIR/trimmed/${BC}_trimmed.fastq.gz" \
    --html "$OUTDIR/trimmed/${BC}_fastp.html" \
    --json "$OUTDIR/trimmed/${BC}_fastp.json" \
    --qualified_quality_phred 10 \
    --length_required 200 \
    --thread 4

  # Clean up merged raw file to save space
  rm "$OUTDIR/trimmed/${BC}_raw_merged.fastq.gz"
done

Key fastp parameters

FlagValuePurpose
--qualified_quality_phred10Min base quality to keep (Q10; suitable for ONT data)
--length_required200Discard reads shorter than 200 bp (amplicon is ~500 bp)
--thread4Parallel processing threads
--html / --jsonPer-barcode QC reports from fastp

Adapter detection is automatic. For ONT data, consider also using --cut_front and --cut_tail to trim low-quality stretches at read ends, or adjust --qualified_quality_phred depending on your basecaller accuracy.

4 Post-Trim QC (FastQC) New

Run FastQC again on the trimmed reads to confirm that quality has improved and problematic reads were removed. Compare these reports to the raw QC from Step 2.

mkdir -p "$OUTDIR/fastqc_trimmed"

for i in $(seq -w 1 24); do
  fastqc "$OUTDIR/trimmed/barcode${i}_trimmed.fastq.gz" \
    -o "$OUTDIR/fastqc_trimmed" --threads 2
done

Things to check:

  • Per-base quality scores should be more uniform after trimming
  • Read length distribution should center around ~500 bp with short reads removed
  • Adapter content warnings should be resolved
  • Compare total read counts before/after to assess filtering loss

5 Alignment (minimap2)

Align the trimmed reads to the 699 bp genomic reference using minimap2 with the ONT preset. The --MD flag adds MD tags for accurate variant display in IGV.

# Build minimap2 index (if not already done)
minimap2 -d "$REF_MMI" "$REF"

# Align a single barcode
minimap2 -a -x map-ont --MD "$REF_MMI" \
  "$OUTDIR/trimmed/barcode01_trimmed.fastq.gz" \
  > "$OUTDIR/barcode01_aligned.sam"

We write SAM first here to keep alignment and BAM processing as separate steps, but they can be piped together (see Step 6).

6 BAM Sorting & Indexing (samtools)

Convert SAM to sorted BAM and create an index for fast random access in IGV.

# Sort and convert to BAM
samtools sort -o "$OUTDIR/barcode01_aligned.bam" "$OUTDIR/barcode01_aligned.sam"

# Index the BAM
samtools index "$OUTDIR/barcode01_aligned.bam"

# Clean up SAM
rm "$OUTDIR/barcode01_aligned.sam"

Combined pipeline (all barcodes)

For efficiency, pipe alignment directly into sorting:

for i in $(seq -w 1 24); do
  BC="barcode${i}"

  minimap2 -a -x map-ont --MD "$REF_MMI" \
    "$OUTDIR/trimmed/${BC}_trimmed.fastq.gz" 2>/dev/null | \
    samtools sort -o "$OUTDIR/${BC}_aligned.bam" -

  samtools index "$OUTDIR/${BC}_aligned.bam"
done

Quick alignment stats

# Check mapping rate and read counts per barcode
for i in $(seq -w 1 24); do
  echo -n "barcode${i}: "
  samtools flagstat "$OUTDIR/barcode${i}_aligned.bam" 2>/dev/null | head -1
done

7 Indel Analysis (IGV)

Open the aligned BAMs in IGV to visually inspect CRISPR-induced indels at the sgRNA cut site.

Loading files in IGV

  1. Genomes > Load Genome from File → select NSP_genomic_ref.fasta
  2. File > Load from File → select one or more barcodeXX_aligned.bam files
  3. Navigate to position ~230 in the reference — this is the sgRNA cut site

IGV settings for indel visualization

  • Color alignments by:insert size and pair orientation or read strand
  • Sort alignments by:base (at the cut site position) to group indel alleles
  • Show soft-clipped bases to see if reads extend beyond expected amplicon boundaries
  • Right-click track → Collapsed or Squished view for high-coverage samples
  • Look for consistent insertion/deletion patterns clustered at position ~230

What to look for

  • Deletions appear as horizontal black lines spanning the cut site
  • Insertions appear as purple marks (I) at the cut position
  • Wild-type reads align cleanly through the cut site with no gaps
  • Editing efficiency = proportion of reads with indels vs. wild-type at the sgRNA target

Reference Region

The reference is a 699 bp region from P. brassicae chromosome 5 (NC_059669.1:14882048-14885601) containing the first coding exon of the NSP gene.

5'─────────────────────────────────────────────────────────────────────────3' │ │ │ │ │ Fwd primer ──► │ │ ◄── Rev primer │ │ ╔══════════ Exon 1 (276 bp) ══════════╗ │ │ │ │ ║ ATG... ...║ │ │ │ │ ╚═════════════════════════════════════╝ │ │ │ │ ▼▼▼ sgRNA_2 cut site ▼▼▼ │ │ │ │ CTTCTTAGCCCTCGCGGCTTTGG │ │ 0 ~60 ~230 ~600 699

Software

ToolVersionPurpose
minimap22.30ONT read alignment (-x map-ont)
samtools1.23BAM sorting, indexing, stats
FastQC0.12.1Read quality assessment (pre- and post-trim)
fastp0.24+Read trimming, adapter removal, quality filtering
IGVVisual inspection of alignments and indels