2: Core genome phylogenetic tree

Author

Richard Goodman

Published

April 24, 2025

Introduction

This tutorial will take n genome sequences and construct a SNP based core genome phylogenetic tree.

Part 1 - Variant calling in bash

1.1: Variant calling with snippy

Create a list of fasta genomes to determine SNPs

ls fastas/*.fasta > genomes.txt

cat genomes.txt

Check the list

cat genomes.txt

Use snippy to generate all SNPs.

Make a new directory

mkdir cgtree

Set reference file for SNP calculations.

REF=FSFC1386.fasta

Use sed to remove .fastq from .txt file

sed 's|fastas/||g' genomes.txt > genome_names.txt
sed -e 's/.fasta//' genomes_names.txt > genome_names.txt

Check the new .txt file containing list of genome names

cat genome_names.txt

The we use parallel on our list of genomes to run snippy:

cat genome_names.txt | parallel snippy --report --outdir cgtree/{}_snps --ref $REF --ctgs fastas/{}.fasta

This produces several files:

3.1.3: Run Snippy-core

Use snippy-core from snippy to generate core SNPs

snippy-core --ref $REF --outdir cgtree --prefix core cgtree/*_snps

This produces several files:

  • core.full.aln: The full core genome alignment in FASTA format.
  • core.aln: The core SNP alignment in FASTA format (only variable sites).
  • core.tab: A table summarizing the SNP differences.

1.2: Recombination site exclusion with gubbins

run_gubbins.py -p gubbins cgtree/clean.full.aln

1.3: SNP extraction with snp-sites

snp-sites -c cgtree/gubbins.filtered_polymorphic_sites.fasta -o cgtree/filtered_snps.fasta

Part 2 - Core genome tree construction in bash

2.1: tree construction with IQTree

# Use modelfinder first
iqtree -s filtered_snps.fasta -m MF

# Then run the code using this specific model 
iqtree -s filtered_snps.fasta -m TVM+F+ASC+R2 -keep-ident -pre REFSEQ_HS11286_Klebsiella_TVM+F+ASC+R2 -bb 1000
# or run modelfinder plus
iqtree -s filtered_snps.fasta -m MFP -keep-ident -pre core_tree_REFSEQ_HS11286_Klebsiella_MF -bb 1000

Part 3 - Visualising core genome tree

3.1: Visualisation with iTOL

For usage with iTOL (https://itol.embl.de/), the “.contree” should be used.