
Here is my code for producing ANI and SNP distance matrices for 4 E. coli genomes. However, the same code can be applied to n bacterial genomes.
This tutorial provides code for taking n genome sequences and running algorithms to determine average nucleotide identities (ANI) and core genome single nucleotide polymorphisms (SNPs), visualising the distances as heatmaps in python. It uses fastANI for ANI, snippy and snp-dists for SNP distances, and seaborn and matplotlib in python to visualise the distances as heatmaps.
ANI and SNP distance matrices are useful for comparing the relatedness of genomes. This is important in determining whether isolates are clonal in outbreaks or the relatedness of ancestor and progeny in evolution studies.
Determining ANI thresholds whereby two or more isolates are clonal or not clonal is a difficult task.
Rodriguez et. al (2024) analysed 18,123 genomes to determine where the thresholds lay which distinguished certain taxonomic ranks:
Watt et. al (2025) analysed 5471 Escherichia coli genome sequences from different One Health sectors in Australia and concluded that a threshold of ≤ 100 SNPs detected cross-source linkage of isolates.
Yet these aren’t always definitive thresholds, the gain of a large plasmid could lower the ANI <99.99% even though the chromosome would be identical. There are many biological phenomena which cause a clonal population to diverge such as:
The combination of ANI % and SNP distances provides more information about the differences between isolates, which can help answer the question about clonality more fully. However, to more confidently determine clonality, I would suggest further analyses including: