microGWAS
microGWAS is a snakemake-powered pipeline to carry out an end-to-end microbial GWAS analysis. Starting from genome assemblies and a phenotype file, microGWAS will run a number of associations using pyseer, annotate the associations results, and generating a number of functional enrichment tests.
Cite this software
Description
microGWAS: Bacterial Genome-Wide Association Studies
A comprehensive Snakemake pipeline for conducting bacterial genome-wide association studies (GWAS) on assembled genomes. microGWAS supports multiple phenotypes and provides extensive downstream analyses including functional annotation, enrichment analysis, and visualization.
Key Features
For each phenotype, the pipeline runs associations using 6 complementary approaches:
- Individual unitigs - High-resolution genetic variants
- Gene presence/absence - Pangenome-based associations
- Rare variants - Gene burden testing for low-frequency variants
- Common variants - SNP-based associations called against a reference genome
- Gene cluster k-mers - Locus-specific associations
- Whole genome ML - Combined unitig modeling
Additional analyses include:
- Heritability estimation using lineage and unitig data
- Functional enrichment analysis
- Manhattan plots and QQ plots
- Optional phylogenetic tree construction
- AMR/virulence gene prediction
Getting Started
đź“– For complete instructions, visit the documentation:
https://microgwas.readthedocs.io/
Documentation
- Installation Guide
- Usage Instructions
- Beginner's Tutorial
- Input Requirements
- Output Description
- Testing
TODO
- manhattan_plot.py : handle cases in which the reference genome has more than one chromosome (either because it has plasmids or because it is a draft genome) (enhancement issue)
- Easily switch to poppunk for lineage computation
- Combine all annotations in a series of webpages (enhancement issue)
- Use
/tmpdirectories (as implemented by snakemake) to be efficient in I/O heavy rules - Run QC on phenotypic data/genomes as part of bootstrapping
- Use snakemake resources system to budget memory requirements (enhancement issue)
- Swap
unitig-counterforbifrostorcuttlefish(enhancement issue) - Heritability estimates using different distributions (i.e. for binary phenotypes the normal distribution is likely not appropriate?)
- Add script to check for duplicated contigs during the bootstrap
- Swap panaroo for ggCaller, which would also allow for the use of raw reads - ggCaller is now integrated for GFF annotation generation
Reference
Citation
If you use microGWAS in your research, please cite the paper above and include the version DOI from Zenodo.
Authors
Marco Galardini, Judit Burgaya, Bamu F. Damaris, Jenny Fiebig
Participating organisations
Reference papers
Mentions
- 1.Author(s): Kuangyi Charles Wei, Beth Blane, Jacqueline Toussaint, Sandra Reuter, Michelle S. Toleman, Estee Torok, Sharon J. Peacock, Ewan M Harrison, Dinesh Aggarwal, William Roberts-SengierPublished in 202510.1101/2025.02.28.640835
- 2.Author(s): Norelle L. Sherry, Jean Y. H. Lee, Stefano G. Giulieri, Christopher H. Connor, Kristy Horan, Jake A. Lacey, Courtney R. Lane, Glen P. Carter, Torsten Seemann, Adrian Egli, Timothy P. Stinear, Benjamin P. HowdenPublished in 202510.1128/aac.01082-24