Ctrl K

microGWAS

microGWAS is a snakemake-powered pipeline to carry out an end-to-end microbial GWAS analysis. Starting from genome assemblies and a phenotype file, microGWAS will run a number of associations using pyseer, annotate the associations results, and generating a number of functional enrichment tests.

2
mentions
5
contributors

Cite this software

Description

microGWAS: Bacterial Genome-Wide Association Studies

DOI Documentation Status

A comprehensive Snakemake pipeline for conducting bacterial genome-wide association studies (GWAS) on assembled genomes. microGWAS supports multiple phenotypes and provides extensive downstream analyses including functional annotation, enrichment analysis, and visualization.

Key Features

For each phenotype, the pipeline runs associations using 6 complementary approaches:

  • Individual unitigs - High-resolution genetic variants
  • Gene presence/absence - Pangenome-based associations
  • Rare variants - Gene burden testing for low-frequency variants
  • Common variants - SNP-based associations called against a reference genome
  • Gene cluster k-mers - Locus-specific associations
  • Whole genome ML - Combined unitig modeling

Additional analyses include:

  • Heritability estimation using lineage and unitig data
  • Functional enrichment analysis
  • Manhattan plots and QQ plots
  • Optional phylogenetic tree construction
  • AMR/virulence gene prediction

Getting Started

đź“– For complete instructions, visit the documentation:

https://microgwas.readthedocs.io/

Documentation

TODO

  • manhattan_plot.py : handle cases in which the reference genome has more than one chromosome (either because it has plasmids or because it is a draft genome) (enhancement issue)
  • Easily switch to poppunk for lineage computation
  • Combine all annotations in a series of webpages (enhancement issue)
  • Use /tmp directories (as implemented by snakemake) to be efficient in I/O heavy rules
  • Run QC on phenotypic data/genomes as part of bootstrapping
  • Use snakemake resources system to budget memory requirements (enhancement issue)
  • Swap unitig-counter for bifrost or cuttlefish (enhancement issue)
  • Heritability estimates using different distributions (i.e. for binary phenotypes the normal distribution is likely not appropriate?)
  • Add script to check for duplicated contigs during the bootstrap
  • Swap panaroo for ggCaller, which would also allow for the use of raw reads - ggCaller is now integrated for GFF annotation generation

Reference

Burgaya, J., Damaris, B. F., Fiebig, J., & Galardini, M. (2025). microGWAS: A computational pipeline to perform large-scale bacterial genome-wide association studies. Microbial Genomics, 11(2), 001349.

Citation

If you use microGWAS in your research, please cite the paper above and include the version DOI from Zenodo.

Authors

Marco Galardini, Judit Burgaya, Bamu F. Damaris, Jenny Fiebig

Logo of microGWAS
Keywords
Programming languages
  • Python 89%
  • Shell 6%
  • R 5%
License
</>Source code
Packages
github.com

Participating organisations

Center for Experimental and Clinical Infection Research
Medizinische Hochschule Hannover
Helmholtz Centre for Infection Research

Reference papers

Mentions

Contributors

MG
Marco Galardini
Author/Developer/Maintainer
Twincore
JBV
Judit Burgaya Ventura
BD
Bamu F Damaris
Developer
Center for Experimental and Clinical Infection Research
JF
Jenny Fiebig
AM
Alessio Masoni
Developer
Twincore

Helmholtz Program-oriented Funding IV