QUAST

QUAST is a state-of-the-art tool for (meta)genome assembly evaluation, computing over 50 quality metrics and presenting results in plain text, static plots, and interactive HTML reports.

9454
mentions
4
contributors

Cite this software

What QUAST can do for you

The current QUAST toolkit includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, QUAST-LG, the extension for large genomes (e.g., mammalians), and Icarus, the interactive visualizer for these tools.

The QUAST package works with and without reference genomes. However, it is much more informative if at least a close reference genome is provided along with the assemblies. The tool accepts multiple assemblies and is thus suitable for comparison.

This description gives a snapshot of the QUAST running instructions, output interpretation, and reported quality metrics. The online manual provides a much more detailed description of these and many other topics.

Usage

QUAST requires a 64-bit Linux or macOS machine with Python 3. The basic running command is below.

./quast.py test_data/contigs_1.fasta \
           test_data/contigs_2.fasta \
        -r test_data/reference.fasta.gz \
        -g test_data/genes.txt \
        -1 test_data/reads1.fastq.gz -2 test_data/reads2.fastq.gz \
        -o quast_test_output

Output

report.txt      summary table
report.tsv      tab-separated version, for parsing, or for spreadsheets (Google Docs, Excel, etc)  
report.tex      Latex version
report.pdf      PDF version, includes all tables and plots for some statistics
report.html     everything in an interactive HTML file
icarus.html     Icarus main menu with links to interactive viewers
contigs_reports/        [only if a reference genome is provided]
  misassemblies_report  detailed report on misassemblies
  unaligned_report      detailed report on unaligned and partially unaligned contigs
k_mer_stats/            [only if --k-mer-stats is specified]
  kmers_report          detailed report on k-mer-based metrics
reads_stats/            [only if reads are provided]
  reads_report          detailed report on mapped reads statistics

Reference-independent quality metrics

  • Number of large contigs (e.g., longer than 500 bp) and total length of them.
  • Length of the largest contig.
  • N50 (length of a contig, such that all the contigs of at least the same length together cover at least 50% of the assembly).
  • Number of predicted genes, discovered either by GeneMark.hmm (for prokaryotes), GeneMark-ES or GlimmerHMM (for eukaryotes), or MetaGeneMark (for metagenomes).

Reference-based quality metrics

  • Numbers of misassemblies of different kinds (inversions, relocations, translocations, interspecies translocations (metaQUAST only) or local).
  • Number and total length of unaligned contigs.
  • Numbers of mismatches and indels, over the assembly and per 100 kb.
  • Genome fraction %, assembled part of the reference.
  • Duplication ratio, the total number of aligned bases in the assembly divided by the total number of those in the reference. If the assembly contains many contigs that cover the same regions, its duplication ratio will significantly exceed 1. This occurs due to multiple reasons, including overestimating repeat multiplicities and overlaps between contigs.
  • Number of genes in the assembly, completely or partially covered, based on a user-provided list of gene positions in the reference.
  • NGA50, a reference-aware version of N50 metric. It is calculated using aligned blocks instead of contigs.
    Such blocks are obtained after removing unaligned regions, and then splitting contigs at misassembly breakpoints.
    Thus, NGA50 is the length of a block, such that all the blocks of at least the same length together cover at least 50% of the reference.
Logo of QUAST
Keywords
Programming languages
  • AMPL 76%
  • C++ 6%
  • Python 5%
  • C 4%
  • Perl 4%
  • JavaScript 3%
  • HTML 1%
  • Other 1%
License
</>Source code
Packages
github.com
anaconda.org
pypi.org

Participating organisations

Helmholtz Centre for Infection Research
Saarland University
Hel
Uni

Reference papers

Mentions

Contributors

AG
Alexey Gurevich
Developer and Principal Investigator
Helmholtz Institute for Pharmaceutical Research Saarland
AM
Alla Mikheenko
Core developer
University College London
PH
Pascal Hirsch

Helmholtz Program-oriented Funding IV

Research Field
Research Program
PoF Topic
3 Health
3.4 Infection Research
3.4.1 Bacterial and Viral Pathogens
3.4.3 Anti-infectives
  • 3 Health
    • 3.4 Infection Research
      • 3.4.1 Bacterial and Viral Pathogens
      • 3.4.3 Anti-infectives