maftools is a comprehensive toolkit for processing somatic variants from cohort-based cancer genomic studies
Problem: Analyzing somatic variants from large cancer patient cohorts can be cumbersome due to the involvement of numerous file formats and software tools. This often results in a classic situation of “Too many cooks spoil the broth,” significantly affecting scientific reproducibility.
Solution: maftools offers an elegant solution by providing over 80 functions to perform the most commonly required tasks in cancer genomics, using MAF as the only input file type.
Additionally, maftools can handle sequencing alignment BAM files for copy-number analysis, sample mismatch/relatedness analysis, and rapid genotyping of known cancer hotspot variants. Moreover, the package is lightweight and requires approximately 15 core dependencies.
Complete documentation of maftools using the TCGA acute myeloid leukemia cohort as a case study can be found here.
maftools is extremely easy to use, starting with importing an MAF file along with associated clinical data. Once the data is successfully imported, the resulting MAF object can be passed to various functions. Key applications include:
Besides the MAF files, maftools can handle sequencing alignment BAM files, copy number output from GISTIC and mosdepth. Please refer to the package documentation sections below to learn more.
Moreover, analyzing all 33 TCGA cohorts along with the harmonized clinical data is a breeze. A single command tcgaLoad will import the desired TCGA cohort thereby avoiding costly time spent on data mining from public databases.