methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing.
Build Status
| Github | |
| Bioc Release | |
| Bioc Devel | 
methylKit is an R package
for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The
package is designed to deal with sequencing data from
RRBS and its variants,
but also target-capture methods such as Agilent SureSelect
methyl-seq.
In addition, methylKit can
deal with base-pair resolution data for 5hmC obtained from Tab-seq or oxBS-seq. It can also
handle whole-genome bisulfite sequencing data if proper input format is provided.
You can subscribe to our googlegroups page to get the latest information about new releases and features (low-frequency, only updates are posted)
To ask questions please use methylKit_discussion forum
You can also check out the blogposts we make on using methylKit
in R console,
library(devtools)
install_github("al2na/methylKit", build_vignettes=FALSE, 
  repos=BiocManager::repositories(),
  dependencies=TRUE)
if this doesn't work, you might need to add type="source" argument.
library(devtools)
install_github("al2na/methylKit", build_vignettes=FALSE, 
  repos=BiocManager::repositories(),ref="development",
  dependencies=TRUE)
if this doesn't work, you might need to add type="source" argument.
Typically, bisulfite converted reads are aligned to the genome and % methylation value per base is calculated by processing alignments. methylKit takes that  % methylation value per base information as input. Such input file may be obtained from AMP pipeline for aligning RRBS reads. A typical input file looks like this:
chrBase	chr	base	strand	coverage	freqC	freqT
chr21.9764539	chr21	9764539	R	12	25.00	75.00
chr21.9764513	chr21	9764513	R	12	0.00	100.00
chr21.9820622	chr21	9820622	F	13	0.00	100.00
chr21.9837545	chr21	9837545	F	11	0.00	100.00
chr21.9849022	chr21	9849022	F	124	72.58	27.42
chr21.9853326	chr21	9853326	F	17	70.59	29.41
methylKit reads in those files and performs basic statistical analysis and annotation for differentially methylated regions/bases. Also a tab separated text file with a generic format can be read in, such as methylation ratio files from BSMAP, see here for an example. Alternatively, read.bismark function can read SAM file(s) output by Bismark(using bowtie/bowtie2) aligner (the SAM file must be sorted based on chromosome and read start). The sorting must be done by unix sort or samtools, sorting using other tools may change the column order of the SAM file and that will cause an error.
Below, there are several options showing how to do basic analysis with methylKit.
Annotation files in BED format are needed for annotating your differentially methylated regions. You can download annotation files from UCSC table browser for your genome of interest. Go to [http://genome.ucsc.edu/cgi-bin/hgGateway]. On the top menu click on "tools" then "table browser". Select your "genome" of interest and "assembly" of interest from the drop down menus. Make sure you select the correct genome and assembly. Selecting wrong genome and/or assembly will return unintelligible results in downstream analysis.
From here on you can either download gene annotation or CpG island annotation.
In addition, you can check this tutorial to learn how to download any track from UCSC in BED format (http://www.openhelix.com/cgi/tutorialInfo.cgi?id=28)
The most recent version of the R script in the Genome Biology manuscript is here.
If you used methylKit please cite:
If you used flat-file objects or over-dispersion corrected tests please consider citing:
and also consider citing the following publication as a use-case with specific cutoffs: