DiMmer

Starting the Program

Once Java 8 is installed in your computer, a double-click at dimmer.jar will be enough to start the application. However, the input files are normally too large for the standard Java parameters, such that we have to increase the amount of memory Java is allowed to consume. Use the console of your operating system (Linux/Mac: bash or similar, Windows: cmd) and navigate to the selected download folder of DiMmer and type:
java -Xmx4096M -jar dimmer.jar
The parameter Xmx is used for increasing the allowed memory usage of DiMmer. In case DiMmer still aborts with a out-of-memory message, try increasing this value accordingly. The amount of memory required is determined by the size of your experiment.
Example: For the Infinium HumanMethylation450 BeadChip with 60 patients, a value of -Xmx3024M is sufficient.

Input files

In order to perfrom an analysis with DiMmer, you have to make sure that you have a valid sample annotation file (click here for an example file) reflecting your experiment. The file format of the annotation file format is very simple; it is just a comma-separated file with at least wo columns linking the probe to the IDAT file:

Sentrix_ID
Sentrix_Position

These columns allow DiMmer to link each probe to the corresponding IDAT file (that is the actual methylation data). Each Sentrix_ID will be a directory and the combination of Sentrix_ID and Sentrix_Position represents the IDAT filename.

Optionally, you can provide three additional standard columns:

Group_ID
Pair_ID
Gender_ID

Group_ID splits your data into test and control group. The column Pair_ID is used to identify pairs of connected samples, e.g. when performing a twin study, each pair of twins is identified by the same number. The column Gender_ID encodes for the sexes of your sample.

In cases where these three columns are omitted, DiMmer assumes a specific order of your samples in the annotation file in order to separate test and control group (in the middle of your dataset) and for identifying the connect pairs in the case of a paired study (each sample is followed by its paired sample). If the column Gender_ID is missing, DiMmer performs an automatic gender detection.

Additionally, you may add an arbitrary number of columns representing the nature of your experiment. You might define one column for each phenotype of interest and for each known co-factors.

Data type

In this step, you select whether your data is paired (e.g. twin data) or unpaired. This decision has direct influence on the following permutation tests. In case of paired data, the permutation randomly swaps samples with same pair_ID (in case provided) whereas the unpaired permutation shuffles all patient/control labels without repetition.

Available models

Two different methods can be applied for computing the statistical significance of the CpGs. For simple case-control studies, the t-test is the preferred method as it is considerably faster than the more complex regression. You can choose between a left-sided, right-sided t-test if you want to investigate under, upper methylated regions or both respectively. The linear regression (LR) can also be applied for binary data, however, it is intended to be utilized to compute statistical significance of continuous data with multiple labels. The use of linear regression also allows for the cell composition estimation, which can be used to correct the p-values of your phenotype of interest.

Permutation test

One cannot draw any conclusion solely based on the p-Value derived by the aforementioned t-test or regression of a CpG. This is the reason we perform permutation tests estimating the statistical significance of the CpGs. That means, we randomly assign case-control labels to the samples and perform the same statistical test as above. This results in a distribution of permuted p-values allowing to judge the significance of the original p-values. Again, permutations using the t-test are considerably faster compared to the ones using LR. A larger number of permutations leads to a more precise estimation of the significance. As this might take some time, you should allow DiMmer to utilize several CPUs at once. Therefore, set the number of threads to a values between 1 and the number of CPUs of your computer. Furthermore, the DiMmer also allows for correcting for multiple testing. In total, DiMmer automatically provides the user with four different p-Value variances:

Empirical p-values
False Discovery Rate (FDR)
Family-Wise Error Rate (FWER)
Step-Down minP.

These corrected p-values are all available after the permutation test. When starting to discovering differentially methylated regions (DMRs), you can choose either one these p-values, but we recommend to use the least stringent test (the empirical p-Value) as the DMR search does not require a correction for multiple testing.

DMR search

A DMR, differentially methylated region, is a group of differentially methylated CpGs occurring in close proximity to each other. DiMmer defines a DMR as a sequence of consecutive differentially methylated CpGs with a smaller genomic distance of 1000 base-pairs. You can change the maximal acceptable distances of the CpGs in the field Max. CpG distance. A CpG is regarded as methylated when the chosen p-Value is below a given threshold which can be set in the p-value cutoff field. You have to select which p-values should be used (original p-value empirical p-values, FDR, FWER or step-down minP).

In the field window size you can specify the minimal number of consecutive differentially methylated CpGs your require for a region. The default value is 5.

Furthermore, you can define how many exceptions (i.e., not differentially methylated CpGs) you allow within the search window defined above. Default value is 2 and can be changed in the field number of exceptions.