Getting started

Installation

IsoMut2py can be easily installed with pip, Python’s own package manager:

pip install isomut2py

This also installs all python dependencies, but not samtools. To install samtools on your computer, follow the instructions here.

Note: If you only wish to perform postprocessing and plotting steps on a preexisting set of mutations, you can skip the installation of samtools.

Basic mutation detection on a set of 10 samples

Preparations, downloading an example dataset

As a first step we import IsoMut2py, which also compiles the C scripts used for mutation detection.

[1]:
import isomut2py as im2
Compiling C scripts...
Done.

Next, we download an example dataset of 10 samples from the DT40 cell line. (The BAM files in this dataset only contain reads that have been aligned to the first chromosome of the reference genome for faster mutation detection.)

The variable exampleDataDir stores the path to the directory where you wish to download the example dataset. Make sure to sets its value appropriately.

Note: Downloading can take a while, if you have the example data already downloaded, skip this step.
[2]:
exampleDataDir = 'isomut2py_download_dir/'
[3]:
im2.examples.download_example_data(path=exampleDataDir)
Downloading file from URL "http://genomics.hu/tools/isomut2py/isomut2py_example_dataset.tar.gz" to isomut2py_download_dir/isomut2py_example_dataset.tar.gz
File size: 2.1GiB
Downloading might take a while...
Download completed in 0 day(s), 11 hour(s), 38 min(s), 32 sec(s).
----------------------------------------
Extracting downloaded file to isomut2py_download_dir
Extracting completed in 0 day(s), 0 hour(s), 0 min(s), 59 sec(s).

And now we load the basic settings for the example dataset into a directory with:

Note: Make sure to set the value of the exampleResultsDir to a path where fairly large temporary files can be stored.
[4]:
exampleResultsDir='isomut2py_results_dir'
[5]:
param = im2.examples.load_example_mutdet_settings(example_data_path = exampleDataDir,
                                                  output_dir = exampleResultsDir)

Mutation detection

Mutation detection can be performed with the commands below.

[6]:
mutDet = im2.mutationcalling.MutationCaller(**param)
mutDet.run_isomut2_mutdet()
File no_ploidy_info does not exist. Using constant default ploidy of 2.
2018-12-05 10:30:57 - Mutation detection with IsoMut2


        2018-12-05 10:30:57 - Running IsoMut2 without local realignment:


                2018-12-05 10:30:57 - Preparations for parallelization:

                        2018-12-05 10:30:57 - Defining parallel blocks ...
                        2018-12-05 10:30:57 - Done

                2018-12-05 10:30:57 - (All output files will be written to isomut2py_results_dir)

                2018-12-05 10:30:57 - Generating temporary files, total number of blocks to run: 101

                2018-12-05 10:30:57 - Currently running: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
                2018-12-05 10:33:00 - Finished this round.

                2018-12-05 10:33:00 - Finalizing output...
                2018-12-05 10:33:00 - Cleaning up temporary files...

2018-12-05 10:33:01 - IsoMut2 finished. (0 day(s), 0 hour(s), 2 min(s), 4 sec(s).)

Further optimization

The results of the mutation detection are stored in files all_SNVs.isomut2 and all_indels.isomut2 in exampleResultsDir. These are essentially raw, unfiltered results of the mutation detection pipeline that can be refined with the optimize_results() command. For this, one has to supply a list of control samples where no unique mutations are expected to be found. (These can be starting clones of a sample group from which other samples are also available or normal samples of tumor-normal sample pairs or the same biological sample sequenced multiple times.)

In our case, file pairs (S03.bam, S05.bam) and (S08.bam, S10.bam) are two independent sequencing results of the same biological sample, thus no unique mutations should be present in any of these.

Setting these files as control samples and allowing no mutations in these samples filters other mutations as well, based on their score value:

[7]:
mutDet.optimize_results(control_samples=['S03.bam', 'S05.bam', 'S08.bam', 'S10.bam'], FPs_per_genome=0)

Interpreting the results, figure generation

Now the final set of mutations can be further investigated.

First it is generally a good idea to check the number of (unique) mutations in each investigated sample. Plotting only unique mutations allows us to check that indeed none of these were found in control samples. This can be achieved with the following command:

Note: Calling the magic command %matplotlib inline simply ensures that figures are visible in the jupyter notebook environment.
[8]:
%matplotlib inline
[10]:
f = mutDet.plot_mutation_counts(unique_only=True)
_images/getting_started_17_0.png

Given that the investigation was confined to the first chromosome, very meaningful conclusions should not be drawn from this example. The number of indels is extremely low, but in the case of SNVs it is apparent that control samples lack unique mutations, while other samples do not.

Further plotting options and additional tools for interpreting the results are discussed in Further analysis, visualization.

An already existing set of mutation caller results can also be analysed with isomut2py. On how to work with external files, see Importing external mutations.

On how to perform ploidy estimation and use the results for fine-tuning mutation detection, see Advanced ploidy estimation.