{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "## IsoMut2py: a python module for the comprehensive detection and analysis of genomic mutations\n", "\n", "IsoMut2py is an easy-to-use tool for the detection and postprocessing of mutations from raw NGS sequencing data. It takes sets of aligned reads (BAM files) as its input and can explore and compare the karyotypes of different samples, detect single nucleotide variations (SNVs), insertions and deletions (indels) in single or multiple samples, optimize the identified mutations whenever provided with a list of control samples, plot mutation counts and spectra on readily interpretable charts and decompose them to predefined reference signatures.\n", "\n", "IsoMut2py is an updated version of the original [IsoMut](https://github.com/genomicshu/isomut) \n", "software, mainly implemented in python. The most time-consuming parts of the workflow are \n", "however written in C. \n", "\n", "\n", "## Features\n", "\n", "- easy installation with dependencies using ``pip``\n", "- **karyotype exploration** for a single sample, using a Bayesian approach\n", "- **karyotype comparison** between sample pairs, for a naive identification of CNVs\n", "- **karyotype plots**, **coverage histograms**\n", "- **SNV** and **indel** detection in **single or multiple samples**\n", "- detection of both **unique and shared mutations**\n", "- refined mutation detection based on local ploidy information\n", "- **automatic optimization** based on the user-defined list of control samples \n", "with easily interpretable figures as sanity checks\n", "- option for loading and filtering a preexisting set of mutations\n", "- **basic hierarchical clustering** of samples based on the number of shared mutations\n", "- **plots of SBS, DBS and ID spectra** [source](https://www.biorxiv.org/content/early/2018/05/15/322859)\n", "- **decomposition of SBS, DBS and ID spectra** to a mixture of reference signatures\n", "using expectation maximization\n", "- **signature composition plots**\n", "- straightforward querying of details of samples in mutated positions\n", "\n", "## Dependencies\n", "\n", "1. **samtools**: In order to use the functions for mutation calling or ploidy estimation, \n", "samtools needs to be installed. However, plotting and filtering of mutations is available \n", "without samtools.\n", "2. pandas\n", "3. numpy\n", "4. scipy\n", "5. matplotlib \n", "6. pymc3\n", "7. theano \n", "8. seaborn \n", "9. biopython\n", "\n", "Other than **samtools**, all dependencies can be automatically installed using ``pip``.\n", "\n", "## Authors\n", "\n", "Most of the code has been written by Orsolya Pipek, although the C code directly \n", "inherited from the original [IsoMut](https://github.com/genomicshu/isomut) software \n", "has been written by Dezso Ribli.\n", "\n", "The whole project was done in collaboration of:\n", "\n", "- Department of Physics of Complex Systems, Eotvos Lorand University \n", "(Orsolya Pipek, Dezso Ribli, Istvan Csabai)\n", "- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of \n", "Sciences (Adam Poti, David Szuts)\n", "- Center for Biological Sequence Analysis, Department of Systems Biology, \n", "Technical University of Denmark (Zoltan Szallasi)\n", "\n", "Implementation of the Fisher's exact test in C was borrowed from \n", "[Christopher Chang](https://github.com/chrchang/stats).\n", "\n", "## How to cite\n", "\n", "Coming soon." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 2 }