Functions for plotting

Most of these functions can be called as methods of a MutationCaller object, but to make it more straightforward to analyse any set of (external) mutations, they can also be called individually.

isomut2py.plot.generate_HTML_report_for_ploidy_est(chromosomes, output_dir, min_noise=nan)

Generates a HTML file with figures displaying the results of ploidy estimation and saves it to output_dir/PEreport.html.

Parameters:
  • chromosomes – list of chromosomes in the genome (list of str)
  • output_dir – the path to the directory where PE_fullchrom_[chrom].txt files are located (str)
  • min_noise – the minimal B-allele frequency for a position to be included in the analyses (default: numpy.nan) (float)
isomut2py.plot.plot_DNV_heatmap(matrixDict, return_string=False, normalize_to_1=False)

Plot DNVs (dinucleotide variations) as a heatmap for a database of mutations.

Parameters:
  • matrixDict – a dictionary containing 12x12 element matrices as values and sample names as keys (dictionary)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: False) (bool)
  • normalize_to_1 – If True, results are plotted as percentages, instead of counts. (default: False) (bool)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, nothing.

isomut2py.plot.plot_DNV_spectrum(spectrumDict, return_string=False, normalize_to_1=False)

Plots the DNV spectrum, given a dictionary containing the spectra as values.

Parameters:
  • spectrumDict – a dictionary containing DNV spectra as values and sample names as keys (dictionary)
  • normalize_to_1 – If True, results are plotted as percentages, instead of counts. (default: False) (bool)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: False) (bool)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a list of matplotlib figures.

isomut2py.plot.plot_SNV_spectrum(spectrumDict, return_string=False, normalize_to_1=False)

Plots the triplet spectrum for a list of 96-element vectors defined in spectrumDict.

Parameters:
  • spectrumDict – a dictionary containing spectra as values and sample names as keys (dictionary)
  • normalize_to_1 – If True, results are plotted as percentages, instead of counts. (default: False) (bool)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: False) (bool)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a list of matplotlib figures.

isomut2py.plot.plot_coverage_distribution(cov_sample=None, chromosomes=None, output_dir=None, cov_max=None, cov_min=None, distribution_dict=None)

Plot the coverage distribution of the sample.

Parameters:
  • cov_sample – a sample of the coverage distribution (default: None) (array-like)
  • chromosomes – the list of chromosomes in the genome (default: None) (list of str)
  • output_dir – the path to the directory where PE_fullchrom_[chrom].txt files are located (default: None) (str)
  • cov_max – the maximum value for the coverage for a position to be included on the plot (default: None) (int)
  • cov_min – the minimum value for the coverage for a position to be included on the plot (default: None) (int)
  • distribution_dict – a dictionary containing the fitted parameters of the coverage distribution (default: None) (dictionary with keys: ‘mu’, ‘sigma’, ‘p’)
isomut2py.plot.plot_hierarchical_clustering(sample_names=None, mutations_dataframe=None, mutations_filename=None, output_dir=None, return_string=False, method='average')

Generates a heatmap based on the number of shared mutations found in all possible sample pairs. A dendrogram is also added that is the result of hierarchical clustering of the samples.

Parameters:
  • mutations_dataframe – The dataframe containing the mutations. (default: None) (pandas.DataFrame)
  • sample_names – list of samples names to plot mutation counts for (default: None) (list of str)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: False) (bool)
  • mutations_filename – The path to the file, where mutations are stored, if the mutations attribute of the object does not exist, its value will be set to the file defined here. (default: None) (str)
  • output_dir – the path to the directory where mutation tables are located (default: None) (str)
  • method – method used for seaborn hierarchical clustering (default: ‘average’) (“single”, “complete”, “average”, “weighted”, “median”, “ward”)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a list of matplotlib figures.

isomut2py.plot.plot_indel_spectrum(spectrumDict, return_string=False, normalize_to_1=False)

Plots the indel spectrum, given a dictionary containing 83-element vectors as values.

Parameters:
  • spectrumDict – a dictionary containing spectra as values and sample names as keys (dictionary)
  • normalize_to_1 – If True, results are plotted as percentages, instead of counts. (default: False) (bool)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: True) (bool)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a list of matplotlib figures.

isomut2py.plot.plot_karyotype_for_all_chroms(chromosomes, output_dir, return_string=False)

Plots karyotype information (coverage, estimated ploidy, estimated LOH, reference base frequencies) about the sample for all analysed chromosomes.

Parameters:
  • chromosomes – the list of chromosomes to plot (list of str)
  • output_dir – the path to the directory where PE_fullchrom_[chrom].txt files are stored. (str)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: True) (bool)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, nothing.

isomut2py.plot.plot_karyotype_for_chrom(chrom, df, return_string=True)

Plots karyotype information (coverage, estimated ploidy, estimated LOH, reference base frequencies) about the sample for a given chromosome.

Parameters:
  • chrom – The chromosome to plot. (str)
  • df – The dataframe containing ploidy and LOH information. (pandas.DataFrame)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: True) (bool)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a matplotlib figure.

isomut2py.plot.plot_karyotype_summary(haploid_coverage, chromosomes, chrom_length, output_dir, bed_filename, bed_file_sep=', ', binsize=1000000, overlap=50000, cov_min=5, cov_max=200, min_PL_length=3000000, chroms_with_text=None)

Plots karyotype summary for the whole genome with data preparation.

Parameters:
  • haploid_coverage – the average coverage of haploid regions (or the half of that of diploid regions)
  • chromosomes – list of chromosomes in the genome (list of str)
  • chrom_length – list of chromosome lengths (list of int)
  • output_dir – the path to the directory where PE_fullchrom_[chrom].txt files are located (str)
  • bed_filename – the path to the bed file of the sample with ploidy and LOH information (str)
  • bed_file_sep – bed file separator (default: ‘,’) (str)
  • binsize – the binsize used for moving average (default: 1000000) (int)
  • overlap – the overlap used for moving average (default: 50000) (int, smaller than binsize)
  • cov_min – the minimum coverage for a position to be included (default: 5) (int)
  • cov_max – the maximum coverage for a position to be included (default: 2000) (int)
  • min_PL_length – the minimal length of a region to be plotted (default: 3000000) (int)
  • chroms_with_text – the list of chromosomes to be indicated with text on the plot (list of str) (If there are many short chromosomes or they have long names, it is useful to only indicate a few with text on the plot.)
Returns:

a matplotlib figure

isomut2py.plot.plot_mutation_counts(sample_names=None, mutations_dataframe=None, unique_only=False, return_string=False, mutations_filename=None, output_dir=None, control_samples=None)

Plots the number of mutations found in all the samples in different ploidy regions.

Parameters:
  • mutations_dataframe – The dataframe containing the mutations. (default: None) (pandas.DataFrame)
  • sample_names – list of samples names to plot mutation counts for (default: None) (list of str)
  • unique_only – If True, only unique mutations are plotted for each sample. (default: False) (boolean)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: False) (bool)
  • mutations_filename – The path to the file, where mutations are stored, if the mutations attribute of the object does not exist, its value will be set to the file defined here. (default: None) (str)
  • output_dir – the path to the directory where mutation tables are located (default: None) (str)
  • control_samples – List of sample names that should be used as control samples in the sense, that no unique mutations are expected in them. (The sample names listed here must match a subset of the sample names listed in bam_filename.) (list of str)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a list of matplotlib figures.

isomut2py.plot.plot_rainfall(mutations_dataframe, chromosomes=None, chrom_length=None, ref_fasta=None, sample_names=None, return_string=False, muttypes=['SNV', 'INS', 'DEL'], unique_only=True, plot_range=None)

Plot a rainfall plot of the mutations. The horizontal axis is the genomic position of each mutation and the vertical axis is the genomic difference measured from the previous mutation.

Parameters:
  • mutations_dataframe – the pandas.DataFrame containing all mutations (pandas.DataFrame)
  • chromosomes – a list of chromosomes to be plotted (default: None) (list of str)
  • chrom_length – a list of chrom lengths (default: None) (list of int)
  • ref_fasta – the path to the reference fasta file (default: None) (str)
  • sample_names – the list of sample names to be plotted (default: None) (str)
  • return_string – If True, only a temporary plot is generated and its base64 code is returned, that can be included in HTML files. (default: False) (bool)
  • muttypes – the list of mutation types to be plotted (default: [“SNV”, “INS”, “DEL”]) (any elements of the default list)
  • unique_only – If True, only unique mutations are plotted for each sample. (default: False) (boolean)
  • plot_range – the genomic range to be plotted (default: None, the whole genome is plotted) (str, example: “chr9:123134-143441414”)
Returns:

If the return_string value is True, a base64 encoded string of the image. Otherwise, a list of matplotlib figures.