rna_majiq.PsiCoverage

class rna_majiq.PsiCoverage(df, events)

Summarized raw and bootstrap coverage over LSVs for one or more experiments.

Summarized raw and bootstrap coverage over LSVs for one or more experiments as input for quantification. Coverage is a total readrate over all bins, excluding stacks and after any preceding batch correction steps, ready for quantification. Per-experiment coverage stored independently over “prefix” dimension, where prefixes originate as the prefix from BAM file names (i.e. foo/experiment1.bam -> experiment1). Coverage is accompanied by boolean array indicating whether an event is “passed for quantification” for each experiment (PsiCoverage.passed).

Provides functionality for combining and summarizing over experiments and multiple PsiCoverage objects. Functions and attributes enable computation of PSI posterior statistics under MAJIQ models for splicing quantification. Computations are performed over xarray objects. When loading PsiCoverage from Zarr files, data/computations will be loaded/performed lazily using Dask. Testing of these computations have been performed over local clusters using threads rather than processes (expensive computations generally release the GIL).

Generally, for point estimates of location, quantification with raw coverage should be preferred, as bootstrap estimates converge very closely to raw estimates as the number of bootstrap replicates becomes large. For estimates of variability, quantification with bootstrap coverage should be used to account for additional per-bin readrate variability that isn’t fully captured by the Bayesian model on its own.

Underlying coverage is stored as the total number of reads over the event and the proportion of reads per intron/junction. This requires twice the uncompressed memory vs the number of reads per intron/junction, but permits easier lazy computation with Dask over large datasets.

Parameters:
  • df (xarray.Dataset) – Required variables/coordinates as in EXPECTED_VARIABLES

  • events (xarray.Dataset) – dataset that can be loaded along with matching introns/junctions as Events

See also

PsiCoverage.from_sj_lsvs

Create PsiCoverage from SJExperiment and Events

PsiCoverage.from_events_coverage

Create PsiCoverage from EventsCoverage

PsiCoverage.from_zarr

Load PsiCoverage from one or more Zarr files

PsiCoverage.updated

Create updated PsiCoverage with updated arrays

PsiCoverage.sum

Summed PsiCoverage over current prefixes

PsiCoverage.mask_events

Create updated PsiCoverage passing only specified events

PsiCoverage.__getitem__

Get PsiCoverage for subset of prefixes

__init__(df, events)

Initialize PsiCoverage with specified xarray datasets

Parameters:
  • df (xarray.Dataset) – Required variables/coordinates as in EXPECTED_VARIABLES

  • events (xarray.Dataset) – dataset that can be loaded along with matching introns/junctions as Events

Methods

__init__(df, events)

Initialize PsiCoverage with specified xarray datasets

approximate_cdf(x, **indexer_kwargs)

Compute cdf of approximate/smoothed bootstrapped posterior

approximate_discretized_pmf([nbins, ...])

Compute discretized PMF of approximate/smoothed bootstrap posterior

approximate_quantile([quantiles])

Compute quantiles of approximate/smoothed bootstrapped posterior

approximate_stats(labels[, quantiles, ...])

Statistics on approximate posterior means and psisamples.

bootstrap_cdf(x, **indexer_kwargs)

Compute cdf of mixture of bootstrapped posterior distribution

bootstrap_discretized_pmf([nbins, ...])

Compute discretized PMF of bootstrap posterior mixture

bootstrap_psi_mean_population_quantile([...])

empirical quantiles over prefixes of bootstrap_psi_mean

bootstrap_quantile([quantiles])

Compute quantiles of mixture of bootstrapped posterior distributions

bootstrap_stats(labels[, quantiles, ...])

Statistics on bootstrap posterior means and psisamples.

concat(*objs[, override_args, update_kwargs])

Concatenate multiple instances of class into single one

convert_sj_batch(sjs, lsvs, path[, ...])

Load PsiCoverage from sj paths, save to single output path

dataset([properties, quantiles, psibins, ...])

Extract selected properties into single xr.Dataset

downsample(num_prefixes[, rng])

Get random subset with exactly num_prefixes prefixes

drop_prefixes(prefixes)

events_to_zarr(path, mode[, consolidated])

Save events information to specified path

from_events_coverage(events_coverage[, ...])

Create PsiCoverage from EventsCoverage

from_sj_lsvs(sj, lsvs[, minreads, minbins, ...])

Create PsiCoverage from SJExperiment and Events.

from_zarr(path[, ec_idx_nchunks, prefix_nchunks])

Load PsiCoverage from one or more specified paths

get_events(introns, junctions)

Construct Events using saved dataset and introns, junctions

group([save_zarr, tmp_base_dir, ...])

Create PsiGroup from coverage in self

mask_events(passed)

Return PsiCoverage passing only events that are passed in input

mock_with_psi_and_total(psi, total[, ...])

Returns PsiCoverage over binary events with specified psi/total

passed_min_experiments([min_experiments_f])

Return boolean mask array for events passing min_experiments

plot_violins(ec_idx[, nbins, ...])

Plot posterior distributions over groups of prefixes

raw_psi_mean_population_quantile([...])

empirical quantiles over prefixes of raw_psi_mean

raw_stats(labels[, use_stats])

Statistics on raw posterior means with respect to labels

raw_total_population_quantile([quantiles, ...])

empirical quantiles over prefixes of raw_total

rename_prefixes(prefixes)

Rename prefixes as specified

split_prefixes([rng])

Split class randomly into evenly sized parts

subset_mask(prefix_mask)

Subset class to selected prefixes (provided as boolean mask)

sum(new_prefix[, min_experiments_f])

Create aggregated PsiCoverage with sum coverage over prefixes

to_zarr(path[, consolidated, show_progress])

Save PsiCoverage to specified path

to_zarr_slice(path, prefix_slice)

Save PsiCoverage to specified path for specified slice on prefix

to_zarr_slice_init(path, events_df, ...[, ...])

Initialize zarr store for saving PsiCoverage over many writes

updated(bootstrap_psi, raw_psi, **update_attrs)

Create updated PsiCoverage with new values of psi

Attributes

DIMS_BEFORE_PREFIX

EVENTS_EXPECTED_VARIABLES

EXPECTED_VARIABLES

alpha_prior

array(ec_idx) alpha parameter of prior distribution on PSI for connection

approximate_alpha

array(prefix, ec_idx) alpha parameter of approximated bootstrap posterior

approximate_alpha_plus_beta

approximate_beta

array(prefix, ec_idx) beta parameter of approximated bootstrap posterior

beta_prior

array(ec_idx) beta parameter of prior distribution on PSI for connection

bootstrap_alpha

array(prefix, ec_idx, bootstrap_replicate) alpha parameter of bootstrapped posterior

bootstrap_beta

array(prefix, ec_idx, bootstrap_replicate) beta parameter of bootstrapped posterior

bootstrap_coverage

array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_coverage

bootstrap_posterior_mean

array(...) means of mixtures of bootstrapped posteriors

bootstrap_posterior_std

array(...) standard deviations of bootstrap posterior distribution

bootstrap_posterior_variance

array(...) variances of mixtures of bootstrapped posteriors

bootstrap_psi

array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_psi

bootstrap_psi_mean

array(...) means of bootstrap posterior distribution on PSI (alias)

bootstrap_psi_mean_legacy

array(...) median of means of bootstrapped posteriors

bootstrap_psi_mean_population_median

array(...) median over prefixes of bootstrap_psi_mean

bootstrap_psi_std

array(...) standard deviations of bootstrap posterior distribution (alias)

bootstrap_psi_variance

array(...) variances of bootstrap posterior distribution on PSI (alias)

bootstrap_total

array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_total

event_passed

array(prefix, ec_idx) indicating if event passed

event_size

array(ec_idx) total number of connections from same event

lsv_idx

array(ec_idx) index identifying event it belongs to

lsv_offsets

array(e_offsets_idx) offsets for events into ec_idx

num_bootstraps

Number of bootstrap replicates used for bootstraped coverage estimates

num_connections

Total number of connections over all events

num_events

Total number of events

num_passed

Number of prefixes for which an event was passed

num_prefixes

Number of independent experiments

prefix_total

array(prefix) of total number of reads over entire experiment

prefixes

Names of independent units of analysis

raw_alpha

array(prefix, ec_idx) alpha parameter of raw posterior

raw_alpha_plus_beta

raw_beta

array(prefix, ec_idx) beta parameter of raw posterior

raw_coverage

array(prefix, ec_idx) coverage for individual connection (psi * total)

raw_posterior_mean

array(...) means of raw posterior distribution on PSI

raw_posterior_std

array(...) standard deviations of raw posterior distribution

raw_posterior_variance

array(...) variances of raw posterior distribution on PSI

raw_psi

array(prefix, ec_idx) percentage of raw_total for connection

raw_psi_mean

array(...) means of raw posterior distribution on PSI (alias)

raw_psi_mean_population_median

array(...) median over prefixes of raw_psi_mean

raw_psi_std

array(...) standard deviations of raw posterior distribution (alias)

raw_psi_variance

array(...) variances of raw posterior distribution on PSI (alias)

raw_total

array(prefix, ec_idx) raw total reads over event

raw_total_population_median

array(...) median over prefixes of raw_total