rna_majiq.PsiCoverage
- class rna_majiq.PsiCoverage(df, events)
Summarized raw and bootstrap coverage over LSVs for one or more experiments.
Summarized raw and bootstrap coverage over LSVs for one or more experiments as input for quantification. Coverage is a total readrate over all bins, excluding stacks and after any preceding batch correction steps, ready for quantification. Per-experiment coverage stored independently over “prefix” dimension, where prefixes originate as the prefix from BAM file names (i.e. foo/experiment1.bam -> experiment1). Coverage is accompanied by boolean array indicating whether an event is “passed for quantification” for each experiment (
PsiCoverage.passed
).Provides functionality for combining and summarizing over experiments and multiple
PsiCoverage
objects. Functions and attributes enable computation of PSI posterior statistics under MAJIQ models for splicing quantification. Computations are performed over xarray objects. When loadingPsiCoverage
from Zarr files, data/computations will be loaded/performed lazily using Dask. Testing of these computations have been performed over local clusters using threads rather than processes (expensive computations generally release the GIL).Generally, for point estimates of location, quantification with raw coverage should be preferred, as bootstrap estimates converge very closely to raw estimates as the number of bootstrap replicates becomes large. For estimates of variability, quantification with bootstrap coverage should be used to account for additional per-bin readrate variability that isn’t fully captured by the Bayesian model on its own.
Underlying coverage is stored as the total number of reads over the event and the proportion of reads per intron/junction. This requires twice the uncompressed memory vs the number of reads per intron/junction, but permits easier lazy computation with Dask over large datasets.
- Parameters:
df (
xarray.Dataset
) – Required variables/coordinates as in EXPECTED_VARIABLESevents (
xarray.Dataset
) – dataset that can be loaded along with matching introns/junctions as Events
See also
PsiCoverage.from_sj_lsvs
Create
PsiCoverage
fromSJExperiment
andEvents
PsiCoverage.from_events_coverage
Create
PsiCoverage
fromEventsCoverage
PsiCoverage.from_zarr
Load
PsiCoverage
from one or more Zarr filesPsiCoverage.updated
Create updated
PsiCoverage
with updated arraysPsiCoverage.sum
Summed
PsiCoverage
over current prefixesPsiCoverage.mask_events
Create updated
PsiCoverage
passing only specified eventsPsiCoverage.__getitem__
Get
PsiCoverage
for subset of prefixes
- __init__(df, events)
Initialize
PsiCoverage
with specified xarray datasets- Parameters:
df (
xarray.Dataset
) – Required variables/coordinates as in EXPECTED_VARIABLESevents (
xarray.Dataset
) – dataset that can be loaded along with matching introns/junctions as Events
Methods
__init__
(df, events)Initialize
PsiCoverage
with specified xarray datasetsapproximate_cdf
(x, **indexer_kwargs)Compute cdf of approximate/smoothed bootstrapped posterior
approximate_discretized_pmf
([nbins, ...])Compute discretized PMF of approximate/smoothed bootstrap posterior
approximate_quantile
([quantiles])Compute quantiles of approximate/smoothed bootstrapped posterior
approximate_stats
(labels[, quantiles, ...])Statistics on approximate posterior means and psisamples.
bootstrap_cdf
(x, **indexer_kwargs)Compute cdf of mixture of bootstrapped posterior distribution
bootstrap_discretized_pmf
([nbins, ...])Compute discretized PMF of bootstrap posterior mixture
empirical quantiles over prefixes of bootstrap_psi_mean
bootstrap_quantile
([quantiles])Compute quantiles of mixture of bootstrapped posterior distributions
bootstrap_stats
(labels[, quantiles, ...])Statistics on bootstrap posterior means and psisamples.
concat
(*objs[, override_args, update_kwargs])Concatenate multiple instances of class into single one
convert_sj_batch
(sjs, lsvs, path[, ...])Load PsiCoverage from sj paths, save to single output path
dataset
([properties, quantiles, psibins, ...])Extract selected properties into single
xr.Dataset
downsample
(num_prefixes[, rng])Get random subset with exactly num_prefixes prefixes
drop_prefixes
(prefixes)events_to_zarr
(path, mode[, consolidated])Save events information to specified path
from_events_coverage
(events_coverage[, ...])Create
PsiCoverage
fromEventsCoverage
from_sj_lsvs
(sj, lsvs[, minreads, minbins, ...])Create
PsiCoverage
fromSJExperiment
andEvents
.from_zarr
(path[, ec_idx_nchunks, prefix_nchunks])Load
PsiCoverage
from one or more specified pathsget_events
(introns, junctions)Construct
Events
using saved dataset and introns, junctionsgroup
([save_zarr, tmp_base_dir, ...])Create
PsiGroup
from coverage in selfmask_events
(passed)Return
PsiCoverage
passing only events that are passed in inputmock_with_psi_and_total
(psi, total[, ...])Returns PsiCoverage over binary events with specified psi/total
passed_min_experiments
([min_experiments_f])Return boolean mask array for events passing min_experiments
plot_violins
(ec_idx[, nbins, ...])Plot posterior distributions over groups of prefixes
empirical quantiles over prefixes of raw_psi_mean
raw_stats
(labels[, use_stats])Statistics on raw posterior means with respect to labels
raw_total_population_quantile
([quantiles, ...])empirical quantiles over prefixes of raw_total
rename_prefixes
(prefixes)Rename prefixes as specified
split_prefixes
([rng])Split class randomly into evenly sized parts
subset_mask
(prefix_mask)Subset class to selected prefixes (provided as boolean mask)
sum
(new_prefix[, min_experiments_f])Create aggregated
PsiCoverage
with sum coverage over prefixesto_zarr
(path[, consolidated, show_progress])Save
PsiCoverage
to specified pathto_zarr_slice
(path, prefix_slice)Save
PsiCoverage
to specified path for specified slice on prefixto_zarr_slice_init
(path, events_df, ...[, ...])Initialize zarr store for saving
PsiCoverage
over many writesupdated
(bootstrap_psi, raw_psi, **update_attrs)Create updated
PsiCoverage
with new values of psiAttributes
DIMS_BEFORE_PREFIX
EVENTS_EXPECTED_VARIABLES
EXPECTED_VARIABLES
alpha_prior
array(ec_idx) alpha parameter of prior distribution on PSI for connection
array(prefix, ec_idx) alpha parameter of approximated bootstrap posterior
approximate_alpha_plus_beta
array(prefix, ec_idx) beta parameter of approximated bootstrap posterior
beta_prior
array(ec_idx) beta parameter of prior distribution on PSI for connection
array(prefix, ec_idx, bootstrap_replicate) alpha parameter of bootstrapped posterior
array(prefix, ec_idx, bootstrap_replicate) beta parameter of bootstrapped posterior
array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_coverage
bootstrap_posterior_mean
array(...) means of mixtures of bootstrapped posteriors
bootstrap_posterior_std
array(...) standard deviations of bootstrap posterior distribution
bootstrap_posterior_variance
array(...) variances of mixtures of bootstrapped posteriors
bootstrap_psi
array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_psi
array(...) means of bootstrap posterior distribution on PSI (alias)
array(...) median of means of bootstrapped posteriors
array(...) median over prefixes of bootstrap_psi_mean
array(...) standard deviations of bootstrap posterior distribution (alias)
bootstrap_psi_variance
array(...) variances of bootstrap posterior distribution on PSI (alias)
array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_total
array(prefix, ec_idx) indicating if event passed
event_size
array(ec_idx) total number of connections from same event
lsv_idx
array(ec_idx) index identifying event it belongs to
lsv_offsets
array(e_offsets_idx) offsets for events into ec_idx
Number of bootstrap replicates used for bootstraped coverage estimates
Total number of connections over all events
num_events
Total number of events
Number of prefixes for which an event was passed
Number of independent experiments
prefix_total
array(prefix) of total number of reads over entire experiment
Names of independent units of analysis
array(prefix, ec_idx) alpha parameter of raw posterior
raw_alpha_plus_beta
array(prefix, ec_idx) beta parameter of raw posterior
array(prefix, ec_idx) coverage for individual connection (psi * total)
raw_posterior_mean
array(...) means of raw posterior distribution on PSI
raw_posterior_std
array(...) standard deviations of raw posterior distribution
raw_posterior_variance
array(...) variances of raw posterior distribution on PSI
raw_psi
array(prefix, ec_idx) percentage of raw_total for connection
array(...) means of raw posterior distribution on PSI (alias)
array(...) median over prefixes of raw_psi_mean
array(...) standard deviations of raw posterior distribution (alias)
raw_psi_variance
array(...) variances of raw posterior distribution on PSI (alias)
array(prefix, ec_idx) raw total reads over event
raw_total_population_median
array(...) median over prefixes of raw_total