new_majiq.PsiCoverage
- class rna_majiq.PsiCoverage(df, events)
Summarized raw and bootstrap coverage over LSVs for one or more experiments.
Summarized raw and bootstrap coverage over LSVs for one or more experiments as input for quantification. Coverage is a total readrate over all bins, excluding stacks and after any preceding batch correction steps, ready for quantification. Per-experiment coverage stored independently over “prefix” dimension, where prefixes originate as the prefix from BAM file names (i.e. foo/experiment1.bam -> experiment1). Coverage is accompanied by boolean array indicating whether an event is “passed for quantification” for each experiment (
PsiCoverage.passed).Provides functionality for combining and summarizing over experiments and multiple
PsiCoverageobjects. Functions and attributes enable computation of PSI posterior statistics under MAJIQ models for splicing quantification. Computations are performed over xarray objects. When loadingPsiCoveragefrom Zarr files, data/computations will be loaded/performed lazily using Dask. Testing of these computations have been performed over local clusters using threads rather than processes (expensive computations generally release the GIL).Generally, for point estimates of location, quantification with raw coverage should be preferred, as bootstrap estimates converge very closely to raw estimates as the number of bootstrap replicates becomes large. For estimates of variability, quantification with bootstrap coverage should be used to account for additional per-bin readrate variability that isn’t fully captured by the Bayesian model on its own.
Underlying coverage is stored as the total number of reads over the event and the proportion of reads per intron/junction. This requires twice the uncompressed memory vs the number of reads per intron/junction, but permits easier lazy computation with Dask over large datasets.
- Parameters:
df (
xarray.Dataset) – Required variables/coordinates as in EXPECTED_VARIABLESevents (
xarray.Dataset) – dataset that can be loaded along with matching introns/junctions as Events
See also
PsiCoverage.from_sj_lsvsCreate
PsiCoveragefromSJExperimentandEventsPsiCoverage.from_events_coverageCreate
PsiCoveragefromEventsCoveragePsiCoverage.from_zarrLoad
PsiCoveragefrom one or more Zarr filesPsiCoverage.updatedCreate updated
PsiCoveragewith updated arraysPsiCoverage.sumSummed
PsiCoverageover current prefixesPsiCoverage.mask_eventsCreate updated
PsiCoveragepassing only specified eventsPsiCoverage.__getitem__Get
PsiCoveragefor subset of prefixes
- __init__(df, events)
Initialize
PsiCoveragewith specified xarray datasets- Parameters:
df (
xarray.Dataset) – Required variables/coordinates as in EXPECTED_VARIABLESevents (
xarray.Dataset) – dataset that can be loaded along with matching introns/junctions as Events
Methods
__init__(df, events)Initialize
PsiCoveragewith specified xarray datasetsapproximate_cdf(x, **indexer_kwargs)Compute cdf of approximate/smoothed bootstrapped posterior
approximate_discretized_pmf([nbins, ...])Compute discretized PMF of approximate/smoothed bootstrap posterior
approximate_quantile([quantiles])Compute quantiles of approximate/smoothed bootstrapped posterior
bootstrap_cdf(x, **indexer_kwargs)Compute cdf of mixture of bootstrapped posterior distribution
bootstrap_discretized_pmf([nbins, ...])Compute discretized PMF of bootstrap posterior mixture
empirical quantiles over prefixes of bootstrap_psi_mean
bootstrap_quantile([quantiles])Compute quantiles of mixture of bootstrapped posterior distributions
convert_sj_batch(sjs, lsvs, path[, ...])Load PsiCoverage from sj paths, save to single output path
dataset([properties, quantiles, psibins, ...])Extract selected properties into single
xr.Datasetevents_to_zarr(path, mode[, consolidated])Save events information to specified path
from_events_coverage(events_coverage[, ...])Create
PsiCoveragefromEventsCoveragefrom_sj_lsvs(sj, lsvs[, minreads, minbins, ...])Create
PsiCoveragefromSJExperimentandEvents.from_zarr(path[, ec_idx_nchunks, ...])Load
PsiCoveragefrom one or more specified pathsget_events(introns, junctions)Construct
Eventsusing saved dataset and introns, junctionsmask_events(passed)Return
PsiCoveragepassing only events that are passed in inputpassed_min_experiments([min_experiments_f])Return boolean mask array for events passing min_experiments
empirical quantiles over prefixes of raw_psi_mean
sum(new_prefix[, min_experiments_f])Create aggregated
PsiCoveragewith sum coverage over prefixesto_zarr(path[, consolidated, show_progress])Save
PsiCoverageto specified pathto_zarr_slice(path, prefix_slice)Save
PsiCoverageto specified path for specified slice on prefixto_zarr_slice_init(path, events_df, ...[, ...])Initialize zarr store for saving
PsiCoverageover many writesupdated(bootstrap_psi, raw_psi, **update_attrs)Create updated
PsiCoveragewith new values of psiAttributes
alpha_priorarray(ec_idx) alpha parameter of prior distribution on PSI for connection
array(prefix, ec_idx) alpha parameter of approximated bootstrap posterior
approximate_alpha_plus_betaarray(prefix, ec_idx) beta parameter of approximated bootstrap posterior
beta_priorarray(ec_idx) beta parameter of prior distribution on PSI for connection
array(prefix, ec_idx, bootstrap_replicate) alpha parameter of bootstrapped posterior
array(prefix, ec_idx, bootstrap_replicate) beta parameter of bootstrapped posterior
array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_coverage
bootstrap_posterior_meanarray(...) means of mixtures of bootstrapped posteriors
bootstrap_posterior_stdarray(...) standard deviations of bootstrap posterior distribution
bootstrap_posterior_variancearray(...) variances of mixtures of bootstrapped posteriors
bootstrap_psiarray(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_psi
array(...) means of bootstrap posterior distribution on PSI (alias)
array(...) median of means of bootstrapped posteriors
array(...) median over prefixes of bootstrap_psi_mean
array(...) standard deviations of bootstrap posterior distribution (alias)
bootstrap_psi_variancearray(...) variances of bootstrap posterior distribution on PSI (alias)
array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_total
array(prefix, ec_idx) indicating if event passed
event_sizearray(ec_idx) total number of connections from same event
lsv_idxarray(ec_idx) index identifying event it belongs to
lsv_offsetsarray(e_offsets_idx) offsets for events into ec_idx
Number of bootstrap replicates used for bootstraped coverage estimates
Total number of connections over all events
num_eventsTotal number of events
Number of prefixes for which an event was passed
Number of independent experiments
prefix_totalarray(prefix) of total number of reads over entire experiment
Names of independent units of analysis
array(prefix, ec_idx) alpha parameter of raw posterior
raw_alpha_plus_betaarray(prefix, ec_idx) beta parameter of raw posterior
array(prefix, ec_idx) coverage for individual connection (psi * total)
raw_posterior_meanarray(...) means of raw posterior distribution on PSI
raw_posterior_stdarray(...) standard deviations of raw posterior distribution
raw_posterior_variancearray(...) variances of raw posterior distribution on PSI
raw_psiarray(prefix, ec_idx) percentage of raw_total for connection
array(...) means of raw posterior distribution on PSI (alias)
array(...) median over prefixes of raw_psi_mean
array(...) standard deviations of raw posterior distribution (alias)
raw_psi_variancearray(...) variances of raw posterior distribution on PSI (alias)
array(prefix, ec_idx) raw total reads over event