API reference

This page provides an auto-generated summary of MAJIQ’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Random number generation

MAJIQ uses a pool of random number generators to handle multithreaded random number generation, which is separate from numpy or dask random number generation. If more than a single thread is needed for a task involving random numbers, do not forget to use rng_resize() to size the pool of RNGs to match.

rng_seed(seed)

Set seed for random number generator pools

rng_resize(n)

Resize rng pools to allow n simultaneous threads

Build API

MAJIQ builds a SpliceGraph object from GFF3 and coverage from BAMs. The splicegraph is used later to define Events: for quantification.

Classes

SJExperiment

Spliced junction and retained intron coverage for the same experiment.

SJJunctionsBins

Per-bin read coverage over junctions

SJIntronsBins

Per-bin read coverage over introns

SpliceGraph

Representation of all possible splicing changes in each gene.

Contigs

Collection of contigs/chromosomes on which genes can be defined

Genes

Collection of genes on Contigs with their coordinates, and ids

Exons

Collection of exons per gene and their annotated/updated coordinates

GeneIntrons

Collection of introns per gene and their coordinates, flags, and exons

GeneJunctions

Collection of junctions per gene and their coordinates, flags, and exons

ExonConnections

Map from exons to the introns and junctions that start or end from them

ExperimentThresholds

Thresholds on intron/junction coverage for inclusion in splicegraph

GroupJunctionsGenerator

Accumulator of SJJunctionsBins that pass per-experiment thresholds

PassedJunctionsGenerator

Accumulator of GroupJunctionsGenerator to create updated GeneJunctions

GroupIntronsGenerator

Accumulator of SJIntronsBins that pass per-experiment thresholds to update GeneIntrons flags

SimplifierGroup

Accumulator of SJExperiment to update simplifier flags

GeneJunctionsAccumulator

Accumulate GeneJunctions objects into combined GeneJunctions

Create a splicegraph from GFF3

The first step in MAJIQ is to build a splicegraph from transcriptome annotations (GFF3).

SpliceGraph.from_gff3(path[, process_ir, ...])

Create SpliceGraph from GFF3 transcriptome annotations

Save/load splicegraphs to zarr

These splicegraphs are saved and loaded with the following commands.

SpliceGraph.to_zarr(store[, mode])

Save SpliceGraph to specified path/store

SpliceGraph.from_zarr(store[, genes])

Load SpliceGraph from specified path/store

Process BAMs for junction/intron coverage

Updating the annotated splicegraph requires information about coverage from RNA-seq experiments, which is represented by SJExperiment objects.

SJExperiment.from_bam(path, sg[, ...])

Load SJExperiment from BAM file

SJExperiment.to_zarr(store[, mode, consolidated])

Save SJExperiment to specified path/store

SJExperiment.from_zarr(store)

Load SJExperiment from specified path

SJExperiment.introns

SJIntronsBins with intron coverage for this experiment

SJExperiment.junctions

SJJunctionsBins with junctioncoverage for this experiment

Update SpliceGraph structure, passed flags

SpliceGraphs are generally updated in the following manner:

Update junctions using coverage

GeneJunctions can be updated using coverage from SJExperiment objects. This is done by:

GeneJunctions.builder()

Create PassedJunctionsGenerator starting from these junctions

GeneJunctions.build_group(exons)

Create GroupJunctionsGenerator starting from these junctions and exons

GroupJunctionsGenerator.add_experiment(...)

Add SJJunctionsBins experiment to build group

PassedJunctionsGenerator.add_group(group[, ...])

Update passed junctions with GroupJunctionsGenerator build group

PassedJunctionsGenerator.get_passed([...])

Return GeneJunctions with updated flags and novel junctions

Update junctions from other junctions

Updated GeneJunctions can also be created by loading junctions from previous splicegraphs and combining them using GeneJunctions. Note that they must all share the same Genes object, which can be done by setting genes argument to GeneJunctions.from_zarr().

GeneJunctionsAccumulator.add(junctions[, ...])

Add GeneJunctions to accumulator

GeneJunctionsAccumulator.accumulated()

Return GeneJunctions with previously added junctions

Update exons

Update Exons to match updated GeneJunctions.

Exons.infer_with_junctions(junctions[, ...])

Return updated Exons accommodating novel junctions per gene

Update introns

Generally, updated introns are obtained by:

Exons.empty_introns()

Return empty GeneIntrons that match these exons

Exons.potential_introns([make_simplified])

GeneIntrons enumerating all possible introns between exons

GeneIntrons.update_flags_from(donor_introns)

Update flags using overlapping donor GeneIntrons

GeneIntrons.build_group()

Create GroupIntronsGenerator to update these introns in place

GroupIntronsGenerator.add_experiment(sj_introns)

Add SJIntronsBins experiment to build group

GroupIntronsGenerator.update_introns([...])

In-place update of original GeneIntrons flags passing group filters

GeneIntrons.filter_passed([keep_annotated, ...])

Return GeneIntrons subset that all passed build filters

Update SpliceGraph

The updated splicegraph is made by:

SpliceGraph.from_components(contigs, genes, ...)

Construct SpliceGraph with given components

SpliceGraph.with_updated_exon_connections(...)

Create SpliceGraph from exon connections with same genes

ExonConnections.create_connecting(exons, ...)

Create ExonConnections mapping exons to introns, junctions

Update simplifier flags

Simplifier flags allow excluding introns and junctions that pass reliability thresholds (raw readrates/nonzero bins) but have negligible coverage relative to the events they are a part of (PSI). These flags are updated in place by creating a SimplifierGroup, which accumulates SJExperiment objects per group and updates intron/junction flags for a group using SimplifierGroup.update_connections().

GeneIntrons._simplify_all()

Set all connections to the simplified state

GeneJunctions._simplify_all()

Set all connections to the simplified state

GeneIntrons._unsimplify_all()

Set all connections to the unsimplified state

GeneJunctions._unsimplify_all()

Set all connections to the unsimplified state

ExonConnections.simplifier()

Create SimplifierGroup to unsimplify introns and junctions

SimplifierGroup.add_experiment(sj[, ...])

Add SJExperiment to simplification group

SimplifierGroup.update_connections([...])

In-place update of connections passing thresholds in enough experiments

Events API

An event is defined by a reference exon and connections (junctions and/or intron) that all start or end at the reference exon. The Events class represents a collection of these events as arrays over events (e_idx) and connections per event (ec_idx). The mapping from events and event connections is specified by offsets yielding the start/end indexes of ec_idx for each event. The events use indexes to refer back to the splicegraph/exon connections that were used to create them.

UniqueEventsMasks and Events.unique_events_mask() allow identification of events that are unique or shared between two Events objects. This has use for analyses involving multiple splicegraphs derived from a common splicegraph (e.g. a common set of controls).

Classes

Events

Collections of introns/junctions all starting or ending at the same exon

UniqueEventsMasks

Masks betwen two Events objects over e_idx for unique/shared events

Create/save events objects

ExonConnections.lsvs([select_lsvs])

construct Events for all LSVs defined by exon connections

ExonConnections.constitutive()

construct Events for all constitutive events in ExonConnections

PsiCoverage.get_events(introns, junctions)

Construct Events using saved dataset and introns, junctions

PsiControlsSummary.get_events(introns, junctions)

Construct Events using saved dataset and introns, junctions

Events.to_zarr(store, mode[, consolidated])

Save Events to specified path/store

Events.from_zarr(store, introns, junctions)

Load Events from specified path/store

Work with events objects

Events.unique_events_mask(other)

Get UniqueEventsMasks with shared events and events unique to self

Events.exons

Exons over which events defined

Events.introns

Introns over which events defined

Events.junctions

Junctions over which events defined

Events.df

xr.Dataset with event and event connections information

Events.ec_dataframe([annotated, ...])

pd.DataFrame over event connections detailing genomic information

Information on unique events

Events.e_idx

Index over unique events

Events.ref_exon_idx

Index into self.exons for reference exon of each unique event

Events.event_type

Indicator if source ('s') or target ('b') for each unique event

Events.ec_idx_start

First index into event connections (ec_idx) for each unique event

Events.ec_idx_end

One-past-end index into event connections (ec_idx) for each unique event

Events.connections_slice_for_event(event_idx)

Get slice into event connections for event with specified index

Information on connections per event

Events.ec_idx

Index over event connections

Events.is_intron

Indicator if an intron or junction for each event connection

Events.connection_idx

Index into self.introns or self.junctions for each event connection

Events.connection_gene_idx([ec_idx])

Index into self.genes for selected event connections

Events.connection_start([ec_idx])

Start coordinate for each selected event connection

Events.connection_end([ec_idx])

End coordinate for each selected event connection

Events.connection_denovo([ec_idx])

Indicator if connection was denovo for each selected event connection

Events.connection_ref_exon_idx

Index into self.exons for reference exon for each event connection

Events.connection_other_exon_idx([ec_idx])

Index into self.exons for nonreference exon for each event connection

PsiCoverage API

PsiCoverage describes coverage over Events in one or more independent “prefixes”. PsiCoverage can be created over events for a single experiment using PsiCoverage.from_sj_lsvs() (prefix is determined by prefix of original BAM file, which is where “prefix” name originates). New PsiCoverage files can be subsequently created by loading them together or aggregating coverage over multiple prefixes. Finally, PsiCoverage provides attributes and functions which enable lazy computation of PSI posterior quantities using xarray/Dask.

Classes

PsiCoverage

Summarized raw and bootstrap coverage over LSVs for one or more experiments.

Create/save PsiCoverage

Create PsiCoverage from SJ coverage

PsiCoverage.from_sj_lsvs(sj, lsvs[, ...])

Create PsiCoverage from SJExperiment and Events.

Save PsiCoverage to zarr

PsiCoverage.to_zarr(path[, consolidated, ...])

Save PsiCoverage to specified path

PsiCoverage.to_zarr_slice_init(path, ...[, ...])

Initialize zarr store for saving PsiCoverage over many writes

PsiCoverage.to_zarr_slice(path, prefix_slice)

Save PsiCoverage to specified path for specified slice on prefix

Load and update PsiCoverage

PsiCoverage.from_zarr(path[, ...])

Load PsiCoverage from one or more specified paths

PsiCoverage.updated(bootstrap_psi, raw_psi, ...)

Create updated PsiCoverage with new values of psi

PsiCoverage.sum(new_prefix[, min_experiments_f])

Create aggregated PsiCoverage with sum coverage over prefixes

PsiCoverage.mask_events(passed)

Return PsiCoverage passing only events that are passed in input

Events/prefixes with coverage

PsiCoverage.num_connections

Total number of connections over all events

PsiCoverage.get_events(introns, junctions)

Construct Events using saved dataset and introns, junctions

PsiCoverage.num_prefixes

Number of independent experiments

PsiCoverage.prefixes

Names of independent units of analysis

PsiCoverage.event_passed

array(prefix, ec_idx) indicating if event passed

PsiCoverage.num_passed

Number of prefixes for which an event was passed

PsiCoverage.passed_min_experiments([...])

Return boolean mask array for events passing min_experiments

Raw coverage/posteriors

PsiCoverage.raw_total

array(prefix, ec_idx) raw total reads over event

PsiCoverage.raw_coverage

array(prefix, ec_idx) coverage for individual connection (psi * total)

PsiCoverage.raw_alpha

array(prefix, ec_idx) alpha parameter of raw posterior

PsiCoverage.raw_beta

array(prefix, ec_idx) beta parameter of raw posterior

PsiCoverage.raw_psi_mean

array(...) means of raw posterior distribution on PSI (alias)

PsiCoverage.raw_psi_std

array(...) standard deviations of raw posterior distribution (alias)

PsiCoverage.raw_psi_mean_population_median

array(...) median over prefixes of raw_psi_mean

PsiCoverage.raw_psi_mean_population_quantile([...])

empirical quantiles over prefixes of raw_psi_mean

Bootstrap coverage/posteriors

PsiCoverage.num_bootstraps

Number of bootstrap replicates used for bootstraped coverage estimates

PsiCoverage.bootstrap_total

array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_total

PsiCoverage.bootstrap_coverage

array(prefix, ec_idx, bootstrap_replicate) bootstrapped raw_coverage

PsiCoverage.bootstrap_alpha

array(prefix, ec_idx, bootstrap_replicate) alpha parameter of bootstrapped posterior

PsiCoverage.bootstrap_beta

array(prefix, ec_idx, bootstrap_replicate) beta parameter of bootstrapped posterior

PsiCoverage.bootstrap_psi_mean

array(...) means of bootstrap posterior distribution on PSI (alias)

PsiCoverage.bootstrap_psi_mean_legacy

array(...) median of means of bootstrapped posteriors

PsiCoverage.bootstrap_psi_std

array(...) standard deviations of bootstrap posterior distribution (alias)

PsiCoverage.bootstrap_psi_mean_population_median

array(...) median over prefixes of bootstrap_psi_mean

PsiCoverage.bootstrap_psi_mean_population_quantile([...])

empirical quantiles over prefixes of bootstrap_psi_mean

Beta approximation to bootstrap mixture coverage/posteriors

PsiCoverage.approximate_alpha

array(prefix, ec_idx) alpha parameter of approximated bootstrap posterior

PsiCoverage.approximate_beta

array(prefix, ec_idx) beta parameter of approximated bootstrap posterior

PsiCoverage.approximate_quantile([quantiles])

Compute quantiles of approximate/smoothed bootstrapped posterior

PsiCoverage.approximate_discretized_pmf([...])

Compute discretized PMF of approximate/smoothed bootstrap posterior

Quantifier API

DeltaPsi (replicate PsiCoverage)

DPsiPrior([a, pmix])

Prior on DeltaPsi as weighted mixture of beta distributions (over [-1, 1])

DPsiPrior.empirical_update(psi1, psi2[, ...])

Use reliable binary events from psi1,2 to return updated prior

DeltaPsi(psi1, psi2, prior[, psibins, ...])

Compute DeltaPsi between two groups of PsiCoverage (replicate assumption)

DeltaPsi.dataset([nchunks])

Reduce to DeltaPsiDataset used for VOILA visualization

DeltaPsi.bootstrap_posterior

DeltaPsiPMF for average bootstrapped dpsi posteriors

DeltaPsiPMF(p)

Specialization of PMFSummaries for DeltaPsi on [-1, 1]

DeltaPsiPMF.mean

expectation of position in bins

DeltaPsiPMF.standard_deviation

standard deviation (sqrt of variance)

DeltaPsiPMF.probability_changing([...])

Probability that abs(dPSI) > changing_threshold

DeltaPsiPMF.probability_nonchanging([...])

Probability that abs(dPSI) <= nonchanging_threshold

Heterogen (independent PsiCoverage)

Heterogen(psi1, psi2[, min_experiments_f, ...])

Compare Psi between two groups of PsiCoverage (independence assumption)

Heterogen.dataset([pvalue_quantiles, ...])

Heterogen.raw_stats([ec_idx, use_stats])

Statistics on means, samples from raw posteriors

Heterogen.approximate_stats([ec_idx, ...])

Statistics on means, samples from approximate posteriors

CLIN (in development)

Controls

PsiControlsSummary(df, events[, hold_temporary])

Summary of PSI posterior means over large group of controls

PsiControlsSummary.from_zarr(path)

PsiControlsSummary.to_zarr(path[, ...])

Save PSI coverage dataset as zarr

PsiControlsSummary.q

alias for controls_q

PsiControlsSummary.num_passed

PsiControlsSummary.prefixes

PsiControlsSummary.passed_min_experiments([...])

Get boolean mask of events that pass enough experiments

PsiControlsSummary.psi_median

PsiControlsSummary.psi_quantile

PsiControlsSummary.psi_range

For each controls_alpha, range between lower/upper quantiles (scale)

Outliers

PsiOutliers(cases, controls[, alpha_case])

Outliers in PSI between cases and controls