MAJIQ


Majiq Parameters

In the previous quick start section we described a general execution pipeline for MAJIQ, but those three commands have many other parameters that can be adjusted to modify MAJIQ's behavior.

Builder

majiq build [-h] [-j NPROC] -o OUTDIR [--logger LOGGER] [--silent] [--debug] [--mem-profile] [--min-experiments MIN_EXP] -c CONF [--minreads MINREADS] [--minpos MINPOS] [--min-denovo MIN_DENOVO] [--disable-denovo] [--irnbins IRNBINS] [--min-intronic-cov MIN_INTRONIC_COV] [--disable-ir] [--disable-denovo-ir] [--annotated_ir_always] [--junc-files-only] [--incremental] [--simplify [SIMPL_PSI]] [--simplify-min-experiments SIMPL_MIN_EXP] [--simplify-annotated SIMPL_DB] [--simplify-denovo SIMPL_DENOVO] [--simplify-ir SIMPL_IR] [--markstacks PVALUE_LIMIT] [--m M] [--permissive] [--dump-constitutive] [--dump-coverage] transcripts

Mandatory arguments:

  • transcripts: Transcriptome file with the annotation database. Currently, we accept GFF3 format. For a better description, see the annotation file section. transcripts can be a majiq DB file generated with DB.npz name from a previous majiq build execution.
  • -c/--conf CONFIG_FILE: This is the configuration file for the study. This file should define the files and the paths for the bam files, the read length, the genome version, and some other information needed for the Builder. For a more detailed information, please check the configuration file section.
  • -o/--output OUTDIR: Directory where the output will be placed. MAJIQ Builder has a set of output files .majiq per each bam file and one splicegraph.sql. These files will be the input files in the next steps of the analysis.

Optional arguments:

  • -h, --helpshow this help message and exit
  • -j NPROC, --nproc NPROC Number of threads to use. [Default: 4]
  • --logger LOGGER Path for the logger. [Default is output directory]
  • --silent Silence the logger.
  • --debug This flag is used for debugging purposes. It activates more verbose logging and skips some processing steps. [Default: False]
  • --mem-profile Print memory usage summary at the end of program execution. [Default: False]
  • --min-experiments MIN_EXP Threshold for group filters. This specifies the fraction (value < 1) or absolute number (value >= 1) of experiments passing per-experiment filters (i.e. minreads, minpos, etc.) that must pass individually in order to pass an LSV or junction. If greater than the total number of experiments in a group, requires all experiments to pass individually. [Default: 0.5]

Junction Filters:

  • --minreads MINREADS Threshold on the minimum total number of reads for any junction to meet per-experiment filters for the LSVs it is a part of. When the minimum numbers of reads and positions (--minpos) are both met in enough experiments for a group in any junction that is part of an LSV, the LSV is considered admissible and saved in output MAJIQ files for potential downstream quantification. [Default: 3]
  • --minpos MINPOS Threshold on the minimum number of read positions with at least 1 read for any junction to meet per-experiment filters for the LSVs it is a part of. Positions are relative to the aligned query sequences, ignoring soft clipping, and the first few bases of overhang on either end are ignored. When the minimum numbers of reads (--minreads) and positions are both met in enough experiments for a group in any junction that is part of an LSV, the LSV is considered admissible and associated coverage per experiment saved in the output MAJIQ files for potential downstream quantification. [Default: 2]

Denovo junctions options:

  • --min-denovo MIN_DENOVO Threshold on the minimum total number of reads for a denovo junction to be detected for inclusion in the splicegraph. This per-experiment filter requires the --minpos filter to be satisfied at the same time. [Default: 5]
  • --disable-denovo Disable denovo detection of junctions, splicesites and exons. This will restrict analysis to junctions and exons found in provided annotations file, reducing the number of LSVs detected. Note that this does not disable detection of unannotated intron retention (see --disable-denovo-ir). [Default: denovo enabled]

Intron options:

  • --irnbins IRNBINS Threshold on fraction of intronic read positions (aggregated/normalized to match junctions) with sufficient coverage (set by --min-intronic-cov) to pass per-experiment filters on introns. [Default: 0.5]
  • --min-intronic-cov MIN_INTRONIC_COV Threshold on per-position normalized intronic readrate to be considered to have sufficient coverage at that position. Used with --irnbins to define per-experiment filters on introns. [Default: 0.01]
  • --disable-ir Disable intron retention detection. This applies to both annotated and unannotated retained introns. [Default: intron retention enabled]
  • --disable-denovo-ir Disable detection of denovo introns only, keeping detection of annotated introns enabled. [Default: denovo introns enabled]
  • --annotated_ir_always Automatically pass all annotated introns regardless of coverage. By default, introns with insufficient coverage (not passing group filters) are excluded from the splicegraph and associated LSV definitions even if they are present in the annotations; this flag forces them to be kept.

Incremental build options:

  • --junc-files-only Stop MAJIQ builder execution after extracting junction information from BAM files into output SJ (*.sj) files for use in MAJIQ incremental builds. [Default: disabled]
  • --incremental Enable use of SJ files generated by previous builds. This removes the need to reprocess the original BAM file again. [Default: False]

Simplifier options:

  • --simplify [SIMPL_PSI] Enable the simplifier. Simplification ignores junctions and introns with consistently low usage within and between build groups in subsequent quantification. Optional threshold specifies maximum value of raw PSI considered as low usage (if not specified, uses 0.01). If enabled, a junction/intron will be simplified if it has PSI above this threshold in less than min-experiments experiments for each build group and event it belongs to. [Default: -1]
  • --simplify-min-experiments SIMPL_MIN_EXP Override minimum number of experiments used for simplifier, alllowing for more relaxed or stringent number of experiments passing filters for/against simplification. Specified as with --min-experiments (fraction or absolute number). This is disabled by default and the value from --min-experiments is used instead.
  • --simplify-annotated SIMPL_DB Simplifier minimum number or reads threshold for annotated junctions. If specified, an annotated junction must simultaneously have at least the specified number of reads (in addition to being above the PSI threshold from --simplify) in enough experiments to avoid removal by simplification. [Default: 0]
  • --simplify-denovo SIMPL_DENOVO Simplifier minimum number or reads threshold for denovo junctions. If specified, a denovo junction must simultaneously haveat least the specified number of reads (in addition to being above the PSI threshold from --simplify) in enough experiments to avoid removal by simplification. [Default: 0]
  • --simplify-ir SIMPL_IR Simplifier minimum readrate threshold for intron retention. If specified, a retained intron must simultaneously have at least the specified minimum normalized readrate (in addition to being above the PSI threshold from --simplify) in enough experiments to avoid removal by simplification. [Default: 0]

Bootstrap coverage sampling:

  • --markstacks PVALUE_LIMIT P-value threshold used for detecting and removing read stacks (outlier per-position read coverage under Poisson or negative-binomial null distribution). Use a negative value to disable stack detection/removal. [Default: 1e-07]
  • --m M Number of bootstrap samples of total read coverage to save in output SJ and MAJIQ files for downstream quantification. [Default: 30]

Advanced options:

  • --permissive Consider all distinct LSVs, including events which are contained by other LSVs. By default, MAJIQ ignores all events for which their connections are all present in another event (def: redundant events) unless they are mutually redundant, in which case the events are equivalent (in this case the single-source event is selected). There are some cases where we would like to quantify these redundant events (excluding the equivalent mutually-redundant events); this flag enables more permissive output of splicing events.
  • --dump-constitutive Create constitutive_junctions.tsv file listing all junctions that pass group filters but are not part of any LSV because they are structurally constitutive. [Default: False]
  • --dump-coverage Optionally dump raw junction coverage by position to created SJ files for experimental/debugging purposes

PSI

majiq psi [-h] [-j NPROC] -o OUTDIR [--logger LOGGER] [--silent] [--debug] [--mem-profile] [--min-experiments MIN_EXP] -n NAME [--output-type {voila,tsv,all}] [--minreads MINREADS] [--minpos MINPOS] files [files ...]

Mandatory arguments:

  • files: .majiq file[s] that were created by the MAJIQ Builder execution
  • -n/--name NAME: The name that identifies the quantification group.
  • -o/--output OUTDIR: PSI output directory. It will contain the psi.voila file once the execution is finished. - Optional arguments:
  • -h, --help: Show help message and exit
  • -j/--nprocs NTHREADS: Number of threads to use.
  • --minreads MINREADS: Minimum number of reads to pass the quantifiable threshold combining all positions in a LSV to considered. [Default: 10]
  • --minpos MINPOS: Minimum number of start positions with at least 1 read in a LSV to considered. [Default: 3]
  • --min-experiments MIN_EXP: Use to alter the threshold for group filters. min_experiments is the minimum number of experiments where the different filter checks must be met in order to consider LSV or junction quantifiable.
  • --output-type {voila,tsv,all} Specify the type(s) of output files to produce: voila file to use with voila, TSV file with basic quantifications per LSV, or both. [Default: all]

Logger arguments:

  • --logger LOGGER_PATH: Path for the logger. Default is output directory
  • --silent : Boolean argument used to silence the logger.
  • --debug: Activate this flag to activate debug messages.

DeltaPSI

majiq deltapsi [-h] [-j NPROC] -o OUTDIR [--logger LOGGER] [--silent] [--debug] [--mem-profile] [--min-experiments MIN_EXP] -grp1 FILES1 [FILES1 ...] -grp2 FILES2 [FILES2 ...] [--default-prior] -n NAMES NAMES [--binsize BINSIZE] [--prior-minreads PRIOR_MINREADS] [--prior-minnonzero PRIOR_MINNONZERO] [--prior-iter ITER] [--output-type {voila,tsv,all}] [--minreads MINREADS] [--minpos MINPOS]

Mandatory arguments:

  • -grp1 FILES1 [FILES1 ...]: Set of .majiq file[s] for the first condition
  • -grp2 FILES2 [FILES2 ...]: Set of .majiq file[s] for the second condition
  • -n/--names NAMES [NAMES ...]: _cond_id1_ _cond_id2_: group identifiers for grp1 and grp2 respectively.
  • -o/--output OUTDIR: PSI output directory. It will contain the deltapsi.voila file once the execution is finished.

Optional arguments:

  • -h, --help: Show help message and exit
  • -j/--nprocs NTHREADS: Number of threads to use [Default: 4].
  • --minreads MINREADS: Minimum number of reads to pass the quantifiable threshold combining all positions in a LSV to considered. [Default: 10]
  • --minpos MINPOS: Minimum number of start positions with at least 1 read in a LSV to considered. [Default: 3]
  • --min-experiments MIN_EXP: Use to alter the threshold for group filters. min_experiments is the minimum number of experiments where the different filter checks must be met in order to consider LSV or junction quantifiable.
  • --binsize BINSIZE: The bins for PSI values. With a BINSIZE of 0.025 (default), we have 40 bins
  • --default-prior: Use a default prior instead of computing it using the empirical data
  • --prior-minreads PRIORMINREADS: Minimum number of reads combining all positions in a junction to be considered (for the 'best set' calculation). [Default: 20]
  • --prior-minnonzero PRIORMINNONZERO: Minimum number of positions for the best set.
  • --prior-iter ITER: Max number of iterations of the EM
  • --output-type {voila,tsv,all} Specify the type(s) of output files to produce: voila file to use with voila, TSV file with basic quantifications per LSV, or both. [Default: all]

Logger arguments:

  • --logger LOGGER_PATH: Path for the logger. Default is output directory
  • --silent : Boolean argument used to silence the logger.
  • --debug: Activate this flag to activate debug messages.

Heterogen

majiq heterogen [-h] [-j NPROC] -o OUTDIR [--logger LOGGER] [--silent] [--debug] [--mem-profile] [--min-experiments MIN_EXP] [--minreads MINREADS] [--minpos MINPOS] -grp1 FILES1 [FILES1 ...] -grp2 FILES2 [FILES2 ...] -n NAME_GRP1 NAME_GRP2 [--keep-tmpfiles] [--psi-samples PSI_SAMPLES] [--stats {TTEST,WILCOXON,TNOM,INFOSCORE,ALL} [{TTEST,WILCOXON,TNOM,INFOSCORE,ALL} ...]] [--test_percentile TEST_PERCENTILE] [--visualization-std VISUALIZATION_STD]

optional arguments:

  • -h, --help: show this help message and exit
  • -j NPROC, --nproc NPROC: Number of threads to use. [Default: 4]
  • -o OUTDIR, --output OUTDIR: Path for output directory to which output files will be saved.
  • --logger LOGGERPath for the logger. [Default is output directory]
  • --silentSilence the logger.
  • --debugThis flag is used for debugging purposes. It activates more verbose logging and skips some processing steps. [Default: False]
  • --mem-profilePrint memory usage summary at the end of program execution. [Default: False]
  • --min-experiments MIN_EXPThreshold for group filters. This specifies the fraction (value < 1) or absolute number (value >= 1) of experiments passing per-experiment filters (i.e. minreads, minpos, etc.) that must pass individually in order to pass an LSV or junction. If greater than the total number of experiments in a group, requires all experiments to pass individually. [Default: 0.5]
  • --minreads MINREADSThreshold on the minimum total number of reads for any junction or intron to meet per-experiment filters for the LSVs it is a part of. When the minimum numbers of reads and positions (--minpos) are both met in enough experiments for a group in any one junction/intron that is part of a LSV, the LSV is considered quantifiable in that group. [Default: 10]
  • --minpos MINPOSThreshold on the minimum total number of read positions with at least 1 read for any junction or intron to meet per-experiment filters for the LSVs it is a part of. When the minimum number of reads (--minreads) and positions are both met in enough experiments for a group in any one junction/intron that is part of a LSV, the LSV is considered quantifiable in that group. [Default: 3]
  • --keep-tmpfilesWhen this argument is specified, majiq heterogen will not remove the psi files that are temporary generated during the execution [Default: 0]
  • --psi-samples PSI_SAMPLESNumber of PSI samples to take per LSV junction. If equal to 0, use expected value only. [Default: 100]
  • --stats {TTEST,WILCOXON,TNOM,INFOSCORE,ALL} [{TTEST,WILCOXON,TNOM,INFOSCORE,ALL} ...]Test statistics to run. TTEST: unpaired two-sample t-test (Welch's t-test). WILCOXON: Mann-Whitney U two-sample test (nonparametric). TNOM: Total Number of Mistakes (nonparametric). INFOSCORE: TNOM but threshold maximizing mutual information with group labels (nonparametric). ALL: use all other available test statistics. [Default: ['TTEST', 'WILCOXON', 'TNOM']]
  • --test_percentile TEST_PERCENTILEFor each one of the statistical tests, we combine all pvalue per psi sample by percentile calculation. This argument allows the user define with percentile they want to use [Default: 0]
  • --visualization-std VISUALIZATION_STDChange stochastic estimation error in terms of standard deviation of discretized average posterior per group by sampling additional values of PSI when number of samples is low [Default: 0.010000]

Required specification of groups:

  • -grp1 FILES1 [FILES1 ...]: Paths to MAJIQ files for the experiment(s) to quantify for first group (aggregated as replicates if deltapsi, independently if heterogen)
  • -grp2 FILES2 [FILES2 ...]: Paths to MAJIQ files for the experiment(s) to quantify for first group (aggregated as replicates if deltapsi, independently if heterogen)
  • -n NAME_GRP1 NAME_GRP2, --names NAME_GRP1 NAME_GRP2: The names that identify the groups being compared.