Voila
MODULIZE
Voila provides a method of detecting specific classes / structures of splicing events. It is a comprehensive processor, capable of working with psi, deltapsi, or heterogen majiq analysis separately or combined into one detection pipeline. The modulizer detects both classic (binary) splicing events as well as a number of relatively common, interesting looking events specifically, as well as providing an output summary relevant to detecting common, novel structures that may be specific to your use dataset. The general usage workflow, as well as moduizer's comprehensive filtering and special use case options are provided below.
Quick Start
Modlizer comes with a reasonable set of default options to get you started quickly. You can use the tool with the following command
voila categorize [splicegraph.sql] [voila file(s)] -d [output directory] -j [threads]
Full Usage reference
The usage statement for voila modulize --help
is the following:
usage: voila modulize [-h] [--overwrite] [--ignore-inconsistent-group-errors] [--only-binary] [--untrimmed-exons] [--show-all] [--heatmap-selection {shortest_junction,max_abs_dpsi}] [--gene-ids [GENE_IDS [GENE_IDS ...]]] [--debug-num-genes DEBUG_NUM_GENES] [--output-mpe] [--putative-multi-gene-regions] [--keep-constitutive [KEEP_CONSTITUTIVE]] [--keep-no-lsvs-modules] [--keep-no-lsvs-junctions] [--decomplexify-psi-threshold DECOMPLEXIFY_PSI_THRESHOLD] [--decomplexify-deltapsi-threshold DECOMPLEXIFY_DELTAPSI_THRESHOLD] [--decomplexify-reads-threshold DECOMPLEXIFY_READS_THRESHOLD] [--changing-between-group-dpsi CHANGING_BETWEEN_GROUP_DPSI] [--non-changing-between-group-dpsi NON_CHANGING_BETWEEN_GROUP_DPSI] [--changing-between-group-dpsi-secondary CHANGING_BETWEEN_GROUP_DPSI_SECONDARY] [--non-changing-pvalue-threshold NON_CHANGING_PVALUE_THRESHOLD] [--non-changing-within-group-IQR NON_CHANGING_WITHIN_GROUP_IQR] [--changing-pvalue-threshold CHANGING_PVALUE_THRESHOLD] [--probability-changing-threshold PROBABILITY_CHANGING_THRESHOLD] [--probability-non-changing-threshold PROBABILITY_NON_CHANGING_THRESHOLD] -d DIRECTORY [-j NPROC] [--debug] [-l LOGGER] [--silent] files [files ...]
positional arguments:
- files
- List of files or directories which contains the splice graph and voila files.
optional arguments:
- -h, --help
- show this help message and exit
- --overwrite
- If there are files inside of specified --directory, delete them and run classifier anyway
- --ignore-inconsistent-group-errors
- Don't show any warnings / errors when multiple experiments with the same name, but different experiments are analyzed
- --only-binary
- Do not show "complex" modules in the output -- that is, modules with more than one splicing event.
- --untrimmed-exons
- Display original Exon coordinates instead of Trimmed coordinates in output TSVs
- --show-all
- By default, we find classifications for events which are changing (between multiple analysis). Using this switch bypasses this and shows all events
- --heatmap-selection {shortest_junction,max_abs_dpsi}
- For the classifier output "heatmap", the quantification values may be derived from either the shortest junction in the module (default), or optionally, if a het or dpsi file is provided, from the junction with the maximum dpsi value
- -j NPROC, --nproc NPROC
- Number of processes used to produce output. Default is half of system processes.
- --debug
- Exit on errors, move verbose logging
- -l LOGGER, --logger LOGGER
- Set log file and location. There will be no log file if not set.
- --silent
- Do not write logs to standard out.
Limit the number of data processed to a specific target subset:
- --gene-ids [GENE_IDS [GENE_IDS ...]]
- Gene IDs, separated by spaces, which should remain in the results. e.g. GENE_ID1 GENE_ID2 ...
- --debug-num-genes DEBUG_NUM_GENES
- Modulize only n many genes, useful to see an excerpt of the functionality without waiting for a full run to complete.
Alternative use cases / run modes for specialized applications of modulizer:
- --output-mpe
- Outputs tsv with primer targetable regions upstream and downstream of every module: takes into account trimmed exons, constitutive upstream/downstream exons, and in a format where it is easy to programmatically design primers.
- --putative-multi-gene-regions
- Only output a single TSV file describing regions found in inputs with complete breaks in the gene (no junctions connecting at all). Implies "--keep-constitutive"
Include or exclude junctions / modules based on structure or data availability:
- --keep-constitutive [KEEP_CONSTITUTIVE]
- Do not discard modules with only one junction, implies "--show-all-modules". Turns on output of constitutive.tsv and constitutive column in summary output
- --keep-no-lsvs-modules
- Do not discard modules that are unquantified my Majiq (no LSVs found)
- --keep-no-lsvs-junctions
- If there are no LSVs attached to a specific junction, retain the junction instead of removing it
Options for 'decomplexifier': removing junctions based on simple criteria prior to creating modules:
- --decomplexify-psi-threshold DECOMPLEXIFY_PSI_THRESHOLD
- Filter out junctions where PSI is below a certain value (between 0.0 and 1.0). If multiple input files are used, only the highest PSI value is used. If 0 (or 0.0) is specified, no filtering fill be done. The default is "0.05".
- --decomplexify-deltapsi-threshold DECOMPLEXIFY_DELTAPSI_THRESHOLD
- Filter out junctions where abs(E(dPSI)) is below a certain value (between 0.0 and 1.0). If multiple input files are used, only the biggest difference (dPSI) value is used. If 0 (or 0.0) is specified, no filtering fill be done. The default is "0.0".
- --decomplexify-reads-threshold DECOMPLEXIFY_READS_THRESHOLD
- Filter out junctions where the number of reads is below a certain value (integer). If multiple input files are used, only the biggest number of reads is used. The default is "1".
Adjust the parameters used for determining whether a junction / module is changing or non-changing based on dpsi or heterogen file inputs:
- --changing-between-group-dpsi CHANGING_BETWEEN_GROUP_DPSI
- For determining changing with HET or dPSI inputs. This is the maximum absolute difference in median values of PSI for HET inputs (or E(dPSI) for dPSI inputs) for which an LSV/junction can be marked changing. The default is "0.2"
- --non-changing-between-group-dpsi NON_CHANGING_BETWEEN_GROUP_DPSI
- For determining non-changing with HET or dPSI inputs. This is the maximum absolute difference in median values of PSI for HET inputs (or E(dPSI) for dPSI inputs) for which an LSV/junction can be marked non-changing. The default is "0.05"
- --changing-between-group-dpsi-secondary CHANGING_BETWEEN_GROUP_DPSI_SECONDARY
- Set the secondary changing event definition. In order to be considered "changing", any junction in an event must meet the other changing definitions, and ALL junctions in an event must meet this condition (DPSI value of the junction >= this value). Applies to HET or delta-PSI inputs The default is "0.1".
Adjust the parameters used for determining whether a junction / module is changing or non-changing based on heterogen file inputs:
- --non-changing-pvalue-threshold NON_CHANGING_PVALUE_THRESHOLD
- For determining non-changing with HET inputs. Minimum p-value for which an LSV/junction can return true. Uses minimum p-value from all tests provided. The default is "0.05".>
- --non-changing-within-group-IQR NON_CHANGING_WITHIN_GROUP_IQR
- For determining non-changing with HET inputs. Maximum IQR within a group for which an LSV/junction can return true. The default is "0.1".
- --changing-pvalue-threshold CHANGING_PVALUE_THRESHOLD
- For determining changing with HET inputs. Maximum p-value for which an LSV/junction can return true. Uses maximum p-value from all tests provided. The default is "0.05".
Adjust the parameters used for determining whether a junction / module is changing or non-changing based on dpsi file inputs:
- --probability-changing-threshold PROBABILITY_CHANGING_THRESHOLD
- The default is "0.95"
- --probability-non-changing-threshold PROBABILITY_NON_CHANGING_THRESHOLD
- The default is "0.95"
required named arguments:
- -d DIRECTORY, --directory DIRECTORY
- All generated TSV files will be dumped in this directory