###############
MAJIQ-L
###############
MAJIQ-L takes an input three sources of information: Transcriptome annotation; short reads processed by MAJIQ v2; and long reads in gtf format, processed by the user’s algorithm of choice. It then computes and displays an extensive set of statistics that contrast the available annotation and the two sequencing sources in terms of novel junctions, introns, coverage, inclusion levels, etc. such that existing gaps between the three sources can be captured
Using the three input sources, MAJIQ-L constructs unified gene splice graphs with all isoforms and all LSVs visible for analysis. This unified view is implemented in a new visualization package (VOILA v3), allowing users to inspect each gene of interest where the three sources agree or differ.
Note: for additional scripts/software required to generate all figures from the MAJIQ-L paper, including general purpose algorithms, please see the supplementary MAJIQ-L code repository `here `_
VOILA lr: Unified Visualization
-------------------------------
VOILA lr takes four inputs:
1. A splicegraph database (splicegraph.sql from MAJIQ Builder)
2. Voila files (SAMPLE_ID.psi.voila from MAJIQ quantifiers)
3. Long reads in gtf format, processed by the user’s algorithm of choice
4. A SAMPLE_ID.tsv, a tab separated file, containing transcript read number processed by the user’s algorithm of choice like:
| transcript_id_1 transcript_read_1
| transcript_id_2 transcript_read_2
| transcript_id_3 transcript_read_3
For example, use the following command options:
+------------------------------------------------------------+
| | voila lr |
| | --lr-gtf-file /PAHT/TO/SAMPLE_ID.gtf |
| | --lr-tsv-file /PAHT/TO/SAMPLE_ID.tsv |
| | -sg /PATH/TO/splicegraph.sql |
| | -o OUTPUT_FOLDER |
| |
| Default: 1 |
+------------------------------------------------------------+
Output is the resulting long read voila file (recommended extension in .lr.voila format)
To display the unified splicegraph, the output of voila lr (SAMPLE_ID.lr.voila) is given as input to `VOILA v3 `_. Users may use voila view as a server to display results as shown `here `_. For example, use the following command options:
+------------------------------------------------------------+
| | voila view |
| | /PATH/TO/splicegraph.sql |
| | /PAHT/TO/SAMPLE_ID.psi.voila |
| | /PATH/TO/SAMPLE_ID.lr.voila |
| | -p 7050 |
| | --host 0.0.0.0 |
| |
| Default: 1 |
+------------------------------------------------------------+
Additional details on usage can be found by adding --help to the subcommand of interest (e.g. voila view --help)
Examples of obtaining GTF and TSV files from different LR algorithms
====================================================================
Using `IsoQuant `_:
- **GTF file**: Obtain the GTF file containing the discovered expressed transcripts (``SAMPLE_ID.transcript_models.gtf`` should be provided) by running `IsoQuant `_.
- **TSV file**: Obtain the TSV file with read counts assigned to each transcript (``SAMPLE_ID.transcript_model_counts.tsv`` should be provided) by running `IsoQuant `_.
Using `FLAIR `_:
- **GTF file**: Obtain the GTF file containing the discovered expressed transcripts from the `flair collapse `_ step.
- **TSV file**: Obtain the TSV file with read counts assigned to each transcript from the `flair quantify `_ step. The output of ``flair quantify`` looks like the screenshot on the left below. You need to modify this file by removing the gene_id after the last underscore in the transcript IDs to match the format shown in the screenshot on the right. This modified TSV file should be provided as your input TSV file.
.. image:: flair_quantify.png
VOILA lr
--------
usage: voila lr [-h] [--voila-file VOILA_FILE] [--gene-id GENE_ID]
[--only-update-psi] [--lr-gtf-file LR_GTF_FILE]
[--lr-tsv-file LR_TSV_FILE] -o OUTPUT_FILE
[-sg SPLICE_GRAPH_FILE] [-j NPROC] [--debug] [--memory-map-hdf5]
optional arguments:
-h, --help show this help message and exit
--voila-file VOILA_FILE
This should be a .psi.voila file which we will match LSV
definitions to to run the beta prior. If not provided,
PSI values will not be rendered for long read LSVs
--gene-id GENE_ID Limit to a gene-id for testing
--only-update-psi Instead of re generating all data, only update the PSI
values. Requires -o to point to an existing .lr.voila
file, and --voila-file to be provided as well
-j NPROC, --nproc NPROC
Number of processes used to produce output. Default is
half of system processes.
--debug Show Verbose output
--memory-map-hdf5 by default, hdf5 voila files will be opened and read as
needed, however, for greater performance it may help to
instead preload these files into memory, if your server
has sufficient RAM. Use this option to memory map the
files. If used with view mode, you must also specify an
index file to save to with --index-file
-l LOGGER, --logger LOGGER
Set log file and location. There will be no log file if
not set.
--silent Do not write logs to standard out.
required named arguments:
--lr-gtf-file LR_GTF_FILE
path to the long read GTF file
--lr-tsv-file LR_TSV_FILE
path to the long read TSV file
-o OUTPUT_FILE, --output-file OUTPUT_FILE
the path to write the resulting voila file to
(recommended extension .lr.voila)
-sg SPLICE_GRAPH_FILE, --splice-graph-file SPLICE_GRAPH_FILE
the path to the majiq splice graph file which will be
used to align to annotated exons
Citation
--------
The paper describing MAJIQ-L algorithm is available at https://www.biorxiv.org/content/10.1101/2023.11.21.568046v1.