What statistic should I use for HET differential splicing analysis?

MAJIQ HET analysis includes two types of statistics used to identify differentially spliced LSVs between two conditions: (1) The difference between the medians of the estimated E[PSI] of all samples in each condition (2) A test statistic yielding a matching p-value for a statistical test run over all the estimated E[PSI] against a null distribution. There are several statistical tests implemented in the HET package: Wilcoxon, TNOM, Info, and t-test.

In general we recommend combining both types of criteria (1) and (2) listed above.

The threshold on (1) should reflect the magnitude of phenotype (splicing change) you are interested in. A commonly used threshold for significant changes is 20% and a more moderate threshold commonly used in 10%. In general, we would not recommend going below 10% given the stochastic nature of splicing, the limited amount of reads observed in typical RNA-Seq experiments and the consequent variability observed in many studies for splicing across replicates/individuals.

For criteria (2), users can test which statistic/threshold gives them the best performance using the evaluation criteria and matching evaluation package included in XXXX. In general, when dealing with highly variable data and small sample size (e.g. n<5 per group) we found that using TNOM score = 0 combined with criteria (1) above gives a highly reproducible set of differentially spliced LSVs which also validate well with RT-PCR (see details in paper XXXX). A TNOM score = 0 means the set of E[PSI] in each group are completely separable while criteria (1) controls the separation of the medians. However, users should note that the threshold in this case is set over the TNOM score statistic, not the TNOM p-value. This is due to the discrete nature of the TNOM p-value, making it “choppy”, not well calibrated, and with a limited minimal p-value for low n. For n > 5 we found the Wilcoxon p-value to generally work well though in specific settings other tests may perform better. Info score is significantly slower to compute and is therefore turned off by default. For more conservatives sets of LSVs the user can apply several/all statistical tests and require the LSV to pass all of them. A common used threshold on the statistical test is p < 0.05.

Finally, we note that the filtering criteria described above should not be viewed as a calibrated p-value: As shown in the paper XXX, applying any of the statistical tests in (2) over a single randomly selected bootstrap sample per experiment/LSV does give a well defined and calibrated p-value which can be corrected for multiple hypotheses testing. However, this is not the recommended procedure for most use cases as we are typically interested in LSVs which exhibit biologically significant changes and not just deviation from a theoretical null distribution. Specifically, there is no specific null distribution that combines both criteria (1) and (2) above, there is no null distribution for combining the statistical tests together and no procedure for multiple hypotheses correction over such criteria. Instead, we recommend users to apply our RR and IIR criteria (using the package supplied in the XXX paper) on their data to get a more realistic sense of reproducability and possible FP in their specific dataset.