To perform splicing QTL (sQTL) analysis using LeafCutter you’ll first need to preprocess your RNA-seq data as described in Steps 1 and 2 under Differential Splicing. sQTL analysis is a little more involved than differential splicing analysis, but we provide a script
scripts/prepare_phenotype_table.py intended to make this process a little easier. We assume you will use FastQTL for the sQTL mapping itself, but reformatting the output if you want to use another tool (e.g. MatrixEQTL ) should be reasonably straightforward. The script is pretty simple: a) calculate intron excision ratios b) filter out introns used in less than 40% of individuals or with almost no variation c) output these ratios as gzipped txt files along with a user-specified number of PCs.
You’ll need the
scipy Python packages installed, e.g.
pip install sklearn
for the PCA calculation.
Usage is e.g.
python scripts/prepare_phenotype_table.py example_data/testYRIvsEU_perind.counts.gz -p 10
-p 10 specifies you want to calculate 10 PCs for FastQTL to use as covariates.
tabix indices. To generate these you’ll need
bgzip which you may have as part of
samtools, if not they’re now part of
htslib, see https://github.com/samtools/htslib for installation instructions (alternatively
apt-get install tabix worked for me in Ubuntu 14.04). With these dependencies installed you can run the script created by and pointed to by the output of
We assume you’ll run FastQTL separately for each chromosome: the files you’ll need will have names like
testYRIvsEU_perind.counts.gz.qqnorm_chr21.gz. The PC file will be e.g. testYRIvsEU_perind.counts.gz.PCs.