SpliceMap

Latest News: SpliceMap 3.3.5.2 -- Faster and more accurate ... read more

Tutorial

This tutorial will help you get started with SpliceMap by demonstrating how to search for junctions in 100k sample RNA-seq reads of length 100bp from chromosome 21 in the human genome (hg18). If you experience any problems following these steps, please don't hesitate to contact us.

Step 1 - Download and extract the example files

Download the example:

SpliceMap 3.3.5.2 example (Linux-x86 64bit) This is the recommended version for everyone.
SpliceMap 3.3.5.2 example (Linux-x86 32bit) This is the 32-bit version, if your system requires it.
SpliceMap 3.3.5.2 example (OSX 64bit) This is the Mac OSX (intel-64 bit) version.

Extract the example to an empty folder of your choice. After extracting the folder should contain the following files and folders:

dn800c9107:SpliceMap3352_example_OSX-64 moo$ ls
INSTALL			data			src
LICENSE			genome			temp
all.gene.refFlat.txt	output
bin			run.cfg

The tutorial will be given with the OSX version. However, the steps are the same for all versions.

Step 2 - Build SpliceMap from source (optional)

Try running "./bin/runSpliceMap" in the example folder. If you see the following output, you are ok and you may skip this step.

dn800c9107:SpliceMap3352_example_OSX-64 moo$ ./bin/runSpliceMap
---== Welcome to SpliceMap 3.3.5.2 (55) ==---
Developed by Kin Fai Au and John C. Mu
http://www.stanford.edu/group/wonglab/SpliceMap/
__________
usage: ./runSpliceMap run.cfg
  run.cfg  --  Configuration options for this run, see comments in file for details
See website for further details

However, if you see something like

[johnmu@solomon-0-10 SpliceMap3352_example_linux-32]$ ./bin/runSpliceMap 
./bin/runSpliceMap: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./bin/runSpliceMap)
./bin/runSpliceMap: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by ./bin/runSpliceMap)

Then the C++ standard libraries in your Linux distribution are not compatible and you need to build SpliceMap from source by following these instructions (for 64-bit systems):

  1. Navigate to the "src" directory in the example folder
  2. Type "./install.sh ../bin", this will install SpliceMap into the example bin directory for the purposes of this tutorial. Of course, you can install it anywhere you like in future.
  3. Type "./install-bowtie.sh ../bin", this will install Bowtie into the example bin directory.

or these instructions (for 32-bit systems):

  1. Navigate to the "src" directory in the example folder
  2. Type "./install-32.sh ../bin", this will install SpliceMap into the example bin directory for the purposes of this tutorial. Of course, you can install it anywhere you like in future.
  3. Type "./install-bowtie-32.sh ../bin", this will install Bowtie into the example bin directory.

SpliceMap is now ready to run and you are ready to move to the next step!

Step 3 - Examine the example directory contents

Before we continue, it will be helpful to learn the purpose of each file in this example. When you run SpliceMap on your data, all of these files can be in separate locations if you wish.
run.cfg
This is the most important file. It is a text file that contains the path to your sequencer reads, path to your genome files and the configuration settings. Please see .cfg file format for details. It is simple to edit and you will need to edit it once for each data-set.
genome directory
This directory contains all of the chromosomes of your organism and the Bowtie index of the same genome. It may be read-only. In this example, we only have chr21 and its associated Bowtie index in the genome directory. For instructions on how to obtain the genome/bowtie index files see the manual.
data directory
This directory contains all of the sequencer reads in the example. In your case, this directory could be anywhere and it may be read-only.
temp directory
This is a temporary directory created during the execution of SpliceMap. The results of the initial short reads mapping is stored here, so this directory can be quite large.
output directory
This is directory stores all the useful output after executing SpliceMap. It is also created during the execution of SpliceMap
all.gene.refFlat.txt
This file contains all the known (hg18) gene annotation from Ensembl, RefSeq and knowngene. It is provided for your convenience and may be used to find novel junctions.
bin directory
This is directory stores all of the SpliceMap binaries. It is important that all the binaries are in the same location. No installation is required! Simply copy this directory to a location convenient for you.
src directory
This is directory stores all of the SpliceMap/Bowtie sources.

Step 4 - Run SpliceMap on the example data

Only one command is need to to initiate SpliceMap.

Make sure your terminal is pointed to the example folder and type the following in one line:

./bin/runSpliceMap run.cfg

You should then see some output:

dn800c9107:SpliceMap3352_example_OSX-64 moo$ ./bin/runSpliceMap run.cfg
---== Welcome to SpliceMap 3.3.5.2 (55) ==---
Developed by Kin Fai Au and John C. Mu
http://www.stanford.edu/group/wonglab/SpliceMap/
__________
Loading configuration file... run.cfg
output directory exists
temp directory exists
Scaning genome: genome/chr*.fa
List of chromosomes to be searched: 
chr21 | genome/chr21.fa | pos:7 - 47883217
Please check that these are correct... continuing in 7 s ...  < control-c >  to exit
If they are not correct please check chromosome_wildcard: chr*.fa
__________
Temp directory:   temp/
Output directory: output/
Maximum number of multiple mapped reads allowed: 10
Maximum number of mismatches allowed in 25-mer seed: 1
Maximum number of mismatches allowed in full read: 2
Maximum number of bases SpliceMap is allowed to clip: 40
Mapper used: bowtie
(25th-percentile) intron size: 20000
(99th-percentile) intron size: 400000
Annotations path:  name: all.gene.refFlat.txt
Package path:     ./bin/ name: runSpliceMap
Read format: RAW
Number of threads: 2
Number of chromosomes to run together: 2
Will print Cufflinks compatible SAM file
Reads List 1:
data/long_reads_1_100K.txt.seq
Reads List 2:
data/long_reads_2_100K.txt.seq
Preparing the reads!...
Bases removed from front: 0
Using as many bases as possible.
Extracting 25-mers... 
...
...

At this point, feel free to take a break. After about 3-4 minutes the the mapping and junction search will be completed.

Step 5 - Examining the output

All of the output from SpliceMap is automatically copied to the "output" directory. After this execution, it should contain:
dnab4167d9:output moo$ ls
coverage_all.wig		junction_color.bed
coverage_down.wig		junction_color.new.bed
coverage_up.wig			junction_nUM_color.bed
debug_logs			junction_nUM_color.new.bed
good_hits.sam			log

The following is a description of each output file:

junction_color.bed
This file contains the junctions found on all chromosomes. The novel junction are highlighted in red. The faded junctions are not well supported. It may be displayed on UCSC genome browser or cisGenome browser. The tag associated with each junction is explained in the output formats.
junction_color.new.bed
This file contains the junctions from junction_color.bed not found in all.gene.refFlat.txt
junction_nUM_color.bed
This file contains the junctions from junction_color.bed that are supported by at least one uniquely mappable read
junction_nUM_color.new.bed
This file contains the junctions from junction_nUM_color.bed not found in all.gene.refFlat.txt
coverage_up.wig
This file contains the coverage supported by uniquely mappable reads at each chromosome position.
coverage_down.wig
This file contains the coverage supported by only multiply mappable reads at each chromosome position.
coverage_all.wig
This file contains the coverage of all mapped reads at each chromosome position.
good_hits.sam
This file contains all mapped reads in SAM format.
debug_logs
This folder contains the logs of the output. If you experience problems, please send us the contents of this folder. Otherwise, you can safely ignore it.

Step 6 - Learning how to apply this tutorial to your own data

See the Using SpliceMap section of the manual.