RNA-seq Pipeline for Known Transcripts: Difference between revisions
added categories |
added section for aligning reads with bowtie |
||
| (5 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
==Reference Pages== | |||
* http://seqanswers.com/wiki/How-to/RNASeq_analysis | |||
==Sequence Quality and Trimming== | ==Sequence Quality and Trimming== | ||
# Run FASTQC to assess quality of reads from sequencer and: | # Run FASTQC to assess quality of reads from sequencer and: | ||
# FASTQC available at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ | |||
## Open run_fastqc on a windows machine. Individually open each sequence file and allow it to analyse. Save this report. | |||
## Check this report to decide if sequences need to be trimmed or discarded. See http://bioinfo.cipf.es/courses/mda11/lib/exe/fetch.php?media=ngs_qc_tutorial_mda_val_2011.pdf for sample results and an explanation of the quality report | |||
===Filter Sequences Using FastX-Toolkit=== | |||
# If samples are barcoded use Fastx barcode splitter (see http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_barcode_splitter_usage for more details): | |||
<pre> | |||
/usr/local/bin/fastx_barcode_splitter.pl --bcfile FILE --prefix PREFIX [--suffix SUFFIX] [--bol|--eol] [--mismatches N] [--exact] [--partial N] [--help] [--quiet] [--debug] | |||
</pre> | |||
* This requires a barcode file in the format where BC# is the barcode number and the nucleotide names are the barcodes: | |||
<pre> | |||
#This line is a comment (starts with a 'number' sign) | |||
BC1 GATCT | |||
BC2 ATCGT | |||
BC3 GTGAT | |||
BC4 TGTCT | |||
</pre> | |||
* This file is the FILE for the --bcfile option | |||
* Example command where s_2_100.txt is the original file, mybarcodes.txt is the barcode file, 2 mismatches are allowed (default is 1). This will generate files /tmp/bla_BC#.txt: | |||
<pre> | |||
cat s_2_100.txt | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt --bol --mismatches 2 --prefix /tmp/bla_ --suffix ".txt" | |||
</pre> | |||
# Filter for quality, if applicable | # Filter for quality, if applicable | ||
# Trim, if applicable | # Trim, if applicable using Fast-x. The following keeps 100% of the reads with a quality of 25 or greater: | ||
<pre> | |||
fastq_quality_filter -v -q 25 -p 100 -i control-reads.txt -o control-reads-quality25.txt | |||
</pre> | |||
== | ==Align Reads== | ||
This can be done with either TopHat or Bowtie, so choose one of the following. The reference genomes are located in the following locations: | |||
<pre> | |||
/database/davebrid/RNAseq/reference-genomes/hg19 | |||
/database/davebrid/RNAseq/reference-genomes/mm9 | |||
</pre> | |||
These reference alignments are pre-built UCSC genomes and downloaded from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/ | |||
===Align Reads to Reference Genome with Bowtie=== | |||
Run bowtie to align reads to reference genomes. The following generates a sam formatted alignment using the best quality flag for reads aligned to hg19 | |||
<pre> | <pre> | ||
bowtie --sam --best /database/davebrid/RNAseq/reference-genomes/hg19/hg19 control-reads-quality25.txt control-aligned-quality25.sam | |||
</pre> | </pre> | ||
==Align Reads to Reference Genome with Tophat== | |||
===Align Reads to Reference Genome with Tophat=== | |||
Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line. | Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line. | ||
<pre> | <pre> | ||