RNA-seq Pipeline for Known Transcripts: Difference between revisions

added instructions for barcode splitting
added section for aligning reads with bowtie
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Reference Pages==
* http://seqanswers.com/wiki/How-to/RNASeq_analysis
==Sequence Quality and Trimming==
==Sequence Quality and Trimming==
# Run FASTQC to assess quality of reads from sequencer and:
# Run FASTQC to assess quality of reads from sequencer and:
Line 21: Line 24:
* Example command where s_2_100.txt is the original file, mybarcodes.txt is the barcode file, 2 mismatches are allowed (default is 1).  This will generate files /tmp/bla_BC#.txt:
* Example command where s_2_100.txt is the original file, mybarcodes.txt is the barcode file, 2 mismatches are allowed (default is 1).  This will generate files /tmp/bla_BC#.txt:
<pre>
<pre>
cat s_2_100.txt | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt --bol --mismatches 2 \
cat s_2_100.txt | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt --bol --mismatches 2 --prefix /tmp/bla_ --suffix ".txt"
--prefix /tmp/bla_ --suffix ".txt"
</pre>
</pre>
# Filter for quality, if applicable
# Filter for quality, if applicable
# Trim, if applicable
# Trim, if applicable using Fast-x.  The following keeps 100% of the reads with a quality of 25 or greater:
<pre>
fastq_quality_filter -v -q 25 -p 100  -i  control-reads.txt -o control-reads-quality25.txt
</pre>


==Generate a Reference Genome==
==Align Reads==
# Run bowtie-build to generate Burroughs Wheeler transformed reference genome (.ebwt format).   
This can be done with either TopHat or Bowtie, so choose one of the followingThe reference genomes are located in the following locations:
# http://bowtie-bio.sourceforge.net/index.shtml (bowtie, tophat, and cufflinks are here).
<pre>
# [Optional input and parameter settings are in square brackets.] 
/database/davebrid/RNAseq/reference-genomes/hg19
# <Required parameters are in greater than/less than brackets.>
/database/davebrid/RNAseq/reference-genomes/mm9
# This BW transformed reference genome can be created once then used repeatedly in the future.  
</pre>
# $ is the command prompt.
These reference alignments are pre-built UCSC genomes and downloaded from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/


===Align Reads to Reference Genome with Bowtie===
Run bowtie to align reads to reference genomes.  The following generates a sam formatted alignment using the best quality flag for reads aligned to hg19
<pre>
<pre>
$ bowtie-build [-f specifies reference genome is in fasta format] <path to input reference genome (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa)> <base name for reference genome output .ebwt files (e.g hg19)>
bowtie --sam --best /database/davebrid/RNAseq/reference-genomes/hg19/hg19 control-reads-quality25.txt control-aligned-quality25.sam
</pre>
</pre>
==Align Reads to Reference Genome with Tophat==
 
===Align Reads to Reference Genome with Tophat===
Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.
Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.
<pre>
<pre>