Difference between revisions of "RNA-seq Pipeline for Known Transcripts"
From Bridges Lab Protocols
Davebridges (Talk | contribs) (added categories) |
Davebridges (Talk | contribs) (added information about using FastQC) |
||
Line 1: | Line 1: | ||
==Sequence Quality and Trimming== | ==Sequence Quality and Trimming== | ||
# Run FASTQC to assess quality of reads from sequencer and: | # Run FASTQC to assess quality of reads from sequencer and: | ||
+ | # FASTQC available at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ | ||
+ | ## Open run_fastqc on a windows machine. Individually open each sequence file and allow it to analyse. Save this report. | ||
+ | ## Check this report to decide if sequences need to be trimmed or discarded. | ||
+ | |||
+ | ===Filter Sequences Using FastX-Toolkit=== | ||
# Filter for quality, if applicable | # Filter for quality, if applicable | ||
# Trim, if applicable | # Trim, if applicable | ||
− | |||
==Generate a Reference Genome== | ==Generate a Reference Genome== |
Revision as of 20:12, 3 October 2011
Contents
Sequence Quality and Trimming
- Run FASTQC to assess quality of reads from sequencer and:
- FASTQC available at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
- Open run_fastqc on a windows machine. Individually open each sequence file and allow it to analyse. Save this report.
- Check this report to decide if sequences need to be trimmed or discarded.
Filter Sequences Using FastX-Toolkit
- Filter for quality, if applicable
- Trim, if applicable
Generate a Reference Genome
- Run bowtie-build to generate Burroughs Wheeler transformed reference genome (.ebwt format).
- http://bowtie-bio.sourceforge.net/index.shtml (bowtie, tophat, and cufflinks are here).
- [Optional input and parameter settings are in square brackets.]
- <Required parameters are in greater than/less than brackets.>
- This BW transformed reference genome can be created once then used repeatedly in the future.
- $ is the command prompt.
$ bowtie-build [-f specifies reference genome is in fasta format] <path to input reference genome (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa)> <base name for reference genome output .ebwt files (e.g hg19)>
Align Reads to Reference Genome with Tophat
Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.
$ tophat [-p #processors -o ./output_directory] <./reference genome in both .ebwt and fasta formats (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19)> <reads file to be aligned (e.g. s_1_1_sequence.fastq)> $ tophat -p 5 -o ./HG19/tophat_out_hg19_001_trimmed /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19 ./HG19/Rich_trim/A_1_16_85.fastq
Use Cuffcompare to Generate .gtf Reference
Run cuffcompare to create .gtf format reference genome from a generic reference genome. Note that cuffcompare adds the tss_id and p_id columns that you will need in cuffdiff. This .gtf reference can be created once then used repeatedly in the future.
$ cuffcompare [-o ./output_directory] < input file twice (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf )> $ cuffcompare -o ./cuffcompare_out /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf
Use Cuffdiff to Identify Differentially Expressed Transcripts
Run cuffdiff to identify differentially abundant transcripts.
$ cuffdiff [-p #processors -o ./output_directory –L label1,label2,etc. –T (for time series data) –N (use upper quantile normalization –compatible_hits_norm (use reference hits in normalization) –b (use reference transcripts to reduce bias, include path to file e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa) –u (improve multi-read weighting) ] <transcripts.gtf (produced by cuffcompare) sample_A_accepted_hits1.bam, sample_A_accepted_hits2.bam,etc (all produced by tophat) sample_B_accepted_hits1.bam,sample_B_accepted_hits2.bam, etc> $ cuffdiff -o ./HG19/Cuffdiff_out_options_b_u_N_compatible/ -p 14 -L Control,PUF_kd --no-update-check -b /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa -u -N --compatible-hits-norm /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/cuffcompare_out.combined.gtf /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_001_trimmed/accepted_hits.bam /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_002_trimmed/accepted_hits.bam