Changes

RNA-seq Pipeline for Known Transcripts

236 bytes added, 13:37, 14 September 2011

updated formatting

~~RNA-Seq pipeline – known transcripts~~==Sequence Quality and Trimming==1. # Run ~~FASTQ~~ FASTQC to assess quality of reads from sequencer and:a. # Filter for quality, if applicableb. # Trim, if applicablec. # FASTQC available at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

2. ==Generate a Reference Genome==# Run bowtie-build to generate Burroughs Wheeler transformed reference genome (.ebwt format). a. # http://bowtie-bio.sourceforge.net/index.shtml (bowtie, tophat, and cufflinks are here).b. # [Optional input and parameter settings are in square brackets.] c. # <Required parameters are in greater than/less than brackets.> d. # This BW transformed reference genome can be created once then used repeatedly in the future. e. # $ is the command prompt.

<pre>

$ bowtie-build [-f specifies reference genome is in fasta format] <path to input reference genome (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa)> <base name for reference genome output .ebwt files (e.g hg19)>

</pre>==Align Reads to Reference Genome with Tophat==3. Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.<pre>

$ tophat [-p #processors -o ./output_directory] <./reference genome in both .ebwt and fasta formats (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19)> <reads file to be aligned (e.g. s_1_1_sequence.fastq)>

$ tophat -p 5 -o ./HG19/tophat_out_hg19_001_trimmed /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19 ./HG19/Rich_trim/A_1_16_85.fastq

</pre>4==Use Cuffcompare to Generate . gtf Reference==Run cuffcompare to create .gtf format reference genome from a generic reference genome. Note that cuffcompare adds the tss_id and p_id columns that you will need in cuffdiff. This .gtf reference can be created once then used repeatedly in the future.<pre>

$ cuffcompare [-o ./output_directory] < input file twice (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf )>

$ cuffcompare -o ./cuffcompare_out /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf

</pre>

==Use Cuffdiff to Identify Differentially Expressed Transcripts==5. Run cuffdiff to identify differentially abundant transcripts.<pre>

$ cuffdiff [-p #processors -o ./output_directory –L label1,label2,etc. –T (for time series data) –N (use upper quantile normalization –compatible_hits_norm (use reference hits in normalization) –b (use reference transcripts to reduce bias, include path to file e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa) –u (improve multi-read weighting) ] <transcripts.gtf (produced by cuffcompare) sample_A_accepted_hits1.bam, sample_A_accepted_hits2.bam,etc (all produced by tophat) sample_B_accepted_hits1.bam,sample_B_accepted_hits2.bam, etc>

$ cuffdiff -o ./HG19/Cuffdiff_out_options_b_u_N_compatible/ -p 14 -L Control,PUF_kd --no-update-check -b /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa -u -N --compatible-hits-norm /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/cuffcompare_out.combined.gtf /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_001_trimmed/accepted_hits.bam /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_002_trimmed/accepted_hits.bam

</pre>

Davebridges

Bureaucrat, administrator

915

edits

Changes

RNA-seq Pipeline for Known Transcripts

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools