Using GSEA for Gene Set Analyses of Transcriptional Data

From Bridges Lab Protocols
Jump to: navigation, search

Software

First download the GSEA Desktop Application from http://software.broadinstitute.org/gsea/downloads.jsp (you may need to register first)

Input File Format

  • This protocol is for using a pre-ranked gene list
  • You should have a list of all analysed genes, sorted be gene expression changes (log fold change or fold change) from highest differentially expressed to lowest.
  • For mouse genes you have two options
    • Capitalize them to make them look like human gene symbols. This is mostly ok but some genes will be missed (i.e. if RUNX1 and Runx1 are not the same gene in human and mouse respectively)
    • Use the conversion table maintained at the Jackson Laboratory http://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt. This is a table that you can load into R that has both the mouse and human orthologues. Below is an r-script that will format this into a table you can merge with your dataset, then find the human ortholog based on Homologene:
library(tidyr)
library(dplyr)
human_mouse_table_file <- 'http://www.informatics.jax.org/downloads/reports/HOM_MouseHumanSequence.rpt'
human_mouse_table <- read.table(human_mouse_table_file, sep="\t", header=T)
human.to.mouse <- 
  human_mouse_table %>%
  select(Common.Organism.Name, Symbol, HomoloGene.ID) %>%
  distinct(HomoloGene.ID,Common.Organism.Name, .keep_all = T) %>%
  spread(Common.Organism.Name, Symbol) %>%
  rename("mouse"="mouse, laboratory")


  • You need to save a table with the following parameters:
    • Must have two columns one with the gene name and the other with the numerical value (fold change or log fold change).
    • Must be saved as a tab-delimited file (i generally use a .rnk suffix, but any suffix is fine).

Running GSEA

  • Load GSEA, and drag your file into the box under method 3
  • Click on Run GSEAPreranked on the left panel
  • Under Basic fields click Show then give you analysis a name (usually based on the gene set database used)
  • Under Gene Sets Database click on the three dots then select the gene set you want to analyse first. Some really useful sets consider analysing:
    • KEGG which is metabolic pathways (c2.cp.kegg.v6.2.symbols.gmt)
    • Reactome which is a different metabolic pathway database (c2.cp.reactome.v6.2.symbols.gmt)
    • GO which is all gene ontologies (c5.all.v6.2.symbols.gmt)
    • CGP which is chemical and genetic perturbations from other experiments (c2.cgp.v6.2.symbols.gmt)
    • TRANSFAC which is a dataset of transcription factor regulated genes (c3.tft.v6.2.symbols.gmt)
  • Click on Run on the bottom of the screen. Data will be saved in the folder noted under basic fields