GEOracle case study: Identifying the conserved response to TGFB stimulation across human cell types.

In this case study we will demonstrate one typical use case of GEOracle; integrating multiple microarray data sets to rapidly identify the conserved transcriptomic signature of a particular perturbation. We are examining the conserved response to TGFB stimulation across 6 different experiments in various human tissues and cell lines. Our input is a text file containing 6 GSE IDs (one per line) (GSE14491, GSE16416, GSE17708, GSE23952, GSE28448 and GSE42373) that we identified as containing TGFB stimulation experiments in human cells. The file can be downloaded here: TGFB_Case_Study_GSE_IDs.txt




Setup (~1 min)


  1. Open GEOracle
  2. Upload ‘TGFB_Case_Study_GSE_IDs.txt’.
  3. Leave ‘Strictness’ as ‘Default’. Click ‘COMPUTE’


Verification (~5 min)


  1. We will now verify each GSE in the ‘Processed GSEs’ table
  2. Click on the first row to reveal the verification panel.

GSE14491: GEOracle has detected two comparisons in this multi-factor experimental design, but unfortunately looking at the wrong factor, p53 knockdown. The comparison representing TGFB treatment is ‘treated’ vs ‘untreated’ with ‘shGFP’. So we remove both existing comparisons... ... and then add a new comparison... ... selecting the appropriate samples... and giving an appropriate name ‘TGFB’. Then click ‘ADD THIS COMPARISON’ at the bottom of the page. Verification of this GSE is complete.

GSE16416: GEOracle has detected three comparisons in this multifactor experimental design. We only want the second comparison ‘treated with TGF’ vs ‘treated with medium’, so we remove the first and third comparisons. We rename the remaining comparison to ‘TGFB’.

GSE17708: This experiment contains 8 treatment time points so GEOracle has detected 8 comparisons. We will remove them all except for the final one representing the longest TGFB treatment (72 hour), and rename to ‘TGFB’.

GSE23952: Due to unusual placement of the replicate identifier GEOracle has failed to correctly identify the treatment cluster. We remove all existing comparisons and add a comparison with the correct clustering, naming it ‘TGFB’, as shown below.

GSE28448: GEOracle has detected multiple comparisons in this multifactor experimental design, but unfortunately none of them are the one we want. We remove them all, and add a new comparison with ‘ctrl+’ as treatment and ‘ctrl-‘ as control samples, as shown below.

GSE42373: GEOracle has detected two comparisons corresponding to the two time points in this experiment. We will only keep the second comparisons representing three days of TGFB treatment and rename it to ‘TGFB’.




Calculating Differential Expression (~3 min)


  1. We have verified all 6 GSE. Now click ‘NEXT STEP’.
  2. Enter a name for the output such as ‘TGFB_Case_Study’ and click ‘Calculate D.E.’
  3. Once processing is complete, download the results by clicking ‘Download Results’ and unzip them to your computer.



Conserved response (~1 min)


We now run the R script ‘TGFB_Case_Study_Script.R’, including setting the working directory to our unzipped output folder, to detect the conserved TGFB response across all 6 experiments. This script considers the top and bottom n differentially expressed genes (in this case 200) by t-statistic and selects those genes that show a conserved response in 3 or more experiments.