Mining the GEO database for perturbation data sets

- Launch GEOracle online -

Suitable for small one off analyses of less than 10 GSE.

- Download GEOracle standalone -

Suitable for large and repeated analyses including many GSEs.

- Visit GEOracle on GitHub -

See and contribute to the source code.

What is GEOracle?

GEOracle is a R Shiny app that greatly speeds up the identification and processing of large numbers of perturbation microarray gene expression data sets from GEO. It uses text mining of the GEO metadata along with machine learning techniques to automatically:

  1. Identify perturbation GSEs
  2. Cluster GSM samples
  3. Label clusters as control or perturbation
  4. Pair each perturbation cluster with it's control.
  5. Identify the molecule that is being perturbed and the direction of perturbation.

It provides this information via an interactive interface that allows the user to verify and change the details of the perturbation experiments. The work flow is described below.

Installing GEOracle

  1. Make sure you have an up to date version of R (3.3+) and the Shiny package installed.
  2. Download GEOracle and then unzip it somewhere permanent
  3. Open server.R or ui.R in RStudio
  4. Click 'Run App' in the top right corner of the screen

GEOracle will then attempt to install all neccessary packages and download the full database of metadata from GEO (~5GB of space required). Once complete GEOracle will open as a Shiny app in a web browser or as a new R studio window. The user can then upload a list of GSE IDs and begin the analysis.

GEOracle analysis steps

- See a detailed case study of how to use GEOracle here -

  1. The user uploads a list of GSE IDs (simple .txt file, one GSE ID per line)
  2. The metadata for each GSE will then be analysed and clustering / labelling / matching will occur.
  3. The user decides whether to use the default filters for which GSEs to process, or manually set desired filters, then click "COMPUTE".
  4. GEOracle will now try to detect the names of the perturbed molecules and direction of perturbation for each GSE that passes filtering.
  5. Once the GSE IDs appear in the "Processed GSEs" table, the user clicks them one by one to modify / verify / remove the comparisons.
  6. Once the user is happy with the details of every GSE, they click "NEXT STEP" and enter a desired output directory.

GEOracle will then automatically perform differential expression using the limma method as implemented in NCBI's GEO2R. As well as differential expression results GEOracle will output a list of edges that can be used to instantly build a causal gene regulatory network based on the set of input perturbation experiments. This edge list can be loaded straight into Cytoscape or R's igraph package.

If you are having trouble installing or running GEOracle, please follow the below troubleshooting tips

This website is still under construction. Please contact with any queries