Here we describe some details about the implementation of GEOracle that may not be immediately obvious.
GEOracle allows you to modify the criteria for the GSE that make it through for analysis and your inspection. Here we will describe what they mean. Species: If multiple species appear in your list of GSE IDs you can toggle between them here. The species with the most GSE is automatically selected. Because GEOracle utilizes species specific gene names it can only process GSE from one species at a time.
Min. / Max. Cluster Size: You can set your desired number of samples in a cluster to control the statistical power of your downstream analysis. A higher minimum means you get more replicates and hence more statistical power.
All clusters correct size: By default at least one cluster in the GSE has to have the number of samples in the range specified. By selecting this option you force all clusters to be the correct size. NOTE: This can preclude GSE where a typo in the metadata causes a sample to become a cluster with size 1.
Simple (2 clusters) only: Check this box to limit your analysis to 'simple' perturbation studies where there is only one comparison to be analysed. This can be useful for fully automated pipelines where you don't want to deal with verifying multifactorial experimental designs which can often miss what you want.
Max Comparisons: This removes GSE with more than the maximum number of comparisons to be analysed. This is more of a performance filter as loading tables with many comparisons (as could be caused by many typos or randomness in the metadata) can slow down or even crash the software.
1 platform only: This removes GSE with samples from more than one platform type as they can not be compared. Often there are additional GSE that represent the platform specific subset of samples, which should be used instead.
1 channel array: GEOracle currently only deals with single channel arrays. Multi-channel array support is a functionality we hope to add in the future.
Predicted perturbation: This will restrict the analysis to only those GSE that are predicted to be perturbation experiments by GEOracle. Incorporating other experiment types is a functionality we hope to add in the future.
Adjusting molecular context
GEOracle will try to determine the molecule that has been perturbed and the direction of perturbation, based on the free text metadata. This is a hard problem, but very useful for automatic gene regulatory network construction. GEOracle provides a convenient interface to quickly adjust these fields in the case of a mistake. Currently assigned values (p53 +): The leftmost labels display the current values for the molecule and direction attached to this comparison.
Rename: The text box titled 'Rename' allows you to specify the molecule being perturbed. The format should match the official gene symbols for that species to facilitate accurate automatic gene regulatory network construction (eg. WNT5A vs. Wnt5a). Once you have entered the gene symbol click the blue button with the pencil to make the changes.
+ or - : This dropdown menu allows you to select whether the molecule has been removed (-) or added (-) in the perturbation experiment. This is also used for automatic gene regulatory network construction.
GEOracle produces several outputs that are downloaded in a zip file after the analysis is complete. GSE folders: Within the output file you will find one folder for each GSE.
AllEdges.txt: This file contains all significant differentially expressed genes from all GSE in this analysis, in the form of an edge list that can be input directly into Cytoscape for gene regulatory network construction.
Differential expression results: Within the GSE folders you will find the differential expression results for each comparison from that GSE with the name you assigned it. These are the files that begin with '1_' and '2_' etc. These contain the informative outputs from the limma pipeline, gene symbols, log 2 fold change, BH adjusted p-value, etc.
Comparison details: The file comparisons.txt contains details about exactly which GSM samples were compared (mut vs. control) for each differential expression analysis. This provides a complete audit trail of your results.