Tips on Parameter Choice¶
Node (Gene Set inclusion) Parameters¶
Node specific parameters filter the gene sets included in the enrichment map.
For a gene set to be included in the enrichment map it needs to pass both p-value and q-value thresholds.
P-value
All gene sets with a p-value with the specified threshold or below are included in the map.
FDR Q-value
All gene sets with a q-value with the specified threshold or below are included in the map.
Depending on the type of analysis the FDR Q-value used for filtering genesets by EM is different
For GSEA the FDR Q-value used is 8th column in the gsea_results file and is called “FDR q-val”.
For Generic the FDR Q-value used is 4th column in the generic results file.
For David the FDR Q-value used is 12th column in the david results file and is called “Benjamini”.
For Bingo the FDR Q-value used is 3rd column in the Bingo results file and is called “core p-value”
Edge (Gene Set relationship) Parameters¶
An edge represents the degree of gene overlap that exists between two gene sets, A and B.
Edge specific parameters control the number of edges that are created in the enrichment map.
Only one coefficient type can be chosen to filter the edges.
Jaccard Coefficient
Jaccard Coefficient = [size of (A intersect B)] / [size of (A union B)]
Overlap Coefficient
Overlap Coefficient = [size of (A intersect B)] / [size of (minimum( A , B))]
Combined Coefficient
the combined coefficient is a merged version of the jacquard and overlap coefficients.
the combined constant allows the user to modulate reciprocally the weights associated with the jacquard and overlap coefficients.
When k = 0.5 the combined coefficient is the average between the jacquard and overlap.
Combined Constant = k
Combined Coefficient = (k * Overlap) + ((1-k) * Jaccard)
Tips on Parameter Choice¶
P-value and FDR Thresholds
GSEA can be used with two different significance estimation settings: gene-set permutation and phenotype permutation. Gene-set permutation was used for Enrichment Map application examples.
Gene-set Permutation
Here are different sets of thresholds you may consider for gene-set permutation:
- Very permissive:
p-value < 0.05
FDR < 0.25
- Moderately permissive:
p-value < 0.01
FDR < 0.1
- Moderately conservative:
p-value < 0.005
FDR < 0.075
- Conservative:
p-value < 0.001
FDR < 0.05
For high quality, high coverage transcriptomic data, the number of enriched terms at the very conservative threshold is usually 100-250 when using gene-set permutation.
Phenotype Permutation
- Recommended:
p-value < 0.05
FDR < 0.25
In general, we recommend to use permissive thresholds only if your having a hard time finding any enriched terms.
Jaccard vs. Overlap Coefficient
The Overlap Coefficient is recommended when relations are expected to occur between large-size and small-size gene-sets, as in the case of the Gene Ontology.
The Jaccard Coefficient is recommended in the opposite case.
When the gene-sets are about the same size, Jaccard is about the half of the Overlap Coefficient for gene-set pairs with a small intersection, whereas it is about the same as the Overlap Coefficient for gene-sets with large intersections.
When using the Overlap Coefficient and the generated map has several large gene-sets excessively connected to many other gene-sets, we recommend switching to the Jaccard Coefficient.
Overlap Thresholds
0.5 is moderately conservative, and is recommended for most of the analyses.
0.3 is permissive, and might result in a messier map.
Jaccard Thresholds
0.5 is very conservative
0.25 is moderately conservative