# Tips on Parameter Choice¶

## Node (Gene Set inclusion) Parameters¶

- Node specific parameters filter the gene sets included in the enrichment map.
- For a gene set to be included in the enrichment map it needs to pass both p-value and q-value thresholds.

**P-value**

- All gene sets with a p-value with the specified threshold or below are included in the map.

**FDR Q-value**

- All gene sets with a q-value with the specified threshold or below are included in the map.
- Depending on the type of analysis the FDR Q-value used for filtering genesets by EM is different
- For GSEA the FDR Q-value used is 8th column in the gsea_results file and is called “FDR q-val”.
- For Generic the FDR Q-value used is 4th column in the generic results file.
- For David the FDR Q-value used is 12th column in the david results file and is called “Benjamini”.
- For Bingo the FDR Q-value used is 3rd column in the Bingo results file and is called “core p-value”

## Edge (Gene Set relationship) Parameters¶

- An edge represents the degree of gene overlap that exists between two gene sets, A and B.
- Edge specific parameters control the number of edges that are created in the enrichment map.
- Only one coefficient type can be chosen to filter the edges.

**Jaccard Coefficient**

```
Jaccard Coefficient = [size of (A intersect B)] / [size of (A union B)]
```

**Overlap Coefficient**

```
Overlap Coefficient = [size of (A intersect B)] / [size of (minimum( A , B))]
```

**Combined Coefficient**

- the combined coefficient is a merged version of the jacquard and overlap coefficients.
- the combined constant allows the user to modulate reciprocally the weights associated with the jacquard and overlap coefficients.
- When k = 0.5 the combined coefficient is the average between the jacquard and overlap.

```
Combined Constant = k
Combined Coefficient = (k * Overlap) + ((1-k) * Jaccard)
```

## Tips on Parameter Choice¶

**P-value and FDR Thresholds**

GSEA can be used with two different significance estimation settings: gene-set permutation and phenotype permutation. Gene-set permutation was used for Enrichment Map application examples.

*Gene-set Permutation*

Here are different sets of thresholds you may consider for gene-set permutation:

- Very permissive:

- p-value < 0.05
- FDR < 0.25
- Moderately permissive:

- p-value < 0.01
- FDR < 0.1
- Moderately conservative:

- p-value < 0.005
- FDR < 0.075
- Conservative:

- p-value < 0.001
- FDR < 0.05

For high quality, high coverage transcriptomic data, the number of enriched terms at the very conservative threshold is usually 100-250 when using gene-set permutation.

*Phenotype Permutation*

- Recommended:

- p-value < 0.05
- FDR < 0.25

In general, we recommend to use permissive thresholds only if your having a hard time finding any enriched terms.

**Jaccard vs. Overlap Coefficient**

- The Overlap Coefficient is recommended when relations are expected to occur between large-size and small-size gene-sets, as in the case of the Gene Ontology.
- The Jaccard Coefficient is recommended in the opposite case.
- When the gene-sets are about the same size, Jaccard is about the half of the Overlap Coefficient for gene-set pairs with a small intersection, whereas it is about the same as the Overlap Coefficient for gene-sets with large intersections.
- When using the Overlap Coefficient and the generated map has several large gene-sets excessively connected to many other gene-sets, we recommend switching to the Jaccard Coefficient.

**Overlap Thresholds**

- 0.5 is moderately conservative, and is recommended for most of the analyses.
- 0.3 is permissive, and might result in a messier map.

**Jaccard Thresholds**

- 0.5 is very conservative
- 0.25 is moderately conservative