Title: | Multiscale Clustering of Geometrical Network |
---|---|
Description: | Co-Expression Network Analysis by adopting network embedding technique. Song W.-M., Zhang B. (2015) Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol 11(11): e1004574. <doi: 10.1371/journal.pcbi.1004574>. |
Authors: | Won-Min Song, Bin Zhang |
Maintainer: | Won-Min Song <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.6 |
Built: | 2025-03-04 04:49:22 UTC |
Source: | https://github.com/songw01/megena |
construction of gene-gene interaction network and dissection into multi-scale functional modules, and network key drivers.
Package: | MEGENA |
Type: | Package |
Version: | 1.3 |
Date: | 2015-12-18 |
License: | GPL (>= 2) |
Multiscale Embedded Gene Co-expression Network Analysis (MEGENA)
Won-Min Song
Maintainer: Won-Min Song <[email protected]>
Song W-M, Zhang B (2015) Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol 11(11): e1004574. doi: 10.1371/journal.pcbi.1004574
main function to calculate PFN a ranked list of edge pairs
calculate.PFN(edgelist,max.skipEdges = NULL,maxENum = NULL,doPar = FALSE, num.cores = NULL, keep.track = TRUE, save.vertex.names=TRUE, vertex.names.file='vertex.names.RData') resume.calculate.PFN(edgelist, net, max.skipEdges = NULL, doPar = FALSE, num.cores = NULL, keep.track = TRUE, vertex.names.file='vertex.names.RData')
calculate.PFN(edgelist,max.skipEdges = NULL,maxENum = NULL,doPar = FALSE, num.cores = NULL, keep.track = TRUE, save.vertex.names=TRUE, vertex.names.file='vertex.names.RData') resume.calculate.PFN(edgelist, net, max.skipEdges = NULL, doPar = FALSE, num.cores = NULL, keep.track = TRUE, vertex.names.file='vertex.names.RData')
edgelist |
three column edgelist: first two columns are topological edges, and the third column is the weight. Must be a data.frame object. |
max.skipEdges |
Maximum number of edges to be searched by planarity test without any inclusion to PFN. If set NULL, it will be automatically set to number of cores x 1000. It acts as a threhold to quicken PFN construction termination during PCP. |
maxENum |
maximum number of edges to include in final PFN. Default value is NULL, which invokes maximal number of edges allowed in planar network. |
doPar |
TRUE/FALSE logical variable to choose parallelization. |
num.cores |
number of cores to use in parallelization. |
keep.track |
If TRUE, pfg_el.RData will be created in working folder. This file can be used later for restart in case PFN construction did not finish successfully. Default is TRUE. |
net |
a data.frame with intermediate network edges from a PFN job. The edges should have been converted to numeric index of vector |
save.vertex.names |
whether to save vertex names that matches the gene indices in intermediate output. |
vertex.names.file |
file to save vertex names if |
If doPar = TRUE
, then num.cores are registered for PCP.
output is three column edgelist data.frame, third column being the weight.
Won-Min Song
# test simplest case of planar network (a 3-clique). a <- c(1,1,2);b <- c(2,3,3);w <- runif(3,0,1); el <- cbind(a,b,w);el <- as.data.frame(el[order(el[,3],decreasing = TRUE),]) calculate.PFN(edgelist = el,max.skipEdges = Inf,doPar = FALSE,num.cores = NULL)
# test simplest case of planar network (a 3-clique). a <- c(1,1,2);b <- c(2,3,3);w <- runif(3,0,1); el <- cbind(a,b,w);el <- as.data.frame(el[order(el[,3],decreasing = TRUE),]) calculate.PFN(edgelist = el,max.skipEdges = Inf,doPar = FALSE,num.cores = NULL)
correlation analysis with FDR calculation
calculate.rho.signed(datExpr,n.perm = 10,FDR.cutoff = 0.05,estimator = "pearson", use.obs = "na.or.complete", direction = "absolute", rho.thresh = NULL,sort.el = TRUE)
calculate.rho.signed(datExpr,n.perm = 10,FDR.cutoff = 0.05,estimator = "pearson", use.obs = "na.or.complete", direction = "absolute", rho.thresh = NULL,sort.el = TRUE)
datExpr |
gene expression data matrix |
n.perm |
Number of permutations to perform. If |
FDR.cutoff |
FDR threshold to output final results of significant correlations. |
estimator |
Type of correlation coefficient to calculate, and gets passed to 'method' in 'cor()' function. Either "pearson" or "spearman". |
direction |
If direction = "absolute", absolute valued correlations are considered. If direction = "positive" or "negative", signed correlation is considered, and significance is calculated for positive/negative values. |
rho.thresh |
vector of correlation cutoff threshold to calculate significance. If NULL, threshold values will be generated by increment of 0.01 between -1 and 1 (if direction = "positive" or "negative"), or 0 and 1 (if direction = "absolute"). |
sort.el |
logical value to determine whether correlation list should be sorted |
use.obs |
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". |
direction = "absolute" can serve as two-tailed significance test, where as direction = "positive"/"negative" is one-tailed test.
output is a list. output$signif.ijw is three column edgelist data.frame, third column being the weight. output$FDR is the permutation FDR table across the threshold values in rho.thresh.
Won-Min Song
data(Sample_Expression) rho.out = calculate.rho.signed(datExpr,n.perm = 10,FDR.cutoff = 0.05,estimator = "pearson", use.obs = "na.or.complete", direction = "absolute", rho.thresh = NULL,sort.el = TRUE)
data(Sample_Expression) rho.out = calculate.rho.signed(datExpr,n.perm = 10,FDR.cutoff = 0.05,estimator = "pearson", use.obs = "na.or.complete", direction = "absolute", rho.thresh = NULL,sort.el = TRUE)
PFN construction by parallelized edge screening.
compute.PFN.par(sortedEdge,Ng,maxENum,Njob,Ncore,max.skipEdges = NULL, keep.track = TRUE,initial.links = NULL)
compute.PFN.par(sortedEdge,Ng,maxENum,Njob,Ncore,max.skipEdges = NULL, keep.track = TRUE,initial.links = NULL)
sortedEdge |
3-column matrix for the input edgelist (e.g. - correlation pair list). Must be sorted by third column, which is usually weight vector. |
Ng |
integer. number of genes included in sortedEdge. |
maxENum |
Maximum number of edges to include in final PFN. The theoretical maximal number enforced by Euler's formula is 3(Ng-2). |
max.skipEdges |
Maximum number of edges to be counted before any valid edge to be included in PFN. This works as a termination condition to avoid exhaustive planarity testing over all edges provided in sortedEdge. |
Njob |
Number of edges to be passed to each core for parallelized edge screening. |
Ncore |
Number of cores to utilize. |
keep.track |
TRUE/FALSE logical. Indicate if the record of PFN construction is saved in temporary file "pfg_el.RData". Default is TRUE. |
initial.links |
If provided, PFN construction will restart by regarding these initial.links as already-built PFN. |
This is parallelized implementation of PFN construction, where it is possible to re-capture PFN construction by providing already computed edgelist into initial.links. Although provivded, this function itself may require careful caution and users are encouraged to use more user-friendly "calculate.PFN()" instead.
A 3-column matrices, where first two columns are integer indices for vertices, and third is the weight vector.
Won-Min Song
A portion of TCGA breast cancer data to test run MEGENA.
Sample_Expression.RData
Sample_Expression.RData
Contains a matrix object, "datExpr". Use data(Sample_Expression) to load.
a gene expression matrix.
1. Song, W.M. and B. Zhang, Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol, 2015. 11(11): p. e1004574.
multiscale clustering analysis (MCA) and multiscale hub analysis (MHA) pipeline
do.MEGENA(g, do.hubAnalysis = TRUE, mod.pval = 0.05,hub.pval = 0.05,remove.unsig = TRUE, min.size = 10,max.size = 2500, doPar = FALSE,num.cores = 4,n.perm = 100,singleton.size = 3, save.output = FALSE)
do.MEGENA(g, do.hubAnalysis = TRUE, mod.pval = 0.05,hub.pval = 0.05,remove.unsig = TRUE, min.size = 10,max.size = 2500, doPar = FALSE,num.cores = 4,n.perm = 100,singleton.size = 3, save.output = FALSE)
g |
igraph object of PFN. |
do.hubAnalysis |
TRUE/FALSE indicating to perform multiscale hub analysis (MHA) in downstream. Default is TRUE. |
mod.pval |
cluster significance p-value threshold w.r.t random planar networks |
hub.pval |
hub significance p-value threshold w.r.t random planar networks |
remove.unsig |
TRUE/FALSE indicating to remove insignificant clusters in MHA. |
min.size |
minimum cluster size |
max.size |
maximum cluster size |
doPar |
TRUE/FALSE indicating parallelization usage |
num.cores |
number of cores to use in parallelization. |
n.perm |
number of permutations to calculate hub significance p-values/cluster significance p-values. |
singleton.size |
Minimum module size to regard as non-singleton module. Default is 3. |
save.output |
TRUE/FALSE to save outputs from each step of analysis |
Performs MCA and MHA by taking PFN as input. Returns a list object containing clustering outputs, hub analysis outputs, and node summary table.
A series of output files are written in wkdir. Major outputs are,
module.output |
outputs from MCA |
hub.output |
outputs from MHA |
node.summary |
node table summarizing clustering results. |
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) ## End(Not run)
Sunburst plot and colored heatmaps
draw_sunburst_wt_fill(module.df, parent.col = "module.parent",id.col = "id", min.angle = 5, feat.col, fill.type = "continuous",log.transform = TRUE, fill.scale = NULL, border.col = "black", border.width = 0.25, theme.adjust = NULL )
draw_sunburst_wt_fill(module.df, parent.col = "module.parent",id.col = "id", min.angle = 5, feat.col, fill.type = "continuous",log.transform = TRUE, fill.scale = NULL, border.col = "black", border.width = 0.25, theme.adjust = NULL )
module.df |
A data.frame table summarizing module information. Must contain module parent and child relation for hierarchy visualization. |
parent.col |
Character object, name for the parent module column in module.df. |
id.col |
Character object for the module id column in module df. |
min.angle |
Minimum angle that rectangles in the sunburst are labeled with respective module id. |
feat.col |
Chracter object, for the feature column in module.df to color the heatmaps. |
fill.type |
continuous/discrete, is the variable numeric (continuous) or factor (discrete)? |
log.transform |
TRUE/FALSE. do log10 transform for p-values? |
fill.scale |
A ggplot object to specify heatmap coloring scheme. Permissible functions are: scale_fill_gradient,scale_fill_gradient2,scale_fill_gradientn,scale_fill_manual. |
border.col |
sunburst border color. Default is black. |
border.width |
sunburst border width. Default is 0.25. |
theme.adjust |
A ggplot object to specify theme for plotting. |
makes use of ggraph scheme to manipulate and draw sunburst plot in ggplot2 framework. fill.scale and theme.adjust provide flexibility to designate heatmap coloring schemes and figure aesthetics.
ggplot object for the figure
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) # no coloring sbobj = draw_sunburst_wt_fill(module.df = output.summary$module.table, feat.col = NULL,id.col = "module.id",parent.col = "module.parent") sbobj # get some coloring (with log transform option) mdf= output.summary$module.table mdf$heat.pvalue = runif(nrow(mdf),0,0.1) sbobj = draw_sunburst_wt_fill(module.df = mdf,feat.col = "heat.pvalue",log.transform = TRUE, fill.type = "continuous", fill.scale = scale_fill_gradient2(low = "white",mid = "white",high = "red", midpoint = -log10(0.05),na.value = "white"), id.col = "module.id",parent.col = "module.parent") sbobj # get discrete coloring done mdf$category = factor(sample(x = c("A","B"),size = nrow(mdf),replace = TRUE)) sbobj = draw_sunburst_wt_fill(module.df = mdf,feat.col = "category", fill.type = "discrete", fill.scale = scale_fill_manual(values = c("A" = "red","B" = "blue")), id.col = "module.id",parent.col = "module.parent") sbobj ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) # no coloring sbobj = draw_sunburst_wt_fill(module.df = output.summary$module.table, feat.col = NULL,id.col = "module.id",parent.col = "module.parent") sbobj # get some coloring (with log transform option) mdf= output.summary$module.table mdf$heat.pvalue = runif(nrow(mdf),0,0.1) sbobj = draw_sunburst_wt_fill(module.df = mdf,feat.col = "heat.pvalue",log.transform = TRUE, fill.type = "continuous", fill.scale = scale_fill_gradient2(low = "white",mid = "white",high = "red", midpoint = -log10(0.05),na.value = "white"), id.col = "module.id",parent.col = "module.parent") sbobj # get discrete coloring done mdf$category = factor(sample(x = c("A","B"),size = nrow(mdf),replace = TRUE)) sbobj = draw_sunburst_wt_fill(module.df = mdf,feat.col = "category", fill.type = "discrete", fill.scale = scale_fill_manual(values = c("A" = "red","B" = "blue")), id.col = "module.id",parent.col = "module.parent") sbobj ## End(Not run)
MSigDB v6.2 sets (BIOCARTA, GO, KEGG, REACTOME, HALLMARK and oncogenic signatures)
data(FrequenctSets.v6.2)
data(FrequenctSets.v6.2)
Contains a list object, "msigdb.sets". Use data(Sample_Expression) to load.
a list of curated gene signatures from MSigDB
Utilizes various layout algorithms from ggraph to visualize module hierarchy with desired root.
get_module_hierarchy_graph(htbl,max.depth = 5,anchor.mid = NULL,h.scale = NULL,is.circular = TRUE,layout = "dendrogram")
get_module_hierarchy_graph(htbl,max.depth = 5,anchor.mid = NULL,h.scale = NULL,is.circular = TRUE,layout = "dendrogram")
htbl |
Two column data.frame table capturing module hierarchy. First column is the parent module, and second column is an immediate child module. |
max.depth |
Maximum number of hierarchy depth to propagate. Default value 5 is often sufficient to capture the entire MEGENA hierarchy. |
anchor.mid |
Default is NULL. If specified, it is taken as the root module, and its module hierarchy is subsetted. |
h.scale |
When layout = "dendrogram" is used, it scales the heights of the dendrogram branching points to make the plot less busy. |
is.circular |
Default is TRUE. If TRUE, circular dendrogram is drawn. |
layout |
Options are "dendrogram" and "treemap". See ?ggraph for layout option descriptions. |
add_names |
If TRUE, it adds modules names at respective branching points. |
Returns a list containing the results.
A list object with the components:
pobj |
ggplot object for the hierarchy graph. |
graph.obj |
igraph object for the hierarchy graph. |
anchor.mid |
The root module id. |
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) tbl = output.summary$module.table htbl = tbl[,c("module.parent","module.id")] hplot = get_module_hierarchy_graph(htbl,max.depth = 5,anchor.mid = NULL,h.scale = NULL,is.circular = FALSE,layout = "dendrogram",add_names = TRUE) print(hplot$pobj) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) tbl = output.summary$module.table htbl = tbl[,c("module.parent","module.id")] hplot = get_module_hierarchy_graph(htbl,max.depth = 5,anchor.mid = NULL,h.scale = NULL,is.circular = FALSE,layout = "dendrogram",add_names = TRUE) print(hplot$pobj) ## End(Not run)
calculation of module p-values.
get.DegreeHubStatistic(subnetwork,n.perm = 100,doPar = FALSE,n.core = 4)
get.DegreeHubStatistic(subnetwork,n.perm = 100,doPar = FALSE,n.core = 4)
subnetwork |
a planar network as an igraph object. |
n.perm |
number of random networks generated, constraint with number of links and nodes same to "subnetwork". |
doPar |
TRUE/FALSE to parallelize. |
n.core |
number of cores/threads to use. |
Hub significance calculation functionality. Make sure that, if doPar = TRUE, register cores using registerDoParallel() from doParallel package.
a data.frame table showing node-wise statistics.
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) out <- get.DegreeHubStatistic(subnetwork = g,n.perm = 100,doPar = FALSE,n.core = 4) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) out <- get.DegreeHubStatistic(subnetwork = g,n.perm = 100,doPar = FALSE,n.core = 4) ## End(Not run)
hubs in different scales are summarized.
get.hub.summary(MEGENA.output)
get.hub.summary(MEGENA.output)
MEGENA.output |
A list object. The output from "do.MEGENA()". |
returns a data.frame object
A data.frame object with columns:
node |
hub gene node names |
S1 , ...
|
binary vector indicating hubs in each scale |
frequency |
number of scales that respective gene emerges as hub. |
scale.summary |
list of scales that respective gene as hub. |
Won-Min Song
obtain a discrete, disjoint clustering results from multiscale MEGENA modules for a given alpha value.
get.union.cut(module.output,alpha.cut,output.plot = T, plotfname = "validModules_alpha",module.pval = 0.05,remove.unsig = T)
get.union.cut(module.output,alpha.cut,output.plot = T, plotfname = "validModules_alpha",module.pval = 0.05,remove.unsig = T)
module.output |
A direct output from "do.MEGENA". (i.e. MEGENA.output$module.output). |
alpha.cut |
Resolution parameter cut-off (i.e. alpha) value. alpha.cut = 1 corresponds to classical definition of "small-world" compactness. |
output.plot |
TRUE/FALSE to indicate outputting a .png file showing hierarchical structure with final outputted modules highlighted in red. |
plotfname |
.png file outputname. |
module.pval |
module significance p-value. |
remove.unsig |
TRUE/FALSE indicating to remove insignificant clusters. |
Returns a list object where each entry is a module.
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) get.union.cut(module.output = MEGENA.output$module.output,alpha.cut = 1, output.plot = FALSE,plotfname = NULL,module.pval = 0.05,remove.unsig = TRUE) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) get.union.cut(module.output = MEGENA.output$module.output,alpha.cut = 1, output.plot = FALSE,plotfname = NULL,module.pval = 0.05,remove.unsig = TRUE) ## End(Not run)
Summarizes modules into a table.
MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 2500, annot.table = NULL,symbol.col = NULL,id.col = NULL, output.sig = TRUE)
MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 2500, annot.table = NULL,symbol.col = NULL,id.col = NULL, output.sig = TRUE)
MEGENA.output |
A list object. The output from "do.MEGENA()". |
mod.pvalue |
module compactness significance p-value, to identify modules with significant compactness. |
hub.pvalue |
node degree significance p-value to identify nodes with significantly high degree. |
min.size |
minimum module size allowed to finalize in the summary output. |
max.size |
maximum module size allowed to finalize in the summary output. |
annot.table |
Default value is NULL, indicating no mapping is provided between node names to gene symbols. If provided, the mapping between node names (id.col) and gene symbol (symbol.col) are used. |
id.col |
column index of annot.table for node names. |
symbol.col |
column index of annot.table for gene symbols. |
output.sig |
Default value is TRUE, indicating significant modules are outputted. |
output$module.table contains many important information including module hierarchy, as indicated by
A list object with the components:
modules |
Final set of modules obtained upon apply mod.pvalue for significance, min.size and max.size for module size thresholding. |
mapped.modules |
gene symbol mapped modules when "annot.table" is provided. |
module.table |
data.frame object for module summary table. Columns include: id, module.size, module.parent, module.hub, module.scale and module.pvalue. |
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) ## End(Not run)
Summarizes module hub/hierarchy/membership into a data.frame table format.
module_convert_to_table(MEGENA.output,mod.pval = 0.05, hub.pval = 0.05,min.size = 10,max.size)
module_convert_to_table(MEGENA.output,mod.pval = 0.05, hub.pval = 0.05,min.size = 10,max.size)
MEGENA.output |
A list object. The output from "do.MEGENA()". |
mod.pval |
module compactness significance p-value, to identify modules with significant compactness. |
hub.pval |
node degree significance p-value to identify nodes with significantly high degree. |
min.size |
minimum module size allowed to finalize in the summary output. |
max.size |
maximum module size allowed to finalize in the summary output. |
the resulting data.frame contains the following essential columns: id, module.parent and module. If the co-expression network bears significant hubs, it will additionally have node.degree (connectivity), node.strength (sum of edge weights) and is.hub column to supplement hub information.
A data.frame with the columns:
id |
gene name |
module.parent |
parent module id |
module |
module name. |
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) module.df = module_convert_to_table(MEGENA.output,mod.pval = 0.05, hub.pval = 0.05,min.size = 10,max.size) head(module.df) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) module.df = module_convert_to_table(MEGENA.output,mod.pval = 0.05, hub.pval = 0.05,min.size = 10,max.size) head(module.df) ## End(Not run)
An interface function to output .gmt format gene signature file.
output.geneSet.file(geneSet,outputfname)
output.geneSet.file(geneSet,outputfname)
geneSet |
a list object |
outputfname |
output file name |
Outputs each signature into a single line of lists in outputfname
.
Won-Min Song
FET for all pairs.
perform.AllPairs.FET(geneSets1,geneSets2,background,adjust.FET.pvalue = T)
perform.AllPairs.FET(geneSets1,geneSets2,background,adjust.FET.pvalue = T)
geneSets1 |
a list object containing gene signatures as character vector in each list entry. |
geneSets2 |
a list object containing gene signatures as character vector in each list entry. |
background |
a character vector containing the background gene set. |
adjust.FET.pvalue |
If set TRUE, bonferroni correction is performed and output in corrected.FET.pvalue column. |
Returns a data.frame entailing all comparisons.
In comparison to perform.ijPairs.FET, this function is designed to perform FET on ALL comparisons.
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) data(FrequentSets.v6.2) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) FET.results = perform.AllPairs.FET(geneSets1 = output.summary$modules,geneSets2 = msigdb.sets[[2]],background = V(g)$name) FET.results = FET.results[order(FET.results$FET_pvalue),] head(FET.results) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) data(FrequentSets.v6.2) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) FET.results = perform.AllPairs.FET(geneSets1 = output.summary$modules,geneSets2 = msigdb.sets[[2]],background = V(g)$name) FET.results = FET.results[order(FET.results$FET_pvalue),] head(FET.results) ## End(Not run)
FET with given pairs
perform.ijPairs.FET(geneSets1,geneSets2,ij,background,adjust.FET.pvalue = T)
perform.ijPairs.FET(geneSets1,geneSets2,ij,background,adjust.FET.pvalue = T)
geneSets1 |
a list object containing gene signatures as character vector in each list entry. |
geneSets2 |
a list object containing gene signatures as character vector in each list entry. |
ij |
two column integer matrix. Each row contains a pair of indices to compare between geneSet1 and geneSet2. For instance, ij[1,] = c(1,2), then FET is performed between geneSets1[[1]] and geneSets2[[2]]. |
background |
a character vector containing the background gene set. |
adjust.FET.pvalue |
If set TRUE, bonferroni correction is performed and output in corrected.FET.pvalue column. |
Returns a data.frame entailing all comparisons.
In comparison to perform.AllPairs.FET, this function is designed to perform FET on a specified set of comparisons to save time.
Won-Min Song
wrapper function of _MEGENA_planaritytest. imports from Boost graph library, and test planarity of a network
planaritytest(N, rows, cols)
planaritytest(N, rows, cols)
N |
must be an integer. number of nodes in the network. |
rows |
first column of edgelist. a vector of integers. |
cols |
second column of edgelist. a vector of integers. |
cbind(rows,cols) is equivalent to the two column edge list of the network. We assume that the network is undirected.
TRUE/FALSE is returned to indicate planarity. (TRUE -> network is planar).
Won-Min Song
# test simplest case of planar network (a 3-clique). planaritytest(as.integer(3),c(1,1,2),c(2,3,3))
# test simplest case of planar network (a 3-clique). planaritytest(as.integer(3),c(1,1,2),c(2,3,3))
Extract subnetworks for modules and plot.
plot_module(output.summary, PFN, subset.module = NULL, node.rename=NULL, col.names=NULL, gene.set = NULL,color.code = "logFC",show.legend = TRUE, label.hubs.only = TRUE,hubLabel.col = "red",hubLabel.sizeProp = 0.5,show.topn.hubs = 10, node.sizeProp = 13,label.sizeProp = 13,label.scaleFactor = 10,label.alpha = 0.5, layout = "kamada.kawai",output.plot = TRUE,out.dir = "modulePlot")
plot_module(output.summary, PFN, subset.module = NULL, node.rename=NULL, col.names=NULL, gene.set = NULL,color.code = "logFC",show.legend = TRUE, label.hubs.only = TRUE,hubLabel.col = "red",hubLabel.sizeProp = 0.5,show.topn.hubs = 10, node.sizeProp = 13,label.sizeProp = 13,label.scaleFactor = 10,label.alpha = 0.5, layout = "kamada.kawai",output.plot = TRUE,out.dir = "modulePlot")
output.summary |
output from summary function, "MEGENA.ModuleSummary". |
PFN |
igraph object retaining PFN topology. |
subset.module |
A character vector for list of module names to plot. Default = NULL plots all modules in output.summary. |
node.rename |
NULL or a data.frame with the first column being the node ids in the network and the second column being node names for display purpose, eg, this could be used to map ensembl gene ids to gene symbols. |
col.names |
NULL or a character vector for list of colors to be used for coloring children modules. |
gene.set |
A list object containing signatures for customized coloring of nodes in resulting network plot. |
color.code |
A character vector with matched length to "gene.set", to specify colors for each signature. |
label.hubs.only |
TRUE/FALSE to show labels for significant hub genes only, or all genes. Defauly is TRUE. |
hubLabel.col |
Label color for hubs. Default is "red" |
show.legend |
TRUE/FALSE for showing node legend on the bottom of the figure. |
hubLabel.sizeProp |
A multiplicative factor to adjust hub label sizes with respect to node size values. Default is 0.5 |
show.topn.hubs |
Maximal number of hubs to label on module subnetwork. Default is 10. |
node.sizeProp |
A multiplicative factor to adjust node sizes with respect to 90th percentile degree node size. Default is 13 |
label.sizeProp |
A multiplicative factor to adjust node label sizes with respect to 90th percentile degree node size. Default is 13 |
label.scaleFactor |
Overall scale factor to control the final size of node labels appearing in figure. Default is 10. |
label.alpha |
Transparency value ranging from 0 (transparent) to 1 (solid). Default is 0.5. |
layout |
Network layout algorithm to apply. Options are: "kamada.kawai", "fruchterman.reingold". |
output.plot |
logical value. output.plot = TRUE generates figure files under folder, "modulePlot". |
out.dir |
if output.plot = TRUE, then out.dir is created and resulting figures are exported to .png files to the folder. |
Subnetwork plot functionality with application of "ggrepel" package for node labeling. The most effective way to control overall node label size is through label.scaleFactor.
A list object holding ggplot objects for plotted modules.
Won-Min Song
## Not run: rm(list = ls()) library(MEGENA) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) pnet.obj <- plot_module(output = output.summary,PFN = g,subset.module = "comp1_2", layout = "kamada.kawai",label.hubs.only = FALSE, gene.set = list("hub.set" = c("CD3E","CD2")),color.code = c("red"), output.plot = FALSE,out.dir = "modulePlot",col.names = c("grey","grey","grey"), hubLabel.col = "black",hubLabel.sizeProp = 1,show.topn.hubs = Inf,show.legend = TRUE) pnet.obj ## End(Not run)
## Not run: rm(list = ls()) library(MEGENA) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) pnet.obj <- plot_module(output = output.summary,PFN = g,subset.module = "comp1_2", layout = "kamada.kawai",label.hubs.only = FALSE, gene.set = list("hub.set" = c("CD3E","CD2")),color.code = c("red"), output.plot = FALSE,out.dir = "modulePlot",col.names = c("grey","grey","grey"), hubLabel.col = "black",hubLabel.sizeProp = 1,show.topn.hubs = Inf,show.legend = TRUE) pnet.obj ## End(Not run)
visualized module hierarchical structure.
plot_module_hierarchy(module.table,plot.coord = NULL, edge.color = "grey",node.color = "black",node.label.color = "black", label.scaleFactor = 0.5,node.scaleFactor = 0.2,arrow.size = 0.015, data.col = NULL,low.color = "blue",mid.color = "white", high.color = "red",mid.value = 0.05)
plot_module_hierarchy(module.table,plot.coord = NULL, edge.color = "grey",node.color = "black",node.label.color = "black", label.scaleFactor = 0.5,node.scaleFactor = 0.2,arrow.size = 0.015, data.col = NULL,low.color = "blue",mid.color = "white", high.color = "red",mid.value = 0.05)
module.table |
output from MEGENA.ModuleSummary. Specifically $module.table component of the output. |
plot.coord |
Two column coordinate matrix. rownames must be labelled according to module.table$id. |
edge.color |
Edge color to be shown. |
node.color |
If data.col = NULL, node.color is used to color nodes in figure. |
node.label.color |
Node label color. |
label.scaleFactor |
scale number to adjust node label sizes. |
node.scaleFactor |
scale number to adjust node sizes. |
arrow.size |
scale number to arrow size. |
data.col |
A character to specify data vector to color nodes in module.table. |
low.color |
If data.col != NULL, color to be used in lower value spectrum. |
mid.color |
If data.col != NULL, color to be used in middle value spectrum. |
high.color |
If data.col != NULL, color to be used in high value spectrum. |
mid.value |
If data.col != NULL, value to define middle value spectrum. |
Module hierarchy plotting functionality using ggplot2.
A list containing output$hierarchy.obj = ggplot2 object, output$node.data = node attributes, output$edge.data = edge attributes.
Won-Min Song
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr,doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) module.table = output.summary$module.table colnames(module.table)[1] <- "id" output.obj <- plot_module_hierarchy(module.table = module.table, label.scaleFactor = 0.15,arrow.size = 0.005,node.label.color = "blue") print(output.obj[[1]]) ## End(Not run)
## Not run: rm(list = ls()) data(Sample_Expression) ijw <- calculate.correlation(datExpr,doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) module.table = output.summary$module.table colnames(module.table)[1] <- "id" output.obj <- plot_module_hierarchy(module.table = module.table, label.scaleFactor = 0.15,arrow.size = 0.005,node.label.color = "blue") print(output.obj[[1]]) ## End(Not run)
A modification of plot_module() function for more general subnetwork plotting purpose.
plot_subgraph(module,hub = NULL,PFN,node.default.color = "black", gene.set = NULL,color.code = "grey",show.legend = TRUE, label.hubs.only = TRUE,hubLabel.col = "red",hubLabel.sizeProp = 0.5,show.topn.hubs = 10, node.sizeProp = 13,label.sizeProp = 13,label.scaleFactor = 10,layout = "kamada.kawai")
plot_subgraph(module,hub = NULL,PFN,node.default.color = "black", gene.set = NULL,color.code = "grey",show.legend = TRUE, label.hubs.only = TRUE,hubLabel.col = "red",hubLabel.sizeProp = 0.5,show.topn.hubs = 10, node.sizeProp = 13,label.sizeProp = 13,label.scaleFactor = 10,layout = "kamada.kawai")
module |
A character vector containing gene names to be subsetted. |
hub |
If provided, genes in hub will be highlighted as triangles in resulting figure. |
PFN |
igraph object retaining PFN topology. |
node.default.color |
Default node colors for those that do not intersect with signatures in gene.set. |
gene.set |
A list object containing signatures for customized coloring of nodes in resulting network plot. |
color.code |
A character vector with matched length to "gene.set", to specify colors for each signature. |
show.legend |
TRUE/FALSE for showing node legend on the bottom of the figure. |
label.hubs.only |
TRUE/FALSE to show labels for significant hub genes only, or all genes. Defauly is TRUE. |
hubLabel.col |
Label color for hubs. Default is "red" |
hubLabel.sizeProp |
A multiplicative factor to adjust hub label sizes with respect to node size values. Default is 0.5 |
show.topn.hubs |
Maximal number of hubs to label on module subnetwork. Default is 10. |
node.sizeProp |
A multiplicative factor to adjust node sizes with respect to 90th percentile degree node size. Default is 13 |
label.sizeProp |
A multiplicative factor to adjust node label sizes with respect to 90th percentile degree node size. Default is 13 |
label.scaleFactor |
Overall scale factor to control the final size of node labels appearing in figure. Default is 10. |
layout |
Network layout algorithm to apply. Options are: "kamada.kawai", "fruchterman.reingold". |
Subnetwork plot functionality with application of "ggrepel" package for node labeling. The most effective way to control overall node label size is through label.scaleFactor.
A list object holding ggplot object and node annotation table.
Won-Min Song
## Not run: rm(list = ls()) library(MEGENA) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) pnet.obj <- plot_subgraph(module = output.summary$modules[[1]], hub = c("CD3E","CD2"),PFN = g,node.default.color = "black", gene.set = NULL,color.code = c("grey"),show.legend = TRUE, label.hubs.only = TRUE,hubLabel.col = "red",hubLabel.sizeProp = 0.5, show.topn.hubs = 10,node.sizeProp = 13,label.sizeProp = 13, label.scaleFactor = 10,layout = "kamada.kawai") # the plot pnet.obj[[1]] # the annotation pnet.obj[[2]] ## End(Not run)
## Not run: rm(list = ls()) library(MEGENA) data(Sample_Expression) ijw <- calculate.correlation(datExpr[1:100,],doPerm = 2) el <- calculate.PFN(ijw[,1:3]) g <- graph.data.frame(el,directed = FALSE) MEGENA.output <- do.MEGENA(g = g,remove.unsig = FALSE,doPar = FALSE,n.perm = 10) output.summary <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 5000, annot.table = NULL,id.col = NULL,symbol.col = NULL, output.sig = TRUE) pnet.obj <- plot_subgraph(module = output.summary$modules[[1]], hub = c("CD3E","CD2"),PFN = g,node.default.color = "black", gene.set = NULL,color.code = c("grey"),show.legend = TRUE, label.hubs.only = TRUE,hubLabel.col = "red",hubLabel.sizeProp = 0.5, show.topn.hubs = 10,node.sizeProp = 13,label.sizeProp = 13, label.scaleFactor = 10,layout = "kamada.kawai") # the plot pnet.obj[[1]] # the annotation pnet.obj[[2]] ## End(Not run)
An interface function to read-in .gmt format gene signature file.
read.geneSet(geneSet.file)
read.geneSet(geneSet.file)
geneSet.file |
text file containing gene signatures in .gmt format |
Each line of lists in geneset.file
is a single set of signature.
loads signatures into a list object.
Won-Min Song
A portion of TCGA breast cancer data to test run MEGENA.
data(Sample_Expression)
data(Sample_Expression)
Contains a matrix object, "datExpr". Use data(Sample_Expression) to load.
a gene expression matrix.
1. Song, W.M. and B. Zhang, Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol, 2015. 11(11): p. e1004574.