Network Science Lesson 4 (Network Visualization with iGraph)

Today I am going to take you on a brief tour of the network visiualization capabilities of the igraph package in R (we’ll cover some simple analyses in the next lesson, igraph part II). The igraph package is large and has many functions for manipulating, visualizing, and analyzing your networks. I recommend reading the package documentation, and the many online tutorials available to explore more advanced functionality. For those who prefer ggplot for their plotting needs, the ggnetwork package may be helpful.

Fair warning, igraph’s interface is a little confusing at first. I recommend playing with your network to get a hang of it (“learning through fiddling”“).

Setup (Making the Network)

First we need a network to visualize. The networks we built in previous lessons are pretty boring visually, so we are going to use a different example today, taken directly from the SpiecEasi documentation. We use their american gut microbiome dataset. See their tutorial for details about this dataset and the functions we use for network construction below.

You should already have installed SpiecEasi and igraph in previous lessons:

library(SpiecEasi)
library(igraph)

## 
## Attaching package: 'igraph'

## The following object is masked from 'package:SpiecEasi':
## 
##     make_graph

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

#Custom colorblind pallette, see: https://stackoverflow.com/questions/57153428/r-plot-color-combinations-that-are-colorblind-accessible
customvermillion<-rgb(213/255,94/255,0/255)
custombluegreen<-rgb(0/255,158/255,115/255)
customblue<-rgb(0/255,114/255,178/255)
customskyblue<-rgb(86/255,180/255,233/255)
customreddishpurple<-rgb(204/255,121/255,167/255)

We build a network using SpiecEasi. I don’t really care what the species in this dataset are, so I’m going to assign them arbitrary names here (either a letter or letter.number combination like “A.1”).

## Generate network, see https://github.com/zdk123/SpiecEasi for a nice walkthrough of this process with this example
#Load dara
data(amgut1.filt)
#Make up some Species "Names" since we don't care and the names aren't easily accessible in this matrix (unless you want to insall phyloseq)
colnames(amgut1.filt) <- make.names(rep(LETTERS[1:26],5),unique = T)[1:ncol(amgut1.filt)]
#Build network w/ spieceasi
se.gl.amgut <- spiec.easi(amgut1.filt, method='glasso', lambda.min.ratio=1e-2,
                          nlambda=20, pulsar.params=list(rep.num=50))

## Applying data transformations...

## Selecting model with pulsar using stars...

## Fitting final estimate with glasso...

## done

We use the getRefit() function to extract the adjacency matrix from the spieceasi output. This is a square matrix with species on rows and columns and a one if two species are connected (“adjacent”) in the network and a zero otherwise.

#Extract adjacency matrix from output  - Explain ADJ MAT vs. EDGE LIST
adj.mat <- getRefit(se.gl.amgut)
table(as.numeric(adj.mat))

## 
##     0     1 
## 15773   356

But we want a weighted network! No worries, we can extract the weights and make a weighted adjacency matrix as well.

#We want weights!
se.cor  <- cov2cor(as.matrix(getOptCov(se.gl.amgut)))
weighted.adj.mat <- se.cor*getRefit(se.gl.amgut)

#Let's take a loot at that adjacency matrix
heatmap(as.matrix(weighted.adj.mat))

It is very easy to change these adjacency matrices into igraph “graph” objects. What’s a graph? Well a network is a graph (look up “graph theory”, it’s a whole field of mathematics). Consider them synonyms.

grph.unweighted <- adj2igraph(adj.mat)
grph <- adj2igraph(weighted.adj.mat)

You might also find the graph_from_edgelist() and graph_from_data_frame() functions useful if you have data stored in different formats. If we are worried about self-loops or redundant edges we could also use the simplify() function to remove them (it turns the network into a “simple graph”).

Let’s take a look at our unweighted network:

plot(grph.unweighted,vertex.size=1,vertex.label=NA)

Great, now our unweighted network:

plot(grph,vertex.size=1,
     vertex.label=NA)

Oh no! That looks very bad. It turns out that our graph layout algorithm isn’t behaving well with negative egde weights. Let’s use a different layout function for now (we will get back to this later in the lesson):

plot(grph,vertex.size=1,
     vertex.label=NA,
     edge.width=1,
     layout=layout.circle(grph))

Adding and Visualizing Metadata

For those used to dealing with dataframes, the graph structure can seem pretty confusing. The two functions you need to know for now are V() and E(), which allow you to acess properties of the vertices (nodes) and edges respectively.

V(grph)

## + 127/127 vertices, named, from 8da9d8a:
##   [1] 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17 
##  [18] 18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34 
##  [35] 35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51 
##  [52] 52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68 
##  [69] 69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85 
##  [86] 86  87  88  89  90  91  92  93  94  95  96  97  98  99  100 101 102
## [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
## [120] 120 121 122 123 124 125 126 127

E(grph)

## + 178/178 edges from 8da9d8a (vertex names):
##  [1]  1-- 25  1-- 35  1-- 58  2-- 15  2-- 68  2--105  4-- 14  4-- 44
##  [9]  4-- 93  5-- 16  5-- 25  5-- 44  5-- 46  5-- 50  5-- 59  5-- 71
## [17]  5-- 76  5-- 90  5--103  5--104  5--116  5--119 11-- 87 11--110
## [25] 12-- 16 12-- 19 12-- 34 12-- 50 12-- 60 12-- 65 12-- 82 12--104
## [33] 12--119 13-- 41 13-- 71 13-- 89 13-- 90 15-- 68 15--105 16-- 34
## [41] 16-- 44 16-- 46 16-- 50 16-- 60 16-- 76 16-- 82 16-- 84 16--103
## [49] 16--104 16--116 16--119 19-- 50 19-- 60 19-- 65 19-- 82 19--119
## [57] 23-- 71 23-- 89 23-- 99 23--123 23--126 24-- 47 24-- 86 25-- 35
## [65] 25-- 59 27-- 32 27-- 62 27--105 27--120 28-- 43 29-- 38 29-- 48
## [73] 29--114 29--125 31-- 77 32-- 75 32-- 95 32--105 34-- 50 34-- 60
## + ... omitted several edges

We can modify these features (similar to how names(df)<-c(“a”,“b”,..) allows us to assign names to a dataframe).

For example, it seems spieceasi got rid of our species names when building the network, so let’s put them back:

V(grph)$name <- colnames(amgut1.filt)
V(grph)

## + 127/127 vertices, named, from 8da9d8a:
##   [1] A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q  
##  [18] R   S   T   U   V   W   X   Y   Z   A.1 B.1 C.1 D.1 E.1 F.1 G.1 H.1
##  [35] I.1 J.1 K.1 L.1 M.1 N.1 O.1 P.1 Q.1 R.1 S.1 T.1 U.1 V.1 W.1 X.1 Y.1
##  [52] Z.1 A.2 B.2 C.2 D.2 E.2 F.2 G.2 H.2 I.2 J.2 K.2 L.2 M.2 N.2 O.2 P.2
##  [69] Q.2 R.2 S.2 T.2 U.2 V.2 W.2 X.2 Y.2 Z.2 A.3 B.3 C.3 D.3 E.3 F.3 G.3
##  [86] H.3 I.3 J.3 K.3 L.3 M.3 N.3 O.3 P.3 Q.3 R.3 S.3 T.3 U.3 V.3 W.3 X.3
## [103] Y.3 Z.3 A.4 B.4 C.4 D.4 E.4 F.4 G.4 H.4 I.4 J.4 K.4 L.4 M.4 N.4 O.4
## [120] P.4 Q.4 R.4 S.4 T.4 U.4 V.4 W.4

E(grph)

## + 178/178 edges from 8da9d8a (vertex names):
##  [1] A  --Y   A  --I.1 A  --F.2 B  --O   B  --P.2 B  --A.4 D  --N  
##  [8] D  --R.1 D  --O.3 E  --P   E  --Y   E  --R.1 E  --T.1 E  --X.1
## [15] E  --G.2 E  --S.2 E  --X.2 E  --L.3 E  --Y.3 E  --Z.3 E  --L.4
## [22] E  --O.4 K  --I.3 K  --F.4 L  --P   L  --S   L  --H.1 L  --X.1
## [29] L  --H.2 L  --M.2 L  --D.3 L  --Z.3 L  --O.4 M  --O.1 M  --S.2
## [36] M  --K.3 M  --L.3 O  --P.2 O  --A.4 P  --H.1 P  --R.1 P  --T.1
## [43] P  --X.1 P  --H.2 P  --X.2 P  --D.3 P  --F.3 P  --Y.3 P  --Z.3
## [50] P  --L.4 P  --O.4 S  --X.1 S  --H.2 S  --M.2 S  --D.3 S  --O.4
## [57] W  --S.2 W  --K.3 W  --U.3 W  --S.4 W  --V.4 X  --U.1 X  --H.3
## [64] Y  --I.1 Y  --G.2 A.1--F.1 A.1--J.2 A.1--A.4 A.1--P.4 B.1--Q.1
## + ... omitted several edges

We might also change things like vertex size or color this way. Let’s make all our vertices the same color right now, and make their size proportional to their degree (number of connections):

V(grph)$size <- (degree(grph) + 1) # the +1 is to avoid size zero vertices
V(grph)$color <- "black"
plot(grph,
     vertex.label=NA,
     layout=layout.circle(grph))

Similarly, maybe we want out edges to be colored according to whether they are positive or negative interactions, and for their widths to be proportional to their weights:

E(grph)$color <- custombluegreen
E(grph)$color[E(grph)$weight<0] <- customreddishpurple
E(grph)$width <- abs(E(grph)$weight)*10
plot(grph,
     vertex.label=NA,
     layout=layout.circle(grph))

It looks like many of our negative interactions are very weak, and it’s difficult to see these very thin lines on the plot above:

plot(density((E(grph)$weight)),xlab="Edge Weight",main="")

boxplot(abs(E(grph)$weight)~(E(grph)$weight>0),
        xlab="Positive Interaction?",
        ylab="Strength of Interaction")

We could emphasize the negative interactions by making them a bit bigger:

E(grph)$width[E(grph)$weight<0] <- E(grph)$width[E(grph)$weight<0]*10
plot(grph,
     vertex.label=NA,
     layout=layout.circle(grph))

Removing Edges and Vertices

Often you will want to remove sets of nodes or edges from your network. For visualization, you want to make sure your network is telling an intelligible story. If there is a lot of unimportant stuff filling up your screen it’s hard to focus on the interesting stuff.

For example, we can remove low-weight edges (you decide what threshold is right for your network):

#Remove edges with very low weight 
weight_threshold <- 0.01
grph <- delete.edges(grph,which(abs(E(grph)$weight)<weight_threshold))

We might only be interested in positive species interactions, so we can get rid of any edges with negative weights:

#Remove negative edges 
grph.pos <- delete.edges(grph,which(E(grph)$weight<0))
plot(grph.pos,
     vertex.label=NA)

Finally, lets get rid of vertices that aren’t connected to anything in the network (these don’t tell us much and take up a lot of space):

#Remove unconnected vertices
grph.pos <- delete.vertices(grph.pos,which(degree(grph.pos)<1))
plot(grph.pos,
     vertex.label=NA)

Clean it up a little (fiddle with node sizes, your color scheme, edge shape), and voila:

#Cleanup a little
V(grph.pos)$size <- V(grph.pos)$size/3
E(grph.pos)$color <- "gray"
plot(grph.pos,
     vertex.label=NA,
     edge.curved=0.5)

Layouts

How does igraph decide where to put the vertices of your network? It implements one of various layout algorithms to distribute nodes in space. Here’s a few examples, but there are a lot of these (check out the igraph documentation). In general, I’ve found that either the Frutherman-Reingold or Circle layouts are the best options for 99% of my applications. You can also igraph pick with layout_nicely().

# Layout with fruchterman-reingold algorithm
plot(grph.pos,
     vertex.label=NA,
     layout=layout_with_fr(grph.pos))

#Layout with kamada-kawai algorithm
plot(grph.pos,
     vertex.label=NA,
     layout=layout_with_kk(grph.pos))

#layout with davidson-harel algorithm
plot(grph.pos,
     vertex.label=NA,
     layout=layout_with_dh(grph.pos))

#supposedly good for larger graphs
plot(grph.pos,
     vertex.label=NA,
     layout=layout_with_lgl(grph.pos))

Importantly, these layout algorithms aren’t typically deterministic, so each time you call them you will get a slightly different looking graph:

plot(grph.pos,
     vertex.label=NA,
     layout=layout_with_fr(grph.pos))

If this really bothers you, you can call your layout function once and store the resulting layout it in a variable that can be referenced repeatedly.

my_unchanging_layout <- layout_with_fr(grph.pos)
plot(grph.pos,
     vertex.label=NA,
     layout=my_unchanging_layout)

Looking at Specific Components

Sometimes we want to look at individual components of our graph (sets of nodes connected by at least one path).

It is easy to retrieve this information:

graph_components <- components(grph.pos)
graph_components

## $membership
##   A   B   D   E   K   L   M   N   O   P   S   W   X   Y A.1 B.1 C.1 E.1 
##   1   2   3   4   5   4   6   3   2   4   4   6   7   1   2   8   9  10 
## F.1 H.1 I.1 L.1 O.1 Q.1 R.1 S.1 T.1 U.1 V.1 X.1 Y.1 Z.1 B.2 D.2 E.2 F.2 
##   2   4   1   9   6   8   4   6   4   7   9   4   2  11   2  12  13   1 
## G.2 H.2 I.2 J.2 K.2 M.2 N.2 O.2 P.2 Q.2 S.2 T.2 V.2 W.2 X.2 Y.2 Z.2 A.3 
##   1   4  14   2  15   4   1  11   2  16   6  14   9   2   4  10  16  17 
## D.3 H.3 I.3 J.3 K.3 L.3 N.3 O.3 Q.3 U.3 Y.3 Z.3 A.4 C.4 E.4 F.4 H.4 J.4 
##   4   7   5   9   6   4  12   3   2   6   4   4   2  15   8   5   2   9 
## L.4 M.4 O.4 P.4 R.4 S.4 T.4 U.4 V.4 W.4 
##   4   3   4   2  13   6  17   1   6   9 
## 
## $csize
##  [1]  7 13  4 17  3  9  3  3  7  2  2  2  2  2  2  2  2
## 
## $no
## [1] 17

We can then visualize our largest component:

grph.largest.component <- 
  induced.subgraph(grph.pos,V(grph.pos)[which(graph_components$membership == which.max(graph_components$csize))])
plot(grph.largest.component,vertex.label=NA)

Or our second largest component (and so on):

#look at graph_components$csize to see which ID is second largest (in this case component 2)
grph.second.largest.component <- 
  induced.subgraph(grph.pos,V(grph.pos)[which(graph_components$membership == 2)])
plot(grph.second.largest.component,vertex.label=NA)

With very big networks this can be one way to easily break-down the visualization problem.

Highlighting Sets of Nodes

Say we want to highlight species “L” and all of the edges conected to it, we can do this by selecting all edges coming from the node using “E(grph.pos)[from(“L”)]”. Here we will color them vermillion:

#Change the color of vertex L
V(grph.pos)$color[V(grph.pos)$name=="L"] <- customvermillion
#Find attached edges
e.L <- E(grph.pos)[from("L")]
#Change the color of those edges
E(grph.pos)$color[E(grph.pos) %in% e.L] <- customvermillion
#Plot
plot(grph.pos,vertex.label=NA)

Maybe we want to highlight a group of nodes? Maybe all species starting with the letter “L”? This is also relatively easy (we accomplish this using the substr() function to get the first letter, but you could format this command differently to pull out a specific genus, for instance):

#Find nodes that start with :
L_index <- substr(V(grph.pos)$name,1,1)=="L"
Ls <- V(grph.pos)[L_index]
#Color them vermillion
V(grph.pos)$color[V(grph.pos) %in% Ls] <- customvermillion
#Find edges connected to these nodes
e.Ls <- E(grph.pos)[from(Ls)]
#Color them vermillion
E(grph.pos)$color[E(grph.pos) %in% e.Ls] <- customvermillion
#Plot
plot(grph.pos,vertex.label=NA)

We might also be interested in making all of the vertices adjacent to the set of “L” vertices a specific color. We can do this using the ego() function (by increasing the order argument we can include nodes 2 or more steps away rather than just immediately adjacent nodes):

#Find nodes <=1 edge away from our focal set
ego.Ls <- ego(grph.pos, order=1, nodes = Ls$name, mode = "all", mindist = 0)
#Color these nodes blue, but not including our original set
V(grph.pos)$color[V(grph.pos) %in% unlist(ego.Ls) & ! V(grph.pos) %in% Ls ] <- customblue
#Plot
plot(grph.pos,vertex.label=NA)

Labels

Labels are a pain. For large networks, my personal opinion is that you should not label more than a handful of nodes, otherwise it ends up being too much small text on the screen. If you must label your nodes, try offsetting labels a little with the vertex.label.dist argument, and playing with the size using vertex.label.cex:

plot(grph.pos,vertex.label.dist=0.75,
     vertex.label.cex=0.5,
     vertex.label.color="black")

Maybe you only want to show some labels (as I recommend above). One way to do this is to create a dummy metadata category with only a few labels in it:

#Create dummy label vertex property
V(grph.pos)$my_label <- ""
#Only add lables for species starting with "L"
V(grph.pos)$my_label[L_index] <- V(grph.pos)$name[L_index]
plot(grph.pos,vertex.label=V(grph.pos)$my_label,
     vertex.label.dist=0.75,
     vertex.label.cex=0.5,
     vertex.label.color="black")

Very Versatile

I’ve given you a brief intro, but the ability to assign specific values to various node/edge attributes means you can represent data in lots of fun ways. For example, maybe we want to color nodes diffferently based on the first letter of their species names (or e.g., by genus, family, phylum, etc.):

E(grph.pos)$color <- "black"
V(grph.pos)$color <- rainbow(26)[as.numeric(as.factor(substr(V(grph.pos)$name,1,1)))]
plot(grph.pos,vertex.label=NA,vertex.size=V(grph.pos)$size*3)

This versatility means that much of the visualization process is not automated - you will have to write at least some of the code to do many of the things you want yourself.