Wrapper function for the Jackknife-after-Bootstrap (JaB) algorithm for networks.
Usage
jab_network(
network,
central.func.name,
central.package.name = NULL,
central.func.args = NULL,
bootstrap.func.name,
bootstrap.package.name = NULL,
bootstrap.func.args = NULL,
B = 1000,
quant = 0.95,
nodes = NULL,
return.boot.samples = FALSE
)
Arguments
- network
The network as an
igraph
object.- central.func.name
A character string of the function name used to calculate the desired centrality statistic.
- central.package.name
(Optional) A character string of the name of the package that the
central.func.name
function is in. If left asNULL
if the function will be called as loaded in the users environment.- central.func.args
(Optional) A list of additional arguments the
central.func.name
function may need beyond thenetwork
object.- bootstrap.func.name
A character string of the
JaB::boostrap_
function name used to generate bootstrap samples. (e.g.bootstrap_snowboot()
)- bootstrap.package.name
(Optional) A character string of the name of the package that the
bootstrap.func.name
function is in. If left asNULL
if the function will be called as loaded in the users environment.- bootstrap.func.args
(Optional) A list of additional arguments the
bootstrap.func.name
function may need beyond thenetwork
object.- B
Number of bootstrap samples. Default is 1,000.
- quant
A numeric value specifying the upper quantile used to flag influential nodes (e.g.,
0.95
).- nodes
(Optional) A vector of node names to run JaB algorithm for. If
NULL
, algorithm will be run on all \(n\) nodes innetwork
.- return.boot.samples
Logical, should list of bootstrap samples used in algorithm be returned? Default is
FALSE
.
Value
If return.boot.samples
is FALSE
, returns a data frame containing:
Node_Number
: Numeric IDs of nodes.Node_Name
: Name of nodes.Orig_Stat
: The original centrality statistic of each node.Boot_mean
,Boot_sd
,Boot_skew
: The mean, standard deviation, and skewness of the elements of \(\Gamma_i\)Upper_Quantile
: Upper quantile of the jackknife-after-bootstrap distribution of centrality statistic for each node.Influential
: Logical indicating if each node is influential, i.e. isOrig_Stat
greater thanUpper_Quantile
?Rank
: Rank from most (1) to least (\(n\)) influential. There can be ties in the rankings.Can_Jackknife
: Logical indicating if there were bootstrap samples that did not include that node, meaning there are jackknife-after sample to generate the distribution of centrality statistics in networks that do not contain that node. IfFALSE
, then all bootstrap samples contained that node and theUpper_Quantile
column will generally beNA
. If many nodes areFALSE
it could mean that the bootstrap method is poorly tuned and is sampling more nodes that is appropriate for this data set. If only a few nodes areFALSE
it could mean the bootstrap method is poorly tuned for this data set, or it could mean that the node is extremely influential as it is highly improbable to generate a bootstrap sample that does not contain that node. Which explanation is appropriate depends on the data set and the bootstrap method used.Num_Boot_Samps
: Number of bootstrap samples used to construct the distribution in the Jackknife-after step. If there are \(B\) bootstrap samples inboot.result
, then \(B - Num_Boot_Samps\) bootstrap samples contained node \(v_i\) and, \(Num_Boot_Samps\) did not contain node \(v_i\) IfCan_Jackknife
isFALSE
, then this number will be 0 (i.e. all bootstrap samples contained node \(v_i\) and thus none of them can be used in the Jackknife-after step).
If return.boot.samples
is TRUE
, returns a list containing,
bootstrap
: List ofB
bootstrap samples.jack.after
: Results of the JaB algorithm as the data frame listed above.
Details
Suppose we have a network \(G\) with \(n\) nodes. Specify a centrality statistic \(\gamma\).
For each node \(i = 1,...,n\) we test the hypotheses \(H_0\): node \(i\) is not influential to the network, versus \(H_1\): node \(i\) is influential to the network.
Define \(q\in [0, 1]\) as the upper quantile cut off value and \(B\) as the number of bootstrap samples.
The JaB algorithm has three steps:
Original Centrality Step: Calculate \(n\) centrality statistics of the original network, \(\gamma_1, \gamma_2, ..., \gamma_n\).
Bootstrapping Step: Generate bootstrap samples \(G\pb=(V\pb, E\pb)\) for \(b=1,...,B\). For each bootstrap sample, calculate \(\{\gamma_j\pb : j=1,...,n\}\).
Jackknife-after Step: For \(i \in 1,...,n\), do:
A. Calculate \(S_i^{*b} = \boldsymbol{1}(v_i \in V^{*b})\) for \(b=1,...,B\) and construct \(\mathcal{B}_{-i} = \{G^{*b}: S_i^{*b} = 0)\) which is the set of all bootstrap sample networks that do not contain node i.
B. Let \(\Gamma_{-i} = \{\gamma_j^{*b} : G^{*b} \in \mathcal{B}_{-i}, j=1,...,n\}\) be the set of all centrality statistics calculated from all bootstrap sample networks that do not contain node i.
C. Calculate the cutoff value, \(q_{-i}\), the \(q^{\text{th}}\) quantile of \(\Gamma^{-i}\).
D. If \(\gamma_i > q_{-i}\), reject null hypothesis for node \(i\) and conclude that node \(i\) is influential.
Once we run the algorithm, we have a set of nodes that are influential a set of nodes that are not.
We can also generate a ranking of all nodes from most to least influential determined by \(\gamma_i - q_i\) in step 3C. Node \(i\) is considered influential when \(\gamma_i - q_i\) is large and positive, somewhat influential when \(\gamma_i - q_i\) is small and positive, somewhat not influential when \(\gamma_i - q_i\), and very not influential when \(\gamma_i - q_i\) is large and negative.
We construct bootstrap standard errors and confidence intervals the bootstrap samples generated in Step 2 this algorithm. Let \(\mathcal{B}_i = \{G^{*b} : S_i^{*b} = 1\}\) be the set of all bootstrap networks that include \(v_i\) and \(\Gamma_i = \{\gamma_j : v_j\pb = v_i , v_j\pb \in V\pb\}\) be the set of centrality statistics corresponding to the nodes that are sampled to be \(v_i\). Then the bootstrap standard error of \(\gamma_i\) is \(\sigma_i^{ *} = \sqrt{(|\Gamma_i |-1)^{-1} \sum_{\gamma_j \in \Gamma_i} (\gamma_j - \Bar{\gamma}_i)^2}\) where \(\Bar{\gamma}_i = |\Gamma_i|^{-1}\sum_{\gamma_j \in \Gamma_i} \gamma_j\). Using \(\sigma_i^{ *}\), bootstrap confidence intervals are constructed in the traditional way. For example, a 95% bootstrap confidence interval for \(\gamma_i\) is \((\gamma_i - 1.96\sigma_i^{ *} , \gamma_i + 1.96\sigma_i^{ *})\). For many centrality statistics \(\gamma_i\) must be a non-negative value. If it is the case that \(\gamma_i\) must be non-negative and \(\gamma_i - 1.96\sigma_i^{ *}<0\), we set the 95% bootstrap confidence interval to be \((0, \gamma_i + 1.96\sigma_i^{ *})\).
By construction \(\mathcal{B}_i \cap \mathcal{B}_{-i} = \emptyset\). The bootstrap samples in \(\mathcal{B}_i\) are used for the bootstrap standard errors and the bootstrap samples in \(\mathcal{B}_{-i}\) are used for the hypothesis test of node $v_i$'s influence.
jab_network
is a wrapper function for the entire JaB algorithm. For each step of the JaB algorithm,
the following functions are used:
Original Centrality Step: Calculate the original centrality statistics using
get_centrality()
according tocentral.func.name
.Bootstrapping Step: Generate Bootstrap samples according to
bootstrap.func.name
and calculate their centrality statistics usingget_bootstrap_centrality()
.Jackknife-after Step: Perform the "Jackknife-after" step of the algorithm with
get_jackknife_after()
.
See vignette("jab-networks")
for more details and examples.
Examples
library(igraphdata)
library(igraph)
data(karate)
# JaB with snowboot (1 seed and 2 waves), 1000 bootstrap samples,
# degree (from igraph), and cutoff quantile of 0.90
jab_network(
network = karate,
central.func.name = "degree",
central.package.name = "igraph",
central.func.args = list(normalized = TRUE),
bootstrap.func.name = "bootstrap_snowboot" ,
bootstrap.package.name = "JaB" ,
bootstrap.func.args = list( num.seed = 1, num.wave = 2 ),
B = 1000,
quant = 0.90,
nodes = NULL,
return.boot.samples = FALSE)
#> This graph was created by an old(er) igraph version.
#> Call upgrade_graph() on it to use with the current igraph version
#> For now we convert it on the fly...
#> # A tibble: 34 × 11
#> Node_Number Node_Name Orig_Stat Boot_mean Boot_sd Boot_skew Upper_Quantile
#> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Mr Hi 0.485 0.142 0.174 1458. 0.263
#> 2 34 John A 0.515 0.143 0.175 1434. 0.375
#> 3 33 Actor 33 0.364 0.144 0.175 1479. 0.235
#> 4 28 Actor 28 0.121 0.144 0.175 1479. 0.235
#> 5 29 Actor 29 0.0909 0.144 0.175 1479. 0.235
#> 6 26 Actor 26 0.0909 0.156 0.166 513. 0.333
#> 7 24 Actor 24 0.152 0.142 0.178 1257. 0.4
#> 8 6 Actor 6 0.121 0.137 0.178 1047. 0.375
#> 9 7 Actor 7 0.121 0.137 0.178 1047. 0.375
#> 10 30 Actor 30 0.121 0.139 0.177 1264. 0.4
#> # ℹ 24 more rows
#> # ℹ 4 more variables: Influential <lgl>, Rank <int>, Can_Jackknife <lgl>,
#> # Num_Boot_Samps <int>