JaB Algorithm for Network Data

Wrapper function for the Jackknife-after-Bootstrap (JaB) algorithm for networks.

Usage

jab_network(
  network,
  central.func.name,
  central.package.name = NULL,
  central.func.args = NULL,
  bootstrap.func.name,
  bootstrap.package.name = NULL,
  bootstrap.func.args = NULL,
  B = 1000,
  quant = 0.95,
  nodes = NULL,
  return.boot.samples = FALSE
)

Arguments

network: The network as an igraph object.
central.func.name: A character string of the function name used to calculate the desired centrality statistic.
central.package.name: (Optional) A character string of the name of the package that the central.func.name function is in. If left as NULL if the function will be called as loaded in the users environment.
central.func.args: (Optional) A list of additional arguments the central.func.name function may need beyond the network object.
bootstrap.func.name: A character string of the JaB::boostrap_ function name used to generate bootstrap samples. (e.g. bootstrap_snowboot())
bootstrap.package.name: (Optional) A character string of the name of the package that the bootstrap.func.name function is in. If left as NULL if the function will be called as loaded in the users environment.
bootstrap.func.args: (Optional) A list of additional arguments the bootstrap.func.name function may need beyond the network object.
B: Number of bootstrap samples. Default is 1,000.
quant: A numeric value specifying the upper quantile used to flag influential nodes (e.g., 0.95).
nodes: (Optional) A vector of node names to run JaB algorithm for. If NULL, algorithm will be run on all $n$ nodes in network.
return.boot.samples: Logical, should list of bootstrap samples used in algorithm be returned? Default is FALSE.

Value

If return.boot.samples is FALSE, returns a data frame containing:

Node_Number: Numeric IDs of nodes.
Node_Name: Name of nodes.
Orig_Stat: The original centrality statistic of each node.
Boot_mean, Boot_sd, Boot_skew: The mean, standard deviation, and skewness of the elements of $\Gamma_i$
Upper_Quantile: Upper quantile of the jackknife-after-bootstrap distribution of centrality statistic for each node.
Influential: Logical indicating if each node is influential, i.e. is Orig_Stat greater than Upper_Quantile?
Rank: Rank from most (1) to least ($n$) influential. There can be ties in the rankings.
Can_Jackknife: Logical indicating if there were bootstrap samples that did not include that node, meaning there are jackknife-after sample to generate the distribution of centrality statistics in networks that do not contain that node. If FALSE, then all bootstrap samples contained that node and the Upper_Quantile column will generally be NA. If many nodes are FALSE it could mean that the bootstrap method is poorly tuned and is sampling more nodes that is appropriate for this data set. If only a few nodes are FALSE it could mean the bootstrap method is poorly tuned for this data set, or it could mean that the node is extremely influential as it is highly improbable to generate a bootstrap sample that does not contain that node. Which explanation is appropriate depends on the data set and the bootstrap method used.
Num_Boot_Samps: Number of bootstrap samples used to construct the distribution in the Jackknife-after step. If there are $B$ bootstrap samples in boot.result, then $B - Num_Boot_Samps$ bootstrap samples contained node $v_i$ and, $Num_Boot_Samps$ did not contain node $v_i$ If Can_Jackknife is FALSE, then this number will be 0 (i.e. all bootstrap samples contained node $v_i$ and thus none of them can be used in the Jackknife-after step).

If return.boot.samples is TRUE, returns a list containing,

bootstrap: List of B bootstrap samples.
jack.after: Results of the JaB algorithm as the data frame listed above.

Details

Suppose we have a network $G$ with $n$ nodes. Specify a centrality statistic $\gamma$.

For each node $i = 1,...,n$ we test the hypotheses $H_0$: node $i$ is not influential to the network, versus $H_1$: node $i$ is influential to the network.

Define $q\in [0, 1]$ as the upper quantile cut off value and $B$ as the number of bootstrap samples.

The JaB algorithm has three steps:

Original Centrality Step: Calculate $n$ centrality statistics of the original network, $\gamma_1, \gamma_2, ..., \gamma_n$.
Bootstrapping Step: Generate bootstrap samples $G\pb=(V\pb, E\pb)$ for $b=1,...,B$. For each bootstrap sample, calculate $\{\gamma_j\pb : j=1,...,n\}$.
Jackknife-after Step: For $i \in 1,...,n$, do:

A. Calculate $S_i^{*b} = \boldsymbol{1}(v_i \in V^{*b})$ for $b=1,...,B$ and construct $\mathcal{B}_{-i} = \{G^{*b}: S_i^{*b} = 0)$ which is the set of all bootstrap sample networks that do not contain node i.

B. Let $\Gamma_{-i} = \{\gamma_j^{*b} : G^{*b} \in \mathcal{B}_{-i}, j=1,...,n\}$ be the set of all centrality statistics calculated from all bootstrap sample networks that do not contain node i.

C. Calculate the cutoff value, $q_{-i}$, the $q^{\text{th}}$ quantile of $\Gamma^{-i}$.

D. If $\gamma_i > q_{-i}$, reject null hypothesis for node $i$ and conclude that node $i$ is influential.

Once we run the algorithm, we have a set of nodes that are influential a set of nodes that are not.

We can also generate a ranking of all nodes from most to least influential determined by $\gamma_i - q_i$ in step 3C. Node $i$ is considered influential when $\gamma_i - q_i$ is large and positive, somewhat influential when $\gamma_i - q_i$ is small and positive, somewhat not influential when $\gamma_i - q_i$, and very not influential when $\gamma_i - q_i$ is large and negative.

We construct bootstrap standard errors and confidence intervals the bootstrap samples generated in Step 2 this algorithm. Let $\mathcal{B}_i = \{G^{*b} : S_i^{*b} = 1\}$ be the set of all bootstrap networks that include $v_i$ and $\Gamma_i = \{\gamma_j : v_j\pb = v_i , v_j\pb \in V\pb\}$ be the set of centrality statistics corresponding to the nodes that are sampled to be $v_i$. Then the bootstrap standard error of $\gamma_i$ is $\sigma_i^{ *} = \sqrt{(|\Gamma_i |-1)^{-1} \sum_{\gamma_j \in \Gamma_i} (\gamma_j - \Bar{\gamma}_i)^2}$ where $\Bar{\gamma}_i = |\Gamma_i|^{-1}\sum_{\gamma_j \in \Gamma_i} \gamma_j$. Using $\sigma_i^{ *}$, bootstrap confidence intervals are constructed in the traditional way. For example, a 95% bootstrap confidence interval for $\gamma_i$ is $(\gamma_i - 1.96\sigma_i^{ *} , \gamma_i + 1.96\sigma_i^{ *})$. For many centrality statistics $\gamma_i$ must be a non-negative value. If it is the case that $\gamma_i$ must be non-negative and $\gamma_i - 1.96\sigma_i^{ *}<0$, we set the 95% bootstrap confidence interval to be $(0, \gamma_i + 1.96\sigma_i^{ *})$.

By construction $\mathcal{B}_i \cap \mathcal{B}_{-i} = \emptyset$. The bootstrap samples in $\mathcal{B}_i$ are used for the bootstrap standard errors and the bootstrap samples in $\mathcal{B}_{-i}$ are used for the hypothesis test of node $v_i$'s influence.

jab_network is a wrapper function for the entire JaB algorithm. For each step of the JaB algorithm, the following functions are used:

Original Centrality Step: Calculate the original centrality statistics using get_centrality() according to central.func.name.
Bootstrapping Step: Generate Bootstrap samples according to bootstrap.func.name and calculate their centrality statistics using get_bootstrap_centrality().
Jackknife-after Step: Perform the "Jackknife-after" step of the algorithm with get_jackknife_after().

See vignette("jab-networks") for more details and examples.

Examples

library(igraphdata)
library(igraph)
data(karate)

# JaB with snowboot (1 seed and 2 waves), 1000 bootstrap samples,
# degree (from igraph), and cutoff quantile of 0.90

jab_network(
  network = karate,
  central.func.name = "degree",
  central.package.name = "igraph",
  central.func.args = list(normalized = TRUE),
  bootstrap.func.name = "bootstrap_snowboot" ,
  bootstrap.package.name = "JaB" ,
  bootstrap.func.args = list( num.seed = 1, num.wave = 2 ),
  B = 1000,
  quant = 0.90,
  nodes = NULL,
  return.boot.samples = FALSE)
#> This graph was created by an old(er) igraph version.
#>   Call upgrade_graph() on it to use with the current igraph version
#>   For now we convert it on the fly...
#> # A tibble: 34 × 11
#>    Node_Number Node_Name Orig_Stat Boot_mean Boot_sd Boot_skew Upper_Quantile
#>          <int> <chr>         <dbl>     <dbl>   <dbl>     <dbl>          <dbl>
#>  1           1 Mr Hi        0.485      0.142   0.174     1458.          0.263
#>  2          34 John A       0.515      0.143   0.175     1434.          0.375
#>  3          33 Actor 33     0.364      0.144   0.175     1479.          0.235
#>  4          28 Actor 28     0.121      0.144   0.175     1479.          0.235
#>  5          29 Actor 29     0.0909     0.144   0.175     1479.          0.235
#>  6          26 Actor 26     0.0909     0.156   0.166      513.          0.333
#>  7          24 Actor 24     0.152      0.142   0.178     1257.          0.4  
#>  8           6 Actor 6      0.121      0.137   0.178     1047.          0.375
#>  9           7 Actor 7      0.121      0.137   0.178     1047.          0.375
#> 10          30 Actor 30     0.121      0.139   0.177     1264.          0.4  
#> # ℹ 24 more rows
#> # ℹ 4 more variables: Influential <lgl>, Rank <int>, Can_Jackknife <lgl>,
#> #   Num_Boot_Samps <int>

Usage

Arguments

Value

Details

See also

Examples