Learn R Programming

PAFit (version 1.2.10)

only_A_estimate: Estimating the attachment function in isolation by PAFit method

Description

This function estimates the attachment function \(A_k\) by PAFit method. The method has a hyper-parameter \(r\). It first performs a cross-validation step to select the optimal parameter \(r\) for the regularization of \(A_k\), then uses that \(r\) to estimate the attachment function with the full data.

Usage

only_A_estimate(net_object                             , 
                net_stat   = get_statistics(net_object), 
                p          = 0.75                      ,
                stop_cond  = 10^-8                     , 
                mode_reg_A = 0                         ,
                MLE        = FALSE                     ,
               ...)

Value

Outputs a Full_PAFit_result object, which is a list containing the following fields:

  • cv_data: a CV_Data object which contains the cross-validation data. This is the final Normally the user does not need to pay attention to this data. NULL if MLE = TRUE.

  • cv_result: a CV_Result object which contains the cross-validation result. Normally the user does not need to pay attention to this data. NULL if MLE = TRUE.

  • estimate_result: this is a PAFit_result object which contains the estimated PA function and its confidence interval. It also includes the estimated attachment exponenent \(\alpha\) (assuming the model \(A_k = k^\alpha\)) in the field alpha, and the confidence interval of \(\alpha\) (in the field ci) when possible. In particular, the important fields are:

    • ratio: this is the selected value for the hyper-parameter \(r\).

    • k and A: a degree vector and the estimated PA function.

    • var_A: the estimated variance of \(A\).

    • var_logA: the estimated variance of \(log A\).

    • upper_A: the upper value of the interval of two standard deviations around \(A\).

    • lower_A: the lower value of the interval of two standard deviations around \(A\).

    • center_k and theta: when we perform binning, these are the centers of the bins and the estimated PA values for those bins. theta is similar to A but with duplicated values removed.

    • var_bin: the variance of theta. Same as var_A but with duplicated values removed.

    • upper_bin: the upper value of the interval of two standard deviations around theta. Same as upper_A but with duplicated values removed.

    • lower_lower: the lower value of the interval of two standard deviations around theta. Same as lower_A but with duplicated values removed.

    • g: the number of bins used.

    • alpha and ci: alpha is the estimated attachment exponenet \(\alpha\) (when assume \(A_k = k^\alpha\)), while ci is the confidence interval.

    • loglinear_fit: this is the fitting result when we estimate \(\alpha\).

    • objective_value: values of the objective function over iterations in the final run with the full data.

    • diverge_zero: logical value indicates whether the algorithm diverged in the final run with the full data.

Arguments

net_object

an object of class PAFit_net that contains the network.

net_stat

An object of class PAFit_data which contains summerized statistics needed in estimation. This object is created by the function get_statistics. The default value is get_statistics(net_object).

p

Numeric. This is the ratio of the number of new edges in the learning data to that of the full data. The data is then divided into two parts: learning data and testing data based on p. The learning data is used to learn the node fitnesses and the testing data is then used in cross-validation. Default value is 0.75.

stop_cond

Numeric. The iterative algorithm stops when \(abs(h(ii) - h(ii + 1)) / (abs(h(ii)) + 1) < stop.cond\) where \(h(ii)\) is the value of the objective function at iteration \(ii\). We recommend to choose stop.cond at most equal to \(10^(- number of digits of h - 2)\), in order to ensure that when the algorithm stops, the increase in posterior probability is less than 1% of the current posterior probability. Default is 10^-8. This threshold is good enough for most applications.

mode_reg_A

Binary. Indicates which regularization term is used for \(A_k\):

  • 0: This is the regularization term used in Ref. 1 and 2. Please refer to Eq. (4) in the tutorial for the definition of the term. It approximately enforces the power-law form \(A_k = k^\alpha\). This is the default value.

  • 1: Unlike the default, this regularization term exactly enforces the functional form \(A_k = k^\alpha\). Please refer to Eq. (6) in the tutorial for the definition of the term. Its main drawback is it is significantly slower to converge, while its gain over the default one is marginal in most cases.

MLE

Logical. If TRUE, then not perform cross-validation and estimate the PA function with r = 0, i.e., maximum likelihood estimation. Default is FALSE. One might want to set this option to TRUE when one believes that there are sufficient data to get a reasonable MLE result, or when one wants to compare the default, regularized result with the MLE result.

...

Other arguments to pass to the underlying algorithm.

Author

Thong Pham thongphamthe@gmail.com

References

1. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. (tools:::Rd_expr_doi("10.1371/journal.pone.0137796")).

2. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. (tools:::Rd_expr_doi("10.1038/srep32558")).

See Also

See get_statistics for how to create summerized statistics needed in this function.

See Newman and Jeong for other methods to estimate the attachment function \(A_k\) in isolation.

Examples

Run this code
if (FALSE) {
  library("PAFit")
  set.seed(1)
  #### Example 1: Linear preferential attachment  #########
  # a network from BA model
  net        <- generate_net(N = 1000 , m = 50 , mode = 1, alpha = 1, s = 0)
  
  net_stats  <- get_statistics(net, only_PA = TRUE)
  result     <- only_A_estimate(net, net_stats)
 
  # plot the estimated attachment function
  plot(result, net_stats)
  
  # true function
  true_A     <- result$estimate_result$center_k
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
  
  #### Example 2: a non-log-linear preferential attachment  #########
  # A_k = alpha* log (max(k,1))^beta + 1, with alpha = 2, and beta = 2
  set.seed(1)
  net        <- generate_net(N = 1000 , m = 50 , mode = 3, alpha = 2, beta = 2, s = 0)
  
  net_stats  <- get_statistics(net,only_PA = TRUE)
  result     <- only_A_estimate(net, net_stats)
 
  # plot the estimated attachment function
  plot(result, net_stats)
  
  # true function
  true_A     <- 2 * log(pmax(result$estimate_result$center_k,1))^2 + 1 # true function
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
  
  #############################################################################
  #### Example 3: another non-log-linear preferential attachment kernel ############
  set.seed(1)
  # A_k = min(max(k,1),sat_at)^alpha, with alpha = 1, and sat_at = 200
  # inverse variance of the distribution of node fitnesse = 10
  net        <- generate_net(N = 1000 , m = 50 , mode = 2, alpha = 1, sat_at = 200, s = 0)
  net_stats  <- get_statistics(net, only_PA = TRUE)
  
  result     <- only_A_estimate(net, net_stats)
  
  
  # plot the estimated attachment function
  true_A     <- pmin(pmax(result$estimate_result$center_k,1),200)^1 # true function
  plot(result , net_stats, max_A = max(true_A,result$estimate_result$theta))
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
  }

Run the code above in your browser using DataLab