The function summarizes input data into sufficient statistics for estimating the attachment function and node fitness, together with additional information about the data, such as total number of nodes, number of time-steps, maximum degree, and the final degree of the network, etc. . It also provides mechanisms to automatically deal with very large datasets by binning the degree, setting a degree threshold, or grouping time-steps.
get_statistics(net_object, only_PA = FALSE ,
only_true_deg_matrix = FALSE ,
binning = TRUE , g = 50 ,
deg_threshold = 0 ,
compress_mode = 0 , compress_ratio = 0.5 ,
custom_time = NULL)
An object of class PAFit_data
, which is a list. Some important fields are:
A matrix where the (t,k+1)
element is the number of nodes with degree \(k\) at time \(t\), counting among all the nodes whose number of new edges acquired is less than deg_thresh
A matrix where the (t,k+1)
element is the number of nodes with degree \(k\) at time \(t\)
A matrix where the (t,k+1)
element is the number of new edges connect to a degree-\(k\) node at time \(t\)
A vector where the (k+1)
-th element is the total number of edges that linked to a degree \(k\) node, counting over all time steps
A matrix recording the degree of all nodes (that satisfy degree_threshold
condition) at each time step
A vector where the t
-th element is the number of new edges at time \(t\)
A vector where the j
-th element is the total number of edges that linked to node \(j\)
Numeric. The number of nodes in the network
Numeric. The number of time steps
Numeric. The maximum degree in the final network
A vector contains the id of all nodes
A vector contains the final degree of all nodes (including those that do not satisfy the degree_threshold
condition)
Integer. The specified degree threshold.
Numeric vector. The index in the node_id
vector of the nodes we want to estimate (i.e. nodes whose number of new edges acquired is at least deg_thresh
)
Integer. The specified degree at which we start binning.
Numeric vector contains the beginning degree of each bin
Numeric vector contains the ending degree of each bin
Numeric vector contains the length of each bin.
Logical. Indicates whether binning was applied or not.
Integer. Number of bins
Integer. The mode of time compression.
Integer. The number of time stamps actually used
The time stamps that are actually used
Numeric.
Vector. The time stamps specified by user.
The parameters can be divided into four groups. The first group specifies input data and how the data will be summarized:
An object of class PAFit_net
. You can use the function as.PAFit_net
to convert from an edgelist matrix, function from_igraph
to convert from an igraph
object, function from_networkDynamic
to convert from a networkDynamic
object, and function graph_from_file
to read from a file.
Logical. Indicates whether only the statistics for estimating \(A_k\) are summarized. if TRUE
, the statistics for estimating \(\eta_i\) are NOT collected. This will save memory at the cost of unable to estimate node fitness). Default value is FALSE
.
Logical. Return only the true degree matrix (without binning), and no other statistics is returned. The result cannot be used in PAFit
function to estimate PA or fitness. The motivation for this option is that sometimes we only want to get a degree matrix that summarizes the growth process of a very big network for plotting etc. Default value is FALSE
.
Second group of parameters specifies how to bin the degrees:
Logical. Indicates whether the degree should be binned together. Default value is TRUE
.
Positive integer. Number of bins. Should be at least 3
. Default value is 50
.
Third group contains a single parameter specifying how to reduce the number of node fitnesses:
Integer. We only estimate the fitnesses of nodes whose number of new edges acquired is at least deg_threshold
. The fitnesses of all other nodes are fixed at 1
. Default value is 0
.
Last group of parameters specifies how to group the time-stamps:
Integer. Indicates whether the timeline should be compressed. The value of CompressMode:
0
: No compression
1
: Compressed by using a subset of time-steps. The time stamps in this subset are equally spaced. The size of this subset is CompressRatio
times the size of the set of all time stamps.
2
: Compressed by only starting from the first time-step when \(CompressRatio*100\) percentages of the total number of edges (in the final state of the network) had already been added to the network.
3
: This mode offers the most flexibility, but requires user to supply the time stamps in CustomTime
. Only time stamps in this CustomTime
will be used. This mode can be used, for example, when investigating the change of the attachment function or node fitness in different time intervals.
Default value is 0
, i.e. no compression.
Numeric. Indicates how much we should compress if CompressMode is 1
or 2
. Default value is 0.5
.
Vector. Custom time stamps. This vector is a subset of the vector that contains all time-stamps. Only effective if CompressMode == 3
. In that case, only these time stamps are used.
Thong Pham thongphamthe@gmail.com
For creating the needed input for this function (a PAFit_net
object), see as.PAFit_net
, from_igraph
, from_networkDynamic
, and graph_from_file
.
For the next step, see Newman
, Jeong
or only_A_estimate
for estimating the attachment function in isolation, only_F_estimate
for estimating node fitnesses in isolation, and joint_estimate
for joint estimation of the attachment function and node fitnesses.
library("PAFit")
net <- generate_BA(N = 100 , m = 1)
net_stats <- get_statistics(net)
summary(net_stats)
Run the code above in your browser using DataLab