This function takes the results of detect_groups and generates a network from the data. It performs the second step in coordinated detection analysis by identifying users who repeatedly engage in identical actions within a predefined time window. The function offers multiple options to identify various types of networks, allowing for filtering based on different edge weights and facilitating the extraction of distinct subgraphs. See details.
generate_coordinated_network(
x,
fast_net = FALSE,
edge_weight = 0.5,
subgraph = 0,
objects = FALSE
)
A weighted, undirected network (igraph object) where the vertices (nodes)
are users and edges (links) are the membership in coordinated groups (object_id
).
a data.table (result from detect_groups) with the
Columns: object_id
, account_id
, account_id_y
, content_id
, content_id_y
,
timedelta
If the data.table x has been updated with the flag_speed_share function and this parameter is set to TRUE, two columns weight_full and weight_fast are created, the first containing the edge weights of the full graph, the second those of the subgraph that includes the shares made in the narrower time window.
This parameter defines the edge weight threshold, expressed as a percentile of the edge weight distribution within the network. This applies also to the faster network, if 'fast_net' is set to TRUE (and the data is updated using the flag_speed_share function). Edges with a weight exceeding this threshold are marked as 0 (not exceeding) or 1 (exceeding). The parameter accepts any numeric value between 0 and 1. The default value is set to "0.5", representing the median value of edge weights in the network.
Generate and return the following subgraph (default value is 0, meaning that no subgraph is created):
If 1 reduces the graph to the subgraph whose edges have a value that exceeds the threshold given in the edge_weight parameter (weighted subgraph).
If 2 reduces the subgraph whose nodes exhibit coordinated behavior in the narrowest time window (as established with the flag_speed_share function), to the subgraph whose edges have a value that exceeds the threshold given in the edge_weight parameter (fast weighted subgraph).
If 3 reduces the graph to the subgraph whose nodes exhibit coordinated behavior in the narrowest time window established with the flag_speed_share function (fast subgraph), and the vertices adjacent to their edges. In other words, this option identifies the fastest network, along with a contextual set of accounts that shared the same objects but in the wider time window. It also add a vertex attribute color_v to facilitate further analyses or the generation of the graph plot. This attribute is 1 when for the coordinated accounts and 0 for the neighbor accounts.
Keep track of the IDs of shared objects for further analysis with
group_stats
(default FALSE). There could be a performance impact when this
option is set to TRUE, although the actual impact may vary. For smaller datasets,
the difference might be negligible. However, for very large datasets, or in
scenarios where optimal performance is crucial, you might experience a more
significant slowdown.
Two users may coincidentally share the same objects within the same time window, but it is unlikely that they do so repeatedly (Giglietto et al., 2020). Such repetition is thus considered an indicator of potential coordination. This function utilizes percentile edge weight to represent recurrent shares by the same user pairs within a predefined time window. By considering the edge weight distribution across the data and setting the percentile value p between 0 and 1, we can identify edges that fall within the top p percentile of the edge weight distribution. Selecting a sufficiently high percentile (e.g., 0.99) allows us to pinpoint users who share an unusually high number of objects (for instance, more than 99% of user pairs in the network) in the same time window.
The graph also incorporates the contribution of each node within the pair to
the pair's edge weight, specifically, the number of shared content_id
that
contribute to the edge weight. Additionally, an edge_symmetry_score
is
included, which equals 1 in cases of equal contributions from both users and
approaches 0 as the contributions become increasingly unequal.
The edge_symmetry_score is determined as the proportion of the unique
content_ids (unique content) shared by each vertex to the total content_ids
shared by both users.
This score, along with the value of contributions, can be utilized for further
filtering or examining cases where the score is particularly low. Working with
an undirected graph, it is plausible that the activity of highly active users
disproportionately affects the weight of edges connecting them to less active
users. For instance, if user A shares the same objects (object_id
) 100
times, and user B shares the same object only once, but within a time frame
that matches the time_window
defined in the parameter for all of user A's
100 shares, then the edge weight between A and B will be 100, although this
weight is almost entirely influenced by the hyperactivity of user A. The
edge_symmetry_score
, along with the counts of shares by each user user_id
and user_id_y
(n_content_id
and n_content_id_y
), allows for monitoring
and controlling this phenomenon.
Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). It takes a village to manipulate the media: coordinated link sharing behavior during 2018 and 2019 Italian elections. Information, Communication & Society, 23(6), 867-891.