If it can be assumed that matches should only occur within a given time range (e.g., event data should match news items after the event occured) a low effort validation can be obtained by looking at whether the matches only occur within this time range. This function plots the percentage of matches within a given time range (hourdiff) for different thresholds of the weight column. This can be used to determine a good threshold.
hourdiff_range_thresholds(
g,
breaks = 20,
hourdiff_range = c(0, Inf),
min_weight = NA,
min_hourdiff = NA,
max_hourdiff = NA
)
Nothing... just plots
The output of newsflow.compare (either as "igraph" or "edgelist")
The number of breaks for the weight threshold
The time period (hourdiff range) in which the match 'should' occur.
Optionally, filter out all value below the given weight
the lowest possible hourdiff value. This is used to estimate noise. If not specified, will be estimated based on data.
the highest possible hourdiff value.