Learn R Programming

WikidataR (version 2.3.3)

disambiguate_QIDs: Disambiguate QIDs

Description

Interactive function that presents alternative possible QID matches for a list of text strings and provides options for choosing between alternatives, rejecting all presented alternatives, or creating new items. Useful in cases where a list of text strings may have either missing wikidata items or multiple alternative potential matches that need to be manually disambuguated. Can also used on lists of lists (see examples). For long lists of items, the process can be stopped partway through and the returned vector will indicate where the process was stopped.

Usage

disambiguate_QIDs(
  list,
  variablename = "variables",
  variableinfo = NULL,
  filter_property = NULL,
  filter_variable = NULL,
  filter_firsthit = FALSE,
  limit = 10
)

Arguments

list

a list or vector of text strings to find potential QID matches to. Can also be a list of lists (see examples)

variablename

type of items in the list that are being disambiguated (used in messages)

variableinfo

additional information about items that are being disambiguated (used in messages)

filter_property

property to filter on (e.g. "P31" to filter on "instance of")

filter_variable

values of that property to use to filter out (e.g. "Q571" to filter out books)

filter_firsthit

apply filter to the first match presented or only if alternatives requested? (default = FALSE, note: true is slower if filter not needed on most matches)

limit

number of alternative possible wikidata items to present if multiple potential matches

Value

a vector of:

QID

Selected QID (for when an appropriate Wikidata match exists)

CREATE

Mark that a new Wikidata item should be created (for when no appropriate Wikidata match yet exists)

NA

Mark that no Wikidata item is needed

STOP

Mark that the process was halted at this point (so that output can be used as input to the function later)

Examples

Run this code
# NOT RUN {
#Disambiguating possible QID matches for these music genres
#Results should be:
# "Q22731" as the first match
# "Q147538" as the first match
# "Q3947" as the second alternative match
disambiguate_QIDs(list=c("Rock","Pop","House"),
                 variablename="music genre")

#Disambiguating possible QID matches for these three words, but not the music genres
#This will take longer as the filtering step is slower
#Results should be:
# "Q22731" (the material) as the first match
# "Q147538" (the soft drink) as the second alternative match
# "Q3947" (the building) as the first match
disambiguate_QIDs(list=c("Rock","Pop","House"),
                 filter_property="instance of",
                 filter_variable="music genre",
                 filter_firsthit=TRUE,
                 variablename="concept, not the music genre")

#Disambiguating possible QID matches for the multiple expertise of
#these three people as list of lists
disambiguate_QIDs(list=list(alice=list("physics","chemistry","maths"),
                           barry=list("history"),
                           clair=list("law","genetics","ethics")),
                 variablename="expertise")
# }

Run the code above in your browser using DataLab