revision_diff generates a diff between two revisions in a MediaWiki page. This is provided as an XML-parsable blob inside the returned JSON object.
revision_diff(
language = NULL,
project = NULL,
domain = NULL,
revisions,
properties = c("ids", "flags", "timestamp", "user", "userid", "size", "sha1",
"contentmodel", "comment", "parsedcomment", "tags", "flagged"),
direction,
clean_response = FALSE,
...
)
The language code of the project you wish to query, if appropriate.
The project you wish to query ("wikiquote"), if appropriate.
Should be provided in conjunction with language
.
as an alternative to a language
and project
combination,
you can also provide a domain ("rationalwiki.org") to the URL constructor, allowing
for the querying of non-Wikimedia MediaWiki instances.
The revision IDs of each "start" revision.
Properties you're trying to retrieve about that revision, should you want to; options include "ids" (the revision ID of the revision...which is pointless), "flags" (whether the revision was 'minor' or not), "timestamp","user" (the username of the person who made that revision), "userid" (the userID of the person who made the revision), "size" (the size, in uncompressed bytes, of the revision), "sha1" (the SHA-1 hash of the revision text), "contentmodel" (the content model of the page, usually "wikitext"), "comment" (the revision summary associated with the revision), "parsedcomment" (the same, but parsed, generating HTML from any wikitext in that comment), "tags" (any tags associated with the revision) and "flagged" (the revision's status under Flagged Revisions).
The direction you want the diff to go in from the revisionID you have provided. Options are "prev" (compare to the previous revision on that page), "next" (compare to the next revision on that page) and "cur" (compare to the current, extant version of the page).
whether to do some basic sanitising of the resulting data structure.
further arguments to pass to httr's GET.
MediaWiki's API is deliberately designed to restrict users' ability to make computing-intense requests - such as diff computation. As a result, the API only allows requests for one uncached diff in each request. If you ask for multiple diffs, some uncached and some cached, you will be provided with the cached diffs, one of the uncached diffs, and a warning.
If you're going to be asking for a lot of diffs, some of which may not be cached, it may be more
sensible to retrieve the revisions themselves using revision_content
and compute the
diffs yourself.
page_content
for retrieving the current content of a specific page, and
revision_content
for retrieving the text of specific revisions.
if (FALSE) {
#Wikimedia diff
wp_diff <- revision_diff("en","wikipedia", revisions = 552373187, direction = "next")
#Non-Wikimedia diff
rw_diff <- revision_diff(domain = "rationalwiki.org", revisions = 88616, direction = "next")
}
Run the code above in your browser using DataLab