WikipediaR (version 1.1)

WikipediaR-package: R-Based Wikipedia Client


Provides an interface to the Wikipedia web API. Three functions provide details for a specific Wikipedia page ; the links function lists all links that are present, the backLinks function lists all pages that link to, the contribs function lists all the contributions (revisions for main pages, and discussions for talk pages). The page can be defined by the parameter "page", as the title, i.e. a character string, or the page ID, a numeric value. The character string title can include spaces and special characters, and lower/upper case letters are taking in account. Two functions provides details for a specific user ; the userContribs function lists all contributions, and the userInfo function provides general information (as name, gender, rights or groups). The user is defined by his or her name. Lower/upper case letters are taking in account. The domain can be specified in all the functions, by the parameter "domain". the default domain is "en" for the https://en.wikipedia.org. It provides additional information compared to others packages, as WikipediR. It does not need login. The multiplex network that can be constructed from the results of the functions of WikipediaR can be modeled as Stochastic Block Model as in Barbillon P. et al.



As part of a PRES Sorbonne Paris Cite project, Paris Descartes statisticians, computer scientists and sociologists Paris Diderot Sciences Po are working on the problem of multi-level networks. One part of the project is to analyze data extracted from Wikipedia with the free software R.

Like the twitteR package that provides an interface to the Twitter web API, the objective of the WikipediaR package is to provide a way to access to data extracted from Wikipedia, and return it in a exploitable format in R. API means application programming interface.

A package already exists with a similar objective : WikipediR. This package is under progress, and as it permits modifications on the Wikipedia base, it needs a connexion, with rights. To see more details about this package, you can visit http://ironholds.org/blog/introducing-wikipedir/.

Our package use the XML package to interact with wikipedia, via MediaWiki API syntax. You can see this syntax here: http://en.wikipedia.org/w/api.php.

Which others packages interact with MediaWiki API ? The Tiki Wiki CMS/Groupware framework has an R plugin ( PluginR ) to run R code from wiki pages, and use data from their own collected web databases (trackers). A demo: http://r.tiki.org

The wikibooks package provides functions and datasets of the german WikiBook "GNU R".

Remark1: "fr" and "en" domains have been tested, but others can lead to not anticipated problem. Trying domain="gu" is at your peril... The encoding is UTF-8 for most of the output.

Remark2: as the functions get informations on internet in real time, the execution time depends on your internet connection !


backLinks               lists pages that link to the Wikipedia page
contribs                lists contributions for a specific wikipedia page
links                   links on a Wikipedia page
testWikiPage            internal function testWikiPage
testWikiUser            internal function testWikiPage
userContribs            list of contributions for a specific user
userInfo                General information for a Wikipedia user


Barbillon P., Donnet, S., Lazega E., and Bar-Hen A. : Stochastic Block Models for Multiplex networks: an application to networks of researchers, ArXiv 1501.06444, http://arxiv.org/abs/1501.06444.

