Learn R Programming

detector

detector makes detecting data containing Personally Identifiable Information (PII) quick, easy, and scalable. It provides high-level functions that can take vectors and data.frames and return important summary statistics in a convenient data.frame. Once complete, detector will be able to detect the following types of PII:

  • Full name
  • Home address
  • E-mail address
  • National identification number
  • Passport number
  • Social Security number
  • IP address
  • Vehicle registration plate number
  • Driver's license number
  • Credit card number
  • Date of birth
  • Birthplace
  • Telephone number
  • Latitude and longtiude

State of the Union

Complete!

  • E-mail address
  • Telephone number
  • National identification number

Needs more work...

  • Credit card number

Haven't even started :(

  • Full name
  • Date of birth
  • Home address
  • IP address
  • Vehicle registration plate number
  • Driver's license number
  • Birthplace
  • Latitude and longtiude

Installation

You can install:

  • the latest released version from CRAN with

    install.packages("detector")
  • the latest development version from github with

    if (packageVersion("devtools") < 1.6) {
      install.packages("devtools")
    }
    devtools::install_github("paulhendricks/detector")

If you encounter a clear bug, please file a minimal reproducible example on github.

API

Generate data containing fake PII

library(dplyr, warn.conflicts = FALSE)
library(generator)
n <- 6
ashley_madison <- 
  data.frame(name = r_full_names(n), 
             email = r_email_addresses(n), 
             phone_number = r_phone_numbers(n, use_hyphens = TRUE, 
                                            use_spaces = TRUE), 
             stringsAsFactors = FALSE)
ashley_madison %>% 
  knitr::kable(format = "markdown")
nameemailphone_number
Leonardo Rodriguezxed@be.eny254- 851- 6814
Dee Riceecfoa@rtnlyudbe.yhj597- 978- 5193
Conception Marquardtwnz@xid.anc184- 962- 8153
Collette Nitzschetqghfxe@fsleqhmnjd.wkh475- 723- 2947
Norman Pfannerstilloyhl@szxby.mag153- 674- 4219
Katelin Gislasonvq@zatsl.wyg831- 847- 1568

Detect data containing PII

library(detector)
ashley_madison %>% 
  detect %>% 
  knitr::kable(format = "markdown")
column_namehas_email_addresseshas_phone_numbershas_national_identification_numbers
nameFALSEFALSEFALSE
emailTRUEFALSEFALSE
phone_numberFALSETRUEFALSE

Copy Link

Version

Install

install.packages('detector')

Monthly Downloads

139

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

August 27th, 2015

Functions in detector (0.1.0)

has_email_addresses

Test if a character vector has any e-mail addresses.
is_email_address

Test if a string is an e-mail address.
has_national_identification_numbers

Test if a character vector has any national identification numbers.
has_phone_numbers

Test if a character vector has any phone numbers.
detect

Detect if a data object contains PII.
is_national_identification_number

Test if a string is a national identification number.
detector

detector: Detect Data Containing Personally Identifiable Information
is_phone_number

Test if a string is a phone number.