detector
detector
makes detecting data containing Personally Identifiable Information (PII) quick, easy, and scalable. It provides high-level functions that can take vectors and data.frames and return important summary statistics in a convenient data.frame. Once complete, detector
will be able to detect the following types of PII:
- Full name
- Home address
- E-mail address
- National identification number
- Passport number
- Social Security number
- IP address
- Vehicle registration plate number
- Driver's license number
- Credit card number
- Date of birth
- Birthplace
- Telephone number
- Latitude and longtiude
State of the Union
Complete!
- E-mail address
- Telephone number
- National identification number
Needs more work...
- Credit card number
Haven't even started :(
- Full name
- Date of birth
- Home address
- IP address
- Vehicle registration plate number
- Driver's license number
- Birthplace
- Latitude and longtiude
Installation
You can install:
the latest released version from CRAN with
install.packages("detector")
the latest development version from github with
if (packageVersion("devtools") < 1.6) { install.packages("devtools") } devtools::install_github("paulhendricks/detector")
If you encounter a clear bug, please file a minimal reproducible example on github.
API
Generate data containing fake PII
library(dplyr, warn.conflicts = FALSE)
library(generator)
n <- 6
ashley_madison <-
data.frame(name = r_full_names(n),
email = r_email_addresses(n),
phone_number = r_phone_numbers(n, use_hyphens = TRUE,
use_spaces = TRUE),
stringsAsFactors = FALSE)
ashley_madison %>%
knitr::kable(format = "markdown")
name | phone_number | |
---|---|---|
Leonardo Rodriguez | xed@be.eny | 254- 851- 6814 |
Dee Rice | ecfoa@rtnlyudbe.yhj | 597- 978- 5193 |
Conception Marquardt | wnz@xid.anc | 184- 962- 8153 |
Collette Nitzsche | tqghfxe@fsleqhmnjd.wkh | 475- 723- 2947 |
Norman Pfannerstill | oyhl@szxby.mag | 153- 674- 4219 |
Katelin Gislason | vq@zatsl.wyg | 831- 847- 1568 |
Detect data containing PII
library(detector)
ashley_madison %>%
detect %>%
knitr::kable(format = "markdown")
column_name | has_email_addresses | has_phone_numbers | has_national_identification_numbers |
---|---|---|---|
name | FALSE | FALSE | FALSE |
TRUE | FALSE | FALSE | |
phone_number | FALSE | TRUE | FALSE |