Learn R Programming

galah (version 1.4.0)

search_profile_attributes: Search for which quality filters are applied by a data quality profile

Description

Each data quality profile is made up of a series of filters. While some users may wish to simply trust the default filters, it is often useful to check what information they return, particularly if advanced customization is needed. This function gives all of the arguments built into a specific profile.

Usage

search_profile_attributes(profile)

Arguments

profile

string: a data quality profile name, short name or id. See show_all_profiles() for valid filters

Value

A data.frame of profile attributes, consisting of a free text description and the actual filter used.

Examples

To find all the data quality arguments used in the profile "CSDM"

search_profile_attributes("CSDM")
#> # A tibble: 4 x 2
#>   description                                                               filter                       
#>   <chr>                                                                     <chr>                        
#> 1 "Include only records where Spatial validity is \"true\""                 "spatiallyValid:\"true\""    
#> 2 "Exclude potential duplicate records"                                     "-isDuplicateOf:*"           
#> 3 "Exclude all records that are an outlier against any environmental layer" "-outlierLayerCount:[1 TO *]"
#> 4 "Include only records where Year is 1970 to 2099"                         "year:[1970 TO *]"

Then get a free-text description of each filter used in the "CSDM" profile

profile_info <- search_profile_attributes("CSDM")
profile_info$description
#> [1] "Include only records where Spatial validity is \"true\""                
#> [2] "Exclude potential duplicate records"                                    
#> [3] "Exclude all records that are an outlier against any environmental layer"
#> [4] "Include only records where Year is 1970 to 2099"

See Also

show_all_profiles() for a list of valid profiles; galah_filter() for how to include this information in a data query.