Raw data downloaded from the EPA website is in itself not very efficient to work with in R. The files are large
and would put a large strain on R when loading completely into the system's memory. Instead use this function
to build an SQLite database from the tables. That way, the data can be queried without having to load it all into
memory.
EPA provides the raw table from the ECOTOX database as text files with
pipe-characters ('|') as table column separators. Although not documented, the tables appear not to contain comment
or quotation characters. There are records containing the reserved pipe-character that will confuse the table parser.
For these records, the pipe-character is replaced with a dash character ('-').
In addition, while reading the tables as text files, this package attempts to decode the text as UTF8. Unfortunately,
this process appears to be platform-dependent, and may therefore result in different end-results on different platforms.
This problem only seems to occur for characters that are listed as 'control characters' under UTF8. This will have
consequences for reproducibility, but only if you build search queries that look for such special characters. It is
therefore advised to stick to common (non-accented) alpha-numerical characters in your searches, for the sake of
reproducibility.
Use 'suppressMessages()
' to suppress the progress report.