a data.frame of 3 columns; "license" containing the license IDs,
"content" containing the actual text, and "retrieved_from" indicating the
URL the content was returned from. In the event that the plaintext of a license
is not available (because a URL for it was not in the metadata), content
and retrieved_from will be NAs.