Function returns dataframe containing CIK number, company name,
date of filing, accession number, and various sentiment measures.
This function takes the help of Loughran-McDonald (L&M) sentiment
dictionaries (https://sraf.nd.edu/loughranmcdonald-master-dictionary/) to
compute sentiment measures of a EDGAR filing. Following are the
definitions of the text characteristics and the sentiment measures:
file.size = The filing size of a complete filing on the EDGAR server in
kilobyte (KB).
word.count = The total number of words in a filing text, excluding HTML
tags and exhibits text.
unique.word.count = The total number of unique words in a filing text,
excluding HTML tags and exhibits text.
stopword.count = The total number of stop words in a filing text,
excluding exhibits text.
char.count = The total number of characters in a filing text, excluding
HTML tags and exhibits text.
complex.word.count = The total number of complex words in the filing text.
When vowels (a, e, i, o, u) occur more than three times in a word, then
that word is identified as a complex word.
lm.dictionary.count = The number of words in the filing text that occur
in the Loughran-McDonald (LM) master dictionary.
lm.negative.count = The number of LM financial-negative words in the
document.
lm.positive.count = The number of LM financial-positive words in the
document.
lm.strong.modal.count = The number of LM financial-strong modal words
in the document.
lm.moderate.modal.count = The number of LM financial-moderate Modal
words in the document.
lm.weak.modal.count = The number of LM financial-weak modal words in
the document.
lm.uncertainty.count = The number of LM financial-uncertainty words
in the document.
lm.litigious.count = The number of LM financial-litigious words in
the document.
hv.negative.count = The number of words in the document that occur in
the 'Harvard General Inquirer' Negative word list, as defined by LM.