When topic names and corresponding years are given, this function computes
the rise and fall trend during the period by lm
.
topic_trend(year, topic, relative = FALSE, zero = 0)
a numeric vector of years for corresponding topics,
if it is not numeric, the function will try to
coerce. The years should be written in full-digit, that is, if they are 1998 and 2013,
do not simply write 98 and 13. No NA
is allowed. And, the number of
unique years is at least 3, otherwise an error will be raised.
a character vector of topics. If it is not character, the function will
try to coerce. The length of topic and year should be the same. No NA
is allowed.
if FALSE
(default), the numbers of topics is used. If
TRUE
, the percentage of a topic in a year against the total number
of that year is used. Suppose this year we have 200 texts on art, and the total
number of texts in this year is 1000, then the relative value
is 200/1000 = 0.2 rather than the absolute number 200. Note: if to use
relative value, NA
of the amount of a topic will be
automatically set to 0.
this can only be 0 (default) or NA
. Suppose we have
0 text on a certain topic, then you will make sure whether the amount
is really 0, or the data of this topic in that year is missing. Set this
argument to NA
to make all 0 into NA
.
a list. The 1st element is trend info. The 2nd is a summary of amount
of each topic in each year. If argument relative is TRUE
, a 3rd
element is returned, which is the relative value (percentage) of each
topic in each year.
The detail of trend info in the result is as follows:
(1) trendIndex: a regression with function lm
is done for every
topic with year as x and amount of topics as y. The value of trendIndex
is the slope k in y = kx+b.
(2) trendLevel: the p value of k.
(3) totalTrend: if trendIndex is larger than 0, then "rise", otherwise "fall". If trendLevel is smaller than 0.05, than "significant rise" or "significant fall".
(4) maxminYear: if totalTrend is "rise" or "significant rise", then this value points out which year has the largest amount. If several years have the largest value, the most recent year is returned. If totalTrend is "fall" or "significant fall", the year has the smallest amount is returned.
(5) detailTrend: if totalTrend is "rise" or "significant rise", then the function will see whether the year has the largest amount is the last year, if it is, then "rise along", otherwise "rise and fall". If totalTrend is "fall" or "significant fall", the function will see whether the year has the smallest amount is the last year, if it is, then "fall along", otherwise "fall and rise".
(6) simpleTrend: it is simply whether the amount of the last year is larger than that of the first year. If yes, then "rise", if smaller, then "fall", if the same, then "equal".
When computing trend for a topic, if less than 3 years has valid value and value
in other years are all NA
, then trendIndex, trendLevel and
maxminYear will be -999, and other cells are "less than 3y". If the numbers of a topic do not
change through years, then trendIndex will be 0, trendLevel and maxminYear will be -999, totalTrend
and detailTrend will be "almost same".
# NOT RUN {
set.seed(1)
topic <- sample(c("art", "economy", "law", "politics", "sociology"), 50, replace = TRUE)
set.seed(2)
year <- sample(2011: 2016, 50, replace = TRUE)
tr1 <- topic_trend(year, topic)
tr2 <- topic_trend(year, topic, zero = NA)
tr3 <- topic_trend(year, topic, relative=TRUE)
# }
Run the code above in your browser using DataLab