Learn R Programming

chinese.misc (version 0.2.3)

topic_trend: Simple Rise or Fall Trend of Several Years

Description

When topic names and corresponding years are given, this function computes the rise and fall trend during the period by lm.

Usage

topic_trend(year, topic, relative = FALSE, zero = 0)

Arguments

year

a numeric vector of years for corresponding topics, if it is not numeric, the function will try to coerce. The years should be written in full-digit, that is, if they are 1998 and 2013, do not simply write 98 and 13. No NA is allowed. And, the number of unique years is at least 3, otherwise an error will be raised.

topic

a character vector of topics. If it is not character, the function will try to coerce. The length of topic and year should be the same. No NA is allowed.

relative

if FALSE (default), the numbers of topics is used. If TRUE, the percentage of a topic in a year against the total number of that year is used. Suppose this year we have 200 texts on art, and the total number of texts in this year is 1000, then the relative value is 200/1000 = 0.2 rather than the absolute number 200. Note: if to use relative value, NA of the amount of a topic will be automatically set to 0.

zero

this can only be 0 (default) or NA. Suppose we have 0 text on a certain topic, then you will make sure whether the amount is really 0, or the data of this topic in that year is missing. Set this argument to NA to make all 0 into NA.

Value

a list. The 1st element is trend info. The 2nd is a summary of amount of each topic in each year. If argument relative is TRUE, a 3rd element is returned, which is the relative value (percentage) of each topic in each year.

Details

The detail of trend info in the result is as follows:

  • (1) trendIndex: a regression with function lm is done for every topic with year as x and amount of topics as y. The value of trendIndex is the slope k in y = kx+b.

  • (2) trendLevel: the p value of k.

  • (3) totalTrend: if trendIndex is larger than 0, then "rise", otherwise "fall". If trendLevel is smaller than 0.05, than "significant rise" or "significant fall".

  • (4) maxminYear: if totalTrend is "rise" or "significant rise", then this value points out which year has the largest amount. If several years have the largest value, the most recent year is returned. If totalTrend is "fall" or "significant fall", the year has the smallest amount is returned.

  • (5) detailTrend: if totalTrend is "rise" or "significant rise", then the function will see whether the year has the largest amount is the last year, if it is, then "rise along", otherwise "rise and fall". If totalTrend is "fall" or "significant fall", the function will see whether the year has the smallest amount is the last year, if it is, then "fall along", otherwise "fall and rise".

  • (6) simpleTrend: it is simply whether the amount of the last year is larger than that of the first year. If yes, then "rise", if smaller, then "fall", if the same, then "equal".

When computing trend for a topic, if less than 3 years has valid value and value in other years are all NA, then trendIndex, trendLevel and maxminYear will be -999, and other cells are "less than 3y". If the numbers of a topic do not change through years, then trendIndex will be 0, trendLevel and maxminYear will be -999, totalTrend and detailTrend will be "almost same".

Examples

Run this code
# NOT RUN {
set.seed(1)
topic <- sample(c("art", "economy", "law", "politics", "sociology"), 50, replace = TRUE)
set.seed(2)
year <- sample(2011: 2016, 50, replace = TRUE)
tr1 <- topic_trend(year, topic)
tr2 <- topic_trend(year, topic, zero = NA)
tr3 <- topic_trend(year, topic, relative=TRUE)
# }

Run the code above in your browser using DataLab