Learn R Programming

hackeRnews

The hackeRnews package is an R wrapper for the Hacker News API. Project for Advanced R classes at the Warsaw University of Technology.

Installation and basic setup

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("szymanskir/hackeRnews")

The Hacker News API is constructed in such a way that a single item is retrieved with a single request. This means that the retrieval of 200 items requires 200 separate API calls. Processing this amount of requests sequentially takes a significant amount of time. In order to solve this issue the hackeRnews package makes use of the future.apply package (https://github.com/HenrikBengtsson/future.apply) which allows to fetch all of the requested items in parallel. However, this requires some additional setup:

library(hackeRnews)
future::plan(future::multiprocess) # setup multiprocess futures, read more at https://github.com/HenrikBengtsson/future

Cheatsheet

Examples

Identify buzzwords in job offers of Hacker News

This example will show how to get recently used words in job story titles. Words will be visualized using word cloud to show which words were used the most.

library(hackeRnews)
library(tidyverse)
library(tidytext)
library(stringr)
library(dplyr)
library(ggwordcloud)

job_stories <- hackeRnews::get_latest_job_stories()

# get titles, normalize used words, remove non alphabet characters
title_words <- unlist(
  lapply(job_stories, FUN=function(job_story) { job_story$title }) %>% 
  str_replace_all('[^A-Z|a-z]', ' ') %>% 
  str_to_upper() %>% 
  str_replace_all('\\s\\s*', ' ') %>% 
  str_split(' ')
)

# remove stop words
data('stop_words')
df <- data.frame(word=title_words, stringsAsFactors=FALSE) %>% 
  filter(str_length(word) > 0 & !str_to_lower(word) %in% stop_words$word) %>% 
  count(word)

# add some random colors to beautify visualization
df <- as.data.frame(df) %>% 
  mutate(color=factor(sample(10,nrow(df), replace=TRUE)))


word_cloud <- ggplot(df, aes(label=word, size=n, color=color)) + 
  geom_text_wordcloud() + 
  scale_size_area(max_size = 15)

Check what’s trending on Hacker News

This example will fetch best stories and plot titles of most recently trending stories.

library(hackeRnews)
library(stringr)
library(ggplot2)

best_stories <- hackeRnews::get_best_stories(max_items=10)
df <- data.frame(
  title=unlist(lapply(best_stories, FUN=function(best_story) { str_wrap(best_story$title, 42) })),
  score=unlist(lapply(best_stories, FUN=function(best_story) { best_story$score })),
  stringsAsFactors=FALSE
)

df$title = factor(df$title, levels=df$title[order(df$score)])

best_stories_plot <- ggplot(df, aes(x = title, y = score, label=score)) +
  geom_col() +
  geom_label() +
  coord_flip() +
  xlab('Story title') +
  ylab('Score') +
  ggtitle('Best stories')

Sentiment analysis on two best stories from Hacker News

library(hackeRnews)
library(tidyverse)
library(tidytext)
library(dplyr)

best_stories <- hackeRnews::get_best_stories(2)

comments_by_story <- lapply(best_stories,
                   function(story){
                     get_comments(story)$text
                   }
)

# normalize used words, remove non alphabet characters
words_by_story <- lapply(comments_by_story,
                        function(comments){
                            unlist(
                            comments %>%
                              str_replace_all('[^A-Z|a-z]', ' ') %>%
                              str_to_lower() %>%
                              str_replace_all('\\s\\s*', ' ') %>%
                              str_split(' ')
                          )
                        }
)

# remove stop words and empty strings
data('stop_words')
dataframes <- lapply(1:length(words_by_story), function(story_id){
  data.frame(word=words_by_story[[story_id]], stringsAsFactors=FALSE, story_id=story_id) %>%
    filter(!word %in% stop_words$word & word != "")
  }
)

df <- bind_rows(dataframes)

# get sentiment for every story
library(textdata)
sentiment <- get_sentiments("afinn")

df %>%
  inner_join(sentiment, by='word') %>%
  mutate(story_title=sapply(story_id, function(id){best_stories[[id]]$title}) ) %>% 
  ggplot(aes(x=value, fill=as.factor(story_title))) +
    geom_density(alpha=0.5) +
    scale_x_continuous(breaks=c(-5, 0, 5),
                       labels=c("Negative", "Neutral", "Positive"),
                       limits=c(-6, 6)) +
    theme_minimal() +
    theme(axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          axis.text.y=element_blank(),
          axis.ticks.y=element_blank(),
          plot.title=element_text(hjust=0.5),
          legend.position = 'top') +
    labs(fill='Story') +
    ggtitle('Sentiment for 2 chosen stories')

Copy Link

Version

Install

install.packages('hackeRnews')

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Ryszard Szymanski

Last Published

December 13th, 2019

Functions in hackeRnews (0.1.0)

get_item_by_id

Get Hacker News by Id
get_new_stories

Hacker News newest stories
to_datetime_origin

Converts numeric value into POSIXct datetime type
get_content

Retrieves the response content
print.hn_user

Print for "hn_user" type objects
get_max_item_id

Hacker News item largest id
default_if_null

return specified variable or default value if specified variable is null
get_comments_with_root

Hacker News nested comments with root comment
get_best_stories_ids

Hacker News best stories ids
get_latest_ask_stories

Hacker News latest ask stories
.base_url

Returns the base url of the Hacker News API
get_latest_ask_stories_ids

Hacker News latest ask stories ids
get_comments

Hacker News nested comments
get_user_by_username

Get Hacker News user
get_latest_show_stories

Hacker News latest show stories
get_latest_show_stories_ids

Hacker News latest show stories ids
get_items_by_ids

Get Hacker News by ids
comment_to_dataframe_row

Converts comment to a dataframe row
is_hn_api_response

Checks whether the given object is of the class hn_api_response.
get_top_stories

Hacker News top stories
get_new_stories_ids

Hacker News newest stories ids
parse_json

Parses a json response
get_top_stories_ids

Hacker News top stories ids
.send_request

Sends a request to the specified url and retrieves it's content.
get_updates

Hacker News updated profiles
print.hn_item

Print for "hn_item" type objects
get_latest_job_stories

Hacker News latest job stories
get_best_stories

Hacker News best stories
is_hn_item

Checks whether the given object is of the class hn_item
get_latest_job_stories_ids

Hacker News latest job stories ids
is_hn_user

Checks whether the given object is of the class hn_user
trim_ids_list

Selects only limited number of ids
validate_hn_api_response

Checks if the given response is not empty and that it did not return an error http code.
validate_hn_user

Checks whether the given object is correctly defined hn_user class
validate_hn_item

Checks whether the given object is correctly defined hn_item class
create_hn_user

Creates an object representing Hacker News user
create_request_url

Creates a request url based on the given base url and passed paths. The json extensions is added automatically.
create_hn_api_response

Creates an object representing a response from the Hacker News API
assert_ids

Checks whether ids are correctly defined. If not throws an error
assert_max_items

Checks whether max_items is correctly defined. If not throws an error
add_json_extension

Adds the json extension to the given url
assert

Asserts a given expression and throws an error if it returns FALSE
create_hn_item

Creates an object representing Hacker News item
add_path

Adds the given path to the given url