boilerpipeR (version 1.3.2)
Interface to the Boilerpipe Java Library
Description
Generic Extraction of main text content from HTML files; removal
of ads, sidebars and headers using the boilerpipe
Java library. The
extraction heuristics from boilerpipe show a robust performance for a wide
range of web site templates.