Learn R Programming

Rcrawler (version 0.1.5)

LinkNormalization: Link Normalization

Description

A function that take a URL _charachter_ as input, and transforms it into a canonical form.

Usage

LinkNormalization(links, current)

Arguments

links

character, the URL to Normalize.

current

character, The URL of the current page source of the link.

Value

return the simhash as a nmeric value

Details

This funcion call an external java class

Examples

Run this code
# NOT RUN {
# Normalize a set of links

links<-c("http://www.twitter.com/share?url=http://glofile.com/page.html",
         "/finance/banks/page-2017.html",
         "./section/subscription.php",
         "//section/",
         "www.glofile.com/home/",
         "glofile.com/sport/foot/page.html",
         "sub.glofile.com/index.php",
         "http://glofile.com/page.html#1"
                   )

links<-LinkNormalization(links,"http://glofile.com" )


# }

Run the code above in your browser using DataLab