h2o.tokenize

h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized
text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...).

R interface for 'H2O', the scalable open source machine learning
platform that offers parallelized implementations of many supervised and
unsupervised machine learning algorithms such as Generalized Linear
Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests,
Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes,
Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection,
Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Erin LeDell

R Interface for the 'H2O' Scalable Machine Learning Platform

Navdeep Gill

Spencer Aiello

Anqi Fu

Arno Candel

Cliff Click

Tom Kraljevic

Tomas Nykodym

Patrick Aboyoun

Michal Kurka

Michal Malohlava

Ludi Rehak

Eric Eckstrand

Brandon Hill

Sebastian Vidrio

Surekha Jadhawani

Amy Wang

Raymond Peck

Wendy Wong

Jan Gorecki

Matt Dowle

Yuan Tang

Lauren DiPerna

Tomas Fryda

Veronika Maurerova

H2O.ai 

h2o.tokenize function

<dl><dt>x</dt>
<dd>The column or columns whose strings to tokenize.</dd>
<dt>split</dt>
<dd>The regular expression to split on.</dd></dl>

Arguments

Tokenize String — h2o.tokenize

<dl>

<dt>x</dt>
<dd>The column or columns whose strings to tokenize.</dd>


<dt>split</dt>
<dd>The regular expression to split on.</dd>

</dl>

Tokenize String

h2o.tokenize: Tokenize String

Description

Usage

Value

Arguments

Examples