Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with
dataset_imdb()
, each wire is encoded as a sequence of word indexes (same
conventions).
dataset_reuters(
path = "reuters.npz",
num_words = NULL,
skip_top = 0L,
maxlen = NULL,
test_split = 0.2,
seed = 113L,
start_char = 1L,
oov_char = 2L,
index_from = 3L
)dataset_reuters_word_index(path = "reuters_word_index.pkl")
Lists of training and test data: train$x, train$y, test$x, test$y
with same format as dataset_imdb()
. The dataset_reuters_word_index()
function returns a list where the names are words and the values are
integer. e.g. word_index[["giraffe"]]
might return 1234
.
Where to cache the data (relative to ~/.keras/dataset
).
Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept
Skip the top N most frequently occuring words (which may not be informative).
Truncate sequences after this length.
Fraction of the dataset to be used as test data.
Random seed for sample shuffling.
The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
words that were cut out because of the num_words
or
skip_top
limit will be replaced with this character.
index actual words with this index and higher.
Other datasets:
dataset_boston_housing()
,
dataset_cifar100()
,
dataset_cifar10()
,
dataset_fashion_mnist()
,
dataset_imdb()
,
dataset_mnist()