{"Package":"kgrams","Title":"Classical k-gram Language Models","Version":"0.1.0","Authors@R":"\nperson(given = \"Valerio\",\nfamily = \"Gherardi\",\nrole = c(\"aut\", \"cre\"),\nemail = \"vgherard@sissa.it\",\ncomment = c(ORCID = \"0000-0002-8215-3013\"))","Description":"\nTools for training and evaluating k-gram language models in R,\nsupporting several probability smoothing techniques,\nperplexity computations, random text generation and more.","License":"GPL (>= 3)","Encoding":"UTF-8","LazyData":"true","RoxygenNote":"7.1.1","SystemRequirements":"C++11","LinkingTo":"Rcpp, RcppProgress","Imports":"Rcpp, rlang, methods, utils, RcppProgress (>= 0.1), Rdpack","Depends":"R (>= 3.5)","Suggests":"testthat (>= 3.0.0), covr, knitr, rmarkdown","Config/testthat/edition":"3","RdMacros":"Rdpack","VignetteBuilder":"knitr","NeedsCompilation":"yes","Packaged":"2021-02-11 11:31:26 UTC; vale","Author":"Valerio Gherardi [aut, cre] ()","Maintainer":"Valerio Gherardi ","Repository":"CRAN","Date/Publication":"2021-02-15 09:40:03 UTC","repoType":"cran","tarballUrl":"ftp://cran.r-project.org/pub/R/src/contrib/kgrams_0.1.0.tar.gz","jsonAuthors":[{"name":"Valerio Gherardi","email":"vgherard@sissa.it","maintainer":true}],"readme":"\n\n\n# kgrams\n\n\n\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)\n[![CircleCI build\nstatus](https://circleci.com/gh/vgherard/kgrams.svg?style=svg)](https://circleci.com/gh/vgherard/kgrams)\n[![AppVeyor build\nstatus](https://ci.appveyor.com/api/projects/status/github/vgherard/kgrams?branch=main&svg=true)](https://ci.appveyor.com/project/vgherard/kgrams)\n[![R-CMD-check](https://github.com/vgherard/kgrams/workflows/R-CMD-check/badge.svg)](https://github.com/vgherard/kgrams/actions)\n[![Codecov test\ncoverage](https://codecov.io/gh/vgherard/kgrams/branch/main/graph/badge.svg)](https://codecov.io/gh/vgherard/kgrams?branch=main)\n\n\n`kgrams` provides tools for training and evaluating \\(k\\)-gram language\nmodels, including several probability smoothing methods, perplexity\ncomputations, random text generation and more. It is based on an C++\nbackend (which can be used itself as a standalone library for \\(k\\)-gram\nbased NLP) which makes `kgrams` fast, coupled with an accessible R API\nwhich aims at streamlining the process of model building, and can be\nsuitable for small- and medium-sized NLP experiments, baseline model\nbuilding, and for pedagogical purposes.\n\n## Installation\n\nYou can install the development version from\n[GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"vgherard/kgrams\")\n```\n\n## Example\n\nThis example shows how to train a modified Kneser-Ney 4-gram model on\nShakespeare’s “Much Ado About Nothing” using `kgrams`.\n\n``` r\nlibrary(kgrams)\n# Get k-gram frequency counts from text, for k = 1:4\nfreqs <- kgram_freqs(kgrams::much_ado, N = 4)\n# Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3.\nmkn <- language_model(freqs, smoother = \"mkn\", D1 = 0.25, D2 = 0.5, D3 = 0.75)\n```\n\nWe can now use this `language_model` to compute sentence and word\ncontinuation probabilities:\n\n``` r\n# Compute sentence probabilities\nprobability(c(\"did he break out into tears ?\",\n \"we are predicting sentence probabilities .\"\n ), \n model = mkn\n )\n#> [1] 2.466856e-04 1.184963e-20\n# Compute word continuation probabilities\nprobability(c(\"tears\", \"pieces\") %|% \"did he break out into\", model = mkn)\n#> [1] 9.389238e-01 3.834498e-07\n```\n\nHere are some sentences sampled from the language model’s distribution\nat temperatures `t = c(1, 0.1, 10)`:\n\n``` r\n# Compute sentence probabilities\nset.seed(840)\nsample_sentences(model = mkn, n = 3, max_length = 10, t = 1)\n#> [1] \"i have studied eight or nine truly by your office [...] (truncated output)\"\n#> [2] \"ere you go : \" \n#> [3] \"don pedro welcome signior : \"\nsample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)\n#> [1] \"i will not be sworn but love may transform me [...] (truncated output)\" \n#> [2] \"i will not fail . \" \n#> [3] \"i will go to benedick and counsel him to fight [...] (truncated output)\"\nsample_sentences(model = mkn, n = 3, max_length = 10, t = 10)\n#> [1] \"july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)\" \n#> [2] \"lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)\"\n#> [3] \"born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)\"\n```\n\n## Getting Help\n\nFor further help, you can consult the reference page of the `kgrams`\n[website](https://vgherard.github.io/kgrams/) or [open an\nissue](https://github.com/vgherard/kgrams/issues) on the GitHub\nrepository of `kgrams`. A vignette is available on the website,\nillustrating the process of building language models in-depth.\n\n## Development\n\nThis project is in an early developmental stage, thorough tests of the\nalgorithms and unit tests still need to be implemented, many\ncomputations leave some room for optimization, the API may change,\n*etc.*. If you feel like contributing to `kgrams`, here’s is some useful\ninformation.\n\nDevelopment of `kgrams` takes place on its [GitHub\nrepository](https://github.com/vgherard/kgrams/). If you find a bug,\nplease let me know by [opening an\nissue](https://github.com/vgherard/kgrams/issues), and if you have any\nideas or proposals for improvement, please feel welcome to [send a pull\nrequest](https://github.com/vgherard/kgrams/pulls), or simply an e-mail\nat .","jobInfo":{"package":"kgrams","version":"0.1.0","parsingStatus":"success","parserVersion":1,"parsedAt":"2021-11-16T10:31:00+0000"}}

Description

Usage

Arguments

Value

Details

Examples