Ready-made models for 65 languages trained on 101 treebanks from https://universaldependencies.org/ are provided to you.
Some of these models were provided by the UDPipe community. Other models were build using this R package.
You can either download these models manually in order to use it for annotation purposes
or use udpipe_download_model
to download these models for a specific language of choice. You have the following options:
udpipe_download_model(
language = c("afrikaans-afribooms", "ancient_greek-perseus", "ancient_greek-proiel",
"arabic-padt", "armenian-armtdp", "basque-bdt", "belarusian-hse", "bulgarian-btb",
"buryat-bdt", "catalan-ancora", "chinese-gsd", "chinese-gsdsimp",
"classical_chinese-kyoto", "coptic-scriptorium", "croatian-set", "czech-cac",
"czech-cltt", "czech-fictree", "czech-pdt", "danish-ddt", "dutch-alpino",
"dutch-lassysmall", "english-ewt", "english-gum", "english-lines", "english-partut",
"estonian-edt", "estonian-ewt", "finnish-ftb", "finnish-tdt", "french-gsd",
"french-partut", "french-sequoia", "french-spoken", "galician-ctg",
"galician-treegal", "german-gsd", "german-hdt", "gothic-proiel", "greek-gdt",
"hebrew-htb", "hindi-hdtb", "hungarian-szeged", "indonesian-gsd", "irish-idt",
"italian-isdt", "italian-partut", "italian-postwita", "italian-twittiro",
"italian-vit", "japanese-gsd", "kazakh-ktb", "korean-gsd", "korean-kaist",
"kurmanji-mg", "latin-ittb", "latin-perseus", "latin-proiel", "latvian-lvtb",
"lithuanian-alksnis", "lithuanian-hse", "maltese-mudt", "marathi-ufal",
"north_sami-giella", "norwegian-bokmaal", "norwegian-nynorsk",
"norwegian-nynorsklia", "old_church_slavonic-proiel", "old_french-srcmf",
"old_russian-torot", "persian-seraji", "polish-lfg", "polish-pdb", "polish-sz",
"portuguese-bosque", "portuguese-br", "portuguese-gsd", "romanian-nonstandard",
"romanian-rrt", "russian-gsd", "russian-syntagrus", "russian-taiga", "sanskrit-ufal",
"scottish_gaelic-arcosg", "serbian-set", "slovak-snk", "slovenian-ssj",
"slovenian-sst", "spanish-ancora", "spanish-gsd", "swedish-lines",
"swedish-talbanken", "tamil-ttb", "telugu-mtg", "turkish-imst", "ukrainian-iu",
"upper_sorbian-ufal", "urdu-udtb", "uyghur-udt", "vietnamese-vtb", "wolof-wtb"),
model_dir = getwd(),
udpipe_model_repo = c("jwijffels/udpipe.models.ud.2.5",
"jwijffels/udpipe.models.ud.2.4", "jwijffels/udpipe.models.ud.2.3",
"jwijffels/udpipe.models.ud.2.0", "jwijffels/udpipe.models.conll18.baseline",
"bnosac/udpipe.models.ud"),
overwrite = TRUE,
...
)
A data.frame with 1 row and the following columns:
language: The language as provided by the input parameter language
file_model: The path to the file on disk where the model was downloaded to
url: The URL where the model was downloaded from
download_failed: A logical indicating if the download has failed or not due to internet connectivity issues
download_message: A character string with the error message in case the downloading of the model failed
a character string with a Universal Dependencies treebank which was used to build the model. Possible values are:
afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse,
bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, chinese-gsdsimp, coptic-scriptorium, croatian-set, czech-cac, czech-cltt,
czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines,
english-partut, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken,
galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged,
indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, japanese-gsd, kazakh-ktb, korean-gsd,
korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-hse, maltese-mudt,
marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel,
old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd,
romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, sanskrit-ufal, scottish_gaelic-arcosg, serbian-set,
slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb,
telugu-mtg, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb
Each language should have a treebank extension (e.g. english-ewt, russian-syntagrus, dutch-alpino, ...). If you do not provide a treebank extension (e.g. only english, russian, dutch), the function will use the default treebank of that language as was used in Universal Dependencies up to version 2.1.
a path where the model will be downloaded to. Defaults to the current working directory
location where the models will be downloaded from.
Either 'jwijffels/udpipe.models.ud.2.5', 'jwijffels/udpipe.models.ud.2.4', 'jwijffels/udpipe.models.ud.2.3', 'jwijffels/udpipe.models.ud.2.0', 'jwijffels/udpipe.models.conll18.baseline' or 'bnosac/udpipe.models.ud'.
Defaults to 'jwijffels/udpipe.models.ud.2.5'.
'bnosac/udpipe.models.ud' contains models mainly released under the CC-BY-SA license constructed on Universal Dependencies 2.1 data, and some models released under the GPL-3 and LGPL-LR license
'jwijffels/udpipe.models.ud.2.5' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.5 data
'jwijffels/udpipe.models.ud.2.4' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.4 data
'jwijffels/udpipe.models.ud.2.3' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.3 data
'jwijffels/udpipe.models.ud.2.0' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.0 data
'jwijffels/udpipe.models.conll18.baseline' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.2 data for the 2018 conll shared task
See the Details section for further information on which languages are available in each of these repositories.
logical indicating to overwrite the file if the file was already downloaded. Defaults to TRUE
indicating
it will download the model and overwrite the file if the file already existed. If set to FALSE
,
the model will only be downloaded if it does not exist on disk yet in the model_dir
folder.
currently not used
The function allows you to download the following language models based on your setting of argument udpipe_model_repo
:
'jwijffels/udpipe.models.ud.2.5': https://github.com/jwijffels/udpipe.models.ud.2.5
UDPipe models constructed on data from Universal Dependencies 2.5
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, catalan-ancora, chinese-gsd, chinese-gsdsimp, classical_chinese-kyoto, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, italian-vit, japanese-gsd, korean-gsd, korean-kaist, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, persian-seraji, polish-lfg, polish-pdb, portuguese-bosque, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, urdu-udtb, uyghur-udt, vietnamese-vtb, wolof-wtb
license: CC-BY-SA-NC
'jwijffels/udpipe.models.ud.2.4': https://github.com/jwijffels/udpipe.models.ud.2.4
UDPipe models constructed on data from Universal Dependencies 2.4
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, catalan-ancora, chinese-gsd, classical_chinese-kyoto, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-vit, japanese-gsd, korean-gsd, korean-kaist, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, persian-seraji, polish-lfg, polish-pdb, portuguese-bosque, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, urdu-udtb, uyghur-udt, vietnamese-vtb, wolof-wtb
license: CC-BY-SA-NC
'jwijffels/udpipe.models.ud.2.3': https://github.com/jwijffels/udpipe.models.ud.2.3
UDPipe models constructed on data from Universal Dependencies 2.3
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, catalan-ancora, chinese-gsd, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, japanese-gsd, korean-gsd, korean-kaist, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, urdu-udtb, uyghur-udt, vietnamese-vtb
license: CC-BY-SA-NC
'jwijffels/udpipe.models.ud.2.0': https://github.com/jwijffels/udpipe.models.ud.2.0
UDPipe models constructed on data from Universal Dependencies 2.0
languages-treebanks: ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese
license: CC-BY-SA-NC
'jwijffels/udpipe.models.conll18.baseline': https://github.com/jwijffels/udpipe.models.conll18.baseline
UDPipe models constructed on data from Universal Dependencies 2.2
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, croatian-set, czech-cac, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-postwita, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, mixed, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, romanian-rrt, russian-syntagrus, russian-taiga, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, swedish-lines, swedish-talbanken, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb
license: CC-BY-SA-NC
'bnosac/udpipe.models.ud': https://github.com/bnosac/udpipe.models.ud
UDPipe models constructed on data from Universal Dependencies 2.1
This repository contains models build with this R package on open data from Universal Dependencies 2.1 which allows for commercial usage. The license of these models is mostly CC-BY-SA. Visit that github repository for details on the licenses of the language of your choice. And contact www.bnosac.be if you need support on these models or require models tuned to your needs.
languages-treebanks: afrikaans, croatian, czech-cac, dutch, english, finnish, french-sequoia, irish, norwegian-bokmaal, persian, polish, portuguese, romanian, serbian, slovak, spanish-ancora, swedish
license: license is treebank-specific but mainly CC-BY-SA and GPL-3 and LGPL-LR
If you need to train models yourself for commercial purposes or if you want to improve models, you can easily do this with udpipe_train
which is explained in detail in the package vignette.
Note that when you download these models, you comply to the license of your specific language model.
https://ufal.mff.cuni.cz/udpipe, https://github.com/jwijffels/udpipe.models.ud.2.5, https://github.com/jwijffels/udpipe.models.ud.2.4, https://github.com/jwijffels/udpipe.models.ud.2.3, https://github.com/jwijffels/udpipe.models.conll18.baseline https://github.com/jwijffels/udpipe.models.ud.2.0, https://github.com/bnosac/udpipe.models.ud
udpipe_load_model
if (FALSE) {
x <- udpipe_download_model(language = "dutch-alpino")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english-partut")
x <- udpipe_download_model(language = "english-ewt")
x <- udpipe_download_model(language = "german-gsd")
x <- udpipe_download_model(language = "spanish-gsd")
x <- udpipe_download_model(language = "spanish-gsd", overwrite = FALSE)
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.5")
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.4")
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.3")
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.0")
x <- udpipe_download_model(language = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "afrikaans", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "spanish-ancora",
udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch-ud-2.1-20180111.udpipe",
udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "english",
udpipe_model_repo = "jwijffels/udpipe.models.conll18.baseline")
}
x <- udpipe_download_model(language = "sanskrit",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.0",
model_dir = tempdir())
x
## cleanup for CRAN
if(file.exists(x$file_model)) file.remove(x$file_model)
Run the code above in your browser using DataLab