International Organization for Standardization (ISO) codes for the representation of languages. Consists of four parts, with more parts work in progress. ISO 639-1 consists of 185 two-letter (alpha-2) codes used to identify the world's major languages. ISO 639-2 has three-letter (alpha-3) codes for 485 languages. ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. ISO 639-5 defines alpha-3 codes for language families.
ISO_639_2
ISO_639_3
ISO_639_3_Retirements
ISO_639_5
ISO_639_2
is a character data frame with variables
Alpha_3_B
and Alpha_3_T
(the ISO 639-2 bibliographic and
terminological codes), Alpha_2
(the corresponding ISO 639-1
alpha-2 code if available), and Name
(the English name of the
language).
ISO_639_3
is a data frame with the following variables:
Id
:a character vector with the ISO 639-3 3-letter (alpha-3) identifiers.
Part2B
:a character vector with the equivalent ISO 639-2 B-code identifiers of the bibliographic applications code set (if existent).
Part2T
:a character vector with the equivalent ISO 639-2 T-code identifiers of the terminology applications code set (if existent).
Part1
:a character vector with the equivalent ISO 639-1 2-letter (alpha-2) identifiers (if existent).
Scope
:a factor with levels "I"
(Individual),
"M"
(Macrolanguage) and "S"
(Special).
Type
:a factor with levels "L"
(Living
languages), "E"
(Extinct languages), "A"
(Ancient
languages), "H"
(Historic languages), "C"
(Constructed languages), and "S"
(Special).
Name
:a character vector with the reference language names.
Comment
:a character vector with a comment relating to one or more of the other variables.
Family
:a character vector with the generic English names of the languages' family or macrolanguage.
eng
:a character vector with the language names in English.
fra
:a character vector with the language names in French (if available).
spa
:a character vector with the language names in Spanish (if available).
zho
:a character vector with the language names in Chinese (if available).
rus
:a character vector with the language names in Russian (if available).
deu
:a character vector with the language names in German (if available).
Variables Family
and eng
to deu
are extracted
from the Wikipedia ISO 639-3 language codes pages.
ISO_639_3_Retirements
is a data frame giving the languages
retired from ISO 639-3, with variables:
Id
:a character vector with the retired codes
Ret_Reason
:a factor with levels "C"
(change),
"D"
(duplicate), "N"
(non-existent), "S"
(split), and "M"
(merge).
Change_To
:a character vector which in the cases of C, D, and M gives the identifier to which all instances of the Id should be changed.
Ret_Remedy
:a character vector with instructions for updating an instance of the retired (split) identifier.
Effective
:a Date
object giving the date
the retirement became effective.
ISO_639_5
is a data frame with the following variables:
Id
a character vector with the 3-letter (alpha-3) ISO 639-5 identifiers.
English_Name
the family names in English.
French_Name
the family names in French.
Part2
a factor indicating how the family relates to
639-2, with levels "g"
(group: consists of several related
languages), "r"
(rest group: a group of several related
languages, from which some specific languages have been excluded),
or ""
(no 639-2 code).
Hierarchy
an indication of which other language families or groups the current language family or group is a member of (given as 639-5 ids separated by : ).
While most languages are given one code by the ISO 639-2 standard, twenty-two of the languages described have two three-letter codes, a “bibliographic” code (ISO 639-2/B, B-code), which is derived from the English name for the language and was a necessary legacy feature, and a “terminological” code (ISO 639-2/T, T-code), which is derived from the native name for the language. The range qaa to qtz is reserved for local use.
ISO 639-3 is a superset of ISO 639-1 and of the individual languages in ISO 639-2. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections, whereas Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes.
ISO 639-2 contains codes for some individual and group languages and so any code in it is either in 639-3 or 639-5; 639-5 families may be missing from 639-2.