Learn R Programming

ISOcodes (version 2024.02.12)

ISO_639: ISO 639 Language Codes

Description

International Organization for Standardization (ISO) codes for the representation of languages. Consists of four parts, with more parts work in progress. ISO 639-1 consists of 185 two-letter (alpha-2) codes used to identify the world's major languages. ISO 639-2 has three-letter (alpha-3) codes for 485 languages. ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. ISO 639-5 defines alpha-3 codes for language families.

Usage

ISO_639_2
ISO_639_3
ISO_639_3_Retirements
ISO_639_5

Arguments

Format

ISO_639_2 is a character data frame with variables Alpha_3_B and Alpha_3_T (the ISO 639-2 bibliographic and terminological codes), Alpha_2 (the corresponding ISO 639-1 alpha-2 code if available), and Name (the English name of the language).

ISO_639_3 is a data frame with the following variables:

Id:

a character vector with the ISO 639-3 3-letter (alpha-3) identifiers.

Part2B:

a character vector with the equivalent ISO 639-2 B-code identifiers of the bibliographic applications code set (if existent).

Part2T:

a character vector with the equivalent ISO 639-2 T-code identifiers of the terminology applications code set (if existent).

Part1:

a character vector with the equivalent ISO 639-1 2-letter (alpha-2) identifiers (if existent).

Scope:

a factor with levels "I" (Individual), "M" (Macrolanguage) and "S" (Special).

Type:

a factor with levels "L" (Living languages), "E" (Extinct languages), "A" (Ancient languages), "H" (Historic languages), "C" (Constructed languages), and "S" (Special).

Name:

a character vector with the reference language names.

Comment:

a character vector with a comment relating to one or more of the other variables.

Family:

a character vector with the generic English names of the languages' family or macrolanguage.

eng:

a character vector with the language names in English.

fra:

a character vector with the language names in French (if available).

spa:

a character vector with the language names in Spanish (if available).

zho:

a character vector with the language names in Chinese (if available).

rus:

a character vector with the language names in Russian (if available).

deu:

a character vector with the language names in German (if available).

Variables Family and eng to deu are extracted from the Wikipedia ISO 639-3 language codes pages.

ISO_639_3_Retirements is a data frame giving the languages retired from ISO 639-3, with variables:

Id:

a character vector with the retired codes

Ret_Reason:

a factor with levels "C" (change), "D" (duplicate), "N" (non-existent), "S" (split), and "M" (merge).

Change_To:

a character vector which in the cases of C, D, and M gives the identifier to which all instances of the Id should be changed.

Ret_Remedy:

a character vector with instructions for updating an instance of the retired (split) identifier.

Effective:

a Date object giving the date the retirement became effective.

ISO_639_5 is a data frame with the following variables:

Id

a character vector with the 3-letter (alpha-3) ISO 639-5 identifiers.

English_Name

the family names in English.

French_Name

the family names in French.

Part2

a factor indicating how the family relates to 639-2, with levels "g" (group: consists of several related languages), "r" (rest group: a group of several related languages, from which some specific languages have been excluded), or "" (no 639-2 code).

Hierarchy

an indication of which other language families or groups the current language family or group is a member of (given as 639-5 ids separated by : ).

Details

While most languages are given one code by the ISO 639-2 standard, twenty-two of the languages described have two three-letter codes, a “bibliographic” code (ISO 639-2/B, B-code), which is derived from the English name for the language and was a necessary legacy feature, and a “terminological” code (ISO 639-2/T, T-code), which is derived from the native name for the language. The range qaa to qtz is reserved for local use.

ISO 639-3 is a superset of ISO 639-1 and of the individual languages in ISO 639-2. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections, whereas Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes.

ISO 639-2 contains codes for some individual and group languages and so any code in it is either in 639-3 or 639-5; 639-5 families may be missing from 639-2.

References

https://en.wikipedia.org/wiki/ISO_639