The type of the vector x is not restricted; it only must have
an as.character method and be sortable (by
sort.list).
Ordered factors differ from factors only in their class, but methods
and the model-fitting functions treat the two classes quite differently.
The encoding of the vector happens as follows. First all the values
in exclude are removed from levels. If x[i]
equals levels[j], then the i-th element of the result is
j. If no match is found for x[i] in levels
(which will happen for excluded values) then the i-th element
of the result is set to NA.
Normally the ‘levels’ used as an attribute of the result are
the reduced set of levels after removing those in exclude, but
this can be altered by supplying labels. This should either
be a set of new labels for the levels, or a character string, in
which case the levels are that character string with a sequence
number appended.
factor(x, exclude = NULL) applied to a factor without
NAs is a no-operation unless there are unused levels: in
that case, a factor with the reduced level set is returned. If
exclude is used, since R version 3.4.0, excluding non-existing
character levels is equivalent to excluding nothing, and when
exclude is a character vector, that is
applied to the levels of x.
Alternatively, exclude can be factor with the same level set as
x and will exclude the levels present in exclude.
The codes of a factor may contain NA. For a numeric
x, set exclude = NULL to make NA an extra
level (prints as <NA>); by default, this is the last level.
If NA is a level, the way to set a code to be missing (as
opposed to the code of the missing level) is to
use is.na on the left-hand-side of an assignment (as in
is.na(f)[i] <- TRUE; indexing inside is.na does not work).
Under those circumstances missing values are currently printed as
<NA>, i.e., identical to entries of level NA.
is.factor is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
Where levels is not supplied, unique is called.
Since factors typically have quite a small number of levels, for large
vectors x it is helpful to supply nmax as an upper bound
on the number of unique values.