The type of the vector x
is not restricted; it only must have
an as.character
method and be sortable (by
sort.list
).
Ordered factors differ from factors only in their class, but methods
and the model-fitting functions treat the two classes quite differently.
The encoding of the vector happens as follows. First all the values
in exclude
are removed from levels
. If x[i]
equals levels[j]
, then the i
-th element of the result is
j
. If no match is found for x[i]
in levels
(which will happen for excluded values) then the i
-th element
of the result is set to NA
.
Normally the ‘levels’ used as an attribute of the result are
the reduced set of levels after removing those in exclude
, but
this can be altered by supplying labels
. This should either
be a set of new labels for the levels, or a character string, in
which case the levels are that character string with a sequence
number appended.
factor(x, exclude = NULL)
applied to a factor without
NA
s is a no-operation unless there are unused levels: in
that case, a factor with the reduced level set is returned. If
exclude
is used, since R version 3.4.0, excluding non-existing
character levels is equivalent to excluding nothing, and when
exclude
is a character
vector, that is
applied to the levels of x
.
Alternatively, exclude
can be factor with the same level set as
x
and will exclude the levels present in exclude
.
The codes of a factor may contain NA
. For a numeric
x
, set exclude = NULL
to make NA
an extra
level (prints as <NA>
); by default, this is the last level.
If NA
is a level, the way to set a code to be missing (as
opposed to the code of the missing level) is to
use is.na
on the left-hand-side of an assignment (as in
is.na(f)[i] <- TRUE
; indexing inside is.na
does not work).
Under those circumstances missing values are currently printed as
<NA>
, i.e., identical to entries of level NA
.
is.factor
is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
Where levels
is not supplied, unique
is called.
Since factors typically have quite a small number of levels, for large
vectors x
it is helpful to supply nmax
as an upper bound
on the number of unique values.