The Hmisc library contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, translating SAS datasets into R, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX code, recoding variables, and bootstrap repeated measures analysis. Most of these functions were written by F Harrell, but a few were collected from statlib and from s-news; other authors are indicated below. This collection of functions includes all of Harrell's submissions to statlib other than the functions in the rms and display libraries. A few of the functions do not have “Help” documentation.
To make Hmisc load silently, issue
options(Hverbose=FALSE)
before library(Hmisc)
.
Function Name | Purpose |
abs.error.pred | Computes various indexes of predictive accuracy based |
on absolute errors, for linear models | |
addMarginal | Add marginal observations over selected variables |
all.is.numeric | Check if character strings are legal numerics |
approxExtrap | Linear extrapolation |
aregImpute | Multiple imputation based on additive regression, |
bootstrapping, and predictive mean matching | |
areg.boot | Nonparametrically estimate transformations for both |
sides of a multiple additive regression, and | |
bootstrap these estimates and \(R^2\) | |
ballocation | Optimum sample allocations in 2-sample proportion test |
binconf | Exact confidence limits for a proportion and more accurate |
(narrower!) score stat.-based Wilson interval | |
(Rollin Brant, mod. FEH) | |
bootkm | Bootstrap Kaplan-Meier survival or quantile estimates |
bpower | Approximate power of 2-sided test for 2 proportions |
Includes bpower.sim for exact power by simulation | |
bpplot | Box-Percentile plot |
(Jeffrey Banfield, umsfjban@bill.oscs.montana.edu) | |
bpplotM | Chart extended box plots for multiple variables |
bsamsize | Sample size requirements for test of 2 proportions |
bystats | Statistics on a single variable by levels of >=1 factors |
bystats2 | 2-way statistics |
character.table | Shows numeric equivalents of all latin characters |
Useful for putting many special chars. in graph titles | |
(Pierre Joyet, pierre.joyet@bluewin.ch) | |
ciapower | Power of Cox interaction test |
cleanup.import | More compactly store variables in a data frame, and clean up |
problem data when e.g. Excel spreadsheet had a non- | |
numeric value in a numeric column | |
combine.levels | Combine infrequent levels of a categorical variable |
confbar | Draws confidence bars on an existing plot using multiple |
confidence levels distinguished using color or gray scale | |
contents | Print the contents (variables, labels, etc.) of a data frame |
cpower | Power of Cox 2-sample test allowing for noncompliance |
Cs | Vector of character strings from list of unquoted names |
csv.get | Enhanced importing of comma separated files labels |
cut2 | Like cut with better endpoint label construction and allows |
construction of quantile groups or groups with given n | |
datadensity | Snapshot graph of distributions of all variables in |
a data frame. For continuous variables uses scat1d. | |
dataRep | Quantify representation of new observations in a database |
ddmmmyy | SAS “date7” output format for a chron object |
deff | Kish design effect and intra-cluster correlation |
describe | Function to describe different classes of objects. |
Invoke by saying describe(object). It calls one of the | |
following: | |
describe.data.frame | Describe all variables in a data frame (generalization |
of SAS UNIVARIATE) | |
describe.default | Describe a variable (generalization of SAS UNIVARIATE) |
dotplot3 | A more flexible version of dotplot |
Dotplot | Enhancement of Trellis dotplot allowing for matrix |
x-var., auto generation of Key function, superposition | |
drawPlot | Simple mouse-driven drawing program, including a function |
for fitting Bezier curves | |
Ecdf | Empirical cumulative distribution function plot |
errbar | Plot with error bars (Charles Geyer, U. Chi., mod FEH) |
event.chart | Plot general event charts (Jack Lee, jjlee@mdanderson.org, |
Ken Hess, Joel Dubin; Am Statistician 54:63-70,2000) | |
event.history | Event history chart with time-dependent cov. status |
(Joel Dubin, jdubin@uwaterloo.ca) | |
find.matches | Find matches (with tolerances) between columns of 2 matrices |
first.word | Find the first word in an R expression (R Heiberger) |
fit.mult.impute | Fit most regression models over multiple transcan imputations, |
compute imputation-adjusted variances and avg. betas | |
format.df | Format a matrix or data frame with much user control |
(R Heiberger and FE Harrell) | |
ftupwr | Power of 2-sample binomial test using Fleiss, Tytun, Ury |
ftuss | Sample size for 2-sample binomial test using " " " " |
(Both by Dan Heitjan, dheitjan@biostats.hmc.psu.edu) | |
gbayes | Bayesian posterior and predictive distributions when both |
the prior and the likelihood are Gaussian | |
getHdata | Fetch and list datasets on our web site |
hdquantile | Harrell-Davis nonparametric quantile estimator with s.e. |
histbackback | Back-to-back histograms (Pat Burns, Salomon Smith |
Barney, London, pburns@dorado.sbi.com) | |
hist.data.frame | Matrix of histograms for all numeric vars. in data frame |
Use hist.data.frame(data.frame.name) | |
histSpike | Add high-resolution spike histograms or density estimates |
to an existing plot | |
hoeffd | Hoeffding's D test (omnibus test of independence of X and Y) |
impute | Impute missing data (generic method) |
interaction | More flexible version of builtin function |
is.present | Tests for non-blank character values or non-NA numeric values |
james.stein | James-Stein shrinkage estimates of cell means from raw data |
labcurve | Optimally label a set of curves that have been drawn on |
an existing plot, on the basis of gaps between curves. | |
Also position legends automatically at emptiest rectangle. | |
label | Set or fetch a label for an R-object |
Lag | Lag a vector, padding on the left with NA or '' |
latex | Convert an R object to LaTeX (R Heiberger & FE Harrell) |
list.tree | Pretty-print the structure of any data object |
(Alan Zaslavsky, zaslavsk@hcp.med.harvard.edu) | |
Load | Enhancement of load |
mask | 8-bit logical representation of a short integer value |
(Rick Becker) | |
matchCases | Match each case on one continuous variable |
matxv | Fast matrix * vector, handling intercept(s) and NAs |
mgp.axis | Version of axis() that uses appropriate mgp from |
mgp.axis.labels and gets around bug in axis(2, ...) | |
that causes it to assume las=1 | |
mgp.axis.labels | Used by survplot and plot in rms library (and other |
functions in the future) so that different spacing | |
between tick marks and axis tick mark labels may be | |
specified for x- and y-axes. | |
Use mgp.axis.labels('default') to set defaults. | |
Users can set values manually using | |
mgp.axis.labels(x,y) where x and y are 2nd value of | |
par('mgp') to use. Use mgp.axis.labels(type=w) to | |
retrieve values, where w='x', 'y', 'x and y', 'xy', | |
to get 3 mgp values (first 3 types) or 2 mgp.axis.labels. | |
minor.tick | Add minor tick marks to an existing plot |
mtitle | Add outer titles and subtitles to a multiple plot layout |
multLines | Draw multiple vertical lines at each x |
in a line plot | |
%nin% | Opposite of %in% |
nobsY | Compute no. non-NA observations for left hand formula side |
nomiss | Return a matrix after excluding any row with an NA |
panel.bpplot | Panel function for trellis bwplot - box-percentile plots |
panel.plsmo | Panel function for trellis xyplot - uses plsmo |
pBlock | Block variables for certain lattice charts |
pc1 | Compute first prin. component and get coefficients on |
original scale of variables | |
plotCorrPrecision | Plot precision of estimate of correlation coefficient |
plsmo | Plot smoothed x vs. y with labeling and exclusion of NAs |
Also allows a grouping variable and plots unsmoothed data | |
popower | Power and sample size calculations for ordinal responses |
(two treatments, proportional odds model) | |
prn | prn(expression) does print(expression) but titles the |
output with 'expression'. Do prn(expression,txt) to add | |
a heading (‘txt’) before the ‘expression’ title | |
pstamp | Stamp a plot with date in lower right corner (pstamp()) |
Add ,pwd=T and/or ,time=T to add current directory | |
name or time | |
Put additional text for label as first argument, e.g. | |
pstamp('Figure 1') will draw 'Figure 1 date' | |
putKey | Different way to use key() |
putKeyEmpty | Put key at most empty part of existing plot |
rcorr | Pearson or Spearman correlation matrix with pairwise deletion |
of missing data | |
rcorr.cens | Somers' Dxy rank correlation with censored data |
rcorrp.cens | Assess difference in concordance for paired predictors |
rcspline.eval | Evaluate restricted cubic spline design matrix |
rcspline.plot | Plot spline fit with nonparametric smooth and grouped estimates |
rcspline.restate | Restate restricted cubic spline in unrestricted form, and |
create TeX expression to print the fitted function | |
reShape | Reshape a matrix into 3 vectors, reshape serial data |
rm.boot | Bootstrap spline fit to repeated measurements model, |
with simultaneous confidence region - least | |
squares using spline function in time | |
rMultinom | Generate multinomial random variables with varying prob. |
samplesize.bin | Sample size for 2-sample binomial problem |
(Rick Chappell, chappell@stat.wisc.edu) | |
sas.get | Convert SAS dataset to S data frame |
sasxport.get | Enhanced importing of SAS transport dataset in R |
Save | Enhancement of save |
scat1d | Add 1-dimensional scatterplot to an axis of an existing plot |
(like bar-codes, FEH/Martin Maechler, | |
maechler@stat.math.ethz.ch/Jens Oehlschlaegel-Akiyoshi, | |
oehl@psyres-stuttgart.de) | |
score.binary | Construct a score from a series of binary variables or |
expressions | |
sedit | A set of character handling functions written entirely |
in R. sedit() does much of what the UNIX sed | |
program does. Other functions included are | |
substring.location, substring<-, replace.string.wild, | |
and functions to check if a string is numeric or | |
contains only the digits 0-9 | |
setTrellis | Set Trellis graphics to use blank conditioning panel strips, |
line thickness 1 for dot plot reference lines: | |
setTrellis(); 3 optional arguments | |
show.col | Show colors corresponding to col=0,1,...,99 |
show.pch | Show all plotting characters specified by pch=. |
Just type show.pch() to draw the table on the | |
current device. | |
showPsfrag | Use LaTeX to compile, and dvips and ghostview to |
display a postscript graphic containing psfrag strings | |
solvet | Version of solve with argument tol passed to qr |
somers2 | Somers' rank correlation and c-index for binary y |
spearman | Spearman rank correlation coefficient spearman(x,y) |
spearman.test | Spearman 1 d.f. and 2 d.f. rank correlation test |
spearman2 | Spearman multiple d.f. \(\rho^2\), adjusted \(\rho^2\), Wilcoxon-Kruskal- |
Wallis test, for multiple predictors | |
spower | Simulate power of 2-sample test for survival under |
complex conditions | |
Also contains the Gompertz2,Weibull2,Lognorm2 functions. | |
spss.get | Enhanced importing of SPSS files using read.spss function |
src | src(name) = source("name.s") with memory |
store | store an object permanently (easy interface to assign function) |
strmatch | Shortest unique identifier match |
(Terry Therneau, therneau@mayo.edu) | |
subset | More easily subset a data frame |
substi | Substitute one var for another when observations NA |
summarize | Generate a data frame containing stratified summary |
statistics. Useful for passing to trellis. | |
summary.formula | General table making and plotting functions for summarizing |
data | |
summaryD | Summarizing using user-provided formula and dotchart3 |
summaryM | Replacement for summary.formula(..., method='reverse') |
summaryP | Multi-panel dot chart for summarizing proportions |
summaryS | Summarize multiple response variables for multi-panel |
dot chart or scatterplot | |
summaryRc | Summary for continuous variables using lowess |
symbol.freq | X-Y Frequency plot with circles' area prop. to frequency |
sys | Execute unix() or dos() depending on what's running |
tabulr | Front-end to tabular function in the tables package |
tex | Enclose a string with the correct syntax for using |
with the LaTeX psfrag package, for postscript graphics | |
transace | ace() packaged for easily automatically transforming all |
variables in a matrix | |
transcan | automatic transformation and imputation of NAs for a |
series of predictor variables | |
trap.rule | Area under curve defined by arbitrary x and y vectors, |
using trapezoidal rule | |
trellis.strip.blank | To make the strip titles in trellis more visible, you can |
make the backgrounds blank by saying trellis.strip.blank(). | |
Use before opening the graphics device. | |
t.test.cluster | 2-sample t-test for cluster-randomized observations |
uncbind | Form individual variables from a matrix |
upData | Update a data frame (change names, labels, remove vars, etc.) |
units | Set or fetch "units" attribute - units of measurement for var. |
varclus | Graph hierarchical clustering of variables using squared |
Pearson or Spearman correlations or Hoeffding D as similarities | |
Also includes the naclus function for examining similarities in | |
patterns of missing values across variables. | |
wtd.mean | |
wtd.var | |
wtd.quantile | |
wtd.Ecdf | |
wtd.table | |
wtd.rank | |
wtd.loess.noiter | |
num.denom.setup | Set of function for obtaining weighted estimates |
xy.group | Compute mean x vs. function of y by groups of x |
xYplot | Like trellis xyplot but supports error bars and multiple |
response variables that are connected as separate lines | |
ynbind | Combine a series of yes/no true/false present/absent variables into a matrix |
zoom | Zoom in on any graphical display |
GENERAL DISCLAIMER This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
In short: You may use it any way you like, as long as you don't charge money for it, remove this notice, or hold anyone liable for its results. Also, please acknowledge the source and communicate changes to the author.
If this software is used is work presented for publication, kindly reference it using for example: Harrell FE (2014): Hmisc: A package of miscellaneous R functions. Programs available from http://biostat.mc.vanderbilt.edu/Hmisc. Be sure to reference R itself and other libraries used.
See Alzola CF, Harrell FE (2004): An Introduction to S and the Hmisc and Design Libraries at http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf for extensive documentation and examples for the Hmisc package.