pre.install: Update a source and/or installed package from a task package

Description

See mvbutils.packaging.tools before reading or experimenting!

pre.install creates a "source package" from a "task package", ready for first-time installation using install.pkg. You must have called maintain.packages( mypack) at some point in your R session before pre.install( mypack) etc.

patch.install is normally sufficient for subsequent maintenance of an already-installed package (ie you rarely need call install.pkg again). Again, maintain.packages must have been called earlier. It's also expected that the package has been loaded via library() before patch.install is called, but this may not be required. patch.install first calls pre.install and then modifies the installed package accordingly on-the-fly, so there is no need to re-load or re-build or re-install. patch.install also updates the help system with immediate effect, i.e. during the current R session. You don't need to call patch.install after every little maintenance change to your package during an R session; it's usually only necessary when (i) you want updated help, or (ii) you want to make the changes "permanent". However, it's not a problem to call patch.install quite often. patch.installed is a synonym for patch.install.

It's possible to tweak the source-package-creation process, and this is what 'pre.install.hook..." is for; see Details and section on Overriding defaults below.

spkg is a rarely-needed utility that returns the folder of source package created by pre.install.

Usage

# 95% of the time you just need:
 # pre.install( pkg)
 # patch.install( pkg)
 # Your own hook: pre.install.hook.<>( default.list, <>, ...)
 pre.install( pkg, character.only=FALSE, force.all.docs=FALSE,
     dir.above.source="+", autoversion=getOption("mvb.autoversion", TRUE),
     R.target.version=getRversion(), ...)
 patch.installed( pkg, character.only=FALSE, force.all.docs=FALSE,
     help.patch=TRUE, DLLs.only=FALSE,
     update.installed.cache=getOption("mvb.update.installed.cache", TRUE),
     pre.inst=!DLLs.only, dir.above.source="+", R.target.version=getRversion(),
     autoversion=getOption("mvb.autoversion", TRUE))
 patch.install(...) # actually, args are exactly as for 'patch.installed'
 spkg( pkg)

Arguments

pkg

package name. Either quoted or unquoted is OK; unquoted will be treated as quoted unless character.only=TRUE. Here and in most other places in mvbutils, you can also specify an actual in-memory-task-package object such as ..mypack.

character.only

Default FALSE, which allows unquoted package names. You can set it to TRUE, or just set e.g. char="my@funny@name", which will trump any use of pkg.

force.all.docs

normally just create help files for objects whose documentation has changed (which will always be generated, regardless of force.all.docs). If TRUE, then recreate help for all documented objects. Can also be a character vector of specific docfile names (usually function names, but can be the names of the Rd file, without path or the Rd extension), in which case those Rd files will be regenerated.

help.patch

if TRUE, patch the help of the installed package

DLLs.only

just synchronize the DLLs and don't bother with other steps (see Compiled code)

default.list

list of various things-- see under "Overriding..." below

...

arguments to pass to your pre.install.hook.XXX function, usually if you want to be able to build different "flavours" of a package (e.g. a trial version vs. a production version, or versions with and without enormous datasets included). In patch.install, ... is just shorthand for the arg list of patch.installed.

update.installed.cache

If TRUE, then clear the installed-package cache, so that things like installed.packages work OK. The only reason to set to FALSE could be speed, if you have lots of packages; feedback appreciated. Default is TRUE unless you have set options( mvb.update.installed.cache=FALSE).

pre.inst

?run pre.install first? Default is TRUE unless DLLs.only=TRUE; leave it unless you know better.

autoversion

if TRUE, try to automatically increment the version number in the source (and installed, if patch.install) packages; this means you don't have to change the DESCRIPTION object or file. However, if you have changed the DESCRIPTION object or file's version to something beyond the source/installed version, the larger number will take precedence; hence, you can force a "major" revision by manually increasing the 1st or 2nd component of the version in Description. Only versions with at least 3 levels will be updated:so 1.0.0 will go to 1.0.1, 1.0.0.0 will go to 1.0.0.1, but 1.0 will stay the same. Default is TRUE unless you have set options( mvb.autoversion=FALSE).

dir.above.source

folder within which the source package will go, with a + at the start being shorthand for the task package folder (the default). Hence pre.install( pkg=mypack, dir="+/holder") will lead to creation of "holder/mypack" below the task folder of mypack. Set this manually if you have to maintain different versions of the package for different R versions, or different flavours of the package for other reasons, or if your source package must live in a "subversion tree" (whatever that is).

R.target.version

Not needed 99% of the time; use only if you want to create source package for a different version of R. Supercedes the Rd.version argument of pre.install pre-'mvbutils' 2.5.57, used to control the documentation format. Set R.target.version to something less than "2.10" for ye olde "Rd version 1" format.

Details

As per the Glossary section of mvbutils.packaging.tools: the "task package" is the directory containing the ".RData" file with the guts of your package, which should be linked into the cd task hierarchy. The "source package" is usually the directory "<<pkg>>" below the task package, which will be created if needs be.

The default behaviour of pre.install is as follows-- to change it, see Overriding defaults. A basic source package is created in a sourcedirectory "<<pkg>>" of the current task. The package will have at least a DESCRIPTION file, a NAMESPACE file, a single R source file with name "<<pkg>>.R" in the "R" sourcedirectory, possibly a "sysdata.rda" file in the same place to contain non-functions, and a set of Rd files in the "man" sourcedirectory. Rd files will be auto-created from flatdoc style documentation, although precedence will be given to any pre-existing Rd files found in an "Rd" sourcedirectory of your task, which get copied directly into the package. Any "inst", "demo", "vignettes", "tests", "src", "exec", and "data" sourcedirectories will be copied to the source package, recursively (i.e. including any of their sourcedirectories). There is no compilation of source code, since only a source package is being created; see also Compiled code below.

Most objects in the task package will go into the source package, but there are usually a few you wouldn't want there: objects that are concerned only with how to create the package in the first place, and ephemeral system clutter such as .Random.seed. The default exceptions are: functions pre.install.hook.<<pkg>>, .First.task, and .Last.task; data <<pkg>>.file.exclude.regexes, <<pkg>>.DESCRIPTION, <<pkg>>.VERSION, <<pkg>>.UNSTABLE, forced!exports, .required, .Depends, tasks, .Traceback, .packageName, last.warning, .Last.value, .Random.seed, .SavedPlots; and any character vector whose name ends with ".doc".

All pre-existing files in the "man", "src", "tests", "exec", "demo", "inst", and "R" sourcedirectories of the source-package directory will be removed (unless you have some mlazy objects; see below). If a file ".Rbuildignore" file is present in the task package, it's copied to the package directory, but I've never gotten this feature to work (NB I should include a facility in the pre-install hook for this). To exclude files that would otherwise be copied, i.e. those in "inst/demo/src/data" folders, create a character vector of regexes called <<pkg>>.file.exclude.regexes; any file matching any of these won't be copied. If there is a "changes.txt" file in the task package, it will be copied to the "inst" sourcedirectory of the package, as will any files in the task's own "inst" sourcedirectory. A DESCRIPTION file will be created, preferably from a <<pkg>>.DESCRIPTION object in the task package; see mvbutils.packaging.tools for more. Any "Makefile.*" in the task package will be copied, as will any in the "src" sourcedirectory (not sure why both places are allowed). No other files or sourcedirectories in the package directory will be created or removed, but some essential files will be modified.

If a NAMESPACE file is present in the task (usually no need), then it is copied directly to the package. If not, then pre.install will generate a NAMESPACE file by calling make.NAMESPACE, which makes reasonable guesses about what to import, export, and S3methodize. What is & isn't an S3 method is generally deduced OK (see make.NAMESPACE for gruesome details), but you can override the defaults via the pre-install hook. FWIW, since adding the package-creation features to mvbutils, I have never bothered explicitly writing a NAMESPACE file for any of my packages. By default, only documented functions are exported (i.e. visible to the user or other packages); the rest are only available to other functions in your package.

The R source file will contain functions only. Any doc and export.me attributes are dropped, but other attributes are kept; in particular, source code is kept in the source attribute.

If any of the Rd files starts with a period, e.g. ".dotty.name", it will be renamed to e.g. "01.dotty.name.Rd" to avoid some problems with RCMD. This should never matter, but just so you know...

To speed up conversion of documentation, a list of raw & converted documentation is stored in the file "doc2Rd.info.rda" in the task package, and conversion is only done for objects whose raw documentation has changed, unless force.all.docs is TRUE.

pre.install creates a file "funs.rda" in the package's "R" sourcedirectory, which is subsequently used by patch.installed. The function build.pkg (or R CMD BUILD) and friends will omit this file (currently with a complaint, which I intend to fix eventually, but which does not cause trouble).

Compiled code

patch.install does not compile source code; currently, you need to do that yourself, though I might add support for that if I can work a sufficiently general mechanism. If you use R to do your compilation, then install.pkg should work after pre.install, though you may need detach("package:mypack", unload=T) first and that will disrupt your R session. Alternatively, you may be able to use R CMD SHLIB to create the DLL directly, which you can then copy into the "libs" sourcedirectory of the installed package, without needing to re-install. I haven't tried this, but colleagues have reported success.

If, like me, you pre-compile your own DLLs manually (not allowed on CRAN, but fine for distribution to other users on the same OS), then you can put the DLLs into a folder "inst/libs" of the task (see next for Windows); they will end up as usual in the "libs" folder of the installed package, even though R itself hasn't compiled them. On Windows, put the DLLs one level deeper in "inst/libs/<<arch>>" instead, where "<<arch>>" is found from .Platform$r_arch; for 32-bit Windows, it's currently "i386". All references in this section to "libs", whether in the task or source or installed package, should be taken as meaning "libs/<<arch>>".

To load your package's DLLs, call library.dynam in the .onLoad function, for example like this:

  .onLoad <- function( libname, pkgname){
    library.dynam( 'my_first_dll', package=pkgname)
    library.dynam( 'my_other_dll', package=pkgname)  # fine to have several DLLs
  }

To automatically load all DLLs, you can copy the body of mvbutils:::generic.dll.loader into your own .onLoad, or just include a call to generic.dll.loader(libname,pkgname) if you don't mind having dependence on mvbutils.

After the package has been installed for the first time, I change my compiler settings so that the DLL is created directly in the installed package's "libs" folder; this means I can use the compiler's debugger while R is running. To accommodate this, patch.install behaves as follows:

any new DLLs in the task package are copied to the installed package;
any DLLs in the installed package but not in the task package are deleted;
for any DLLs in both task & installed, both copies are synchronized to the newer version;
the source package always matches the task package

You can call patch.install( mypack, DLLs.only=TRUE) if you only want the DLL-synching step.

(Before version 2.5.57, mvbutils allowed more latitude in where you could put your home-brewed DLLs, but it just made life more confusing. The only place that now works is as above.)

Data objects

Data objects are handled a bit differently to the recommendations in "R extensions" and elsewhere-- but the end result for the package user is the same, or better. The changes have been made to speed up package maintenance, and to improve useability. Specifically:

Undocumented data objects live only in the package's namespace, i.e. visible only to your functions.
Documented data objects appear both in the visible part of the package (i.e. in the search path), and in the namespace. [The R standard is that these should not be visible in the namespace, but this doesn't seem sensible to me.]
The easiest way to export a data object, is to "document" it by putting its name into an alias line of the doc attribute of an existing function. (Alias lines are single-word lines directly after the first line of the doc attr.)
To document a data object xxx in its own right, include a flat-format text object xxx.doc in your task package; see doc2Rd. xxx.doc itself won't appear in the packaged object, but will result in documentation for xxx and any other data objects that are given as alias lines.
Big data objects can be set up for transparent individual lazy-loading (see below) to save time & memory, but lazy-loading is otherwise off by default for individual data objects.
There is no need for the user ever to call data to access a dataset in the package, and in fact it won't work.

Note that the data(...) function has been pretty much obsolete since the advent of lazy-loading in R 2.0; see R-news #4/2.

In terms of package structure, as opposed to operation, there is no "data" sourcedirectory. Data lives either in the "sysdata.rdb/rdx" files in the "R" sourcedirectory (but can still be user-visible, which is not normally the case for objects in those files), or in the "mlazy" sourcedirectory for those objects with individual lazy-loading.

Big data objects

Lazy-loading objects cached with mlazy are handled specially, to speed up pre.install. Such objects get their cache-files copied to "inst/mlazy", and the .onLoad is prepended with code that will load them on demand. By default, they are exported if and only if documented, and are not locked. The following objects are not packaged by default, even if mlazyed: .Random.seed, .Traceback, last.warning, and .Saved.plots. These are mlazyed automatically if options( mvb.quick.cd) is TRUE-- see cd.

Documentation and exporting

Package documentation

Just because you have a package Splendid, it doesn't follow that a user will be able to figure out how to use it from the alphabetical list of functions in library( help=Splendid); even if you've written vignettes, it may not be obvious which to use. The recommended way to provide a package overview is via "package documentation", which the user accesses via package?Splendid. You can write this in a text object called e.g. "Splendid.package.doc", which will be passed through doc2Rd with an extra "docType{package}" field added. The first line should start e.g. "Splendid-package" and the corresponding ".Rd" file will be put first into the index. Speaking as a frequently bewildered would-be user of others' packages-- and one who readily gives up if the "help" is impenetrable-- I urge you to make use of this feature!

Vignettes

See mvbutils.packaging.tools.

Bare minimum for export

Only documented functions and data are exported from your package (unless you resort to the subterfuge described in the subsection after this). Documented things are those found by find.documented( doc="any"). The simplest way to document something is just to add its name as an "alias line" to the existing documentation of another function, before the first empty line. For example, if you're already using flatdoc to document my.beautiful.function, you can technically "document" and thus export other functions like so:

  structure( function( blahblahblah)...
  ,doc=flatdoc())
  my.beautiful.function    package:splendid
  other.exported.function.1
  other.exported.function.2

The package will build & install OK even if you don't provide USAGE and ARGUMENTS sections for the other functions. Of course, R CMD CHECK wouldn't like it (and may have a point on this occasion). If you just are after "legal" (for R CMD CHECK) albeit unhelpful documentation for some of your functions that you can't face writing proper doco for yet, see make.usage.section and make.argument.section.

Exporting undocumented things and vice versa

A bit naughty (RCMD CHECK complains), but quite doable. Note that "things" can be data objects, not just functions. Simply write a pre-install hook (see Overriding defaults) that includes something like this:

  pre.install.hook.mypack <- function( hooklist) {
    hooklist$nsinfo$exports <- c( hooklist$nsinfo$exports, "my.undocumented.thing")
  return( hooklist)
  }

You can follow a similar approach if you want to document something but not to export it (so that it can only be accessed by Splendid:::unexported.thing. This probably isn't naughty.

Overriding defaults

Source package folder can be controlled via options("mvbutils.sourcepkgdir.postfix"), as per "Folders and different R versions" in mvbutils.packaging.tools. You'd only need to do this if you have multiple R versions installed that require different source-package formats (something that does not often change).

If a function pre.install.hook.<<pkgname>> exists in the task "<<pkgname>>", it will be called during pre.install. It will be passed one list-mode argument, containing default values for various installation things that can be adjusted; and it should return a list with the same names. It will also be passed any ... arguments to pre.install, which can be used e.g. to set "production mode" vs "informal mode" of the end product. For example, you might call preinstall(mypack,modo="production") and then write a function pre.install.hook.mypack( hooklist, modo) that includes or excludes certain files depending on the value of modo. The hook can do two things: sort out any file issues not adequately handled by pre.install, and/or change the following elements in the list that is passed in. The return value should be the possibly-modified list. Hook list elements are:

copies: files to copy directly
dll.paths: DLLs to copy directly
extra.filecontents: named list; each element is the contents of a text file, the corresponding name being the path of the file to create eg "inst/src/utils.pas"--- a nonstandard name
extra.docs: names of character-mode objects that constitute flat-format documentation
description: named elements of DESCRIPTION file
task.path: path of task (ready-to-install package will be created as a sourcedirectory in this)
has.namespace: should a namespace be used?
use.existing.NAMESPACE: ignore default and just copy the existing NAMESPACE file?
nsinfo: default namespace information, to be written iff has.namespace==TRUE and use.existing.NAMESPACE==FALSE
exclude.funs: any functions not to include
exclude.data: non-functions to exclude from system.rda
dont.check.visibility: either TRUE (default default), FALSE, or a specified character vector, to say which objects are not to be checked for "globality" by RCMD CHECK (using the globalVariables mechanism). Leave alone if you don't understand this. You can change the "default default" via options( mvb_dont_check_visibility=FALSE).

There are two reasons for using a hook rather than directly setting parameters in pre.install. The first is that pre.install will calculate sensible but non-obvious default values for most things, and it is easier to change the defaults than to set them up from scratch in the call. The second is that once you have written a hook, you can forget about it-- you don't have to remember special argument values each time you call pre.install for that task.

Debugging a pre install hook

To understand what's in the list and how to write a pre-install hook, the easiest way is probably to write a dummy one and then mtrace it before calling pre.install(mypack). However, it's all a bit clunky at present (July 2011). Because the hook only exists in the "..mypack" shadow environment, mtrace won't find it automatically, so you'll need mtrace( pre.install.hook.mypack, from=..mypack). That's fine, but if you then modify the source of your hook function, you'll get an error following the "Reapplying trace..." message. So you need to do mtrace.off before saving your edited hook-function source, and then mtrace the hook again before calling pre.install(mypack). To be fixed, if I can work out how...

Different versions of r

R seems to be rather fond of changing the structural requirements of source & installed packages. mvbutils tries to shield you from those arcane and ephemeral details-- usually, your task package will not need changing, and pre.install will automatically generate source & installed packages in whatever format R currently requires. However, sometimes you do at least need to be able to build different "instances" of your package for different versions of R. The sourcedir and maybe the R.target.version arguments of pre.install may help with this.

But if you need to build instances of your package for a different version of R, then you may need this argument (and dir.above.source). I try to keep mvbutils up-to-date with R's fairly frequent revisions to package structure rules, with the aim that you (or I) can easily produce a source/binary-source package for a version of R later than the one you're using right now, merely by setting R.target.version. However, be warned that this may not always be enough; there might at some point be changes in R that will require you to be running the appropriate R version (and an appropriate version of mvbutils) just to recreate/rebuild your package in an appropriate form.

The nuances of R.target.version change with the changing tides of R versions, but the whole point of pre.install etc is that you shouldn't really need to know about those details; mvbutils tries to look after them for you. For example, though: as of 10/2011, the "detailed behaviour" is to enforce namespaces if R.target.version >= 2.14, regardless of whether your package has a .onLoad or not.

Packages without namespaces pre r2 14

You used to be allowed to build packages without namespaces-- not to be encouraged for general distribution IMO, but occasionally a useful shortcut for your own stuff nevertheless (mainly because everything is "exported", documented or not). For R <= 2.14, mvbutils will decide for itself whether your package is meant to be namespaced, based on whether any of the following apply: there is a NAMESPACE file in the task package; there is a .onLoad function in the task; there is an "Imports" directive in the DESCRIPTION file.

Examples

Run this code

# NOT RUN {
# Workflow for simple case:
cd( task.above.mypack)
maintain.packages( mypack)
# First-time setup, or after major R version changes:
pre.install( mypack)
install.pkg( mypack)
library( mypack)
# ... do stuff
# Subsequent maintenance:
maintain.packages( mypack) # only once per session, usually at the start
library( mypack) # maybe optional
# ...do various things involving changes to mypack, then...
patch.install( mypack) # keep disk image up-to-date
# Prepare copies for distribution
build.pkg( mypack) # for Linux or CRAN
build.pkg.binary( mypack) # for Windows or Macs
check.pkg( mypack) # if you like that sort of thing
# }

Run the code above in your browser using DataLab