Learn R Programming

mvbutils (version 2.5.4)

cd: Organizing R workspaces

Description

cd allows you to set up and move through a hierarchically-organized set of R{} workspaces, each corresponding to a directory. While working at any level of the hierarchy, all higher levels are attached on the search path, so you can see objects in the "parents". You can easily switch between workspaces in the same session, you can move objects around in the hierarchy, and you can do several hierarchy-wide things such as searching, even on parts of the hierarchy that aren't currently attached.

Usage

# Occasionally: cd()
# Usually: cd(to)
# Rarely:
 cd(to, execute.First = TRUE, execute.Last = TRUE)

Arguments

to
the path of a task to move to or create, as an unquoted string. If omitted, you'll be given a menu. See Details.
execute.First
should the .First.task code be executed on attachment? Yes, unless there's a bug in it.
execute.Last
should the .Last.task code be executed on detachment? Yes, unless there's a bug in it.

bold

Arguments

How it works

The mechanism underlying the tree structure is very simple: each task that has any subtasks will contain a character vector called tasks, whose names are the R{} names of the tasks, and whose elements are the corresponding disk directories. Your ROOT task need contain no more than a .First function and a tasks object. You can manually modify the tasks vector, and sometimes this is essential. If you decide to move a disk directory, for example, you can manually change the corresponding element of tasks to reflect the change. (Though if you are moving a whole task hierarchy, e.g. when migrating to a new machine, consult cd.change.all.paths. Having said that, the ability to use relative pathnames in tasks, which is present since about mvbutils version 2.0, makes cd.change.all.paths partly redundant.) You can also rename a task very easily, via something like names( tasks)[ names( tasks)=="my.old.name"] <- "my.new.name" You can use similar methods to "reparent" a subtask without changing the directory structure. There is (deliberately, to avoid accidents) no completely automatic way of removing tasks. To "hide" a task from the cd system, you first need to be cded to its parent; then remove the corresponding element of the tasks object, most easily via e.g. tasks <- tasks %without.name% "mysubtask" If you want to remove the directories corresponding to "mysubtask", you have to do so manually, either in the operating system or (for the brave) in R{} code. Remember to Save() at some point after manually modifying tasks.

Options

Various options() can be set, as follows. Remember to put these into your .First function, too. write.mvb.tasks=TRUE causes a sourceable text representation of the tasks object to be maintained in each directory, in the file tasks.r. This helps in case you accidentally wipe out the .RData file and lose track of where the child tasks live. To create these text representations for the first time throughout the hierarchy, call cd.write.mvb.tasks(0). You need to put the the options call in your .First. abbreviate.cdprompt=n controls the length of the prompt string. Only the first n characters of all ancestral task names will be shown. For example, n=1 would replace the prompt long.task.name/data/funcs> with l/d/funcs>. mvbutils.update.history.on.cd=FALSE will prevent automatic saving & reloading of the history file when cd is called. cd checks the R_HISTFILE environment variable and, if unset, sets it to file.path( getwd()), ".Rhistory"). This (combined with the mvbutils replacement of the standard versions of savehistory and loadhistory-- see package?mvbutils) ensures that the same history file is used throughout each and every R{} session. My experience is that a single master history file is safer. However, if you want to override this behaviour-- e.g. if you want to use a separate history file for each task-- call something like Sys.setenv( R_HISTFILE=".Rhistory") before the first use of cd.

Details

R{} workspaces can become very cluttered, so that it becomes difficult to keep track of what's what (I have seen workspaces with over 1000 objects in them). If you work on several different projects, it can be awkward to work out where to put "shared" functions-- or to remember where things are, if you come back to a project after some months away. And if you just want to test out a bit of code without leaving permanent clutter, but while still being able to "see" your important objects, how do you do it? cd helps with all such problems, by letting you organize all your projects into a single tree structure, regardless of where they are stored on disk. Each workspace is referred to (for historical reasons) as a "task". Note that there is a basic choice when working with R{}: do you keep everything you write in a text file which you source every time you start; or do you store all the objects in a workspace as a binary image in a ".RData" file, and rely on save and load? [Hybrids are possible, too.] Some people prefer the text-based approach, but others including me prefer the binary image approach; my reasons are that binary images let me organize my work across tasks more systematically, and that repeated text-sourcing is much too slow when lengthy analyses or data extractions are involved. The cd system is really geared to the binary image model and, before cd moves to a new task, either up or down the hierarchy, the current workspace is automatically saved to a binary image. Nevertheless, I don't think cd is incompatible with other ways of working, as long as the ".RData" file (actually the tasks object) is not destroyed from session to session. At any rate, some people who work by sourceing large code files still seem to find cd useful; it's even possible to use the .First.task feature to auto-load a task's source files into a text editor when you cd to that task. With the ".RData"-only approach, it is highly advisable to have some way of keeping separate text backups, at least of function code. The fixr editing system is geared up to this, and I presume other systems such as ESS are too. To use the cd system, you will need to start R{} in the same workspace every time. This will become your ROOT or home task, from which all other tasks stem. There need not be much in this workspace except for an object called tasks (see below), though you can use it for shared functions that you don't want to organize into a package. From the ROOT task, your first action in a new R{} session will normally be to use cd to switch to a real task. The cd command is used both to switch between existing tasks, and to create new ones. To set yourself up for working with cd, it's probably a good idea to make the ROOT task a completely new blank workspace, so the first step is to (outside R) create an empty folder with some name like "Rstart". [In MS-Windows, you should think about where to put this, to save yourself inordinate typing later on. If you are planning to create a completely new set of folders for your R{} projects, you might want to put this ROOT folder near the top of the disk directory structure, rather than in the insane default that Windows proffers, which usually looks something like "c:\document...

See Also

move, task.home, cdtree, cdfind, cditerate, cd.change.all.paths, cd.write.mvb.tasks, cdprompt, fixr, mlazy