workflowr: organized + reproducible + shareable data science in R
The workflowr R package makes it easier for researchers to organize their projects and share their results with colleagues.
Install the latest release (v0.7.0) by running this command in R or RStudio:
devtools::install_github("jdblischak/workflowr", build_vignettes = TRUE)
If you are already writing R code to analyze data, and know the basics of Git and GitHub, you can start taking advantage of workflowr immediately. In a matter of minutes, you can create a research website like this. (See also the Divvy data exploration project for a more elaborate example of a workflowr project.)
If you find any problems, or would like to suggest new features, please open an Issue.
- Why use workflowr?
- Quick start
- Upgrading
- More about this repository
- Background and related work
- Credits
- License
- Citation
- Pronunciation
Why use workflowr?
First, hopefully you don't need much convincing to write your analyses in R Markdown. It allows you to combine your R code, text, and figures in the same document! See the website to learn about all the cool features. Second, building a website with the rmarkdown package (as opposed to using knitr to produce Markdown files and passing these to a static site generator) enables you to use all the latest R packages (e.g. htmlwidgets) directly in your analyses. Third, the workflowr package provides functions to make it easier for a researcher to maintain a version-controlled R Markdown website:
- A function to start a project with all the necessary files (see
?wflow_start
) - Includes an R Markdown template that will automatically insert the date and most recent Git commit ID (i.e. SHA1) at the top of the file to aid reproducibility (see
?wflow_open
) - Saves generated figures into an organized directory structure
- A function that handles all the version control operations to track code development and also ensures all the R Markdown files are built in a reproducible manner (see
?wflow_publish
)
Quick start
workflowr builds on several software tools including Git, pandoc and knitr, but you do not need to have experience using any of these tools to get started with workflowr. You only need to know how to code in R and be generally familiar with the R Markdown format. A basic understanding of git as well as the UNIX command line is not essential, but helpful.
Here is a minimal set of steps to get you started with workflowr. If you are already using R and git, you may be able to skip some of these steps.
Install R (instructions from Software Carpentry).
(Optional) Install RStudio (workflowr takes advantages of some
RStudio features, but RStudio is not required to use workflowr)
Install Git (instructions from Software Carpentry)
Create an account on GitHub.
Configure Git (instructions from
Software Carpentry). Run the following commands in the shell command-line, inserting your account information:
```bash
git config --global user.name "Your Name"
git config --global user.email "youremail@domain"
```
- Install the latest stable release of workflowr from
```r
install.packages("devtools")
devtools::install_github("jdblischak/workflowr", build_vignettes = TRUE)
```
- Work through the vignette,
"Getting started with workflowr", to learn how to set up
a workflowr project. (You can view all the available vignettes locally
with browseVignettes("workflowr")
.)
- Alternatively, if you have already started your project, read the
vignette "Migrating an existing project to use workflowr" to learn how to convert your project to a workflowr project.
Learn more about how to Customize your research website.
If you find any unexpected behavior or think of an additional
feature that would be nice to have, please open an Issue
here. When writing your bug report or feature request,
please note the version of workflowr you are using (which you can
obtain by running packageVersion("workflowr")
).
Upgrading
To upgrade workflowr to the most recent stable release, follow these steps:
devtools::install_github("jdblischak/workflowr", build_vignettes = TRUE)
- Preview potential changes to your project files with
wflow_update()
:
library("workflowr")
wflow_update()
- To implement these changes, set
dry_run = FALSE
:
wflow_update(dry_run = FALSE)
More about this repository
This repository contains the workflowr R package. If your goal is to create a workflowr project, you do not need to fork this repository. Instead, following the Quick start instructions above.
For the most part, I try to follow the guidelines from R packages by Hadley Wickham. The unit tests are performed with testthat, the documentation is built with roxygen2, the online package documentation is created with pkgdown, continuous integration testing is performed by Travis CI, and code coverage is calculated with covr and Codecov.
The template files used by wflow_start()
to populate a new project are located
in inst/infrastructure/
. The R Markdown templates used by wflow_open()
are
located in inst/rmarkdown/templates/
. The repository contains the files
LICENSE
and LICENSE.md
to both adhere to R package conventions for defining
the license and also to make the license clear in a more
conventional manner (suggestions for improvement welcome). document.R
is a
convenience script for regenerating the documentation. The remaining directories
are standard for R packages as described in the manual Writing R
Extensions.
If you are interested in contributing to this project, please see these instructions.
Background and related work
There is lots of interest and development around reproducible research with R. Projects like workflowr are possible due to two key developments. First, the R packages knitr and rmarkdown have made it easy for any R programmer to generate reports that combine text, code, output, and figures. Second, the version control software Git, the Git hosting site GitHub, and the static website hosting service GitHub Pages have made it easy to share not only source code but also static HTML files (i.e. no need to purchase a domain name, setup a server, etc).
My first attempt at sharing a reproducible project online was singleCellSeq. Basically, I started by copying the documentation website of rmarkdown and added some customizations to organize the generated figures and to insert the status of the Git repository directly into the HTML pages. The workflowr R package is my attempt to simplify my previous workflow and provide helper functions so that any researcher can take advantage of this workflow.
workflowr encompasses multiple functions: 1) provides a project template, 2) version controls the R Markdown and HTML files, and 3) builds a website. Furthermore, it provides R functions to perform each of these steps. There are many other related works that provide similar functionality. Some are templates to be copied, some are R packages, and some involve more complex software (e.g. static blog software). Depending on your use case, one of the related works listed below may better suit your needs. Please check them out!
Project template hosted on GitHub:
Project template created via R package:
Create websites from R Markdown files:
Guides for reproducible research with R:
Other:
If you know of other related works I should include, please send a pull request to the "dev" branch.
Credits
workflowr was developed, and is maintained, by John Blischak, a postdoctoral researcher in the laboratory of Matthew Stephens at The University of Chicago. He is funded by a grant from the Gordon and Betty Moore Foundation to MS.
The workflowr package uses many great open source packages. Especially critical for this project are the R packages git2r, knitr, and rmarkdown. Please see the vignette How the workflowr package works to learn about the software that makes workflowr possible.
License
workflowr is available under the MIT license.
Citation
To cite workflowr in publications use:
John D. Blischak, Peter Carbonetto and Matthew Stephens (2017). workflowr: A workflow template for creating a research website. R package version 0.7.0. https://github.com/jdblischak/workflowr
A BibTeX entry for LaTeX users is
@Manual{,
title = {workflowr: A workflow template for creating a research website},
author = {John D. Blischak and Peter Carbonetto and Matthew Stephens},
note = {R package version 0.7.0},
year = {2017},
url = {https://github.com/jdblischak/workflowr},
}
Pronunciation
It is common for R packages to end with an "r", and I tend to pronounce this as if it was "er" because I personally find this the easiest. Thus I pronounce the package "workflow + er". Other equally good options are "workflow + R" or "work + flower".