Managing research projects and data analyses can be challenging when dealing with:
The org package solves these problems by providing a
standardized framework for organizing R projects with clear separation
of concerns and consistent structure across all your analyses.
Here’s how to get started with your first org
project:
library(org)
# 1. Initialize your project structure
org::initialize_project(
env = .GlobalEnv,
home = "my_analysis",
results = "my_results"
)
# 2. Access project paths
org::project$home # Your code location
org::project$results_today # Today's results folder
# 3. Use org functions in your analysis
org::path("data", "file.csv") # Cross-platform paths
org::ls_files("R") # List R filesThe concept behind org is straightforward - most
analyses have three main sections:
Each section has unique requirements:
org::initialize_projectThis is the main function that sets up your project structure. It
takes 2+ arguments and saves folder locations in
org::project for use throughout your analysis:
home: Location of Run.R and the
R/ folder (accessible via
org::project$home)results: Results folder that creates date-based
subfolders (accessible via org::project$results_today)...: Additional folders as needed (e.g.,
data_raw, data_clean)Run.RThis is your main analysis script that orchestrates the entire workflow:
All code sections should be encapsulated in functions in the
R/ folder. You should not have multiple main files, as this
creates confusion when returning to your code later. However, you can
have versioned files (e.g., Run_v01.R,
Run_v02.R) where later versions supersede earlier ones.
R/ directoryAll analysis functions should be defined in
org::project$home/R. The initialize_project
function automatically sources all R scripts in this directory.
Here’s a complete example of how to structure your project:
# Initialize the project
org::initialize_project(
env = .GlobalEnv,
home = "/git/analyses/2019/analysis3/",
results = "/dropbox/analyses_results/2019/analysis3/",
data_raw = "/data/analyses/2019/analysis3/"
)
# Document changes in archived results
txt <- glue::glue("
2019-01-01:
Included:
- Table 1
- Table 2
2019-02-02:
Changed Table 1 from mean -> median
", .trim=FALSE)
org::write_text(
txt = txt,
file = fs::path(org::project$results, "info.txt")
)
# Load required packages
library(data.table)
library(ggplot2)
# Run analysis
d <- clean_data() # Accesses data from org::project$data_raw
table_1(d) # Saves to org::project$results_today
figure_1(d) # Saves to org::project$results_today
figure_2(d) # Saves to org::project$results_todayWhen writing research articles, you often need multiple versions
(initial submission, resubmissions). org helps manage this
by using date-based versioning:
Run.R to
Run_YYYY_MM_DD_submission_1.RR/ to
R_YYYY_MM_DD_submission_1/This preserves the code that produced results for each submission, ensuring all changes are deliberate and intentional.
When working with team members who have different folder structures,
you can specify multiple possible paths. The org package
will automatically select the first path that exists:
# Team member setup - org will use the first existing path
org::initialize_project(
env = .GlobalEnv,
home = c(
"/Users/teammate1/projects/analysis3/", # Mac user
"/home/teammate2/analysis3/", # Linux user
"C:/Users/teammate3/analysis3/" # Windows user
),
results = c(
"/Users/teammate1/Dropbox/results/",
"/home/teammate2/dropbox/results/",
"C:/Users/teammate3/Dropbox/results/"
),
data_raw = c(
"/Users/teammate1/data/analysis3/",
"/home/teammate2/data/analysis3/",
"C:/shared_drive/data/analysis3/"
)
)This approach allows the same initialization code to work across different team members’ machines without modification.
Store your project components in appropriate locations:
# Code (GitHub)
git/
└── analyses/
├── 2018/
│ ├── analysis_1/ # org::project$home
│ │ ├── Run.R
│ │ └── R/
│ │ ├── clean_data.R
│ │ ├── descriptives.R
│ │ ├── analysis.R
│ │ └── figure_1.R
│ └── analysis_2/
└── 2019/
└── analysis_3/
# Results (Dropbox)
dropbox/
└── analyses_results/
├── 2018/
│ ├── analysis_1/ # org::project$results
│ │ ├── 2018-03-12/ # org::project$results_today
│ │ │ ├── table_1.xlsx
│ │ │ └── figure_1.png
│ │ ├── 2018-03-15/
│ │ └── 2018-03-18/
│ └── analysis_2/
└── 2019/
└── analysis_3/
# Data (Local)
data/
└── analyses/
├── 2018/
│ ├── analysis_1/ # org::project$data_raw
│ │ └── data.xlsx
│ └── analysis_2/
└── 2019/
└── analysis_3/
For projects on a shared network drive without GitHub/Dropbox:
project_name/ # org::project$home
├── Run.R
├── R/
│ ├── CleanData.R
│ ├── Descriptives.R
│ ├── Analysis1.R
│ └── Graphs1.R
├── paper/
│ └── paper.Rmd
├── results/ # org::project$results
│ └── 2018-03-12/ # org::project$results_today
│ ├── table1.xlsx
│ └── figure1.png
└── data_raw/ # org::project$data_raw
└── data.xlsx
For projects with limited access:
project_name/ # org::project$home
├── Run.R
├── R/
│ ├── clean_data.R
│ ├── descriptives.R
│ ├── analysis.R
│ └── figure_1.R
├── results/ # org::project$results
│ └── 2018-03-12/ # org::project$results_today
│ ├── table_1.xlsx
│ └── figure_1.png
└── data_raw/ # org::project$data_raw
└── data.xlsx
Understanding path components is important:
| Component | Name |
|---|---|
| /home/richard/test.src | Absolute (file)path |
| richard/test.src | Relative (file)path |
| /home/richard/ | Absolute (directory) path |
| ./richard/ | Relative (directory) path |
| richard | Directory |
| test.src | Filename |
A path specifies a location in a directory structure, while a filename only includes the file name itself. Directories only include directory name information.
The org package provides several key functions for
project management:
org::initialize_project(): Set up
project structure and source R filesorg::set_results(): Modify results
folder after project initializationorg::project: Environment containing
all project folder locationsorg::path(): Construct cross-platform
file pathsorg::ls_files(): List files with
optional pattern matchingorg::move_directory(): Move
directories safelyorg::write_text(): Write text files
with consistent formattingorg::package_installed(): Check if
packages are installedorg::create_project_quarto_internal_results():
Create Quarto projects with internal resultsorg::create_project_quarto_external_results():
Create Quarto projects with external results# 1. Initialize project structure
org::initialize_project(
env = .GlobalEnv,
home = "/path/to/your/analysis/",
results = "/path/to/results/",
data_raw = "/path/to/data/"
)
# 2. Create analysis functions in R/ folder
# 3. Run analysis from Run.R
# 4. Results automatically saved to org::project$results_todayRecommendation: Always use .GlobalEnv -
it makes life so much easier! All your functions will be directly
accessible without having to worry about environment scoping issues.
The org::path() function ensures your code works across
different operating systems:
# Cross-platform path construction
data_file <- org::path(org::project$data_raw, "survey_data.csv")
output_file <- org::path(org::project$results_today, "analysis_results.xlsx")
# Handles multiple path components
nested_path <- org::path("folder1", "subfolder", "file.txt")
# Removes double slashes automatically
clean_path <- org::path("folder//", "//file.txt") # Returns "folder/file.txt"org::path() for cross-platform
compatibilityhelp(package = "org")?org::initialize_project