Getting started with robust systematic conservation planning

Motivation

To illustrate the extent of the issue, we consider a simplified conservation planning problem with a minimum set objective. It contains 100 planning units with equal costs, 2 features, and an absolute target of 30. We focus on feature_1 in this example. Across all planning units, feature_1 is an uncertain value that takes a normal distribution of mean 1 and standard deviation 0.2. Suppose the researcher does not know this underlying distribution, goes into the field and obtains an estimate of feature_1 as a random draw from the distribution for use in the planning problem. In other words, the value of feature_1 in some planning units are overestimated (above 1), and some are underestimated (below 1). feature_2 does not have uncertainty and takes on the value 1.5 across all planning units.

We solve this problem in prioritizr.

set.seed(500)

mu <- 1
sigma <- 0.2

N <- 5
target <- N*2

feature_1 <- matrix(rnorm(100, mean = mu, sd = sigma), nrow = N, ncol = N)

## Warning in matrix(rnorm(100, mean = mu, sd = sigma), nrow = N, ncol = N): data
## length differs from size of matrix: [100 != 5 x 5]

feature_2 <- matrix(1.5, nrow = N, ncol = N)
sim_features_not_robust_raster <- c(rast(feature_1),
                                    rast(feature_2))
names(sim_features_not_robust_raster) <- c("feature_1", "feature_2")

pu <- matrix(1, nrow = N, ncol = N)
sim_pu_raster <- rast(pu)
names(sim_pu_raster) <- c("Cost")

p1 <- problem(sim_pu_raster, sim_features_not_robust_raster) %>%
  add_min_set_objective() %>%
  add_absolute_targets(target) %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE)

s1 <- solve(p1)
names(s1) <- c('solution')

plot(c(sim_features_not_robust_raster, sim_pu_raster, s1))

Features and cost of the prioritization problem

In the prioritization process, planning units that have a higher number of feature_1 in the planning unit will be selected over other planning units with less of feature_1. As shown below, prioritizr selects the planning units with the highest estimated number of features.

df <- data.frame(
  feature_1 = values(sim_features_not_robust_raster[[1]]),
  solution = values(s1)
  )

df %>%
  arrange(feature_1) %>%
  mutate(selected = if_else(solution == 1, "Yes", "No")) %>%
  mutate(rank = nrow(.)-row_number()+1) %>%
  ggplot(aes(x = rank, y = feature_1, fill = selected)) +
  geom_bar(stat = 'identity') +
  labs(x = "Rank of Feature 1", y = "Feature 1 value", fill = "Selected") +
  theme_bw() +
  theme(panel.grid = element_blank())

Ranking of feature 1 values and its prioritization outcomes

This would give desirable results if the estimate of the number of features in planning units are accurate and do not carry uncertainty. However, when the number of features in the planning units are not perfect estimates, we can see that the constraint could be violated. In this synthetic case, we know that the number of features in each planning unit all follow the same distribution, which means that the variations we see across features are only caused by estimation error and does not represent underlying variations in the planning units’ features. Planning units that were selected here were only those with an overestimated number of features (number of features greater than 1).

If uncertainties in features are not accounted for, a prioritization problem would overwhelmingly select planning units with over predicted number of features. This will cause the solution to be over-optimistic about the targets we can achieve with our planning solution. There is a good chance that the target constraints will be violated in practice.

To see this, we generate 500 realizations of feature_1 from the same distribution and evaluate how good our solution is across each of these realizations. We can estimate how likely it is that our solution violates the target constraint.

# Number of simulations
n_sims <- 500
sim_feature_1 <- replicate(n_sims, matrix(rnorm(N^2, mean = mu, sd = sigma), nrow = N, ncol = N)) %>%
  rast
feature_1_outcomes <- sim_feature_1 * s1
feature_1_targets <- values(feature_1_outcomes) %>%
  apply(2, sum) %>%
  unname

expected_target <- sum(df$feature_1 * df$solution)

data.frame(feature_1 = unname(feature_1_targets)) %>%
  mutate(violated = if_else(feature_1 < target, "Below target", "Above target")) %>%
  mutate(violated = factor(violated, c("Below target", "Above target"))) %>%
  ggplot(aes(x = feature_1, fill = violated)) +
  geom_histogram(color = 'white', bins = 30) +
  theme_bw() +
  geom_vline(xintercept = target, linetype = 2) +
  geom_vline(xintercept = expected_target, color = '#009e73') +
  scale_fill_manual('', values = c('#d55e00', '#0072b2'), drop = FALSE) +
  theme(panel.grid = element_blank())

Feature 1 representation evaluated across the distribution

The green vertical line depicts where we expected feature_1 to be, and the orange bars shows the distribution of where feature_1 would actually be. We can see that the planning solution identified violated the target for feature_1 in most of the realizations we draw from the statistical distribution.

Why did the planning solution violate the constraint in such a large number of realizations, even though it was solved using an unbiased estimate of feature_1? This is because in the prioritization process, the planning units with an overestimate of feature_1 would be disproportionately selected, as it appears that these planning units have more features.

In what operations researchers call the “Curse of Optimality”, the prioritization process could amplify errors that are present in the data, selectively choosing planning units with overestimated features or underestimated costs. If robust approaches are not used, the planning solution is likely to violate targets, leading to solutions that disappoint and fall far behind what we want from the prioritization process.

Robust minimum set objective

To parameterize the uncertainty in the features, we need to supply the counts of each feature under each of the realizations we have simulated. The core of the robust.prioritizr approach is the use of groupings, that tell the algorithm which

## Note: here I exploited the fact that the min set objective does not actually use the feature groupings behind the scenes
## Note: add_relative_targets will be a bit tricky to interpret as the "number of features" in each realization is different... will need to override with the max of the group (i.e. relative to the maximum of number of features across all realisations), or the mean... need to be transparent

sim_features_robust_raster <- c(sim_feature_1, rast(feature_2), rast(feature_2))
names(sim_features_robust_raster) <- c(paste0(rep('f1_', nlyr(sim_feature_1)),1:nlyr(sim_feature_1)),
                                       'f2_1', 'f2_2')
feature_groupings <- c(rep('f1', nlyr(sim_feature_1)), 'f2', 'f2')

p2 <- problem(sim_pu_raster, sim_features_robust_raster) %>%
  add_constant_robust_constraints(groups = feature_groupings,
                                  conf_level = .9) %>%
  add_absolute_targets(target) %>%
  add_robust_min_set_objective(method = 'cvar') %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE)

s2 <- solve(p2, force = TRUE)
soln <- c(s1, s2)
names(soln) <- c("Non-robust Solution", "Robust Solution")

plot(soln)

Prioritization outcome of the non-robust and robust solutions

Using the same approach, we can evaluate the representation of the new solution s2 across all the realizations of Feature 1.

feature_1_targets_robust <- values(sim_feature_1 * s2) %>%
  apply(2, sum) %>%
  unname

data.frame(`not_robust` = unname(feature_1_targets),
           `robust` = unname(feature_1_targets_robust)) %>%
  tidyr::pivot_longer(c('not_robust', 'robust'), names_to = 'name', values_to = 'values') %>%
  mutate(name = factor(name, c('not_robust', 'robust'), c('Non-Robust', 'Robust'))) %>%
  mutate(violated = if_else(values < target, "Below target", "Above target")) %>%
  mutate(violated = factor(violated, c("Below target", "Above target"))) %>%
  ggplot(aes(x = values, fill = violated)) +
  geom_histogram(color = 'white', bins = 30) +
  theme_bw() +
  geom_vline(xintercept = target, linetype = 2) +
  geom_vline(xintercept = expected_target, color = '#009e73') +
  scale_fill_manual('', values = c('#d55e00', '#0072b2'), drop = FALSE) +
  facet_wrap(vars(name), ncol = 1) +
  theme(panel.grid = element_blank())

Faceted histogram comparing the distribution of feature representation for the not-robust and robust solutions. The robust solution shows a much lower violation rate.

The representation of Feature 1 is dramatically improved when evaluated across the entire distribution of Feature 1, with all realizations having Feature 1 representation above the target value.

Robust minimum shortfall objective

Likewise, the problem can be solved for a minimum shortfall objective, where the objective is to minimize the weighted average of the shortfall between the species representation target of each feature to its target.

We demonstrate this first with the use of the standard (not robust) use of prioritizr, focusing again on minimizing the shortfall of feature_1 to the target.

budget <- target*2

p3 <- problem(sim_pu_raster, sim_features_not_robust_raster) %>%
  add_absolute_targets(target) %>%
  add_min_shortfall_objective(budget = budget) %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE)

s3 <- solve(p3)

plot(s3)

Prioritization outcome of the robust minimum shortfall objective

Using the same approach as before, we evaluate the full distribution of possible representations of feature_1 across the realizations we have simulated above.

feature_1_outcomes <- sim_feature_1 * s3
feature_1_targets <- values(feature_1_outcomes) %>%
  apply(2, sum) %>%
  unname

representation_summary <- eval_feature_representation_summary(p3, s3)

expected_target <- representation_summary[
  representation_summary$feature == 'feature_1',
  'absolute_held'
] %>%
  as.numeric

data.frame(feature_1 = unname(feature_1_targets)) %>%
  mutate(violated = if_else(feature_1 < target, "Below target", "Above target")) %>%
  mutate(violated = factor(violated, c("Below target", "Above target"))) %>%
  ggplot(aes(x = feature_1, fill = violated)) +
  geom_histogram(color = 'white', bins = 30) +
  theme_bw() +
  geom_vline(xintercept = target, linetype = 2) +
  geom_vline(xintercept = expected_target, color = '#009e73') +
  scale_fill_manual('', values = c('#d55e00', '#0072b2'), drop = FALSE) +
  theme(panel.grid = element_blank())

Histogram of feature 1 representation, showing the distribution of outcomes, with bars colored based on whether they are below or above the target. Vertical lines indicate the target and the expected target.

Observe that while prioritizr sees feature_1 as being achieved at the expected target, indicated in the green line, the actual representation as simulated from the distribution is nowhere near the expected level. This implies that the shortfall metric used in the standard prioritizr is an under-representation of the shortfall target.

Now, let’s observe how the solution will differ if we instead used robust approaches. In this robust approach, the shortfall is instead quantified as the minimum value in the distribution of representation for feature_1.

p4 <- problem(sim_pu_raster, sim_features_robust_raster) %>%
  add_constant_robust_constraints(groups = feature_groupings, conf_level = 1) %>%
  add_absolute_targets(target) %>%
  add_robust_min_shortfall_objective(budget = budget) %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE)

s4 <- solve(p4)
plot(s4)

Map of the robust minimum shortfall objective solution.

Importantly, the relative improvement of the solution here will be limited by the budget constraint that was imposed on the function.

feature_1_targets_robust <- values(sim_feature_1 * s4) %>%
  apply(2, sum) %>%
  unname

representation_summary <- eval_feature_representation_summary(p4, s4)

expected_robust_target <- representation_summary %>%
  filter(substr(feature, 0, 2) == 'f1') %>%
  summarise(absolute_held = min(absolute_held))
expected_robust_target <- as.numeric(expected_robust_target)

plot_df <- data.frame(`not_robust` = unname(feature_1_targets),
           `robust` = unname(feature_1_targets_robust)) %>%
  tidyr::pivot_longer(c('not_robust', 'robust'), names_to = 'name', values_to = 'values') %>%
  mutate(name = factor(name, c('not_robust', 'robust'), c('Not Robust', 'Robust'))) %>%
  mutate(violated = if_else(values < target, "Violated", "Not violated")) %>%
  mutate(violated = factor(violated, c("Violated", "Not violated")))

plot_df %>%
  ggplot(aes(x = values, fill = violated)) +
  geom_histogram(color = 'white', bins = 30) +
  theme_bw() +
  geom_vline(xintercept = target, linetype = 2) +
  geom_vline(data = filter(plot_df, name == 'Not Robust'), aes(xintercept = expected_target),
             color = '#009e73') +
    geom_vline(data = filter(plot_df, name == 'Robust'), aes(xintercept = expected_robust_target),
               color = '#0072b2') +
  scale_fill_manual('', values = c('#d55e00', '#0072b2'), drop = FALSE) +
  facet_wrap(vars(name), ncol = 1) +
  theme(panel.grid = element_blank())

Faceted histogram comparing the distribution of feature representation for the not-robust and robust solutions. The robust solution shows a much lower violation rate.

The robust approach is able to find a solution that has a much higher probability of reaching the species target because it is accurately able to use the minimum, or the worst-case, realization in the data to identify the solution with the least amount of shortfall.

Though it is important to recognize that the amount of improvement that are yielded by a robust solution is dependent on other constraints being imposed on it. Consider these alternative problems with the same formulation, but a much smaller budget.

# Use a small budget
small_budget <- 5

# Not robust version
p5 <- problem(sim_pu_raster, sim_features_not_robust_raster) %>%
  add_absolute_targets(target) %>%
  add_min_shortfall_objective(budget = small_budget) %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE)

s5 <- solve(p5)

representation_summary <- eval_feature_representation_summary(p5, s5)

expected_target <- representation_summary[
  representation_summary$feature == 'feature_1',
  'absolute_held'
] %>%
  as.numeric

p6 <- problem(sim_pu_raster, sim_features_robust_raster) %>%
  add_constant_robust_constraints(groups = feature_groupings) %>%
  add_absolute_targets(target) %>%
  add_robust_min_shortfall_objective(budget = small_budget) %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE)

s6 <- solve(p6)

fig_rast <- c(s5,s6)
names(fig_rast) <- c('Non-Robust Solution', 'Robust Solution')

plot(fig_rast)

Map comparing the non-robust and robust solutions with a small budget.

Here, the budget is constrained tightly at 5 cells. Robustness is typically achieved by selecting more planning units. Therefore, even though a robust solution is used, the solution did not improve much.

representation_summary <- eval_feature_representation_summary(p6, s6)

expected_robust_target <- representation_summary %>%
  filter(substr(feature, 0, 2) == 'f1') %>%
  summarise(absolute_held = min(absolute_held))
expected_robust_target <- as.numeric(expected_robust_target)

feature_1_targets <- values(sim_feature_1 * s5) %>%
  apply(2, sum) %>%
  unname

feature_1_targets_robust <- values(sim_feature_1 * s6) %>%
  apply(2, sum) %>%
  unname

plot_df <- data.frame(`not_robust` = unname(feature_1_targets),
           `robust` = unname(feature_1_targets_robust)) %>%
  tidyr::pivot_longer(c('not_robust', 'robust'), names_to = 'name', values_to = 'values') %>%
  mutate(name = factor(name, c('not_robust', 'robust'), c('Not Robust', 'Robust'))) %>%
  mutate(violated = if_else(values < target, "Violated", "Not violated")) %>%
  mutate(violated = factor(violated, c("Violated", "Not violated")))

plot_df %>%
  ggplot(aes(x = values, fill = violated)) +
  geom_histogram(color = 'white', bins = 30) +
  theme_bw() +
  geom_vline(xintercept = target, linetype = 2) +
  geom_vline(data = filter(plot_df, name == 'Not Robust'), aes(xintercept = expected_target),
             color = '#009e73') +
    geom_vline(data = filter(plot_df, name == 'Robust'), aes(xintercept = expected_robust_target),
               color = '#0072b2') +
  scale_fill_manual('', values = c('#d55e00', '#0072b2'), drop = FALSE) +
  facet_wrap(vars(name), ncol = 1) +
  theme(panel.grid = element_blank())

Faceted histogram comparing the distribution of feature representation for the not-robust and robust solutions with a small budget. The distributions are similar due to the tight budget constraint.

We can see that the distribution of outcomes did not change much in this case. This is because the budget is tightly constrained, which means that even the robust approach is not able to find a solution within the budget that reduces the total amount of shortfall for feature_1.

Getting started with robust systematic conservation planning

Introduction

Motivation

Robust minimum set objective

Robust minimum shortfall objective

Conclusion

References