Data Version: 2023 (available May 2025)
Citation:Citation: Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, S. Ligocki, O. Robinson, W. Hochachka, L. Jaromczyk, C. Crowley, K. Dunham, A. Stillman, C. Davis, M. Stokowski, P. Sharma, V. Pantoja, D. Burgin, P. Crowe, M. Bell, S. Ray, I. Davies, V. Ruiz-Gutierrez, C. Wood, A. Rodewald. 2024. eBird Status and Trends, Data Version: 2023; Released: 2025. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/WZTW8903
There were no new eBird Trends generated or released in this version. The existing versions will remain on the website; please see the previous changelog.
terra::slope
function with default parameters on
the updated GEBCO
bathymetry data and summarized within the 1.5km radius neighborhood
(consistent with all other features).ranger
package which is used for the base models, relating
to how features were sorted and sampled in the trees. In the count
model, for approximately 20-30% of stixels this bug caused features,
most often the predicted occurrence, which is a feature in the count
model, to be randomly excluded from tree building even when we specified
that these features must be included. The bug was fixed by the package
author, but we continued to encounter this issue in ~5% of stixels in
the version of ranger
distributed by CRAN, so we have
maintained a branch of the package which fully fixes this issue.mccf1
threshold and having a
count of zero. After fixing the above bug and implementing the new
binary classification model, we discovered a significant decline in the
quality of the count and relative abundance estimates, as seen in
predictive performance metrics (PPMs), particularly the Poisson deviance
for the count and relative abundance estimates. This was attributed to
an increase in checklists in which species were predicted to be present
but with an observed count of zero entering the count model, as result
of adding the binary classification model and potentially exacerbated by
all count models seeing all features after the bugfix. The solution was
to remove the “hurdle” and exclude checklists predicted to be present
but having observed counts of zero and only include checklists with
non-zero observed counts in the count model. This resulted in a
substantial improvement in the PPMs for the count and relative abundance
estimates, especially the Poisson deviance measure, across a set of 20
test species at their full spatiotemporal extent. This change also means
that the relative abundance values themselves are much higher than in
previous versions.Masking Threshold Values
Land-only | Land-and-ocean | |
---|---|---|
Site Selection Probability Threshold | 0.25% | Not applied |
Spatial Coverage Threshold | 0.03% | 0.03% |
Spatial Coverage Threshold for Assumed Zero Layer | 1% | 1% |
Updated PPMs | Land-only | Land-and-ocean |
---|---|---|
Estimate | Statistic | Name |
Binary | F1 | binary-f1 |
Binary | Matthew’s Correlation Coefficient (MCC) | binary-mcc |
Binary | Prevalence | binary-prevalence |
Occurrence | Bernoulli Deviance | occ-bernoulli-dev |
Occurrence | Brier Score | occ-brier |
Occurrence | Precision-Recall AUC (PR AUC) | occ-pr-auc |
Occurrence | PR AUC Greater than Prevalence | occ-pr-auc-gt-prev |
Occurrence | PR AUC Normalized | occ-pr-auc-normalized |
Count, Relative Abundance | Log Pearson Correlation | count/abd-log-pearson |
Count, Relative Abundance | Mean Absolute Error (MAE) | count/abd-mae |
Count, Relative Abundance | Poisson Deviance | count/abd-poisson-dev |
Count, Relative Abundance | Root Mean Squared Error (RMSE) | count/abd-rmse |
Count, Relative Abundance | Spearman Correlation | count/abd-spearman |
Data Version: 2022 (available November 2023)
Citation:Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, S. Ligocki, O. Robinson, W. Hochachka, L. Jaromczyk, C. Crowley, K. Dunham, A. Stillman, I. Davies, A. Rodewald, V. Ruiz-Gutierrez, C. Wood. 2023. eBird Status and Trends, Data Version: 2022; Released: 2023. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2022
effort_hrs
and
effort_distance_km
having been maximized with partial
dependence values separately.Details The foundation of CCI is a predictive model of checklist-level species richness (\(S\); i.e. number of species). In updating CCI, changes were made to both the form of the predictive model of \(S\) and to the method that attributes variation in richness to particular observers.
Prior to Version 2022, predictive features comprised weather,
landcover, habitat diversity, protocol, day of year, and variables that
are particular to the observer: observer_id and checklist_number (i.e.,
index of how many checklists a user has ever submitted from any stixel
to eBird; not to be confused with checklist_id). A mixed-effects
generalized additive model (GAM) was fit to \(S\). This GAM used as predictive features
the natural log of checklist_number
, a smooth spline of
solar_noon_diff
, and the raw values of all other
predictors, with a random effect specification for
observer_id
and checklist_number
. The model
was used to make predictions \(p_{i}\)
of \(S\) to data representing a
“standardized search”, in which all features except
observer_id
and checklist_number
were held
constant (at the column-wise mean) across observations. CCI was derived
from the variation in resulting predictions, and scaled to have mean 0
and variance 1.
\[ CCI_{i} = \\(pi - mean(p)\\) / sd(p) \]
Version 2022 changed the functional form of the predictive model from
a (mostly) linear mixed-effects model to a random forest. Further, it
removed observer_id
and checklist_number
from
the suite of predictive features; the model is now blind to
person-specific effects. Instead, predictions to real data absent any
personal information establish conditional expectations of richness
given habitat, effort, weather, etc. Each expected value parameterizes a
Poisson distribution, which is used to compute the exceedance
probability of the actually-observed S
, which is then
mapped to a standard-normal quantile. A GAM with a “factor smooth” basis
for checklist_number
and observer_id
is
applied to smooth the raw values for each observer. CCI currently
comprises these smoothed values.
has_evi
that
describes whether the covariate was available at a given date and
location.has_shoreline
covariate, as the
shoreline covariates are not spatially exhaustive, describing whether
the covariate was available at a given location.effort_distance_km
and effort_hrs
are set to
their 90th quantiles when making predictions to determine the range
boundary. Previously these were chosen to maximize the partial
dependence (PD) curve.solar_noon_diff
) are now chosen to maximize the abundance
partial dependence (PD) constrained to values where the species was
detected. Previously they were chosen using the occurrence partial
dependence curve and were not constrained to detections.effort_distance_km
is
now 2 km, to more closely reflect the distribution of checklists and to
increase overall signal.Data Version: 2021 (available November 2022)
Citation:Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, S. Ligocki, O. Robinson, W. Hochachka, L. Jaromczyk, A. Rodewald, C. Wood, I. Davies, A. Spencer. 2022. eBird Status and Trends, Data Version: 2021; Released: 2022. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2021
Data Version: 2020 (available Fall 2021)
Citation:Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, O. Robinson, S. Ligocki, W. Hochachka, L. Jaromczyk, C. Wood, I. Davies, M. Iliff, L. Seitz. 2021. eBird Status and Trends, Data Version: 2020; Released: 2021. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2020
Data Version: 2019 (available Fall 2020)
Citation:Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, O. Robinson, S. Ligocki, W. Hochachka, C. Wood, I. Davies, M. Iliff, L. Seitz. 2020. eBird Status and Trends, Data Version: 2019; Released: 2020. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2019