This minor version release expands the scope of simulist to
include two new post-processing functions:
truncate_linelist()
and messy_linelist()
. Both
of these functions modify a line list <data.frame>
from sim_linelist()
or sim_outbreak()
, the
line list now also includes a $date_reporting
column.
This release has also focused on making the package interface more consistent and contains bugs fixes.
The messy_linelist()
function is added. This takes a
simulated line list and creates a “messy” line list with
inconsistencies, irregularities and missingness found in empirical
outbreak data (#187 & #196 & #199)
A reporting delay argument (reporting_delay
) is now
included in sim_linelist()
and sim_outbreak()
to simulate reporting delays from the date of symptom onset
($date_onset
) to date of reporting
($date_reporting
) (#179).
The truncate_linelist()
function is added. This
takes a simulated line list and can create outbreak snapshots and
right-truncation of real-time outbreak data (#179 & #193 & #201
& #211).
A new vignette, reporting_delays-truncation.Rmd
, on
reporting delays and right-truncation for line list data has been added
(#179 & #201).
Alt text is added to all vignette figures (#214).
{english}
is added as a package dependency for
messy_linelist()
(#187).
R CMD check CI is now run on R v4.1, the minimum required R version for the package (#180).
{epiparameter}
is no longer used in testing
(#177).
.check_linelist()
is added for input checking in
post-processing functions (#179).
.check_age_df()
and .check_risk_df()
have been merged into .check_df()
thanks to the
standardisation of the structure of <data.frame>
objects required by sim_*()
function arguments
(#200).
create_config()
has been updated to accepted
function
s instead of a distribution name and a vector of
parameters. This now matches the design of arguments that accept a
function
in sim_*()
functions (#202).
The structure of the age-structured population
<data.frame>
input into sim_*()
functions has been standardised with the age-stratified risk
<data.frame>
s by using an $age_limit
column instead of an $age_range
character
column (#200).
The line list <data.frame>
output by
sim_linelist()
and sim_outbreak()
now contains
a $date_reporting
column (#179).
Outcome date ($date_outcome
) is now conditioned to
be after hospitalisation date ($date_admission
) using the
new internal .sample_outcome_time()
function. This is a
breaking change as previously hospitalisation times could be before
outcome times, sim_linelist()
can now through an error if
an outcome time cannot be sampled to be after the hospitalisation time
(#178).
The date of first contact is now sampled as the number of days before infection time (equal to symptom onset in the model) rather than days before date of last contact, as this could lead to the infection time before the first contact (#206).
The minimum required R version for simulist is increased to v4.1.0 from v3.6.0 due to package dependencies (#180).
The minimum required version of {incidence2}
(suggested dependency) is now v2.3.0 (#214).
design-principles.Rmd
vignette (#208).Date of symptom onset can no longer occur before date of first contact (#206).
Outcome date can no longer occur before hospitalisation date (#178).
A minor version release of simulist containing various minor improvements to the functions and documentation, as well as removing some triggers for warning users. There are also a few bug fixes and internal enhancements.
sim_*()
arguments that
previously did not have one. Allowing functions to be run without
specifying any arguments
(e.g. linelist <- sim_linelist()
) (#149).sim_*()
function arguments that
accept either a function
or an
<epiparameter>
object has been improved (#149).sim_*()
functions no longer warn if the user has not
specified *_risk
arguments and have set
onset_to_*
arguments to NULL
(#149).rmarkdown::html_vignette
for to
correctly render the website and for maximum compatibility with
{pkgdown}
>= 2.1.0. This removes figure numbering and
code folding (#153)..add_hospitalisation()
has been vectorised following
.add_outcome()
in PR #101 (#150).infect_period
and
prob_infect
have been renamed
infectious_period
and prob_infection
(#143).sim_*()
functions now use NULL
instead of
NA
to turn off processes
(e.g. onset_to_death = NULL
for no deaths) (#148)..sample_infect_period()
is added that
errors if the infectious period function generates a negative number
(#142).sim_linelist()
no longer errors when
hosp_death_risk
is NULL
and
onset_to_death
is parameterised as a delay distribution
(#144)..add_ct()
generates the correct number of values and
does not duplicate Ct values due to vector recycling (#158).{epiparameter}
usage (#159).The third release of the simulist R package contains a range of new features, enhancements, documentation and unit tests.
The headline changes to the package are:
date_outcome
and
outcome
columns which can be parameterised with
onset_to_death
and onset_to_recovery
.onset_to_hosp
and onset_to_death
arguments can now take NA
as input and will return a column
of NA
s in the line list columns date_admission
and date_outcome
(#98).
An onset_to_recovery
argument has been added to the
simulation functions, sim_linelist()
and
sim_outbreak()
, and so the recovery date can be explicitly
given in the line list data (#99).
The line list simulation can now use a time-varying case fatality
risk. The create_config()
function now returns a
$time_varying_death_risk
list element, which is
NULL
by default but can take an R function to enable the
fatality risk of cases to change over the epidemic (#101).
A new vignette, time-varying-cfr.Rmd
, has been added
to the package to describe how to use the time-varying case fatality
risk functionality and describe a few different time-varying functions
that can be used (#101).
A new vignette, wrangling-linelist.Rmd
, has been
added to go over some of the common post-processing steps that might be
required after simulating line list or contact tracing data. This
vignette is short and currently only contains a single post-processing
example, more examples will be added over time (#104).
The README
now has a section on related projects to
provide an overview of packages that simulate line list data, or are
related to simulist. This section contains a disclosure widget
containing a feature table providing a high-level description of the
features and development status of each related package (#110).
A Key features section and Complimentary R packages section has
been added to the README
(#134).
Updated package architecture diagram in the
design-principles.Rmd
vignette (#113).
The .add_deaths()
function has been replaced by the
.add_outcome()
function which can simulate death and
recovery times (#99).
.cross_check_sim_input()
function has been added to
the package to ensure user input is coherent across arguments
(#98).
.anonymise()
function has been added to convert
individual’s names into alphanumeric codes to anonymise individuals in
line list and contact tracing data (#106).
The simulation functions are now parameterised with an infectious
period (infect_period
argument) instead of a contact
interval (contact_interval
argument). This moves away from
parameterising the simulation with the time delay between a person
becoming infected and having contact with a susceptible individual, and
instead uses an infectious period distribution within which contacts are
uniformly distributed in time (#96).
The simulation functions can now set a maximum as well as a
minimum outbreak size. The min_outbreak_size
argument in
the exported sim_*()
functions has been renamed
outbreak_size
and takes a numeric
vector of
two elements, the minimum and maximum outbreak size. The maximum
outbreak size is a soft limit due to the stochastic nature of the
branching process model, so epidemiological data returned can contain
more cases and/or contacts that the maximum in
outbreak_size
but in these case a warning is returned
explaining to the user how many cases/contacts are being returned
(#93).
The add_ct
argument in sim_linelist()
and sim_outbreak()
has been removed. The functionality is
now equivalent to add_ct = TRUE
in the previous
simulist version. The add_ct
argument was removed
to move the package to always returning <data.frame>
s
with the same number of columns, for consistency and predictability
(#104).
The add_names
argument in the simulation functions
has been renamed to anonymise
. The new argument controls
whether names are given to each case (anonymise = FALSE
,
default behaviour) or whether fixed length hexadecimal codes are given
to each case (anonymise = TRUE
), this ensures the returned
<data.frame>
has the same number of columns
(#106).
.sim_network_bp()
now indexes the time vector
correctly. Previously a vector indexing bug meant the epidemic would not
progress through time (#95).Second release of simulist, updates the core simulation
model and, as a result, the arguments for sim_*()
functions
for simulating line list data and/or contact table data exported from
simulist are updated. The internal package architecture is also
refactored.
create_config()
now returns a new element in the list:
$network
. By default create_config()
returns
network = "adjusted"
, which assumes the simulation is a
random network and samples contacts with an excess degree distribution
(see Details in ?create_config()
). The network effect can
be changed to "unadjusted"
to switch off the
network effect. $network
is checked internally (in
.sim_network_bp()
) and will error if not valid (#60).design-principles.Rmd
(#66).lint-changed-files.yaml
)
is added to the suite of continuous integration workflows (#68).vis-linelist.Rmd
(#70)..sim_network_bp()
is added as an internal function and
replaces bpmodels::chain_sim()
as the core simulation model
producing contacted and infected individuals. {bpmodels}
is
removed as a dependency as a result (#60)..sample_names()
is added as an internal function to
utilise randomNames::randomNames()
to produce more unique
names than
randomNames(..., sample.with.replacement = FALSE)
..sim_bp_linelist()
, .sim_clinical_linelist()
and .sim_contacts_tbl()
with .sim_internal()
(#66).sim_utils.R
file was renamed to
utils.R
(#66) and the empty create_linelist.R
file was removed (#72)..add_date_contact()
argument
outbreak_start_date
is now NULL
by default
instead of missing (#82).sim_*()
functions now use snapshot
testing for more detailed data checking (#65).testdata
) files have been
updated, as has the testdata/README.md
with instructions
(#64).R
and serial_interval
arguments have been
removed from sim_linelist()
, sim_contacts()
and sim_outbreak()
functions and instead
contact_distribution
, contact_interval
and
prob_infect
are used to parameterise the simulation.
Documentation, both functions and vignettes, have been updated with
these changes (#60).contact_distribution
argument in
sim_*()
functions requires a density function if supplied
as an anonymous function. Information is added to
simulist.Rmd
to explain this.sim_linelist()
now uses
column header sex
instead of gender
. The
contacts table output from sim_contacts()
and
sim_outbreak()
now uses column headers age
and
sex
instead of cnt_age
and
cnt_gender
(#60, #79).contact_distribution
is redefined and redocumented as
the distribution of contacts per individual, rather than the number of
contacts that do not get infected as it was in v0.1.0.row.names
for <data.frame>
s output
by sim_linelist()
, sim_contacts()
and
sim_outbreak()
are now sequential from 1:nrows
(#63).sim_contacts()
now correctly runs with an
age-structured population. In the previous version (v0.1.0),
sim_contacts()
did not call .check_age_df()
and as a result the function errored, this is fixed as of PR #81.Initial release of simulist, an R package containing tools to simulate epidemiological data such as line lists and contact tables.
sim_linelist()
: simulate line list datasim_contacts()
: simulate contacts datasim_outbreak()
: simulate both line list and contacts
datasimulist.Rmd
)age-strat-risks.Rmd
)age-struct-pop.Rmd
)vis-linelist.Rmd
)design-principles.Rmd
)