ukb_time_skeleton() to create a reusable
follow-up time skeleton with baseline date, death date,
loss-to-follow-up date, administrative censoring, follow-up end reason,
and valid follow-up indicators.time_skeleton support to
build_survival_dataset() while preserving the default
survival workflow.ukb_download_rap_dictionary(),
ukb_query_dictionary(), and
ukb_validate_columns(), for official RAP dictionary lookup,
Chinese/English field search, and column validation.ukb_dictionary_zh metadata dataset
for Chinese UKB field-path lookup.run_regression() with
covariate_sets for nested epidemiological models such as
crude, partially adjusted, and fully adjusted analyses.tests/ from package builds and remote
tracking.CancerRegistry disease source using fields 40006, 40005,
40011, and 40012.cancer_icd10_pattern,
cancer_histology, and cancer_behaviour to
create_disease_definition().Lung_Cancer with cancer registry,
ICD-10, and death-registry ascertainment.FirstOccurrence disease source.first_occurrence_fields and
first_occurrence_source_fields to
create_disease_definition().p13xxxx First
Occurrence date/source fields, including UKB special date coding 819
handling.ukb_clean_missing() for converting common UKB
non-response labels and numeric missing codes into analysis-ready
values.ukb_snapshot() to record row/column counts,
missingness, complete rows, object size, and deltas across analysis
pipeline checkpoints.ukb_ml_workflow() API for binary,
multiclass, and continuous non-survival ML with a frozen final test
set.ukb_ml_as_split(), enhanced
ukb_ml_split_data(), ukb_ml_feature_select(),
ukb_ml_tune(), ukb_ml_threshold(),
ukb_ml_fit_final(), and
ukb_ml_evaluate_test().split_ratio style in
ukb_ml_split_data() by keeping
$internal_validation as an alias for the held-out
split.ukb_shap() to support
ukb_ml_workflow and ukb_ml_final objects,
defaulting to the frozen test set for workflow objects.rpart decision tree and
naive_bayes model backends to
ukb_ml_workflow().options(UKBAnalytica.auto_install_ml = TRUE); by default,
optional model packages are checked only when the selected model needs
them and are not installed automatically.ukb_ml_survival_workflow() and survival-specific
split, feature-selection, tuning, final-refit, and frozen-test
evaluation helpers for time-to-event ML.model = "cox" as the lightweight default survival
ML backend and aligned survival prediction output with the new workflow
object structure.ukb_ml_survival() as deprecated in favor
of ukb_ml_survival_workflow().dx extract_dataset and RAP
table-exporter.rap_find_dataset(),
rap_list_fields(), rap_plan_extract(),
rap_extract_pheno(), and
rap_submit_extract().variables = ... using UKBAnalytica
predefined baseline mappings, while preserving
field_id = ... for all instances and arrays of a UKB
field.inst/python/ as legacy/helper entry points./mnt/project.ukb_ml_workflow()
path.OPCS4 operative procedure support for hospital
summary operations via p41272 +
p41282_a*.opcs4_pattern to
create_disease_definition() so procedure evidence is opt-in
and ignored by default when unspecified.OPCS4 in sources,
prevalent_sources, and outcome_sources.Arrhythmia,
Ventricular_Arrhythmia, AV_Block,
Intraventricular_Block, and SVT.Atrial_Fibrillation with OPCS4
support for procedure-augmented atrial arrhythmia ascertainment.opcs4_pattern and arrhythmia phenotyping with
ICD10 + OPCS4.README.md with an ICD-10 + OPCS4 phenotyping
example and clarified the default opt-in behavior for procedure
data.build_survival_dataset() with
show_flow to print step-by-step participant attrition in
terminal for wide output.n_before,
n_after, excluded, retention rates from
previous/raw cohort).attr(result, "participant_flow").dt_threads in
build_survival_dataset() to let users temporarily configure
data.table thread count for large runs..safe_as_date() utility
(R/date_utils.R) to parse mixed date formats safely and
convert malformed values to NA with warnings instead of
stopping execution.as.Date() calls in key pipelines with
.safe_as_date() (ICD, death, baseline, incident-time
utilities, and case extraction paths).parse_self_reported_illnesses() to handle malformed year
values (Inf, -Inf, NaN,
non-numeric strings) without charToDate crashes.p{field}_i0 and p{field} naming conventions
for date/source fields.Diabetes,
T1DM, T2DM) in cohort construction
workflows.ukb_ml_split_data() for
train/internal-validation splitting.seed.man/ukb_ml_split_data.Rd and
NAMESPACE export.add sensitivity analysis module and refine the docs. - add
select_incident_by_years() utility to split incident cases
within or after a year cutoff from enrollment.
ml_model.R)ukb_ml_model(): Unified interface for training ML
models
ranger)xgboost)glmnet)e1071)nnet)ukb_ml_predict(): Generate predictionsukb_ml_cv(): K-fold cross-validation with optional
repeatsukb_ml_compare(): Compare multiple modelsukb_ml_importance(): Extract variable importanceml_evaluate.R)ukb_ml_metrics(): Compute performance metrics (AUC,
accuracy, etc.)ukb_ml_roc(): ROC curve analysis with CIukb_ml_calibration(): Calibration curve with Brier
score and ECEukb_ml_confusion(): Confusion matrixml_shap.R)ukb_shap(): Compute SHAP values for model
interpretationukb_shap_summary(): Feature importance from SHAPukb_shap_dependence(): Single feature SHAP
analysisukb_shap_force(): Single observation explanationml_survival.R)ukb_ml_survival(): Survival machine learning models
randomForestSRC)gbm)glmnet)ukb_ml_survival_predict(): Survival probability
predictionukb_ml_survival_importance(): Variable importanceukb_ml_survival_shap(): SHAP for survival modelsplot_ml_importance(): Variable importance bar/dot
plotplot_ml_roc(): ROC curve plotplot_ml_calibration(): Calibration curve plotplot_ml_confusion(): Confusion matrix heatmapplot_ml_compare(): Model comparison plotplot_shap_summary(): SHAP beeswarm/bar plotplot_shap_dependence(): SHAP dependence plotplot_shap_force(): SHAP waterfall plotranger, xgboost,
glmnet, e1071, nnet,
fastshap, pROC,
randomForestSRCsubgroup.R)run_subgroup_analysis(): Stratified analysis with
interaction p-valuesrun_multi_subgroup(): Batch analysis across multiple
subgroup variablespropensity.R)estimate_propensity_score(): PS estimation via logistic
regression or GBMmatch_propensity(): 1:k nearest neighbor matching with
calipercalculate_weights(): IPTW weights (ATE, ATT, ATC)assess_balance(): Covariate balance assessment with
SMDrun_weighted_analysis(): Weighted regression
analysismediation.R)run_mediation(): Causal mediation analysis (wrapping
regmedint)run_multi_mediator(): Test multiple mediatorsrun_sensitivity_mediation(): Sensitivity analysis for
unmeasured confoundingmi_pool.R)pool_mi_models(): Combine regression results using
Rubin’s Rulesfit_mi_models(): Fit models across imputed
datasetscreate_imputation_list(): Convert to mitools
imputationListpool_custom_estimates(): Pool custom statisticsvisualization.R)plot_forest(): Forest plots for subgroup/regression
resultsplot_km_curve(): Kaplan-Meier survival curvesplot_ps_distribution(): Propensity score distribution
(histogram/density)plot_balance(): Covariate balance before/after
matchingplot_calibration(): Calibration plotsplot_mediation(): Mediation effect plots (bar,
decomposition, path diagram)plot_mediation_forest(): Multi-mediator forest
plotplot_mi_pooled(): MI pooled results forest plotplot_mi_diagnostics(): FMI and variance
diagnosticsdocs/08-advanced-analysis.Rmd)MatchIt, gbm,
regmedint, mitools, MASS,
cobaltFix bug in survival.R: person who has primary disease
before initial time will be set NA in survival time (in
order to distinguish it from person who has primary disease after
initial time, with non-NA survival time).
Add variable_preprocess.R module for preprocessing
baseline variables.
primary_disease argument to compute
outcome_status and outcome_surv_time for a
single primary endpoint.prevalent_sources and
outcome_sources argument into
build_survival_dataset function to manage self-report
bias.sources (ICD-10, ICD-9, self-report, death).inst/python/ to extract:
inst/extdata/metabolites_non_ratio.txt).man/figures/.