pickmax: Split and Coalesce Duplicated Records
Deduplicates datasets by retaining the most complete and informative records. Identifies duplicated entries based on a specified key column, calculates completeness scores for each row, and compares values within groups. When differences between duplicates exceed a user-defined threshold, records are split into unique IDs; otherwise, they are coalesced into a single, most complete entry. Returns a list containing the original duplicates, the split entries, and the final coalesced dataset. Useful for cleaning survey or administrative data where duplicated IDs may reflect minor data entry inconsistencies.
Version: |
0.1.0 |
Imports: |
dplyr, rlang, magrittr |
Published: |
2025-07-15 |
DOI: |
10.32614/CRAN.package.pickmax |
Author: |
Sbonelo Chamane [aut, cre] (ORCID: 0000-0001-5350-5203),
Musawenkosi Mabaso [aut],
Ronel Sewpaul [aut],
Sean Jooste [aut],
Kutloano Skhosana [aut],
Khangelani Zuma [aut] |
Maintainer: |
Sbonelo Chamane <SChamane at hsrc.ac.za> |
License: |
GPL-3 |
NeedsCompilation: |
no |
CRAN checks: |
pickmax results |
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=pickmax
to link to this page.