NUSS: Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Version: 0.1.0
Depends: R (≥ 3.5)
Imports: dplyr, magrittr, Rcpp, stringr, text2vec, textclean, utils
LinkingTo: BH, Rcpp
Suggests: testthat (≥ 3.0.0)
Published: 2024-08-19
DOI: 10.32614/CRAN.package.NUSS
Author: Oskar Kosch ORCID iD [aut, cre]
Maintainer: Oskar Kosch <contact at oskarkosch.com>
BugReports: https://github.com/theogrost/NUSS/issues
License: GPL (≥ 3)
URL: https://github.com/theogrost/NUSS
NeedsCompilation: yes
Language: en
Materials: README
CRAN checks: NUSS results

Documentation:

Reference manual: NUSS.pdf

Downloads:

Package source: NUSS_0.1.0.tar.gz
Windows binaries: r-devel: NUSS_0.1.0.zip, r-release: NUSS_0.1.0.zip, r-oldrel: NUSS_0.1.0.zip
macOS binaries: r-release (arm64): NUSS_0.1.0.tgz, r-oldrel (arm64): NUSS_0.1.0.tgz, r-release (x86_64): NUSS_0.1.0.tgz, r-oldrel (x86_64): NUSS_0.1.0.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=NUSS to link to this page.

mirror server hosted at Truenetwork, Russian Federation.