| Title: | Construct Consistent Time Series from Textual Data |
|---|---|
| Description: | A rolling version of the Latent Dirichlet Allocation, see Rieger et al. (2021) <doi:10.18653/v1/2021.findings-emnlp.201>. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks. |
| Authors: | Jonas Rieger [aut, cre] (ORCID: <https://orcid.org/0000-0002-0007-4478>) |
| Maintainer: | Jonas Rieger <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.3 |
| Built: | 2026-05-17 08:31:35 UTC |
| Source: | https://github.com/jonasrieger/rollinglda |
RollingLDA is a rolling version of the Latent Dirichlet
Allocation (LDA). By a sequential approach, it enables the construction of
LDA-based time series of topics that are consistent with previous states of
LDA models. After an initial modeling, updates can be computed efficiently,
allowing for real-time monitoring and detection of events or structural breaks.
For bug reports and feature requests please use the issue tracker:
https://github.com/JonasRieger/rollinglda/issues. Also have a look at
the (detailed) example at https://github.com/JonasRieger/rollinglda.
economy Example Dataset (576 articles from Wikinews) for testing.
as.RollingLDA RollingLDA objects used in this package.
getChunks Getter for RollingLDA objects.
RollingLDA Performing the method from scratch.updateRollingLDA Performing updates on RollingLDA objects.
Maintainer: Jonas Rieger [email protected] (ORCID)
Rieger, Jonas, Carsten Jentsch and Jörg Rahnenführer (2021). "RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data". EMNLP Findings 2021. URL doi:10.18653/v1/2021.findings-emnlp.201.
Useful links:
Report bugs at https://github.com/JonasRieger/rollinglda/issues
Constructor for RollingLDA objects used in this package.
The function may be useful to create a RollingLDA object out of a standard
LDA object to use it as initial model and
update it using updateRollingLDA.
as.RollingLDA(x, id, lda, docs, dates, vocab, chunks, param) is.RollingLDA(obj, verbose = FALSE)as.RollingLDA(x, id, lda, docs, dates, vocab, chunks, param) is.RollingLDA(obj, verbose = FALSE)
x |
[ |
id |
[ |
lda |
[ |
docs |
[ |
dates |
[ |
vocab |
[ |
chunks |
[
If not passed, |
param |
[ |
obj |
[ |
verbose |
[ |
If you call as.RollingLDA on an object x which already is of
the structure of an RollingLDA object (in particular a RollingLDA
object itself), the additional arguments id, param, ...
may be used to override the specific elements.
[named list] RollingLDA object.
Other RollingLDA functions:
RollingLDA(),
getChunks(),
updateRollingLDA()
roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") is.RollingLDA(roll_lda, verbose = TRUE) getID(roll_lda) roll_lda = as.RollingLDA(roll_lda, id = "newID") getID(roll_lda)roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") is.RollingLDA(roll_lda, verbose = TRUE) getID(roll_lda) roll_lda = as.RollingLDA(roll_lda, id = "newID") getID(roll_lda)
Example Dataset from Wikinews consisting of 576 articles. It can be used to familiarize with the functions offered by this package.
data(economy_texts) data(economy_dates)data(economy_texts) data(economy_dates)
economy_texts is a named list of tokenized texts of length 576.
economy_dates is
An object of class Date of length 576.
https://github.com/Docma-TU/toscaData
Returns the corresponding element of a RollingLDA object.
getChunks(x) getNames(x) getDates(x, names, inverse) getDocs(x, names, inverse) getVocab(x) ## S3 method for class 'RollingLDA' getLDA(x, job, reduce, all) ## S3 method for class 'RollingLDA' getID(x) ## S3 method for class 'RollingLDA' getParam(x)getChunks(x) getNames(x) getDates(x, names, inverse) getDocs(x, names, inverse) getVocab(x) ## S3 method for class 'RollingLDA' getLDA(x, job, reduce, all) ## S3 method for class 'RollingLDA' getID(x) ## S3 method for class 'RollingLDA' getParam(x)
x |
[ |
names |
[ |
inverse |
[ |
job |
not implemented for |
reduce |
not implemented for |
all |
not implemented for |
The requested element of a RollingLDA object.
Other RollingLDA functions:
RollingLDA(),
as.RollingLDA(),
updateRollingLDA()
Performs a rolling version of Latent Dirichlet Allocation.
RollingLDA(...) ## Default S3 method: RollingLDA( texts, dates, chunks, memory, vocab.abs = 5L, vocab.rel = 0, vocab.fallback = 100L, doc.abs = 0L, memory.fallback = 0L, init, type = c("ldaprototype", "lda"), id, ... )RollingLDA(...) ## Default S3 method: RollingLDA( texts, dates, chunks, memory, vocab.abs = 5L, vocab.rel = 0, vocab.fallback = 100L, doc.abs = 0L, memory.fallback = 0L, init, type = c("ldaprototype", "lda"), id, ... )
... |
additional arguments passed to |
texts |
[ |
dates |
[ |
chunks |
[ |
memory |
[ |
vocab.abs |
[ |
vocab.rel |
[0,1] |
vocab.fallback |
[ |
doc.abs |
[ |
memory.fallback |
[ |
init |
[ |
type |
[ |
id |
[ |
The function first computes a initial LDA model (using
LDARep or LDAPrototype).
Afterwards it models temporal chunks of texts with a specified memory for
initialization of each model chunk.
The function returns a RollingLDA object. You can receive results and
all other elements of this object with getter functions (see getChunks).
[named list] with entries
id[character(1)] See above.
ldaLDA object of the fitted RollingLDA.
docs[named list] with modeled texts in a preprocessed format.
See LDAprep.
dates[named Date] with dates of the modeled texts.
vocab[character] with the vocabularies considered
for modeling.
chunks[data.table] with specifications for each
model chunk.
param[named list] with parameter specifications for
vocab.abs [integer(1)], vocab.rel [0,1],
vocab.fallback [integer(1)] and
doc.abs [integer(1)]. See above for explanation.
Other RollingLDA functions:
as.RollingLDA(),
getChunks(),
updateRollingLDA()
roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") roll_lda getChunks(roll_lda) getLDA(roll_lda) roll_proto = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2007-07-03", K = 10, n = 12, pm.backend = "socket", ncpus = 2) roll_proto getChunks(roll_proto) getLDA(roll_proto)roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") roll_lda getChunks(roll_lda) getLDA(roll_lda) roll_proto = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2007-07-03", K = 10, n = 12, pm.backend = "socket", ncpus = 2) roll_proto getChunks(roll_proto) getLDA(roll_proto)
Performs an update of an existing object consisting of a rolling version of Latent Dirichlet Allocation.
updateRollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... ) ## S3 method for class 'RollingLDA' RollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... )updateRollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... ) ## S3 method for class 'RollingLDA' RollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... )
x |
[ |
texts |
[ |
dates |
[ |
chunks |
[ |
memory |
[ |
param |
[
|
compute.topics |
[ |
memory.fallback |
[ |
... |
not implemented |
The function uses an existing RollingLDA object and
models new texts with a specified memory as initialization of the new LDA chunk.
The function returns a RollingLDA object. You can receive results and
all other elements of this object with getter functions (see getChunks).
[named list] with entries
id[character(1)] See above.
ldaLDA object of the fitted RollingLDA.
docs[named list] with modeled texts in a preprocessed format.
See LDAprep
dates[named Date] with dates of the modeled texts.
vocab[character] with the vocabularies considered
for modeling.
chunks[data.table] with specifications for each
model chunk.
param[named list] with parameter specifications for
vocab.abs [integer(1)], vocab.rel [0,1],
vocab.fallback [integer(1)] and
doc.abs [integer(1)]. See above for explanation.
Other RollingLDA functions:
RollingLDA(),
as.RollingLDA(),
getChunks()
roll_lda = RollingLDA(texts = economy_texts[economy_dates < "2008-05-01"], dates = economy_dates[economy_dates < "2008-05-01"], chunks = "month", memory = "month", init = 100, K = 10, type = "lda") # updateRollingLDA = RollingLDA, if first argument is a RollingLDA object roll_update = RollingLDA(roll_lda, texts = economy_texts[economy_dates >= "2008-05-01"], dates = economy_dates[economy_dates >= "2008-05-01"], chunks = "month", memory = "month") roll_update getChunks(roll_update)roll_lda = RollingLDA(texts = economy_texts[economy_dates < "2008-05-01"], dates = economy_dates[economy_dates < "2008-05-01"], chunks = "month", memory = "month", init = 100, K = 10, type = "lda") # updateRollingLDA = RollingLDA, if first argument is a RollingLDA object roll_update = RollingLDA(roll_lda, texts = economy_texts[economy_dates >= "2008-05-01"], dates = economy_dates[economy_dates >= "2008-05-01"], chunks = "month", memory = "month") roll_update getChunks(roll_update)