Title: | Construct Consistent Time Series from Textual Data |
---|---|
Description: | A rolling version of the Latent Dirichlet Allocation, see Rieger et al. (2021) <doi:10.18653/v1/2021.findings-emnlp.201>. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks. |
Authors: | Jonas Rieger [aut, cre] |
Maintainer: | Jonas Rieger <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.3 |
Built: | 2025-02-20 05:31:36 UTC |
Source: | https://github.com/jonasrieger/rollinglda |
RollingLDA is a rolling version of the Latent Dirichlet
Allocation (LDA). By a sequential approach, it enables the construction of
LDA-based time series of topics that are consistent with previous states of
LDA models. After an initial modeling, updates can be computed efficiently,
allowing for real-time monitoring and detection of events or structural breaks.
For bug reports and feature requests please use the issue tracker:
https://github.com/JonasRieger/rollinglda/issues. Also have a look at
the (detailed) example at https://github.com/JonasRieger/rollinglda.
economy
Example Dataset (576 articles from Wikinews) for testing.
as.RollingLDA
RollingLDA objects used in this package.
getChunks
Getter for RollingLDA
objects.
RollingLDA
Performing the method from scratch.updateRollingLDA
Performing updates on RollingLDA
objects.
Maintainer: Jonas Rieger [email protected] (ORCID)
Rieger, Jonas, Carsten Jentsch and Jörg Rahnenführer (2021). "RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data". EMNLP Findings 2021. URL doi:10.18653/v1/2021.findings-emnlp.201.
Useful links:
Report bugs at https://github.com/JonasRieger/rollinglda/issues
Constructor for RollingLDA objects used in this package.
The function may be useful to create a RollingLDA object out of a standard
LDA
object to use it as initial model and
update it using updateRollingLDA
.
as.RollingLDA(x, id, lda, docs, dates, vocab, chunks, param) is.RollingLDA(obj, verbose = FALSE)
as.RollingLDA(x, id, lda, docs, dates, vocab, chunks, param) is.RollingLDA(obj, verbose = FALSE)
x |
[ |
id |
[ |
lda |
[ |
docs |
[ |
dates |
[ |
vocab |
[ |
chunks |
[
If not passed, |
param |
[ |
obj |
[ |
verbose |
[ |
If you call as.RollingLDA
on an object x
which already is of
the structure of an RollingLDA
object (in particular a RollingLDA
object itself), the additional arguments id, param, ...
may be used to override the specific elements.
[named list
] RollingLDA
object.
Other RollingLDA functions:
RollingLDA()
,
getChunks()
,
updateRollingLDA()
roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") is.RollingLDA(roll_lda, verbose = TRUE) getID(roll_lda) roll_lda = as.RollingLDA(roll_lda, id = "newID") getID(roll_lda)
roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") is.RollingLDA(roll_lda, verbose = TRUE) getID(roll_lda) roll_lda = as.RollingLDA(roll_lda, id = "newID") getID(roll_lda)
Example Dataset from Wikinews consisting of 576 articles. It can be used to familiarize with the functions offered by this package.
data(economy_texts) data(economy_dates)
data(economy_texts) data(economy_dates)
economy_texts
is a named list of tokenized texts of length 576.
economy_dates
is
An object of class Date
of length 576.
https://github.com/Docma-TU/toscaData
Returns the corresponding element of a RollingLDA
object.
getChunks(x) getNames(x) getDates(x, names, inverse) getDocs(x, names, inverse) getVocab(x) ## S3 method for class 'RollingLDA' getLDA(x, job, reduce, all) ## S3 method for class 'RollingLDA' getID(x) ## S3 method for class 'RollingLDA' getParam(x)
getChunks(x) getNames(x) getDates(x, names, inverse) getDocs(x, names, inverse) getVocab(x) ## S3 method for class 'RollingLDA' getLDA(x, job, reduce, all) ## S3 method for class 'RollingLDA' getID(x) ## S3 method for class 'RollingLDA' getParam(x)
x |
[ |
names |
[ |
inverse |
[ |
job |
not implemented for |
reduce |
not implemented for |
all |
not implemented for |
The requested element of a RollingLDA
object.
Other RollingLDA functions:
RollingLDA()
,
as.RollingLDA()
,
updateRollingLDA()
Performs a rolling version of Latent Dirichlet Allocation.
RollingLDA(...) ## Default S3 method: RollingLDA( texts, dates, chunks, memory, vocab.abs = 5L, vocab.rel = 0, vocab.fallback = 100L, doc.abs = 0L, memory.fallback = 0L, init, type = c("ldaprototype", "lda"), id, ... )
RollingLDA(...) ## Default S3 method: RollingLDA( texts, dates, chunks, memory, vocab.abs = 5L, vocab.rel = 0, vocab.fallback = 100L, doc.abs = 0L, memory.fallback = 0L, init, type = c("ldaprototype", "lda"), id, ... )
... |
additional arguments passed to |
texts |
[ |
dates |
[ |
chunks |
[ |
memory |
[ |
vocab.abs |
[ |
vocab.rel |
[0,1] |
vocab.fallback |
[ |
doc.abs |
[ |
memory.fallback |
[ |
init |
[ |
type |
[ |
id |
[ |
The function first computes a initial LDA model (using
LDARep
or LDAPrototype
).
Afterwards it models temporal chunks of texts with a specified memory for
initialization of each model chunk.
The function returns a RollingLDA
object. You can receive results and
all other elements of this object with getter functions (see getChunks
).
[named list
] with entries
id
[character(1)
] See above.
lda
LDA
object of the fitted RollingLDA.
docs
[named list
] with modeled texts in a preprocessed format.
See LDAprep
.
dates
[named Date
] with dates of the modeled texts.
vocab
[character
] with the vocabularies considered
for modeling.
chunks
[data.table
] with specifications for each
model chunk.
param
[named list
] with parameter specifications for
vocab.abs
[integer(1)
], vocab.rel
[0,1],
vocab.fallback
[integer(1)
] and
doc.abs
[integer(1)
]. See above for explanation.
Other RollingLDA functions:
as.RollingLDA()
,
getChunks()
,
updateRollingLDA()
roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") roll_lda getChunks(roll_lda) getLDA(roll_lda) roll_proto = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2007-07-03", K = 10, n = 12, pm.backend = "socket", ncpus = 2) roll_proto getChunks(roll_proto) getLDA(roll_proto)
roll_lda = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2008-07-03", K = 10, type = "lda") roll_lda getChunks(roll_lda) getLDA(roll_lda) roll_proto = RollingLDA(texts = economy_texts, dates = economy_dates, chunks = "quarter", memory = "3 quarter", init = "2007-07-03", K = 10, n = 12, pm.backend = "socket", ncpus = 2) roll_proto getChunks(roll_proto) getLDA(roll_proto)
Performs an update of an existing object consisting of a rolling version of Latent Dirichlet Allocation.
updateRollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... ) ## S3 method for class 'RollingLDA' RollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... )
updateRollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... ) ## S3 method for class 'RollingLDA' RollingLDA( x, texts, dates, chunks, memory, param = getParam(x), compute.topics = TRUE, memory.fallback = 0L, ... )
x |
[ |
texts |
[ |
dates |
[ |
chunks |
[ |
memory |
[ |
param |
[
|
compute.topics |
[ |
memory.fallback |
[ |
... |
not implemented |
The function uses an existing RollingLDA
object and
models new texts with a specified memory as initialization of the new LDA chunk.
The function returns a RollingLDA
object. You can receive results and
all other elements of this object with getter functions (see getChunks
).
[named list
] with entries
id
[character(1)
] See above.
lda
LDA
object of the fitted RollingLDA.
docs
[named list
] with modeled texts in a preprocessed format.
See LDAprep
dates
[named Date
] with dates of the modeled texts.
vocab
[character
] with the vocabularies considered
for modeling.
chunks
[data.table
] with specifications for each
model chunk.
param
[named list
] with parameter specifications for
vocab.abs
[integer(1)
], vocab.rel
[0,1],
vocab.fallback
[integer(1)
] and
doc.abs
[integer(1)
]. See above for explanation.
Other RollingLDA functions:
RollingLDA()
,
as.RollingLDA()
,
getChunks()
roll_lda = RollingLDA(texts = economy_texts[economy_dates < "2008-05-01"], dates = economy_dates[economy_dates < "2008-05-01"], chunks = "month", memory = "month", init = 100, K = 10, type = "lda") # updateRollingLDA = RollingLDA, if first argument is a RollingLDA object roll_update = RollingLDA(roll_lda, texts = economy_texts[economy_dates >= "2008-05-01"], dates = economy_dates[economy_dates >= "2008-05-01"], chunks = "month", memory = "month") roll_update getChunks(roll_update)
roll_lda = RollingLDA(texts = economy_texts[economy_dates < "2008-05-01"], dates = economy_dates[economy_dates < "2008-05-01"], chunks = "month", memory = "month", init = 100, K = 10, type = "lda") # updateRollingLDA = RollingLDA, if first argument is a RollingLDA object roll_update = RollingLDA(roll_lda, texts = economy_texts[economy_dates >= "2008-05-01"], dates = economy_dates[economy_dates >= "2008-05-01"], chunks = "month", memory = "month") roll_update getChunks(roll_update)