Text generation using Tensor Factorization

https://datascience.stackexchange.com/questions/23382

30-10-2019
|

Pergunta

Text generation is well studied using Markov chains or NNs, but I am not aware of any works to word sequence prediction in terms of subspace learning.

Treating phrases or sentences as temporal data such as time series, it is possible to represent word sequences as a tensor $T = WS \times W \times K$, where $WS$ is the set of word sequences present in the corpus, $W$ represents the set of segmented words, and $K$ is the maximum length of observed sequences For instance, for a phrase, ws = word sequence prediction, then $T(ws, ``sequence", 2) = 1$

For an incomplete tensor, where entries s.t. prediction are missing, the reconstructed tensor after decomposition can then be used to generate texts, in terms of the observed word space.

My questions are as follows:

1) Is there any works using tensor factorization or factorization machines for word sequence generation?

2) How subspace learning models differ from those generative models, such as Recurrent Neural Networks or Belief Networks? What are the downsides of using subspace methods as compared to other established methods?

2) How to establish the threshold for the length of the predicted sequence? For example, can one look at the $WS_r \times K_r$ space, and use cross-validation to find the threshold for each word sequences?

Any pointers or answers to any of the above questions is highly appreciated.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange