How to do vocabulary estimation based on observed writings?

https://datascience.stackexchange.com/questions/46938

python
regression
linear-regression
scikit-learn
model-selection

01-11-2019
|

Pregunta

Below is a scatter plot of the data set I am dealing with. The X axis is the total number of words per essay for a particular individual, and they Y axis is the number of unique words. In principle, the number of unique words should approach the individuals vocabulary.

I am attempting to find that individual's vocabulary from the data below, but I don't know what kind of a fit would work. A logarithm would have no limit, a quadratic fit doesn't make sense (the gradient should remain non-negative over the entire domain).

In short, I am looking for a decent model to fit the data below, and don't know where to start.

Thank you.

No hay solución correcta

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange