predicting next jobtitle

https://datascience.stackexchange.com/questions/67884

08-12-2020
|

Pergunta

I have a dataset of which has 30M rows each like [current_jobtitles, nextjobtitles].

[['junior software programmer', 'senior software programmer'],
 ['senior software programmer', 'lead software programmer'],
 ['sales associate', 'regional sales associate']]

I want to build a deep learning model to predict the nextjobtitle when a currenttitle is given. Are there any ways that I could acheieve this using some deep learning model? if yes, what kind of model ?

Can we use any of the text generation models for this scenario ? I have seen some examples, but I cannot relate to my solution for the problem. Any suggestions please ..!

update:

I have sequences of jobtitles in each row.

[['junior programmer', 'senior programmer', 'lead programmer'],
 ['sales associate', 'sales regional associate', 'sales manager'],
 ['chairman', 'ceo'],
 ['associate', 'intern', 'junior', 'student']]

for the orginal data (in update) i have modifed data as above (before update) by just taking the nextjob title. Can I train model with the original sequence data to predict the next jobtitle? ** - all the rows will be of different lengths.

Solução

I want to build a deep learning model to predict the next job title when a current title is given. Are there any ways that I could achieve this using some deep learning model?

I think that approaching this problem in the classification way (input:- current job embedding, output:- getting next job title as a class) can somewhat be time-consuming (not necessarily hard) to train.

The reason behind this:-

Assuming that each of the job-title pair is unique in your dataset, given the volume of your dataset, there can be many classes, so much that even deep learning models may take time(for processing, optimization, etc.) to train.
Also as there can be many to many relationships in the dataset, hence using the classic approach might not be a good option.

what kind of model?

As per your given examples, the kind of model that we need here is one, that can learn the hierarchy of the job titles from the text itself, except for some cases where the sample denotes that the individual of that job title has changed their domain of work(this can be attributed as noise).

Can we use any of the text generation models for this scenario?

A deep learning model such as seq2seq(https://www.geeksforgeeks.org/seq2seq-model-in-machine-learning/) or noisy autoencoders(https://towardsdatascience.com/convolutional-autoencoders-for-image-noise-reduction-32fce9fc1763). I know that there is not much content of 'noisy autoencoders for nlp' on the internet, but we can always try. These models can learn the complex relationships between the words of the job title and the job title itself.

I would recommend noisy-autoencoders as I assume (a strong one) that only some if not all corresponding words in the current and next job title are different, and this is the purpose of noisy autoencoders:-

They can accommodate noise very well(many to one and one to many relationships between current and next job title)
They keep most of the content in input and output is the same, which I think you are gonna need because of the strong assumption I mentioned above.

I hope it helps, thanks!!

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange