Question

I have a question regarding the pre-training section (in particular, the Masked Language Model).

In the example Let's stick to improvisation in this skit, by masking the word improvisation, after applying BERT (followed by the FFNN and Softmax), by looking at the probabilities of each of the possible English words, we are able to correctly predict that the masked word was improvisation.

Is it possible to actually play with this by using my own examples? I was wondering if I can input a sentence in a different language (from the multilingual model) and have a sorted list of the most probable words that were masked in the original sentence. If it's possible, what needs to be tweaked?

Any help would be greatly appreciated.

Was it helpful?

Solution

pip install transformers

Then try this

from transformers import pipeline
nlp = pipeline("fill-mask", model="bert-base")
nlp(f"This is the best thing I've {nlp.tokenizer.mask_token} in my life.")
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top