Implementation of BERT using Tensorflow vs PyTorch

https://datascience.stackexchange.com/questions/73761

11-12-2020
|

Question

BERT is an NLP model developed by Google. The original BERT model is built by Tensorflow team there is also a version of BERT which is built using PyTorch. What is the main difference between these two models?

Solution

There are not only 2, but many implementations of BERT. Most are basically equivalent.

The implementations that you mentioned are:

The original code by Google, in Tensorflow. https://github.com/google-research/bert
Implementation by Huggingface, in Pytorch and Tensorflow, that reproduces the same results as the original implementation and uses the same checkpoints as the original BERT article. https://github.com/huggingface/transformers

These are the differences regarding different aspects:

In terms of results, there is no difference in using one or the other, as they both use the same checkpoints (same weights) and their results have been checked to be equal.
In terms of reusability, HuggingFace library is probably more reusable, as it is designed specifically for that. Also, it gives you the freedom of choosing TensorFlow or Pytorch as deep learning framework.
In terms of performance, they should be the same.
In terms of community support (e.g. asking questions in github or stackoverflow about them), HuggingFace library is better suited, as there are a lot of people using it.

Apart from BERT, the transformers library by HuggingFace has implementations for lots of models: OpenAI GPT-2, RoBERTa, ELECTRA, ...

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange