What is the reason for the speedup of transformer-xl?
-
01-11-2019 - |
Question
The inference speed of transformer-xl is faster than transformer.
Why?
If state reuse is the reason, so it is compared by two 32seq_len + state-reuse vs one 64seq_len + no-state-reuse?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange