Question

The inference speed of transformer-xl is faster than transformer.

Why?

If state reuse is the reason, so it is compared by two 32seq_len + state-reuse vs one 64seq_len + no-state-reuse?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top