When a variable is used for pixelCurrent
in the process, then the value is
updated and available immediately, where the value of a signal is not ready
until the next cycle.
So when a variable is use, this line implements a RAM with asynchronous
read based on addrb
:
pixelCurrent := RAM(to_integer(UNSIGNED(addrb)));
Where an assign to a signal will implements a RAM with synchronous read, where the value read from the RAM is not available until next cycle.
The typical FPGA technologies has dedicated hardware for RAMs with synchronous read, but RAMs with asynchronous are made with combinatorial logic (look up tables / LUT).
So the huge amount of LUTs that appears when using a variable for
pixelCurrent
is because the synthesis tool tries to map the RAM with
asynchronous read into LUTs, which typically requires a huge amount of LUTs
and makes the resulting RAM very slow.
In the pipelined design it sounds like the asynchronous RAM read is not
required, so if pixelCurrent
is a signal, a synchronous RAM is used instead
and the synthesis tool will map the RAM to an internal RAM hardware block, with
code like:
pixelMinus2 := pixelMinus1;
pixelMinus1 := pixelCurrent;
pixelCurrent <= RAM(to_integer(UNSIGNED(addrb)));