Keeping in mind Paul Seeb's and fru1bat's answers on clock speed, Paul's comment on NEXT anode, and fru1bat's suggestion on separating clocked and un-clocked processes as well as your noting that you had 8 ROMs, there are alternative architectures.
Your architecture with a ring counter for ANODE and multiple ROMs happens to be optimal for speed, which as both Paul and fru1bat note isn't needed. Instead you can optimize for area.
Because the clock speed is either external or controlled by the addition of an enable supplied periodically it isn't addressed in area optimization:
architecture foo of BCDTo7SegDriver is
signal digit: natural range 0 to 7; -- 3 bit binary counter
signal bcd: std_logic_vector (3 downto 0); -- input to ROM
begin
UNLABELED:
process (CLK)
begin
if rising_edge(CLK) then
if digit = 7 then -- integer/unsigned "+" result range
digit <= 0; -- not tied to digit range in simulation
else
digit <= digit + 1;
end if;
SEGMENT_REG:
SEGMENT <= BCD_TO_DEC7(bcd); -- single ROM look up
ANODE_REG:
for i in ANODE'range loop
if digit = i then
ANODE(i) <= '0';
else
ANODE(i) <= '1';
end if;
end loop;
end if;
end process;
BCD_MUX:
with digit select
bcd <= VAL(3 downto 0) when 0,
VAL(7 downto 4) when 1,
VAL(11 downto 8) when 2,
VAL(15 downto 12) when 3,
VAL(19 downto 16) when 4,
VAL(23 downto 20) when 5,
VAL(27 downto 24) when 6,
VAL(31 downto 28) when 7;
end architecture;
This trades off a 32 bit register (cur_val
), an 8 bit ring counter (cur_anode
) and seven copies of the ROM implied by function BCD_TO_DEC7
for a three bit binary counter.
In truth the argument over whether or not you should be using separate sequential (clocked) and combinatorial (non clocked) processes is somewhat reminiscent of Liliput and Blefuscu going to war over Endian-ness.
Separate processes generally execute a little more efficiently due to not sharing sensitivity lists. You could also note that all concurrent statements have process or block statement equivalents. There's also nothing in this design that can take particular advantage of using variables which can result in more efficient simulation while implying a single process. (Shared variables aren't supported by XST).
I haven't verified this will synthesize but after reading through the 14.1 version of the XST user guide think it should. If not you can convert digit
to a std_logic_vector with a length of 3.
The + 1
for digit
will get optimized, an incrementer is smaller than a full adder.