Question

I'm very new to VHDL and hardware design and was wondering if someone could tell me if my understanding of the following problem I ran into is right.

I've been working on a simple BCD-to-7 segment display driver for the Nexys4 board - this is my VHDL code (with the headers stripped).

entity BCDTo7SegDriver is
    Port ( CLK : in STD_LOGIC;
           VAL : in STD_LOGIC_VECTOR (31 downto 0);
           ANODE : out STD_LOGIC_VECTOR (7 downto 0);
           SEGMENT : out STD_LOGIC_VECTOR (6 downto 0));

   function BCD_TO_DEC7(bcd : std_logic_vector(3 downto 0))
       return std_logic_vector is
   begin
       case bcd is
           when "0000" => return "1000000";
           when "0001" => return "1111001";
           when "0010" => return "0100100";
           when "0011" => return "0110000";
           when others => return "1111111";
       end case;
   end BCD_TO_DEC7;
end BCDTo7SegDriver;

architecture Behavioral of BCDTo7SegDriver is
    signal cur_val : std_logic_vector(31 downto 0);
    signal cur_anode : unsigned(7 downto 0) := "11111101";
    signal cur_seg : std_logic_vector(6 downto 0) := "0000001";
begin

process (CLK, VAL, cur_anode, cur_seg)
begin
    if rising_edge(CLK) then
        cur_val <= VAL;
        cur_anode <= cur_anode rol 1;
        ANODE <= std_logic_vector(cur_anode);
        SEGMENT <= cur_seg;
    end if;

    -- Decode segments
    case cur_anode is
        when "11111110" => cur_seg <= BCD_TO_DEC7(cur_val(3 downto 0));
        when "11111101" => cur_seg <= BCD_TO_DEC7(cur_val(7 downto 4));
        when "11111011" => cur_seg <= BCD_TO_DEC7(cur_val(11 downto 8));
        when "11110111" => cur_seg <= BCD_TO_DEC7(cur_val(15 downto 12));
        when "11101111" => cur_seg <= BCD_TO_DEC7(cur_val(19 downto 16));
        when "11011111" => cur_seg <= BCD_TO_DEC7(cur_val(23 downto 20));
        when "10111111" => cur_seg <= BCD_TO_DEC7(cur_val(27 downto 24));
        when "01111111" => cur_seg <= BCD_TO_DEC7(cur_val(31 downto 28));
        when others => cur_seg <= "0011111";
    end case;
end process;
end Behavioral;

Now, at first I tried to naively drive this circuit from the board clock defined in the constraints file:

## Clock signal
##Bank = 35, Pin name = IO_L12P_T1_MRCC_35,                 Sch name = CLK100MHZ
set_property PACKAGE_PIN E3 [get_ports clk]                         
    set_property IOSTANDARD LVCMOS33 [get_ports clk]
    create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk]

This gave me what looked like almost garbage output on the seven-segment displays - it looked like every decoded digit was being superimposed onto every digit place. Basically if bits 3 downto 0 of the value being decoded were "0001", the display was showing 8 1s in a row instead of 00000001 (but not quite - the other segments were lit but appeared dimmer).

Slowing down the clock to something more reasonable did the trick and the circuit works how I expected it to.

When I look at what elaboration gives me (I'm using Vivado 2014.1), it gives me a circuit with VAL connected to 8 RTL_ROMs in parallel (each one decoding 4 bits of the input). The outputs from these ROMs are fed into an RTL_MUX and the value of cur_anode is being used as the selector. The output of the RTL_MUX feeds the cur_val register; the cur_val and cur_anode registers are then linked to the outputs.

So, with that in mind, which part of the circuit couldn't handle the clock rate? From what I've read I feel like this is related to timing constraints that I may need to add; am I thinking along the right track?

Was it helpful?

Solution

Did your timing report indicate that you had a timing problem? It looks to me like you were just rolling through the segment values extremely fast. No matter how well you design for higher clock speeds, you're rotating cur_anode every clock cycle, and therefore your display will change accordingly. If your clock is too fast, the display will change much faster than a human would be able to read it.

Some other suggestions:

  • You should split your single process into separate clocked and unclocked processes. It's not that what you're doing won't end up synthesizing (obviously), but it's unconventional, and may lead to unexpected results.

  • Your initialization on cur_seg won't really do anything, as it's always driven (combinationally) by your process. It's not a problem - just wanted to make sure you were aware.

OTHER TIPS

Well there are two parts to this.

Your segments appeared so dimly because you are basically running them at a 1/8th duty cycle at a faster rate than the segments have time to react(every clock pulse you are changing which segment is lit up and then you stop driving it on the next pulse).

By increasing the period your segments got brighter by switching from a transient current (segments need time to ramp up) to a steady state current (longer period lets current go to desired levels when you drive the segments slower than their inherent driving frequency). Hence the brightness increase.

One other thing about your code. You may be aware of this, but when you latch with your clock there, the variable labeled cur_anode is advanced and actually represents the NEXT anode. You also latch ANODE and SEGMENT to the current anode and segment respectively. Just pointing out that the cur_anode may be a misnomer (and is confusing because its usually the NEXT one).

Keeping in mind Paul Seeb's and fru1bat's answers on clock speed, Paul's comment on NEXT anode, and fru1bat's suggestion on separating clocked and un-clocked processes as well as your noting that you had 8 ROMs, there are alternative architectures.

Your architecture with a ring counter for ANODE and multiple ROMs happens to be optimal for speed, which as both Paul and fru1bat note isn't needed. Instead you can optimize for area.

Because the clock speed is either external or controlled by the addition of an enable supplied periodically it isn't addressed in area optimization:

architecture foo of BCDTo7SegDriver is
    signal digit:   natural range 0 to 7;            -- 3 bit binary counter
    signal bcd:     std_logic_vector (3 downto 0);   -- input to ROM
begin

UNLABELED:
    process (CLK) 
    begin
        if rising_edge(CLK) then

            if digit = 7 then       -- integer/unsigned "+" result range 
                digit <= 0;         -- not tied to digit range in simulation
            else
                digit <= digit + 1;
            end if;

        SEGMENT_REG:
            SEGMENT <= BCD_TO_DEC7(bcd);  -- single ROM look up

        ANODE_REG:
            for i in ANODE'range loop
                if digit = i then
                    ANODE(i) <= '0';
                else
                    ANODE(i) <= '1';
                end if;
            end loop;
        end if;        
    end process;

BCD_MUX:    
    with digit select 
        bcd <= VAL(3 downto 0)   when 0,
               VAL(7 downto 4)   when 1,
               VAL(11 downto 8)  when 2,
               VAL(15 downto 12) when 3,
               VAL(19 downto 16) when 4,
               VAL(23 downto 20) when 5,
               VAL(27 downto 24) when 6,
               VAL(31 downto 28) when 7;

end architecture;

This trades off a 32 bit register (cur_val), an 8 bit ring counter (cur_anode) and seven copies of the ROM implied by function BCD_TO_DEC7 for a three bit binary counter.

In truth the argument over whether or not you should be using separate sequential (clocked) and combinatorial (non clocked) processes is somewhat reminiscent of Liliput and Blefuscu going to war over Endian-ness.

Separate processes generally execute a little more efficiently due to not sharing sensitivity lists. You could also note that all concurrent statements have process or block statement equivalents. There's also nothing in this design that can take particular advantage of using variables which can result in more efficient simulation while implying a single process. (Shared variables aren't supported by XST).

I haven't verified this will synthesize but after reading through the 14.1 version of the XST user guide think it should. If not you can convert digit to a std_logic_vector with a length of 3.

The + 1 for digit will get optimized, an incrementer is smaller than a full adder.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top