Question

I am creating small chip block on vhdl - comparator.
Use: QuartusII, ModelSim, simulate on Cyclone ii.

INPUT:  
    IN_FIRST: in UNSIGNED(255 downto 0);  
    IN_SECOND: in UNSIGNED(255 downto 0);  

OUTPUT:  
    OUT_IS_RIGHT_RESULT: out STD_LOGIC; -- IN_SECOND < IN_FIRST  

I have some different parallel and sequential realizations. But parallel works worse in some cases than sequential. And I can't find best approach.

Some different realizations:

With generate (time-18.5ns)

architecture ComparatorArch of Comparator is
    signal T: UNSIGNED(7 downto 0) := (others => 'U');
    signal H: UNSIGNED(7 downto 0) := (others => 'U');
begin
    generateG: for i in 7 downto 0 generate
        T(i) <= '1' when (IN_FIRST((i + 1) * 32 - 1 downto i * 32) > IN_SECOND((i + 1) * 32 - 1 downto i * 32)) else '0';
        H(i) <= '1' when (IN_FIRST((i + 1) * 32 - 1 downto i * 32) < IN_SECOND((i + 1) * 32 - 1 downto i * 32)) else '0';
    end generate generateG;

    OUT_TARG <= T;
    OUT_HASH <= H;
    OUT_IS_RIGHT_RESULT <= (T(7) or ((not T(7)) and ((not H(7)) and 
        (T(6) or ((not T(6)) and ((not H(6)) and 
        (T(5) or ((not T(5)) and ((not H(5)) and 
        (T(4) or ((not T(4)) and ((not H(4)) and 
        (T(3) or ((not T(3)) and ((not H(3)) and 
        (T(2) or ((not T(2)) and ((not H(2)) and 
        (T(1) or ((not T(1)) and ((not H(1)) and T(0))))))))))))))))))))));
end ComparatorArch;

Last part - it's logical representation of comparing T and H.

In process (time-35ns)

architecture ComparatorArch of Comparator is
begin
  mainP: process(IN_READY) begin
        if (rising_edge(IN_READY) and IN_READY = '1') then
            if (IN_SECOND < IN_FIRST) then
                OUT_IS_RIGHT_RESULT <= '1';
            else
                OUT_IS_RIGHT_RESULT <= '0';
            end if;
        end if;
  end process;
end ComparatorArch;

May be somebody knows better approach.

And it doesn't work if i change

if (rising_edge(IN_READY) and IN_READY = '1') then  

to

if (IN_READY = '1') then  

Why?

I investigated some basic things on examples and realized that chip has logical and calculation blocks of special input data size. It compares or calculate logical operations const time for signals that has size from min to specific max value. It compares BIT/BIT or BIT_VECTOR(7 down to 0)/BIT_VECTOR(7 down to 0) for the same time - about 9ns. Why so long time? Can somebody explain it?

Was it helpful?

Solution

As earlier commented, a parallel tree will have the best performance as you widen the comparator. For a comparator as narrow as 8 bits, however, routing delays can dominate and the Cyclone II will perform better using its carry chains (see section 2-2 of the Cyclone II device handbook) since they connect directly to neighboring LEs. This is why serial logic can outperform parallel.

As for rising_edge, you've written a mix of two conventions. Before rising_edge was standard, the same function was performed using clk'event and event='1'; since rising_edge already defines the new state as '1' there's no need to test it. Testing for a high level alone, on the other hand, produces not a D flip-flop but a transparent latch - a rarely desired function most FPGAs are not optimized for, and the synthesis tools tend to warn about this.

As for your timing results, without seeing the test method I can't read anything from the time you mention. Is it even about a post-fitting simulation? It's rare that it's worth going to that extent for such a small function.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top