Question

I have designed 2 FSMs for CRC purposes. I got the base code (xor tree) from an online CRC generator, and build around it the FSMs, one for Tx and one for Rx. It works great. When I test either, ON ITS OWN, i get 200+ MHz speed. When I try to test them, back to back, my speed drops significantly (bellow 150 MHz). When I include them in the bigger design with a UART link, it drops even lower (110 MHz). I guess I am missing something very important, but have no idea what. Do you? I have included the code for the Tx one. The Rx is very similar. And just to be more precise, the limiting factor when they are both tested is, frame-to-output(Tx) to curr_state(Rx). I should also say that i have recently started dealing with vhdl, so please point out any stupid mistakes i have in the design bellow. (ps. FSM state vector encoding is for another discussion, but any input will be more than welcome)

library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;

entity Append_Tx_FSM is
port (
CLK                 : in std_logic;                     -- system clock
RESETn          : in std_logic;                     -- global reset
APPEND_CRC          : in    std_logic;                      -- input flag
TX_FRAME_NO_CRC     : in    std_logic_vector(39 downto 0);      -- 40-bit input frame to attach CRC

CRC_APPENDED        : out std_logic;                        -- output flag
TX_FRAME            : out std_logic_vector(47 downto 0)         -- 48-bit output frame
  );
  end Append_Tx_FSM;
  --------------------------------------------------------------------------------
   architecture arch of Append_Tx_FSM is
   --------------------------------------------------------------------------------
   -- Finite State Machine declaration
   --------------------------------------------------------------------------------
TYPE State IS (idle_st, delay_crc, result)                                  
signal curr_st, next_st         : State;        
ATTRIBUTE syn_encoding : STRING;
ATTRIBUTE syn_encoding of curr_st : signal is "gray";
 --ATTRIBUTE syn_state_machine : boolean;
 --ATTRIBUTE syn_state_machine of curr_tx_st : signal is false;
                    -- state vector
signal crc_result_tx            : std_logic_vector (7 downto 0);                -- signal for crc computation
signal last_append          : std_logic;                            -- signal for last value of input flag
signal frame_to_append          : std_logic_vector (39 downto 0);               -- signal for frame construction
signal frame_to_output          : std_logic_vector (47 downto 0);               -- signal for output
signal appended_crc         : std_logic;                            -- signal for output

--------------------------------------------------------------------------------
 begin
--------------------------------------------------------------------------------  
  -- CRC computation for input data(39:0) Polynomial (1+x^1+x^2+x^3+x^5+x^8) (0x97)
--------------------------------------------------------------------------------
CRC_RESULT_TX(0) <= '0' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(11) xor TX_FRAME_NO_CRC(12) xor TX_FRAME_NO_CRC(15) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(25) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(31) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(38);

CRC_RESULT_TX(1) <= '1' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(13) xor TX_FRAME_NO_CRC(15) xor TX_FRAME_NO_CRC(16) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(26) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(39);

CRC_RESULT_TX(2) <= '0' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(11) xor TX_FRAME_NO_CRC(12) xor TX_FRAME_NO_CRC(14) xor TX_FRAME_NO_CRC(15) xor TX_FRAME_NO_CRC(16) xor TX_FRAME_NO_CRC(17) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(25) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(38);

CRC_RESULT_TX(3) <= '0' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(13) xor TX_FRAME_NO_CRC(16) xor TX_FRAME_NO_CRC(17) xor TX_FRAME_NO_CRC(18) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(26) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(39);

CRC_RESULT_TX(4) <= '1' xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(14) xor TX_FRAME_NO_CRC(17) xor TX_FRAME_NO_CRC(18) xor TX_FRAME_NO_CRC(19) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37) xor TX_FRAME_NO_CRC(38);

CRC_RESULT_TX(5) <= '1' xor TX_FRAME_NO_CRC(0) xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(12) xor TX_FRAME_NO_CRC(18) xor TX_FRAME_NO_CRC(19) xor TX_FRAME_NO_CRC(20) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(27) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(39);

CRC_RESULT_TX(6) <= '0' xor TX_FRAME_NO_CRC(1) xor TX_FRAME_NO_CRC(3) xor TX_FRAME_NO_CRC(5) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(13) xor TX_FRAME_NO_CRC(19) xor TX_FRAME_NO_CRC(20) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(25) xor TX_FRAME_NO_CRC(28) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(31) xor TX_FRAME_NO_CRC(33) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36);

CRC_RESULT_TX(7) <= '1' xor TX_FRAME_NO_CRC(2) xor TX_FRAME_NO_CRC(4) xor TX_FRAME_NO_CRC(6) xor TX_FRAME_NO_CRC(7) xor TX_FRAME_NO_CRC(8) xor TX_FRAME_NO_CRC(9) xor TX_FRAME_NO_CRC(10) xor TX_FRAME_NO_CRC(11) xor TX_FRAME_NO_CRC(14) xor TX_FRAME_NO_CRC(20) xor TX_FRAME_NO_CRC(21) xor TX_FRAME_NO_CRC(22) xor TX_FRAME_NO_CRC(23) xor TX_FRAME_NO_CRC(24) xor TX_FRAME_NO_CRC(26) xor TX_FRAME_NO_CRC(29) xor TX_FRAME_NO_CRC(30) xor TX_FRAME_NO_CRC(31) xor TX_FRAME_NO_CRC(32) xor TX_FRAME_NO_CRC(34) xor TX_FRAME_NO_CRC(35) xor TX_FRAME_NO_CRC(36) xor TX_FRAME_NO_CRC(37);

--------------------------------------------------------------------------------
-- At the desired clock edge, load the next state 
--------------------------------------------------------------------------------
CurStDecode_RX:process (CLK, RESETn)
begin
-- Clear FSM to start state 
if (RESETn = '0') then
    curr_st <= idle_st;
elsif (rising_edge(CLK)) then
    curr_st <= next_st;
end if;
end process CurStDecode_RX;
--------------------------------------------------------------------------------
last_value:process (CLK, RESETn, APPEND_CRC)
begin
if (RESETn = '0') then
    last_append <= '1';
elsif (rising_edge(CLK)) then
    last_append <= APPEND_CRC;
end if;
end process last_value;

--------------------------------------------------------------------------------
-- Using the current state of the counter and the input signals
-- decide what the next state should be
--------------------------------------------------------------------------------
NxStDecode_Tx:process (curr_st, APPEND_CRC, last_append)
begin
-- FSM
case curr_st is
    when idle_st=>
        if APPEND_CRC = '1' and last_append = '0' then
            next_st <= delay_crc;
        else
            next_st <= idle_st;
        end if;
    when delay_crc =>
            next_st <= result;
    when result =>              
        next_st <= idle_st;     
    when others =>
        next_st <= idle_st;
end case;                               
 end process NxStDecode_Tx;
--------------------------------------------------------------------------------
 -- Using the current state of the counter 
 -- decide what the output should be
--------------------------------------------------------------------------------
OuStDecode_Tx:process (curr_st) 
begin
  case (curr_st) is
        when idle_st =>
        appended_crc <= '0';
    when delay_crc =>
        appended_crc <= '0';
        when result => 
        appended_crc <= '1';
        when others => 
        appended_crc <= '0';
end case;                               
end process OuStDecode_Tx;
--------------------------------------------------------------------------------
-- output appended frame                                       
 --------------------------------------------------------------------------------
 process (RESETn, CLK, appended_crc, frame_to_append, crc_result_tx)
 begin
  if (RESETn = '0') then
    frame_to_output <= (others => '0');
elsif (rising_edge(CLK)) then
  if (appended_crc = '1') then               
        frame_to_output(39 downto 0)  <= frame_to_append(39 downto 0);
    frame_to_output(47 downto 40) <= crc_result_tx;
  end if;
end if;
end process ;
--------------------------------------------------------------------------------
 -- output signals                                            
 --------------------------------------------------------------------------------
 TX_FRAME <= frame_to_output;
  frame_to_append <= TX_FRAME_NO_CRC;
 CRC_APPENDED <= appended_crc;
 --------------------------------------------------------------------------------
end arch;
Was it helpful?

Solution

When I try to test them, back to back, my speed drops significantly (bellow 150 MHz).
...
And just to be more precise, the limiting factor when they are both tested is, frame-to-output(Tx) to curr_state(Rx).

From this I imply that what you mean by "back-to-back" testing you mean you connect the Tx Output to the Rx Input. As Russell says, you need to review the CRC_APPENDED and TX_FRAME paths between Tx and Rx: - TX_FRAME (the Tx Output) is registered on output in the Tx Block. I will assume it goes straight to the CRC Decoder in the Rx. This path could not be re-pipelined any further. - CRC_APPENDED comes straight out of the OuStDecode_Tx Mux i.e. combinational logic. Try generating CRC_APPENDED from a synchronous process:

p_crc_appended : process (CLK, RESETn)  
begin  
   if(RESETn = 0) then
      CRC_APPENDED <= '0';
   elsif (rising_edge(CLK)) then  
      CRC_APPENDED <= appended_crc;  
   end if;  
end process;

plus, CRC_APPENDED and TX_FRAME will change on the same clock edge. Currently CRC_APPENDED changes one clock cycle before TX_FRAME.

tl;dr try registering appended_crc to generate CRC_APPENDED

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top