Pergunta

Once again my battle with Xilinx tools continues. I am running implementation for a design on the Zynq7020 in PlanAhead-14.7. The design uses roughly 15-20% device utilization on the PL, the implementation process seems to get stuck on Global Placement which so far has been running for over 12hrs (and still running), I was only expecting an hour max, and this is massively inconvenient as I need to have the design built and tested by Friday, which leaves me hanging dry given that any refinements will take another 12hr+ to place again!

I am using the 64bit 14.7 design suite, and read that previous versions had a similar problem for the 64bit tools. Is there anything I can do to speed-up the global placement? And I've checked that all the placement flags have been set to give the fastest possible placement.

--UPDATE 2-- I am now on the verge of insanity, as such this is the whole process (and design refinements mentioned by Brian) which is causing me grief, and it is something to do with the OR statements using the state = fwrd_init and state = bkwrd_init

input : process(clk, rst, dz_ready, row_ready, d_div_stts, counter, bkwrd_stts, state, cd_empty, zd_empty) 
    begin
        if(clk'event and clk='1') then 
        stack_en <='0';
        bkwrd_drdy <= '0';
        d_rdy <= '0';
        dz_read <='0';
        read_row <='0';
        result_ready <='0';
        --delay dz_ready by one clock to correctly sync with  other signals
        dz_ready_p <= dz_ready;

        --d and z register read and write logic--
        if (state = fwrd_init or rst = '1' ) then
                -- reset to all 1's so the initial division a_n/d_(n-1) = 0 ; a=0, n=0
                d_reg <= (others=>'1');
                z_reg <= (others=>'0');
                dz_ready <= '0';
            elsif(d_stts = '1') then
                d_reg <= d_out;
                z_reg <= z_out;
                dz_ready <= '1';
        end if;

        --fwrd it logic---
        if (dz_ready = '1' or state = fwrd_init) then
            if(row_ready = '1') then
                d_rdy <= '1' ;
                dz_read <='1';
                read_row <= '1';
                dz_ready <='0';
                --register the c value
            end if;
        end if;

        --bkwrd it logic and stack logic -- read has priority over push
        if(bkwrd_stts = '1' or state = bkwrd_init) then
            if (cd_empty = '0' and zd_empty = '0') then
                bkwrd_drdy <= '1';
                 --pop from stack 
                stack_en <= '1';
                stack_pshp <= '0';
            end if;
        end if;

        --Set initial values        
        if(state = bkwrd_init) then
           bkwrd_v <= (others=>'0');
        else
            bkwrd_v <= result;
        end if;

        --Drive result output from the bkwrd iteration
        if(bkwrd_stts = '1') then
            result_ready <= '1';
            x <= result;
        else
            x <= (others=>'0');
        end if;

      if(d_div_stts = '1' and state = fwrd_it) then
            counter <= counter_next;
    --push data onto the stack 
            stack_en <='1';
            stack_pshp <='1';
            stack_din <= cd;
            zdstck_din <= zd;
        end if;
--      
        ---NEXT STATE LOGIC---
        case state is
                when idle => 
                                if (row_ready = '1') then
                                    state <= fwrd_init ;
                                 end if;

                when fwrd_init => 
                                state <= fwrd_it;

                when fwrd_it => 

                                if (counter = N) then
                                    state <= bkwrd_init;
                                else
                                    state <= fwrd_it;
                                end if;

                when bkwrd_init => 
                                state <= bkwrd_it;
                when bkwrd_it =>
                                    if(cd_empty = '1' and zd_empty = '1') then
                                        state <= idle;
                                    else
                                        state <= bkwrd_it;
                                    end if;
                when others => NULL;
            end case;
        end if;

    end process;

All other signals are driven by other synchronous modules within the same clock domain, as this is the main routing logic of my design.

BUT if I change the ORs to ANDs it runs global placement fine. Obviously for my design having ands instead of ors won't work, so why is it showing this behavior? (I've expanded the original single line if statements which also didn't work)

Sam

Foi útil?

Solução

Relax your main timing constraint. If your target (and current constraint) is 200MHz, run a P&R for 50 MHz. It may take minutes instead of hours.

The result may seem useless because it's too slow : the point at this stage is to find

  • if P&R works at all for your design
  • the slowest path.

The tools slow down massively as the design gets too tight for the timing constraint. (It differs between tool versions and the newer ones are usually better but you may have hit a pathological case where the tool just doesn't know where to give up).

In any case; assuming the relaxed P&R gives a result at (say) 78 MHz you'll also get details of the slowest path; re-pipeline this path and try again, pushing the constraints up as you improve the design.

If it doesn't, Martin has covered several other bases well.

EDIT following updated question:

There is nothing inherently wrong with these "if" statements per se : what is wrong must be elsewhere, but manifesting here.

Certainly, if these statements are part of a clocked process, and especially if this is the "single process" style of SM, then I would expect this to work. If it's a separate unclocked process there is plenty of room for misbehaviour.

(Comment just popped up that this is a clocked process : I don't believe it's part of the main SM since I can't see state assignments)

Suspect ALL the inputs to these if expressions; especially where and works. Are any of these, unclocked signals, or signals from another clock domain? I am beginning to suspect so. If so there could be some impossibly tight timings there that disappear with and because the critical term can be eliminated by logic minimisation.

Asynch inputs here won't work.

Resynch them to THIS clock domain before feeding them into anything more complex than an OR gate before a latch. If necessary take a signal out into the other clock domain and eliminate the conflict there.

Designing complex resynchronisers that don't add a clock cycle or two delay is HARD. Xilinx FPGAs offer async FIFOS as an alternative so that most people don't have to...

These are only guidelines based on a guess what the problem is ... hope they help.

I'll add my minor nitpick that if (some cond ) could be if some cond for a bit less clutter but that's irrelevant to the problem at hand.

EDIT again (trying to keep up with the question :-)
The second process is unclocked : you can eliminate it and write if dz_ready = '1' or state = fwrd_init then but if the other term dz_ready is asynch, that won't help.

I have taken the liberty of re-writing the process into a more usual "single process SM" form. Its behaviour is (I 'm pretty sure) equivalent to the original but it exposes some odd duplication of actions, that are rather unusual in style. This may either let you see something unintended, or possibly be less confusing to the synthesis tools. (The double assignment to dz_ready in state fwrd_init may be harmless but looks suspicious!)

input : process(clk) 
    begin
        if rising_edge(clk) then 
             if rst = '1' then
                    -- reset to all all 1's so the initial division a_n/d_(n-1) = 0 ; a=0, n=0
                 d_reg <= (others=>'1');
                 z_reg <= (others=>'0');
                 dz_ready <= '0';
                 -- state <= ???; -- Good idea to define initial state here
              else
                -- default assignments, overridden where necessary
                  stack_en <='0';
                  bkwrd_drdy <= '0';
                  d_rdy <= '0';
                  dz_read <='0';
                  read_row <='0';
                  result_ready <='0';
                  --delay dz_ready by one clock to correctly sync with  other signals
                  dz_ready_p <= dz_ready;

                  bkwrd_v <= result; -- will be overridden in bkwrd_init

                  if d_stts = '1' then -- will be overridden in fwrd_init
                         d_reg <= d_out;
                         z_reg <= z_out;
                         dz_ready <= '1';
                  end if;

                  --fwrd it logic---
                  if dz_ready = '1'  then
                        if row_ready = '1'  then
                             d_rdy <= '1' ;
                             dz_read <='1';
                             read_row <= '1';
                             dz_ready <='0';
                             --register the c value
                        end if;
                  end if;
                  --bkwrd it logic and stack logic -- read has priority over push
                  if bkwrd_stts = '1' then
                        if  cd_empty = '0' and zd_empty = '0'  then
                             bkwrd_drdy <= '1';
                              --pop from stack 
                             stack_en <= '1';
                             stack_pshp <= '0';
                        end if;
                        result_ready <= '1';
                        x <= result;
                  else
                        x <= (others=>'0');
                  end if;

                  -- STATE LOGIC --
                  case state is
                        when idle => 
                            if (row_ready = '1') then
                                state <= fwrd_init ;
                            end if;

                        when fwrd_init => 
                            -- actions
                            d_reg <= (others=>'1');
                            z_reg <= (others=>'0');
                            dz_ready <= '0';
                            if(row_ready = '1') then
                                 d_rdy <= '1' ;
                                 dz_read <='1';
                                 read_row <= '1';
                                 dz_ready <='0';
                                 --register the c value
                            end if;
                            -- state
                            state <= fwrd_it;

                        when fwrd_it => 

                             if d_div_stts = '1' then
                                    counter <= counter_next;
                         --push data onto the stack 
                                    stack_en <='1';
                                    stack_pshp <='1';
                                    stack_din <= cd;
                                    zdstck_din <= zd;
                              end if;

                              if (counter = N) then
                                    state <= bkwrd_init;
                              end if;

                         when bkwrd_init => 
                                if cd_empty = '0' and zd_empty = '0' then
                                     bkwrd_drdy <= '1';
                                      --pop from stack 
                                     stack_en <= '1';
                                     stack_pshp <= '0';
                                end if;
                                bkwrd_v <= (others=>'0');
                                state <= bkwrd_it;

                         when bkwrd_it =>
                                if(cd_empty = '1' and zd_empty = '1') then
                                     state <= idle;
                                end if;
                         when others => NULL;
                        end case;
                  end if;
        end if;
    end process;

Outras dicas

After fixing the logic by moving it into case statements it actually appeared that the long placement time is due to a still existent bug in the Coregen 3.0 dividers. http://forums.xilinx.com/t5/Implementation/divider-generator-3-0-problem-with-Virtex-6/m-p/230379#M4737

Can you try the 32-bit tools?

Thhe other thing that springs to mind is that that kind of behaviour often comes from a large memory which you had expected to be implemented in BlockRAM but for some reason the compiler has decided to make out of LUTs and flipflops.

  • Check your synthesis logfiles for the modules you expect to have RAMs in them.
  • Check the technology view to see that the RAMs you expect are there
  • Check the post-MAP netlist in FPGA editor (or Planahead, I forget if you can do that post-MAP)

It's painful!


Now you've isolated some code, can you synthesise just that module (so the module IOs become pins). That might narrow down whether it's inside or outside. You can also trace the signals in RTL viewer more easily at that level sometimes.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top