Generally, I tend to trust the compiler tools I use and don't fret so much about the results as long as they meet my timing and area budgets.
That said, with multipliers that need to run at fast speeds I find I get better results (in DC, at least) if I create a Verilog module containing the multiply (*
) and a retiming register or two, and push down into this module to synthesise it before popping up to toplevel synthesis. It seems as if the compiler gets 'distracted' by other timing paths if you try to do everything at once, so making it focus on a multiplier that you know is going to be tricky seems to help.