If you can tolerate the delay/latency, I would recommend a small adjustment as follows:
process (clk_in, reset)
begin
if (reset = '1') then
init_counter <= '0';
init_counter_done <= '0';
start_d <= '0';
elsif rising_edge(clk_in) then
start_d <= start;
if (start = '1' and start_d = '0' and init_counter_done = '0') then
init_counter <= '1';
init_counter_done <= '1';
else
init_counter <= '0';
end if;
end if;
end process;
This will keep your process purely synchronous. Previously it was slightly "confusing" (and possibly so to the tools) since "start" was in the sensitivity list, but your outputs would only update on the rising clock edge.
If this still doesn't work, can you clarify if it doesn't work in simulation or in hardware? If it does not work in hardware, possibly you need to debounce your start
signal?