You are creating a clock from a clock, which is a bad idea. It seems you are trying to divide by 4? Instead create an enable pulse:
NW_CLK: process (clk) is
variable divider : integer range 0 to 3;
begin
if rising_edge (clk) then
if divider = 3 then
divider := 0;
screen_process_enable <= '1';
else
divider := divider + 1;
screen_process_enable <= '0';
end if
end if;
end process NW_CLK;
Then in the screen process:
scrn_loc : process (clk) is
begin
if RISING_EDGE (clk) and screen_process_enable = '1' then
etc...
Not related to your question, but I'll comment on it here anyway: You seem to be trying to hold the entire screen in memory - that's quite a lot of storage you are asking for in a real chip (it'll be fine in simulation).
For producing a grid you can just do it on the fly, by assigning to the VGA output depending on the values of your x and y counters. Because you have both the assignment to scrn
and vga
outside of a process, the synthesiser is probably clever enough to figure out that you never make use of the memory storage you've asked for and has optimised it away. If at some future point you come to use scrn
as a true framebuffer, you may run up against performance or resource limitations, depending on your device.