nested generate statements for 32 x 8 register VHDL

Question

You have 8 ROWs of 32 bit COLs connected to I0. Without a reset input to the D_FFs at best you'd have to write to all 8 rows to get 'X's instead of 'U's.

Your MUX isn't instantiated for either read port. If you were to implement the array value:

type reg_array is array (0 to 7) of std_logic_vector(31 downto 0);
signal Q,QN: reg_array;

These would replace I0 and Q_out.

From the referenced answer (you apparently just marked as useful - thanks) you could replace I0(COL) and QN_out(COL) in the D_FF instantiation in the inner generate statement with Q(ROW)(COL) and QN(ROW)(COL).

Note If you're not using the Q NOT outputs of the D_FFs you can either not provide them as ports or not connect them (open). You could also use Q outputs for one read MUX and QN outputs for the other read MUX, inverting the output of that MUX. With only two ports you aren't reducing the load significantly, you could just use Q.

For a MUX using signal Q defined as the reg_array above the MUX inputs would be Q(0) through Q(7) and the output would be either PORT_A or PORT_B. S_in would be connected to READ_REG_A orREAD_REG_B respectively.

One thing that's not apparent from reading your VHDL design description is why there is wait for 10 ns in your process DCDR_AND? It delays the write past the low baud of CLK (the low portion of clock). In a zero timed model you could simply used not CLK instead of CLK (CLK_vals(I) <= O_out(I) and not CLK, delete the wait for 10 ns; line). For a timed model resulting from synthesis, wait can't be synthesized. If you were to intend to synthesize, CLK can be used if you can count on input holdover should WR_DATA be clock synchronous.

And then your model has discretely instantiated D_FFs and uses MUXes for read ports.

I resisted the urge to modify your code and show it on the off chance you're doing the same class exercise. If any of this is unclear ask in a comment on this answer and I'll add to the answer, clearly mark any corrections to it or demonstrate code if necessary.

In response to the comment "...yet I still see UUU's for PORT_A and PORT_B"

Note that REG_WRT already shows as an inverted clock from the test bench from the previous effort, so I removed the preceding not in process DCDR_AND, otherwise using the previous effort's test bench unchanged other than matching your port names.

Also notice that the PORT_A and PORT_B outputs remain uninitialized until the address (READ_REG_A or READ_REG_B) is written to, which was the point of that particular test bench.

The idea behind writing to the flip flops (collectively 8 32 bit registers) in the middle of a clock cell (REG_WRT) was to avoid clock skew issues, in the case of the test bench caused by writing inputs based on delay values instead of on clock edges.

You could similarly have stimulus in a clocked process, which might require balancing clock delays to insure WRT_DATA and WRT_REG_NUM are valid at the right time. This is also cured by using not REG_WRT.

If you make REG_WRT in the test bench an upright clock instead of inverted you can leave the not in process DCDR_AND.

There's also concurrent signal assignment statements which incidentally can go in generate statements allowing the process DCDR_AND to folded into the first generate statement:

GEN_D_FF:
    for ROW in 0 to 7 generate
    begin
GEN_D_FF0:
        for COL in 0 to 31 generate
        begin
DFF_X:  
            D_FF 
            port map(
                D_in => WRT_DATA(COL), 
                CLK => CLK_vals(ROW), 
                Q_out => Q(ROW)(COL), 
                QN_out => QN(ROW)(COL)
            );
        end generate;
DCDR_AND:
        CLK_vals(ROW) <= O_out(ROW) and REG_WRT;

    end generate;

-- DCDR_AND: 
--     process (O_out, REG_WRT)
--     begin
--
--         I_in <= WRT_REG_NUM;
--         for I in 0 to 7 loop
--             CLK_vals(I) <= O_out(I) and REG_WRT;
--         end loop;
-- end process;

And could also be used on the PORT_A and PORT_B assignments instead of inside the process statement. You could also assign PORT_A and PORT_B as actuals to O_out in the two MUX instantiations, doing away with either a concurrent signal assignment or a process, for example:

MUX_A: 
    MUX 
        port map ( 
            S_in => READ_REG_A, 
            I7 => Q(7), 
            I6 => Q(6), 
            I5 => Q(5), 
            I4 => Q(4), 
            I3 => Q(3), 
            I2 => Q(2), 
            I1 => Q(1), 
            I0 => Q(0), 
            O_out => Port_A
        );

You could do this because you aren't using the read port data internally and the ports are mode out.

And while doing this I found that eliminating the assignment to I_in as above can cause all the 'U's on your read ports which can be cured similarly:

DCDR1: 
    DCDR 
    port map (
        I_in => WRT_REG_NUM, 
        O_out => O_out
    );

Allowing signal declarations for I_in, MUXA_O_out and MUXB_O_out to be eliminated:

-- internal signals used

-- signal I_in:        std_logic_vector(2 downto 0);
signal O_out:       std_logic_vector(7 downto 0);
signal CLK_vals:    std_logic_vector(7 downto 0);
-- signal MUXA_O_out:  std_logic_vector(31 downto 0);
-- signal MUXB_O_out:  std_logic_vector(31 downto 0);

I didn't have a case of always having 'U's on the read ports except when I had accidentally eliminated the inclusion of WRT_REG_NUM in CLK_vals as noted above.

I didn't quite finish prettifying your code:

library ieee;
use ieee.std_logic_1164.all;

entity DCDR is
    port (
        I_in:   in  std_logic_vector (2 downto 0);
        O_out:  out std_logic_vector (7 downto 0)
    );
end entity;

architecture foo of DCDR is
    signal input:   std_logic_vector (2 downto 0);
begin
    input <= TO_X01Z(I_in);

    O_out <= "00000001"  when input = "000" else
             "00000010"  when input = "001" else
             "00000100"  when input = "010" else
             "00001000"  when input = "011" else
             "00010000"  when input = "100" else
             "00100000"  when input = "101" else
             "01000000"  when input = "110" else
             "10000000"  when input = "111" else
             (others => 'X');

end architecture;

library ieee;
use ieee.std_logic_1164.all;

entity D_FF is
    port (
        D_in:     in    std_logic;
        CLK:      in    std_logic;
        Q_out:    out   std_logic;
        QN_out:   out   std_logic
    );
end entity;

architecture foo of D_FF is

    signal Q:   std_logic;

begin

FF:
    process (CLK)
    begin
        if CLK'EVENT and CLK = '1' then
            Q <= D_in;
        end if;
    end process;

    Q_out <= Q;
    QN_out <= not Q;

end architecture;

library ieee;
use ieee.std_logic_1164.all;

entity MUX is
    port (
        S_in:                             in  std_logic_vector(2 downto 0);
        I7, I6, I5, I4, I3, I2, I1, I0:   in  std_logic_vector(31 downto 0);
        O_out:                            out std_logic_vector(31 downto 0)
    );
end entity;

architecture foo of MUX is

begin
    O_out <= I0 when S_in = "000" else
             I1 when S_in = "001" else
             I2 when S_in = "010" else
             I3 when S_in = "011" else
             I4 when S_in = "100" else
             I5 when S_in = "101" else
             I6 when S_in = "110" else
             I7 when S_in = "111" else
             (others => 'X');

end architecture;

library ieee;
use ieee.std_logic_1164.all;

entity REG is
port (
    REG_WRT:        in  std_logic;
    WRT_REG_NUM:    in  std_logic_vector(2 downto 0);
    WRT_DATA:       in  std_logic_vector(31 downto 0);
    READ_REG_A:     in  std_logic_vector(2 downto 0);
    READ_REG_B:     in  std_logic_vector(2 downto 0);
    PORT_A:         out std_logic_vector(31 downto 0);
    PORT_B:         out std_logic_vector(31 downto 0)
);
end REG;

architecture BEHV_32x8_REG of REG is
-- decoder component
component DCDR
    port (
        I_in:       in  std_logic_vector(2 downto 0);
        O_out:      out std_logic_vector(7 downto 0)
    );
end component;
-- D flip flop component
component D_FF
    port (
        D_in:       in    std_logic;
        CLK:        in    std_logic;
        Q_out:      out   std_logic;
        QN_out:     out   std_logic   -- Q not
    );
end component;
-- MUX component
component MUX
    port (
        S_in:                            in std_logic_vector(2 downto 0);
        I7, I6, I5, I4, I3, I2, I1, I0:  in std_logic_vector(31 downto 0);
        O_out:                           out std_logic_vector(31 downto 0)
    );
end component;

-- internal signals used

-- signal I_in:        std_logic_vector(2 downto 0);
signal O_out:       std_logic_vector(7 downto 0);
signal CLK_vals:    std_logic_vector(7 downto 0);
-- signal MUXA_O_out:  std_logic_vector(31 downto 0);
-- signal MUXB_O_out:  std_logic_vector(31 downto 0);

-- two arrays of eight 32 bit vectors - the Q and QN outputs of all D_FFs

type reg_array is array (0 to 7) of std_logic_vector(31 downto 0);
signal Q, QN: reg_array;

begin

-- decoder instance
DCDR1: 
    DCDR 
    port map (
        I_in => WRT_REG_NUM, 
        O_out => O_out
    );

GEN_D_FF:
    for ROW in 0 to 7 generate
    begin
GEN_D_FF0:
        for COL in 0 to 31 generate
        begin
DFF_X:  
            D_FF 
            port map(
                D_in => WRT_DATA(COL), 
                CLK => CLK_vals(ROW), 
                Q_out => Q(ROW)(COL), 
                QN_out => QN(ROW)(COL)
            );
        end generate;

        CLK_vals(ROW) <= O_out(ROW) and REG_WRT;

    end generate;

-- DCDR_AND: 
--     process (O_out, REG_WRT)
--     begin
-- 
--         I_in <= WRT_REG_NUM;
--         for I in 0 to 7 loop
--             CLK_vals(I) <= O_out(I) and REG_WRT;
--         end loop;
-- end process;

-- MUX instances
MUX_A: 
    MUX 
        port map ( 
            S_in => READ_REG_A, 
            I7 => Q(7), 
            I6 => Q(6), 
            I5 => Q(5), 
            I4 => Q(4), 
            I3 => Q(3), 
            I2 => Q(2), 
            I1 => Q(1), 
            I0 => Q(0), 
            O_out => Port_A
        );

MUX_B: 
    MUX 
        port map ( 
        S_in => READ_REG_B, 
        I7 => Q(7), 
        I6 => Q(6), 
        I5 => Q(5), 
        I4 => Q(4), 
        I3 => Q(3), 
        I2 => Q(2), 
        I1 => Q(1), 
        I0 => Q(0), 
        O_out => Port_B
        );

end architecture;

library ieee; 
use ieee.std_logic_1164.all; 

entity reg_tb is    
end entity; 

architecture fum of reg_tb is 

component REG
    port (  
        REG_WRT:        in  std_logic; 
        WRT_REG_NUM:    in  std_logic_vector (2 downto 0);
        WRT_DATA:       in  std_logic_vector (31 downto 0);
        READ_REG_A:     in  std_logic_vector (2 downto 0);
        READ_REG_B:     in  std_logic_vector (2 downto 0);
        PORT_A:         out std_logic_vector (31 downto 0);
        PORT_B:         out std_logic_vector (31 downto 0)
        ); 
    end component; 

signal REG_WRT:         std_logic := '1';
signal WRT_REG_NUM:     std_logic_vector (2 downto 0) := "000";
signal WRT_DATA:        std_logic_vector (31 downto 0) := (others => '0');
signal READ_REG_A:      std_logic_vector (2 downto 0) := "000";
signal READ_REG_B:      std_logic_vector (2 downto 0) := "000";
signal PORT_A:          std_logic_vector (31 downto 0);
signal PORT_B:          std_logic_vector (31 downto 0);

begin 

DUT: 
        REG
        port map (
            REG_WRT => REG_WRT,
            WRT_REG_NUM => WRT_REG_NUM,
            WRT_DATA  => WRT_DATA,
            READ_REG_A => READ_REG_A, 
            READ_REG_B => READ_REG_B, 
            PORT_A => PORT_A, 
            PORT_B => PORT_B
        ); 


STIMULUS:
    process 
    begin 
    wait for 20 ns;
    REG_WRT <= '0';
    wait for 20 ns;
    REG_WRT <= '1';
    wait for 20 ns;
    WRT_DATA <= x"feedface";
    WRT_REG_NUM <= "001";
    REG_WRT <= '0';
    wait for 20 ns;
    REG_WRT <= '1';
    READ_REG_A <= "001";
    wait for 20 ns;
    WRT_DATA <= x"deadbeef";
    WRT_REG_NUM <= "010";
    READ_REG_B <= "010";
    REG_WRT <= '0';
    wait for 20 ns;
    REG_WRT <= '1';
    wait for 20 ns;
    wait for 20 ns;
    wait;
 end process; 
end architecture;

But it runs and produces the waveform shown above. This was done with Tristan Gingold's ghdl (ghdl-0.31) on a Mac (OS X 10.9.2) using Tony Bybell's gtkwave. See Sourceforge pages for ghdl-updates and gtkwave.