Design of a VHDL LUT Module

Question 1

I'm going to go out on a limb here and tell you to let your synthesizer optimize it. Other than that you can use a minimizer (e.g. espresso) on your table then code the result in VHDL.

I'm guessing this should be what you should do when targeting an FPGA:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity bit_count is
    port (
        a,b,c,d:   in  std_logic;
        x,y,z:     out std_logic    
    );
end entity;

architecture lut of bit_count is
    subtype lutin is std_logic_vector (3 downto 0);
    subtype lutout is std_logic_vector (2 downto 0);
    type lut is array (natural range 0 to 15) of lutout;
    constant bitcount:   lut := (
        "000", "001", "001", "010", 
        "011", "010", "010", "011", 
        "001", "010", "010", "011",
        "010", "011", "011", "100"
        );

    signal temp:    std_logic_vector (2 downto 0);

begin

    temp <= bitcount( TO_INTEGER ( unsigned (lutin'(a&b&c&d) ) ) );

    (x, y, z) <= lutout'(temp(2), temp(1), temp(0));

end architecture;

And failing that I think hand optimizing it as a ROM is likely to be close in terms of gate count:

--  0000   0001   0010   0011
--  "000", "001", "001", "010", 
--  0100   0101   0110   0111
--  "011", "010", "010", "011", 
--  1000   1001   1010   1011
--  "001", "010", "010", "011",
--  1100   1101   1110   1111
--  "010", "011", "011", "100"

-- output         Input
-----------------------
-- bit 0  is true 0001 0010 0100 0111 1000 1011 1101 1111
-- bit 1          0011 0100 0101 0110 0111 1001 1010 1011 1100 1101 1110
-- bit 2          1111

architecture rom of bit_count is

    signal t0,t1,t2:    std_logic;
    signal t4,t7,t8:    std_logic;
    signal t11,t13,t14: std_logic;
    signal t15:         std_logic;

begin
-- terms
    t0  <= not a and not b and not c and not d;
    t1  <=     a and not b and not c and not d;
    t2  <= not a and     b and not c and not d;
--  t3  <=     a and     b and not c and not d;
    t4  <= not a and not b and     c and not d;
--  t5  <=     a and not b and     c and not d;
--  t6  <= not a and     b and     c and not d;
    t7  <=     a and     b and     c and not d;
    t8  <= not a and not b and not c and     d;
--  t9  <=     a and not b and not c and     d;
--  t10 <= not a and     b and not c and     d;
    t11 <=     a and     b and not c and     d;
--  t12 <= not a and not b and     c and     d;
    t13 <=     a and not b and     c and     d;
    t14 <= not a and     b and     c and     d;
    t15 <=     a and     b and     c and     d;

-- outputs

    x <= t15;

    y <= not ( t0 or t1 or t2 or t8 or t15 );

    Z <= t1 or t2 or t4 or t7 or t8 or t11 or t13 or t14;

end architecture;

It should be fewer gates than your chained multiplexers and a bit flatter (faster).

The two architectures have been analyzed but not simulated. It's easy to get errors when doing hand gate level coding.

Question 2

Unless you are just fooling around in VHDL for fun or learning, if you want a LUT, write it directly as a LUT. There is probably no reason to unwrap this into low-level gates and muxes. Instead, simply describe the behavior you want, and let VHDL do the work for you:

For example, here is simple VHDL for the combinational logic LUT you've described:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Number_of_Ones is
    port (
        --- mapped 3=a, 2=b, 1=c, 0=d
        abcd : in std_ulogic_vector(3 downto 0);
        -- mapped x=2, y=1, z=0
        xyz  : out std_ulogic_vector(2 downto 0);
    );
end entity;

architecture any of Number_of_Ones is
begin

    process (abcd) is
    begin
        case abcd is      
        --abcd|xyz
        when "0000" => xyz <= "000";
        when "0001" => xyz <= "001";
        when "0010" => xyz <= "001";
        when "0011" => xyz <= "010";
        when "0100" => xyz <= "011";
        when "0101" => xyz <= "010";
        when "0110" => xyz <= "010";
        when "0111" => xyz <= "011";
        when "1000" => xyz <= "001";
        when "1001" => xyz <= "010";
        when "1010" => xyz <= "010";
        when "1011" => xyz <= "011";
        when "1100" => xyz <= "010";
        when "1101" => xyz <= "011";
        when "1110" => xyz <= "011";
        when "1111" => xyz <= "100";
        end case;
    end process;
end architecture;

As you can see, this is exactly your truth table copied in and just modified to fit VHDL syntax. You can of course write this in several different ways and you might wish to map ports differently, etc, but this should get you on the right track.

Question 3

As another "trust the tools" answer, if you want to count the ones, just do that. Your code will be clearer and the synthesizer will make a remarkably good job of it:

process(clk)
  variable count : unsigned(xyz'range)
begin
  if rising_edge(clk) then
     count := (others => '0');
     for i in abcd'range loop
        if abcd(i) = '1' then
           count := count + 1;
        end if;
     end loop;
     xyz <= count;
   end if;
end process;

I haven't compiled or simulated this, but it should give you the idea... Of course, for full code-clarity, you'd encapsulate the count/loop aspect in a function called count_ones and call that from the process.