It seems to be similat to this one and this one
At the first one, it was suggested to use dec2bitvec inside a for loop. This may be enought to you (altought slow).
The second one sugests to create a lookup table using bitget and then use it (instead of using dec2bit or dec2bitvec)
You can try to use something 'in the middle'.
B = 3; % Number of bits per int.
A = randi(7, 16000000, 1); % 16M random elements between 1 and 7 (3bits).
tic
% get each group of bits in a column of K.
K = cell2mat(arrayfun(@(bit)bitget(A, B+1-bit), 1:B, 'UniformOutput', 0))';
% reshape to have them in 8 packs
K = reshape(K, [8, numel(K)/8])';
% get the uint8 vec.
U = K*(2.^(size(K,2)-1:-1:0))';
toc
Mine was elapsed in 3.5 seconds. (Win8 64bits, i5 4GB ram)
Instead of creating a lookup table, this code is creating a matrix (K) with bit values of each integer (stored in columns), reshaping it (to create 8bin value) and then using the same math as you used before to create the uint8 vector.