ITERAND 당 선고적인 노력 추정치를 사용하여 병렬 루프의 런타임 예측 (주어진 수의 근로자)

Question

나는 다소 만족스러운 솔루션을 생각해 냈기 때문에 누구나 관심이있는 경우 공유 할 것이라고 생각했습니다. 접근 방식을 개선/미세 조정하는 방법에 대한 의견에 여전히 감사 할 것입니다.

기본적으로, 나는 유일한 현명한 방법은 병렬 루프에 대한 스케줄러의 (매우) 초보 모델을 구축하는 것이라고 결정했습니다.

function c=est_cost_para(cost_blocks,cost_it,num_cores)
% Estimate cost of parallel computation

% Inputs:
%   cost_blocks: Estimate of cost per block in arbitrary units. For
%       consistency with the other code this must be in the reverse order
%       that the scheduler is fed, i.e. cost should be ascending!
%   cost_it:     Base cost of iteration (regardless of number of entries)
%       in the same units as cost_blocks.
%   num_cores:   Number of cores
%
% Output:
%   c: Estimated cost of parallel computation

num_blocks=numel(cost_blocks);
c=zeros(num_cores,1);

i=min(num_blocks,num_cores);
c(1:i)=cost_blocks(end-i+1:end)+cost_it;
while i<num_blocks
    i=i+1;
    [~,i_min]=min(c); % which core finished first; is fed with next block
    c(i_min)=c(i_min)+cost_blocks(end-i+1)+cost_it;
end

c=max(c);

end

매개 변수 cost_it 빈 반복은 많은 다른 부작용의 조잡한 조화로 분리 될 수 있습니다. for/parfor-Loop (블록 당 다를 수 있음) 및 시작 시간 RESP. 데이터의 전송 parfor-루프 (아마도 더). 모든 것을 함께 버리는 주된 이유는 더 세분화 된 비용을 추정/결정하고 싶지 않기 때문입니다.

위의 루틴을 사용하여 다음과 같은 방식으로 컷오프를 결정합니다.

% function i=cutoff_ser_para(cost_blocks,cost_it,num_cores)
% Determine cut-off between serial an parallel regime

% Inputs:
%   cost_blocks: Estimate of cost per block in arbitrary units. For
%       consistency with the other code this must be in the reverse order
%       that the scheduler is fed, i.e. cost should be ascending!
%   cost_it:     Base cost of iteration (regardless of number of entries)
%       in the same units as cost_blocks.
%   num_cores:   Number of cores
%
% Output:
%   i: Number of blocks to be calculated serially

num_blocks=numel(cost_blocks);
cost=zeros(num_blocks+1,2);

for i=0:num_blocks
    cost(i+1,1)=sum(cost_blocks(end-i+1:end))/num_cores + i*cost_it;
    cost(i+1,2)=est_cost_para(cost_blocks(1:end-i),cost_it,num_cores);
end

[~,i]=min(sum(cost,2));
i=i-1;

end

특히 나는 est_cost_para 가정을 가정합니다 cost_it) 가능한 가장 낙관적 인 스케줄링. 나는 가장 잘 작동하는 것이 무엇인지 모르기 때문에 주로 그대로두고 있습니다. 보수적이기 위해 (즉, 평행 루프에 너무 큰 블록을 공급하지 않으면) 물론 하나는 버퍼로 약간의 백분율을 추가하거나 병렬 비용을 팽창시키기 위해 전력> 1을 사용할 수도 있습니다.

또한 주목하십시오 est_cost_para 연속적으로 적은 블록으로 호출됩니다 (변수 이름을 사용하더라도 cost_blocks 두 루틴의 경우 하나는 다른 일상의 하위 집합입니다).

내 말이있는 질문의 접근 방식과 비교하여 두 가지 주요 장점이 있습니다.

데이터 (블록 수와 비용 모두)와 코어 수 사이의 비교적 복잡한 의존성은 단일 공식으로 가능한 것보다 시뮬레이션 된 스케줄러로 훨씬 더 잘 캡처됩니다.
일련/병렬 분포의 가능한 모든 조합에 대한 비용을 계산 한 다음 최소한을 취함으로써 한쪽에서 데이터를 읽는 동안 너무 일찍 "고착"할 수 없습니다 (예 : 지금까지 데이터에 비해 큰 점프가 있습니다. 그러나 총계에 비해 작습니다).

물론, 무증상 복잡성은 est_cost_para 항상 루프를 가지고 있지만 내 경우에는 (num_blocks<500) 이것은 절대적으로 무시할 수 있습니다.

마지막으로, 괜찮은 값의 경우 cost_it 쉽게 제시하지 않으며, 각 블록의 실제 실행 시간을 측정하고 순전히 평행 한 부분을 측정 한 다음 결과 데이터를 비용 예측에 맞추고 업데이트 된 값을 얻으려고 시도 할 수 있습니다. cost_it 일상의 다음 호출을 위해 (총 비용과 병렬 비용의 차이를 사용하거나, 제로 비용을 적합한 공식에 삽입함으로써). 이것은 가장 유용한 가치로 "수렴"해야합니다. cost_it 문제의 문제.