Question

I'm trying to determine something regarding the way that my PC works, I have a dual core PC, and I'm trying to test it with a code I wrote, The program multiplies two matrices using threads (in java) when every thread handles matrix rows divided by the number of threads. So, testing my code on two 1024 X 1024 matrices I got this results - (all results are medians of 10 runs) 1 thread - 9.878 seconds\\\ 2 threads - 5.944 seconds\\\ 3 threads - 5.062 seconds\\\ 4 threads - 4.895 seconds \\\ 5 to 1024 threads , time varies between 4.8 to 5.3 seconds\\\ .

What I'm trying to figure out is why the decline in time is lower with each of the first 4 threads? shouldn't the the work be divided equally between the cores? so that 1 thread 10 seconds 2 threads 5 seconds and the rest just take longer for I only have 2 cores and adding more threads is just creating more context switching .

The second thing that I cant understand, assuming that after the 4th thread my PC just switches between threads which doesn't really split the job but rather just switches which thread does A certain job , shouldn't the time increase drastically with 1024 threads just because I'm making it do a lot of context switches ?

Thanks in advance to any response in the matter

add code -

    /**
 * A class representing matrix multiplying threads , implements runnable
 * used to test the time difference according to changes in amount of 
 * threads used in the program !
 * 
 * @author R.G
 */
public class MatrixMultThread implements Runnable{

    //Thread fields and constants
    private static final String INT_ERROR = "An error has occured during thread join";
    private static final String THREAD_COUNT_ERROR = "Invalid number of threads";
    static final int MATRIX_ROW = 1024;
    static final int MATRIX_COL = 1024;
    static final int UPPER_THREAD_LIMIT = MATRIX_ROW;
    private int startRow;
    private int endRow;
    private float[][] target;
    private float[][] sourceTwo;
    private float[][] sourceOne;

    /**
     * MatrixMultThread constructor - constructs the threads that handle multiplication.
     * 
     * @param startRow - the row this thread should start calculating from
     * @param endRow - the row this thread should stop calculating at (included in calc)
     * @param sourceOne - first matrix in the multiplication
     * @param sourceTwo - second matrix in the multiplication
     * @param target - result matrix
     */
    public MatrixMultThread(int startRow, int endRow, float[][] sourceOne, float[][] sourceTwo, float[][] target){
        this.startRow = startRow;
        this.endRow = endRow;
        this.target = target;
        this.sourceOne = sourceOne;
        this.sourceTwo = sourceTwo;

    }

    /**
     * Thread run method, invoking the actual calculation regarding
     * this thread's rows.
     */
    public void run() {
        int sum = 0;
        for(; startRow <= endRow; startRow++){
            for(int j = 0; j < MATRIX_COL ; j++){
                for(int i = 0; i < MATRIX_ROW ; i++){
                    sum += sourceOne[startRow][i] * sourceTwo[i][j];
                }
                target[startRow][j] = sum;
                sum = 0;
            }
        }
    }

    /**
     * A method used for multiplying two matrices by threads.
     * 
     * @param a - first source matrix
     * @param b - second source matrix
     * @param threadCount - number of threads to use in the multiplication
     */
    public static float[][] mult(float[][] a, float[][]b, int threadCount) {
        if(threadCount > UPPER_THREAD_LIMIT || threadCount < 1){
            System.out.println(THREAD_COUNT_ERROR);
            System.exit(1);
        }

        //Result matrix
        float[][] result = new float[MATRIX_ROW][MATRIX_COL];
        Thread[] threadList = new Thread[threadCount];

        //Creating the threads
        int firstRow = 0;
        int lastRow = 0;
        for (int i = 0; i < threadCount ; i++){
            firstRow = i * (MATRIX_ROW / threadCount);
            lastRow = ((i + 1) * (MATRIX_ROW / threadCount)) -1 ;
            Thread singleThread;

            //in case the number does not divide exactly we let the last thread do a bit extra work
            //to compensate on the missing few matrix lines.
            if((i + 1) == threadCount){
                 singleThread = new Thread(new MatrixMultThread(firstRow, MATRIX_ROW - 1, a, b, result));
            }else{
                 singleThread = new Thread(new MatrixMultThread(firstRow, lastRow, a, b, result));
            }
            threadList[i] = singleThread; 
            singleThread.start();
        }

        //Join loop
        for (int i = 0; i < threadCount ; i++){
            try {
                threadList[i].join();
            } catch (InterruptedException e) {
                System.out.println(INT_ERROR);
                System.exit(1);
            }
        }
        return result;
    }


    /**
     * Main method of multiplying two matrices using various number of threads
     * functionality time is being tested.
     * 
     * @param args.
     */
    public static void main(String[] args) {

        //Thread number and timers for milliseconds calculation.
        int numberOfThreads = 1024;
        long startTimer, endTimer;

        //Initializing matrices
        float[][] a = new float[MATRIX_ROW][MATRIX_COL];
        float[][] b = new float[MATRIX_ROW][MATRIX_COL];

        for(int i = 0 ; i < MATRIX_ROW ; i++){
            for(int j = 0 ; j < MATRIX_COL ; j++){
                a[i][j] = (float)(Math.random() * ((100 - 0) + 1)); //Random matrices (values
                b[i][j] = (float)(Math.random() * ((100 - 0) + 1)); //between 0 and 100).
            }
        }

        //Timing the multiplication.
        startTimer = System.currentTimeMillis();
        mult(a, b, numberOfThreads);
        endTimer = System.currentTimeMillis();
        System.out.println("Matrices multiplied in " + (endTimer - startTimer) + " miliseconds");
    }
}
Was it helpful?

Solution

Your program is CPU bound. This means that it consumes its whole scheduler quanta. So context-switch overhead is relatively small:

overhead = ((consumed_quanta + context_switch_time) / consumed_quanta) - 1

Context switch overhead would be bigger in processes that voluntarily leave the CPU: e.g. two threads continuously passing the same message between them (so one thread sends the message while the other reads it, then the second thread sends the message to the first one, and so on) would have very high context switch overhead.

SMT (HyperThreading in x86 land) allows a single core to service several threads, as if it was several logical cores. Since the CPU often has to wait for external resources (e.g: it needs data from cache), allowing another thread to proceed during those dead periods can lead to performance improvements with relatively little extra circuitry (as compared to adding another core). Typical quoted figures for performance improvements in real systems (not in synthetic benchmarks) due to HT are around 10-20%, but YMMV: HT can make performance worse in some edge-cases, and can be a much more significant improvement in different edge-cases.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top