The important thing is the ability to "yield" the processor to other processes, before the whole potentialy long-running workload is finished. This holds true as well in single-core context or multi-core context (because Concurrency is not the same as Parallelism).
This is exactly what the runtime.Gosched() function does :
Gosched yields the processor, allowing other goroutines to run. It
does not suspend the current goroutine, so execution resumes
automatically.
Be aware that a "context switch" is not free : it costs a little time each time.
- On my machine without yielding, your program runs in 5.1s.
- If you yield in the outer loop (
for n:=len(a); n>1; n--
), it runs in 5.2s : small overhead.
- If you yield in the inner loop (
for i:=0; i<n-1; i++
), it runs in 61.7s : huge overhead !!
Here is the modified program correctly yielding, with the small overhead :
package main
import (
"fmt"
"math/rand"
"runtime"
"time"
)
/* Simple bubblesort algorithm*/
func bubblesort(str string, a []int, ch chan []int) {
for n := len(a); n > 1; n-- {
for i := 0; i < n-1; i++ {
if a[i] > a[i+1] {
a[i], a[i+1] = a[i+1], a[i] //swap
}
}
runtime.Gosched() // yield after part of the workload
}
fmt.Println(str + " done") //done message
ch <- a
}
/*fill slice with pseudo numbers*/
func random_fill(a []int) []int {
for i := 0; i < len(a); i++ {
a[i] = rand.Int()
}
return a
}
func main() {
rand.Seed(time.Now().UTC().UnixNano()) //set seed for rand.
a1 := make([]int, 34589) //create slice
a2 := make([]int, 42) //create slice
a3 := make([]int, 9999) //create slice
a1 = random_fill(a1) //fill slice
a2 = random_fill(a2) //fill slice
a3 = random_fill(a3) //fill slice
fmt.Println("Slices filled ...")
ch1 := make(chan []int) //create channel of result
ch2 := make(chan []int) //create channel of result
ch3 := make(chan []int) //create channel of result
go bubblesort("Thread 1", a1, ch1) //1. Routine Start
go bubblesort("Thread 2", a2, ch2) //2. Routine Start
go bubblesort("Thread 3", a3, ch3) //3. Routine Start
fmt.Println("Main working ...")
<-ch1 // Wait for result 1
<-ch2 // Wait for result 2
<-ch3 // Wait for result 3
}
Output :
Slices filled ...
Main working ...
Thread 2 done
Thread 3 done
Thread 1 done
I also used channels to implement the rendez-vous, as suggested in my previous comment.
Best regards :)