Why do my goroutines wait for each other instead of finishing when done?

Question 1

Indeed, there is no garantee regarding the order in which your goroutines will be executed.

However if you force the true parallel processing by explicitly letting 2 processor cores run :

import (
    "fmt"
    "time" //for time functions e.g. Now()
    "math/rand" //for pseudo random numbers
    "runtime"
)
...

func main() {
   runtime.GOMAXPROCS(2)

   rand.Seed( time.Now().UTC().UnixNano()) //set seed for rand.
...

Then you will get the expected result :

Slices filled ...
Main working ...
Thread 2 done
Thread 3 done
Thread 1 done

Best regards

Question 2

The important thing is the ability to "yield" the processor to other processes, before the whole potentialy long-running workload is finished. This holds true as well in single-core context or multi-core context (because Concurrency is not the same as Parallelism).

This is exactly what the runtime.Gosched() function does :

Gosched yields the processor, allowing other goroutines to run. It does not suspend the current goroutine, so execution resumes automatically.

Be aware that a "context switch" is not free : it costs a little time each time.

On my machine without yielding, your program runs in 5.1s.
If you yield in the outer loop (for n:=len(a); n>1; n--), it runs in 5.2s : small overhead.
If you yield in the inner loop (for i:=0; i<n-1; i++), it runs in 61.7s : huge overhead !!

Here is the modified program correctly yielding, with the small overhead :

package main

import (
    "fmt"
    "math/rand"
    "runtime"
    "time"
)

/* Simple bubblesort algorithm*/
func bubblesort(str string, a []int, ch chan []int) {
    for n := len(a); n > 1; n-- {
        for i := 0; i < n-1; i++ {
            if a[i] > a[i+1] {
                a[i], a[i+1] = a[i+1], a[i] //swap
            }
        }
        runtime.Gosched() // yield after part of the workload
    }
    fmt.Println(str + " done") //done message
    ch <- a
}

/*fill slice with pseudo numbers*/
func random_fill(a []int) []int {
    for i := 0; i < len(a); i++ {
        a[i] = rand.Int()
    }
    return a
}

func main() {
    rand.Seed(time.Now().UTC().UnixNano()) //set seed for rand.

    a1 := make([]int, 34589) //create slice
    a2 := make([]int, 42)    //create slice
    a3 := make([]int, 9999)  //create slice

    a1 = random_fill(a1) //fill slice
    a2 = random_fill(a2) //fill slice
    a3 = random_fill(a3) //fill slice
    fmt.Println("Slices filled ...")

    ch1 := make(chan []int) //create channel of result
    ch2 := make(chan []int) //create channel of result
    ch3 := make(chan []int) //create channel of result

    go bubblesort("Thread 1", a1, ch1) //1. Routine Start
    go bubblesort("Thread 2", a2, ch2) //2. Routine Start
    go bubblesort("Thread 3", a3, ch3) //3. Routine Start
    fmt.Println("Main working ...")

    <-ch1 // Wait for result 1
    <-ch2 // Wait for result 2
    <-ch3 // Wait for result 3
}

Output :

Slices filled ...
Main working ...
Thread 2 done
Thread 3 done
Thread 1 done

I also used channels to implement the rendez-vous, as suggested in my previous comment.

Best regards :)

Question 3

Since the release of Go 1.2, the original program now ~~works~~ may work fine without modification. You may try it in Playground.

This is explained in the Go 1.2 release notes :

In prior releases, a goroutine that was looping forever could starve out other goroutines on the same thread, a serious problem when GOMAXPROCS provided only one user thread. In Go 1.2, this is partially addressed: The scheduler is invoked occasionally upon entry to a function.