Solving task scheduling or bin-packing optimizations in R

https://stackoverflow.com/questions/10467579

06-06-2021
|

Question

I have an optimisation issue. It's about a product that contains 20 parts (the order of producing doesn't matter). I've got 3 similar machine that can produce all 20 parts.

I've got the 20 parts represented in minutes (ie. it takes 3min to produce the first part and 75min to produce the second part, etc)

ItemTime<-c(3,75,55,12,45,55,11,8,21,16,65,28,84,3,58,46,5,84,8,48)

So to produce 1 product it takes 730 min.

sum(ItemTime)

The aim is to minimise the production of one product by allocating the good item to the three machines.

sum(ItemTime/3)

So actually I need to be as close as 243.333 min (730/3)

The amount of possibility is huge 3^20

I guess there are many different optimal solutions. I would like that R give me all of them. I don't need only to know total time that will need machine 1 2 and 3 : I also need to know which items to give to machine 1, to machine 2 and to manchine 3.

Alternatively, if it's too long I would like to choose a sample without repetition that is as reasonable as possible...

Can I solve my issue with R language?

Solution

I believe your problem is a close variant of the Multiple Knapsack Problem (MKP) which is, a priori, not a piece of cake...

However, your dimension is small enough that the problem can be solved as a Mixed-Integer Programming (MIP). I solved it below using Rglpk; it took the solver about four minutes. If you are lucky enough to have access to CPLEX, I would highly recommend you use Rcplex instead, it will smoke it.

ItemTime<-c(3,75,55,12,45,55,11,8,21,16,65,28,84,3,58,46,5,84,8,48)
N <- length(ItemTime)
M <- 3

Problem formulation:

# variables are in this order:
# z: slack variable for the max of (s1, s2, s3)
# s1: sum of times for machine 1
# s2: sum of times for machine 2
# s3: sum of times for machine 3
# a1-a20: booleans for assignment to machine1
# b1-b20: booleans for assignment to machine1
# c1-c20: booleans for assignment to machine1

obj <- c(1, rep(0, 3 + 3*N))

mat <- rbind(
  c(1, -1, 0, 0, rep(0, M*N)),                      # z >= s1
  c(1, 0, -1, 0, rep(0, M*N)),                      # z >= s2
  c(1, 0, 0, -1, rep(0, M*N)),                      # z >= s3
  c(0, -1, 0, 0, ItemTime,  rep(0, N), rep(0, N)),  # s1 = ...
  c(0, 0, -1, 0, rep(0, N), ItemTime,  rep(0, N)),  # s2 = ...
  c(0, 0, 0, -1, rep(0, N), rep(0, N), ItemTime),   # s3 = ...
  cbind(matrix(0, N, 4), diag(N), diag(N), diag(N)) # a_i + b_i + c_i = 1
)

dir <- c( ">=", ">=", ">=", "==", "==", "==" , rep("==", N))

rhs <- c(rep(0, 2*M), rep(1, N))

types <- c(rep("C", 1+M), rep("B", M*N))

Now let's solve it:

Rglpk_solve_LP(obj, mat, dir, rhs, types, max=FALSE, verbose=TRUE)

# GLPK Simplex Optimizer, v4.47
# INTEGER OPTIMAL SOLUTION FOUND
# $optimum
# [1] 244
# 
# $solution
#  [1] 244 243 243 244   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0   0   0   1   1   0   0
# [31]   1   1   1   0   1   0   0   0   1   0   1   0   1   0   1   0   0   0   1   1   0   0   0   1   0   1   0   1   0   1
# [61]   0   0   0   1
# 
# $status
# [1] 0

OTHER TIPS

Edit: Obviously this problem is slightly different to the one I remember, because as @han shows, this algorithm is not optimal, merely an approximation (although there is a guarantee that the "makespan" from this algorithm is never more than 4/3 * optimal makespan in general and 11/9 * optimal for 3 machines.).

The link @han provided linked to the multiprocessor scheduling article, which is precisely equivalent to this problem. The article tells us that the problem is actually NP-hard. Which means there is no polynomial time algorithm to compute the optimal answer (much less a O(n log n) like we have here).

You can just use a greedy algorithm: go through the list and assign the job that takes the longest to the machine that currently has the least work allocated to it.

As an example, consider just having c(5,3,6,1,2) as your part manufacturing times. First you sort this into decreasing order: c(6,5,3,2,1), and then go through it (in order) assigning tasks.

     Step:  1    2    3    4    5
Machine 1:  6    6    6    6    6
Machine 2:  -    5    5    5    5,1
Machine 3:  -    -    3    3,2  3,2

So machine 1 makes the part that takes 6 minutes, machine 2 makes the ones that take 1 and 5 minutes and machine 3 makes the one that takes 3 and 2.

You can see that in step 4, the machine with the shortest total time is machine 3 so that is why we assigned the 2 to it.

~~From memory, this algorithm is actually optimal; although I don't have a link for that claim. I also don't know if you can adapt it to get all possible optimal results.~~

If you define a function that takes one job and a list of the machines with their current jobs, you can use Reduce to assign all the jobs. The single job assignment might look something like:

assign.job <- function(machines, job) {
    which.machines <- which.min(lapply(machines, sum))
    machines[[which.machines]] <- c(machines[[which.machines]], job)
    machines
}

(I don't like the machines[[which.machines]] line... I'm sure there is a better way to modify a list at a specific index.)

And then the assignment could be like:

allocate <- function(num.machines, job.times) {
    machines <- lapply(1:num.machines, function(...) c())
    Reduce(assign.job,
           sort(job.times, decreasing=TRUE),
           machines)
}

(I don't like the line that starts machines <-: I'm sure there is a neater way of creating a list of length n, but I can't find it.)

On your example, this gives:

> allocate(3,ItemTime)
[[1]]
[1] 84 58 46 45  8  3     # total time: 244

[[2]]
[1] 84 55 55 21 16  8  5  # total time: 244

[[3]]
[1] 75 65 48 28 12 11  3  # total time: 242

The final step is working out which job corresponds to which time: this can either be done by working backwards after assigning the times (this can be done in approximately linear time, since there is a relatively simple mapping from times to job and vice versa) or by modifying allocate and assign.job to keep track of the indices of the jobs (if you are going to do this then the order function will be more useful than sort, and using dataframes instead of vectors, to store times in one column and indices in another).

(It should be noted that this solution is several times faster than the other, since the other answer is using higher powered algorithms, which are ~~possibly~~ not overkill for this problem... ~~"possibly" because I'm still not 100% sure this algorithm is optimal.~~)

As noted in other answers this is related to the bin packing problem. A simple bin packing algorithm is already implemented in the BBmisc package; we can apply this existing function here for a simple and fast solution:

library(BBmisc)

binlimit <- 3 # specify how many bins
binsize <- ceiling(sum(ItemTime)/binlimit) # calculate the minimum possible bin size (244)
binPack(ItemTime, binsize) # pack the bins

 [1] 3 1 2 3 3 2 2 3 3 3 2 3 1 3 2 3 3 1 3 3

In this case, it instantly produces an optimal solution using 3 bins. We can confirm the solution's bin sizes:

library(dplyr)
df <- data.frame(ItemTime, bins)
df %>% group_by(bins) %>% summarise (time = sum(ItemTime))

  bins time
1    1  243
2    2  244
3    3  243

If binPack produces an initial solution using too many bins, it can be placed in a for-loop that increments the binsize and only returns a solution when there are no more than 3 bins in the output of binPack.

Interestingly, binPack returns a solution with the same bin sizes as flodel's answer above, but with different assignments in bins 2 and 3:

   ItemTime Rglpk binPack
1         3     3       3
2        75     1       1
3        55     2       2
4        12     2       3
5        45     3       3
6        55     3       2
7        11     2       2
8         8     2       3
9        21     2       3
10       16     3       3
11       65     2       2
12       28     3       3
13       84     1       1
14        3     3       3
15       58     2       2
16       46     3       3
17        5     2       3
18       84     1       1
19        8     2       3
20       48     3       3

While binPack provides a fast and simple way to solve this problem, its description notes that this algorithm is simple and may not return the optimal solution:

Maps numeric items in x into groups with sum less or equal than capacity. A very simple greedy algorithm is used, which is not really optimized for speed. This is a convenience function for smaller vectors, not a competetive solver for the real binbacking problem. If an element of x exceeds capacity, an error is thrown.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow