Question

Let's say we have a directed graph. We want to visit every node exactly once by traveling on the edges of this graph. Every node is annotated with one or more tags; some nodes may share tags, and even have the exact same set of tags. As we go along our walk, we are collecting a list of every distinct tag we have encountered - our objective is to find the walk which postpones acquisition of new tags as much as possible.

To restate this as a traveler analogy, let's say that a carpet salesman is trying to decide which supplier he should acquire his carpets from. He makes a list of all the carpet factories in the city. He makes an appointment witht every factory, and collect samples of the kinds of carpet they make.

Let's say we have 3 factories, producing the following kinds of carpet:

F1: C1, C2, C3
F2: C1, C4
F3: C1, C4, C5

The salesman could take the following routes:

  1. Start at F1, collect C1, C2, C3. Go to F2, collect C4 (since he already has C1). Go to F3, collect C5 (he already has C1 and C4).
  2. Start at F1, collect C1, C2, C3. Go to F3, collect C4 and C5. Go to F2, collect nothing (since it turns out he already has all their carpets).
  3. Start at F2, collect C1, C4. Go to F1, collect C2, C3. Go to F3 and collect C5.
  4. Start at F2, collect C1, C4. Go to F3, collect C5. Go to F1 and collect C3.
  5. Start at F3, collect C1, C4, C5. Go to F1, collect C2, C3. Go to F2, collect nothing.
  6. Start at F3, collect C1, C4, C5. Go to F2, collect nothing. Go to F1, collect C2, C3.

Note how sometimes, the salesman visits a factory even though he knows he has already collected a sample for every kind of carpet they produce. The analogy breaks down here a bit, but let's say he must visit them because it would be rude to not show up for his appointment.

Now, the carpet samples are heavy, and our salesman is traveling on foot. Distance by itself isn't hugely important (assume every edge has cost 1), but he doesn't want to carry around a whole bunch of samples any more than he needs to. So, he needs to plan his trip such that he visits the factories which have a lot of rare carpets (and where he will have to pick up a lot of new samples) last.

For the example paths above, here are the numbers of samples carried at each leg of the journey (columns 2-4), and the sum (column 5).

1    0    3    4    7
2    0    3    5    8
3    0    2    4    6
4    0    2    3    5
5    0    3    5    8
6    0    3    3    6

We can see now that route 2 is very bad: first he had to carry 3 sample from F1 to F3, then he had to carry 5 samples from F3 to F2! Instead, he could have went with route 4 - he would carry first 2 samples from F2 to F3, and then 3 samples from F3 to F1.

Also, as shown in the last column, the sum of the samples carried through every edge is a good metric for how many samples he had to carry overall: The number of samples he is carrying cannot decrease, so visiting varied factories early on will necessarily inflate the sum, and a low sum is only possible by visiting similar factories with few carpets.

Is this a known problem? Is there an algorithm to solve it?

Note: I would recommend being careful about making assumptions based on my example problem. I came up with it on the spot, and deliberately kept it small for brevity. It is certain there are many edge cases that it fails to catch.

Was it helpful?

Solution

As the size of the Graph is small, we can consider using bit-mask and dynamic programming to solve this problem (Similar with how we solve the traveling salesman problem)

Assume that we have total 6 cities to visit. So the starting state is 0 and the ending is 111111b or 127 in decimal.

From each step, if the state is x, we can easily calculate the number of sampling the salesman is carrying, and the cost from state x to state y will be the number of newly added samples from x to y times the number of unvisited cities .

 public int cal(int mask) {
    if (/*Visit all city*/) {
        return 0;
    }

    HashSet<Integer> sampleSet = new HashSet();//Store current samples
    int left = 0;//Number of unvisited cities
    for (int i = 0; i < numberOfCity; i++) {
        if (((1 << i) & mask) != 0) {//If this city was visited
            sampleSet.addAll(citySample[i]);
        } else {
            left++;
        }
    }
    int cost;
    for (int i = 0; i < numberOfCity; i++) {
        if (((1 << i) & mask) == 0) {
            int dif = number of new sample from city i;
            cost = min(dif * left + cal(mask | (1 << i));

        }
    }
    return cost;
}

OTHER TIPS

In the case where there are edges between every pair of nodes, and each carpet is only available at one location, this looks tractable. If you pick up X carpets when there are Y steps to go, then the contribution from this to the final cost is XY. So you need to minimise SUM_i XiYi where Xi is the number of carpets picked up when you have Yi steps to go. You can do this by visiting the factories in increasing order of the number of carpets to be picked up at that factory. If you provide a schedule in which you pick up more carpets at A than B, and you visit A before B, I can improve it by swapping the times at which you visit A and B, so any schedule that does not follow this rule is not optimal.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top