Running routes

See the original problem on HackerRank.

Solutions

This problem looks like a classical puzzle: merging overlapping intervals.

A naive approach is \(O(N^2)\) and consists in checking each route with the others. In case they overlap, remove the other route from the array and merge both into the first one.

The crucial condition to check if two routes overlap is (assuming route1 precedes route2):

1
route1.end >= route2.begin

For example:

1
2
3
1 5
3 6
10 32

1 5 overlaps with 3 6, indeed 5 >= 3.

Visually:

1
2
3
1 2 3 4 5 6 7 8 ...
_________
    _______

This idea is fine but the overall solution is quite expensive because we need to check if every route overlaps with all the others (the code is left to the reader).

A better (and optimal) approach enables us to dramatically reduce the number of checks and consists in sorting the routes according to the starting point and then checking if a route overlaps only with the following one. The intuition is: if a route does not overlap with the next one then it can’t overlap with any, since the array is sorted!

When two routes are merged, we just have to correctly calculate the ending point. For example:

1
2
2 8
3 5

They overlap, however the ending point is max(8, 5). This simple rule applies until we found two disjointed routes.

Here is a solution in C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int n;
cin >> n;
vector<pair<int, int>> routes(n);
for (auto i=0; i<n; ++i)
{
    cin >> routes[i].first >> routes[i].second;
}

sort(begin(routes), end(routes));
vector<pair<int, int>> merged;
auto j = 0u;
for (auto i=1u; i<routes.size(); ++i)
{
    if (routes[j].second >= routes[i].first) // do they overlap?
    {
        routes[j].second = max(routes[i].second, routes[j].second); // update ending point 
    }
    else
    {
        merged.push_back(routes[j]); // routes[i] and (current) routes[j] are disjointed 
        j = i;
    }
}
merged.push_back(routes[j]);

for (const auto& [i, j] : merged)
    cout << i << " " << j << "\n";

It’s interesting to see that an in-place algorithm (one not needing support storage) is very similar:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
int n;
cin >> n;
vector<pair<int, int>> routes(n);
for (auto i=0; i<n; ++i)
{
    cin >> routes[i].first >> routes[i].second;
}

sort(begin(routes), end(routes));
auto j = 0u;
for (auto i=1u; i<routes.size(); ++i)
{
    if (routes[j].second >= routes[i].first)
    {
        routes[j].second = max(routes[i].second, routes[j].second); 
    }
    else
    {
        ++j;
        routes[j] = routes[i];
    }
}
routes.resize(j + 1);

for (const auto& [i, j] : routes)
    cout << i << " " << j << "\n";

The solution costs \( O(N \cdot log(N)) \).

Here is a solution in Python by Andrea Battistello:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
N = int(input())
pairs = sorted([list(map(int, input().split())) for _ in range(N)])

current_pairs=[pairs[0]]
for s,e in pairs[1:]:
    if s <= current_pairs[-1][1]:
        current_pairs[-1][1] = max(e, current_pairs[-1][1])
    else:
        current_pairs += [[s,e]]

print('\n'.join([f'{a} {b}' for a,b in current_pairs]))

And here follows another that prints the result without using additional space by Yuri Valentini:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
N = int(input())
ROUTES = []
for i in range(N):
    inizio, fine = map(int, input().split())
    ROUTES.append((inizio, fine))
    
ordinato = sorted(ROUTES, key=lambda x: x[0])
prev_inizio, prev_fine = ordinato[0]
for inizio, fine in ordinato[1:]:
    if prev_fine >= inizio:
        prev_fine = max(fine, prev_fine)
    else:
        print(prev_inizio, prev_fine)
        prev_inizio, prev_fine = inizio, fine        

print(prev_inizio, prev_fine)

Playing with patterns

The proposed solution works nicely, yet we can deliberately stress it to have fun and to learn something new.

We start from this example:

1
2
3
4
5
1 2
4 6
5 8
6 7
9 10

That we visualize as follows:

1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
___
      _____
        _______ 
          ___
                ____

Suppose we create a support array containing the intermdiate results of the algorithm we have seen before:

1
2
3
4
5
1 2  -> does not overlap
4 6  -> overlap start
4 8  -> overlapping (new ending found)
4 8  -> overlapping (not contributing at all since totally contained in current overlap)
9 10 -> does not overlap

If we have this array, we can easily remove adjacent pairs having the very same starting point and keep only the last one:

1
2
3
4
5
1 2  <- to keep
4 6  <- to remove
4 8  <- to remove
4 8  <- to keep
9 10 <- to keep

Becomes:

1
2
3
1 2
4 8
9 10

The latter part is easy and can be written in terms of erase-unique idiom applied backwards:

1
2
3
4
// we'll get to this "merged" array in a second, suppose we have it
merged.erase(begin(merged), unique(rbegin(merged), rend(merged), [](const auto& i1, const auto& i2){
    return i1.first == i2.first; // do they share the very same starting point?
}).base());

Bear with me, reverse iterators involve some boilerplate. Practically speaking, we go through adjacent pairs and “remove” those sharing the same first value (starting point). erase does the actual removal, wherease unique rearranges the values such that equal pairs make way for others. This part is probably more interesting for C++ developers but might be useful to anyone.

How merged is built is actually more fun. Maybe you already got it…we are constructing a running data structure. When you hear about running somehing, a partial sum knocks on the door! We can actually construct such an array this way:

1
2
3
4
vector<pair<int, int>> merged;
partial_sum(begin(routes), end(routes), back_inserter(merged), [](auto sum, auto i){
    return sum.second >= i.first ? pair{sum.first, max(sum.second, i.second)} : i;
});

In other words, sum is a sort of accumulator that holds the current running route. We check this against the next route and if they overlap then we just update the accumulator, otherwise the current route becomes the accumulator. Note that when the pair is created (std::pair{sum.first, max(sum.second, i.second)}), sum.first is always correct since the input is sorted. Here is the complete solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
int n;
cin >> n;
vector<pair<int, int>> routes(n);
for (auto i=0; i<n; ++i)
{
    cin >> routes[i].first >> routes[i].second;
}
sort(begin(routes), end(routes));
vector<pair<int, int>> merged;

partial_sum(begin(routes), end(routes), back_inserter(merged), [](auto sum, auto i){
    return sum.second >= i.first ? pair{sum.first, max(sum.second, i.second)} : i;
});

merged.erase(begin(merged), unique(rbegin(merged), rend(merged), [](const auto& i1, const auto& i2){
    return i1.first == i2.first;
}).base());

for (const auto& [i, j] : merged)
    cout << i << " " << j << "\n";

Sometimes we end up with patterns-based solutions that are worse than explicit for loops. Never mind, the journey to get here involved some good work for our brain. We have visualized some patterns and also understood some limitations. For example: partial_sum makes a full (linear) scan of the array. Can we do things in a single pass instead (as the for-loop solution does)?

Sometimes the answer to such a question is “no, we can’t do better”. This is not the case, though.

We can call C++ ranges to the rescue. Actually, we use range-v3 since it contains more combinators to have fun with. The code above is convoluted because std::unique filters adjacent equal elements out but the first (e.g. applied on a simple array, 1 2 3 3 4 4 4 5 it filters out the second 3 and the second and third 4). That’s why we need to call unique backwards. The good news is that range-v3 (and hopefully standard C++ ranges in the future) introduces adjacent_remove_if that works the other way around. Thus, we end up with this nice piece of code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sort(routes);
auto merged = views::partial_sum(routes, [](auto sum, auto i){
    return sum.second >= i.first ? std::pair{sum.first, max(sum.second, i.second)} : i;
}) | views::adjacent_remove_if([](auto i1, auto i2){
    return i1.first == i2.first;
});

for (auto [i,j] : merged)
{
    std::cout << i << " " << j << "\n";
}

The code above works lazily, that is: when the range-based for loop pulls out the first element of merged, adjacent_remove_if pulls two elements out of the underlying range that pulls two elements out of the partial_sum, and so on. In other words, everything is a single pass.

Try it on Compiler Explorer.

There is another solution that is actually worse in terms of number of iterations but might be useful for some practical reasons.

Suppose we not only want to merge routes (intervals) together, but also group them (e.g. we want to visualize which routes get merged). We want to group those routes satisfying the property:

1
route1.end >= route2.begin

For example:

1
2
3
4
5
1 2
4 6
5 8
6 7
9 10

We have that 1 2 can’t be grouped with anyone, likewise 9 10. On the other hand, 4 6, 5 8 and 6 7 can be all grouped together, indeed:

6 >= 5 and 8 >= 6

We want to obtain:

1
[{1, 2}], [{4,6}, {5,8}, {6,7}], [{9, 10}]

This is fine for visualizing things. However, we can also pick from each subrange the first starting element and also the maximum ending point among all the routes and we will end up with merging all the overlapping routes!

Here are all the starting points:

1
2
[{1, 2}], [{4,6}, {5,8}, {6,7}], [{9, 10}]
  ^         ^                      ^

Here are all the ending points:

1
2
[{1, 2}], [{4,6}, {5,8}, {6,7}], [{9, 10}]
     ^               ^                 ^

Indeed, the final range is:

1
[{1, 2}, {4, 8}, {9, 10}]

This leads to another solution that is much more intuitive than that based on partial_sum. Indeed, both range-v3 and C++23 support a view that groups things together: chunk_by (in the past known as group_by). Once we group routes together, we then just need to construct the new range where each element consists of the first starting point and the maximum among all the ending points:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sort(routes);
auto merged = views::chunk_by(routes, [](const auto& i1, const auto& i2){
	return i1.second >= i2.first;
}) | views::for_each([](auto rng){
	return yield(std::pair{front(rng).first, *max_element(rng | views::values)});
});

for (auto [i,j] : merged)
{
	std::cout << i << " " << j << "\n";
}

for_each lazily generates the final range from intermediate subranges (it’s a list comprehension, practically speaking). Although this solution is quite compact and relatively elegant, it adds an extra cost to calculate the maximum element for each subrange (that is succinctly obtained with *max_element(rng | views::values)).

Try it on Compiler Explorer.

We've worked on this challenge in these gyms: modena 
comments powered by Disqus