See the original problem on HackerRank.
Solutions
This problem looks like a classical puzzle: merging overlapping intervals.
A naive approach is \(O(N^2)\) and consists in checking each route with the others. In case they overlap, remove the other route from the array and merge both into the first one.
The crucial condition to check if two routes overlap is (assuming route1
precedes route2
):
|
|
For example:
|
|
1 5
overlaps with 3 6
, indeed 5 >= 3
.
Visually:
|
|
This idea is fine but the overall solution is quite expensive because we need to check if every route overlaps with all the others (the code is left to the reader).
A better (and optimal) approach enables us to dramatically reduce the number of checks and consists in sorting the routes according to the starting point and then checking if a route overlaps only with the following one. The intuition is: if a route does not overlap with the next one then it can’t overlap with any, since the array is sorted!
When two routes are merged, we just have to correctly calculate the ending point. For example:
|
|
They overlap, however the ending point is max(8, 5)
. This simple rule applies until we found two disjointed routes.
Here is a solution in C++:
|
|
It’s interesting to see that an in-place algorithm (one not needing support storage) is very similar:
|
|
The solution costs \( O(N \cdot log(N)) \).
Here is a solution in Python by Andrea Battistello:
|
|
And here follows another that prints the result without using additional space by Yuri Valentini:
|
|
Playing with patterns
The proposed solution works nicely, yet we can deliberately stress it to have fun and to learn something new.
We start from this example:
|
|
That we visualize as follows:
|
|
Suppose we create a support array containing the intermdiate results of the algorithm we have seen before:
|
|
If we have this array, we can easily remove adjacent pairs having the very same starting point and keep only the last one:
|
|
Becomes:
|
|
The latter part is easy and can be written in terms of erase-unique idiom applied backwards:
|
|
Bear with me, reverse iterators involve some boilerplate. Practically speaking, we go through adjacent pairs and “remove” those sharing the same first value (starting point). erase
does the actual removal, wherease unique
rearranges the values such that equal pairs make way for others. This part is probably more interesting for C++ developers but might be useful to anyone.
How merged
is built is actually more fun. Maybe you already got it…we are constructing a running data structure. When you hear about running somehing, a partial sum knocks on the door! We can actually construct such an array this way:
|
|
In other words, sum
is a sort of accumulator that holds the current running route. We check this against the next route and if they overlap then we just update the accumulator, otherwise the current route becomes the accumulator. Note that when the pair is created (std::pair{sum.first, max(sum.second, i.second)}
), sum.first
is always correct since the input is sorted. Here is the complete solution:
|
|
Sometimes we end up with patterns-based solutions that are worse than explicit for loops. Never mind, the journey to get here involved some good work for our brain. We have visualized some patterns and also understood some limitations. For example: partial_sum
makes a full (linear) scan of the array. Can we do things in a single pass instead (as the for-loop solution does)?
Sometimes the answer to such a question is “no, we can’t do better”. This is not the case, though.
We can call C++ ranges to the rescue. Actually, we use range-v3 since it contains more combinators to have fun with. The code above is convoluted because std::unique
filters adjacent equal elements out but the first (e.g. applied on a simple array, 1 2 3 3 4 4 4 5
it filters out the second 3
and the second and third 4
). That’s why we need to call unique
backwards. The good news is that range-v3 (and hopefully standard C++ ranges in the future) introduces adjacent_remove_if
that works the other way around. Thus, we end up with this nice piece of code:
|
|
The code above works lazily, that is: when the range-based for loop pulls out the first element of merged
, adjacent_remove_if
pulls two elements out of the underlying range that pulls two elements out of the partial_sum
, and so on. In other words, everything is a single pass.
Try it on Compiler Explorer.
There is another solution that is actually worse in terms of number of iterations but might be useful for some practical reasons.
Suppose we not only want to merge routes (intervals) together, but also group them (e.g. we want to visualize which routes get merged). We want to group those routes satisfying the property:
|
|
For example:
|
|
We have that 1 2
can’t be grouped with anyone, likewise 9 10
. On the other hand, 4 6
, 5 8
and 6 7
can be all grouped together, indeed:
6 >= 5 and 8 >= 6
We want to obtain:
|
|
This is fine for visualizing things. However, we can also pick from each subrange the first starting element and also the maximum ending point among all the routes and we will end up with merging all the overlapping routes!
Here are all the starting points:
|
|
Here are all the ending points:
|
|
Indeed, the final range is:
|
|
This leads to another solution that is much more intuitive than that based on partial_sum
. Indeed, both range-v3 and C++23 support a view that groups things together: chunk_by (in the past known as group_by
). Once we group routes together, we then just need to construct the new range where each element consists of the first starting point and the maximum among all the ending points:
|
|
for_each
lazily generates the final range from intermediate subranges (it’s a list comprehension, practically speaking). Although this solution is quite compact and relatively elegant, it adds an extra cost to calculate the maximum element for each subrange (that is succinctly obtained with *max_element(rng | views::values)
).
Try it on Compiler Explorer.