See the original problem on HackerRank.
Solutions
This is a problem on intervals, such as this and this.
In this challenge, the main point is to find an efficient way to insert an interval into an array of already sorted and non-overlapping intervals. The naive solution is to just insert it at the end of the array, sort again and finally merge new overlapping pairs. The cost is in the order of \(O(N \cdot logN)\) (and includes also the possible hidden cost of relocating the array, in case the space is not enough).
A better solution makes advantage of the domain of the problem to find the position of the new interval more quickly. This way, sorting and relocating will be avoided. Remember that the intervals are already sorted and non-overlapping.
First of all, we need to find if the new interval merges with others or not. For example, suppose we have this intervals
|
|
and we are inserting 2 5
. In this case, 2 5
will merge with 1 4
somehow. On the other hand, 5 6
or 16 20
do not tie with anything.
In other words, if the new interval does not overlap with anyone else, it will be inserted as it is. Otherwise, the new interval will make room for another larger interval that merges together all the “affected” intervals. For example, if we insert 2 5
in the example above, the first interval will be replaced with 1 5
. If we insert 2 10
, this time both 1 4
and 7 9
will be replaced with a single larger interval 1 10
. Thus, the larger interval starts at the minimum between the new interval start and the start of the first of the intervals to be replaced, and ends at maximum between the new interval end and the end of the last of the intervals to be replaced.
Think about that for a moment and try some more examples.
At this point, the question is: how can we find the first and the last of the intervals to be replaced?
After all, intervals are sorted. This means, we can binary search the array for the new interval. We have an opportunity here: if we use a special predicate to compare the intervals, we can easily find the entire range of pairs that will be changed by the new interval.
Consider again the example above:
|
|
Inserting 2 5
will only make changes on 1 4
. On the other hand, inserting 2 10
will affect both 1 4
and 7 9
. What is this special predicate?
Well, consider the overlapping condition we have seen in other challenges (assiming interval1
precedes interval2
):
|
|
If this condition is true then the two intervals overlap, otherwise they are disjoint.
In C++, equivalence and equality are two different concepts. Operations on sorted sequences consider equivalence to determine the relation between two elements. Their default predicate is the so-called less operator (<
) that is used to check if two elements are the same if neither of the two is less than the other. For example if both A < B
and B < A
are false
, then we can assume A=B
.
In our scenario, two elements should be the same if they overlap. So we must make the overlapping condition return false
to say “they overlap”. This means, changing the condition this way (just swapping >=
with <
):
|
|
A few tests:
1 4
overlaps with 2 5
, indeed 4 < 2
and 5 < 1
are both false
. On the other hand, 1 4
does not overlap with 7 9
, indeed 4 < 7
is true
. But 2 10
overlap with both 1 4
and 7 9
, indeed 10 < 1
and 4 < 2
are false
, and 10 < 7
and 9 < 2
are false
.
Back to the original question, how can we find the first and the last of the intervals to be replaced?
In C++, we have lower_bound
that solves the problem for the first element of the interval. Basically, it’s a binary search that, given a value, finds the first element in the sequence that is not less than the value. For example, lower_bound([1 5 10], 3)
, is 5
because 5
is the first element not less than 3
. Similarly, lower_bound([1 5 10], 5)
is just 5
because 5
is not less than 5
! On the other hand lower_bound([1 5 10], 20)
can’t be found (aka: it returns the end of the range, that is non-dereferencable).
The last of the intervals to be replaced actually correspond to the first non-overlapping element. This means, finding the first element that is strictly greater than the value. In C++ we have upper_bound
for that. upper_bound([1 5 10], 3)
is still 5
, but lower_bound([1 5 10], 5)
is 10
. In terms of the overlapping condition, this means finding the first element for which the condition is true
.
Actually, lower_bound
and upper_bound
are combined into the single function equal_range
that elegantly returns a pair containing both.
If the lower and upper bounds are the same, the new interval does not actually overlap with any other. Think about this for a moment considering a simple array of numbers: lower_bound([1 5 5 5 5 10], 3) = upper_bound([1 5 5 5 5 10], 3) = 5
means that 3
is not the same as any other elements in the array. On the other hand, lower_bound([1 5 5 5 5 10], 5) = 5 ≠ upper_bound([1 5 5 5 5 10], 5) = 10
means that 5
is equal to at least another element in the array.
Thus, same lower and upper bounds is the “easy” case: just insert the element into the found position that will be the correct one (not less than any other).
On the other hand, if the lower and upper bounds differ, it means that this range should be merged together as we have said before:
- the new start corresponds to the minimum of the starts (new interval and lower bound)
- the new end corresponds to the maximum of the ends (new interval and the interval before the upper bound - remember that the upper bound points to the first non-overlapping interval)
- remember to remove all the intervals in the middle
Here is the complete C++ code:
|
|
The cost of this solution is linear (equal_range
is logarithmic and, in the worst case, insert might relocate the entire vector and also erase might end up erasing all the elements - both linear).
Another solution in Python by Andrea Battistello and Gabriele Russo:
|
|
Another one in Rust that works online by Maria Rosaria Zitelli and “Eddie”:
|
|
Another solution in Javascript by Simone Busoli:
|
|