See the original problem on HackerRank.
Given an unsorted array of \( n-1 \) elements, it is missing exactly one element from the sequence of \( 1 \) to \( n\). Find the missing element.
Input Format
An integer \( n \) followed by \( n-1 \) space separated numeric values.
Constraints
\( 1<n<10^6 \)
Output Format
The missing value.
Solutions
This is an example of problems that we call manifold at Coding Gym, since they give us room to find several alternative solutions.
Sort + Search
We sort the input array and we linearly scan the sequence to find the missing value. For example: In C++:
|
|
In Haskell:
|
|
Pros:
- easy to extend to a more generic domain (e.g. if we have a baseline array, we can easily adapt this solution to locate the element missing in the second array)
- with a small effort, can work with duplicates
Cons:
- we pay sort - \( O(N \cdot logN) \)
- requires modifying the array (what if not permitted?)
Variant: use binary search to locate the missing value, since the array is sorted.
Optimized sort + search
Since the domain is \( [1-N] \), we can actually sort in \( O(N) \) by applying Cycle sort (we can even sort with Counting sort or such, however those require some additional storage):
|
|
Pros:
- it’s linear
- minimal number of element writes
Cons:
- we do modify the array
- if the domain changes, cycle sort might be inconvenient (in general, it’s quadratic)
Additional Storage
We could keep track of elements into a support storage (e.g. a flag array). This is just an example:
|
|
Pros:
- it’s linear
- we do not modify the input array
- easy to extend to a more generic domain and also to duplicates (e.g. turn the vector into a frequency array or map)
Cons:
- we pay additional storage
Note: since the domain is relatively small, we can even use constant storage:
|
|
Mathematics
We can exploit the problem domain to the limit.
- calculate the sum of all the numbers from \( 1 \) to \( N \). Call it \( expected \)
- calculate the sum of input numbers. Call if \( sum \)
- the missing value is just \( expected - sum \)
It’s interesting to note that the sum from \( 1 \) to \( N \) is the Gauss Formula:
\( \dfrac{n(n+1)}{2} \)
In C++:
|
|
Pros:
- efficient
- no extra space
- we do not modify the input array
- the same idea (summation, not Gauss) can be reused to solve the same problem in more generic domains
Cons:
- what if \( \dfrac{n(n+1)}{2} \) overflows?
In Haskell there is no overflow, because the type Integer
is unbounded (also known as bignum).
|
|
A C++ variation that does not suffer for integer overflow:
|
|
XOR
The following solution is very similar to the previous one but it never overflows.
It’s based on XOR property:
- \( A \oplus B = C \)
- \( A \oplus C = B \)
- \( B \oplus C = A \)
That’s the idea:
- suppose \( A \) is the result of XORing all the input elements
- suppose \( B \) is the missing value
- the result of XORing \( A \oplus B \) is \( C \)
- then \( C \) is the result of XORing all the elements from 1 to N
- we can easily find \( B \) as \( A \oplus C \)
Talk is cheap, let’s look at some code:
|
|
In Haskell, but reducing with xor on a single list: the concatenation of the
original list plus numbers from 1 to n
.
|
|
Pros:
- never overflows
- the same idea can be reused to solve the same problem in more generic domains
Cons:
- tricky ;)
Marking array elements
Another - a bit hackish - solution to solve the problem consists in marking array elements as negative by using array elements as indexes.
For example:
|
|
The first pass of the algorithm marks as negative those numbers whose indexes are elements of the array (they are not missing elements):
5
can’t lead to any numbers marked because it would go out of bounds1
leads to marking5
as-5
(because5
is at1
’s position that is0
)3
leads to marking3
as-3
(because3
is in the right position)2
leads to marking1
as-1
(because1
is at2
’s position that is1
)
So, after this first pass, the array will look like:
|
|
The second part of the algorithm checks if any of the numbers is not negative. If any, the missing element is just the next index (because the domain starts from 1
). In this case above, the missing element is 4
because 2
is positive and lies at index 3
(so the solution is the number 3+1
).
The only exception is when, after the first pass, all the elements get negative. Here, the missing element is just N
:
|
|
After the first pass:
|
|
The missing element is 5
(N=5
).
Talk is cheap, show me the code:
|
|
Side note: we can restore the original array by turning negative elements positive.
Pros:
- still linear
- can’t overflow
Cons:
- not easily generalizable