See the original problem on HackerRank.
Given an unsorted array of \( n-1 \) elements, it is missing exactly one element from the sequence of \( 1 \) to \( n\). Find the missing element.
Input Format
An integer \( n \) followed by \( n-1 \) space separated numeric values.
Constraints
\( 1<n<10^6 \)
Output Format
The missing value.
Solutions
This is an example of problems that we call manifold at Coding Gym, since they give us room to find several alternative solutions.
Sort + Search
We sort the input array and we linearly scan the sequence to find the missing value. For example: In C++:
|
|
In Haskell:
|
|
Pros:
- easy to extend to a more generic domain (e.g. if we have a baseline array, we can easily adapt this solution to locate the element missing in the second array)
- with a small effort, can work with duplicates
Cons:
- we pay sort - \( O(N \cdot logN) \)
- requires modifying the array (what if not permitted?)
Variant: use binary search to locate the missing value, since the array is sorted.
Optimized sort + search
Since the domain is \( [1-N] \), we can actually sort in \( O(N) \) by applying Cycle sort (we can even sort with Counting sort or such, however those require some additional storage):
|
|
Pros:
- it’s linear
- minimal number of element writes
Cons:
- we do modify the array
- if the domain changes, cycle sort might be inconvenient (in general, it’s quadratic)
Additional Storage
We could keep track of elements into a support storage (e.g. a flag array). This is just an example:
|
|
Pros:
- it’s linear
- we do not modify the input array
- easy to extend to a more generic domain and also to duplicates (e.g. turn the vector into a frequency array or map)
Cons:
- we pay additional storage
Note: since the domain is relatively small, we can even use constant storage:
|
|
Mathematics
We can exploit the problem domain to the limit.
- calculate the sum of all the numbers from \( 1 \) to \( N \). Call it \( expected \)
- calculate the sum of input numbers. Call if \( sum \)
- the missing value is just \( expected - sum \)
It’s interesting to note that the sum from \( 1 \) to \( N \) is the Gauss Formula:
\( \dfrac{n(n+1)}{2} \)
In C++:
|
|
Pros:
- efficient
- no extra space
- we do not modify the input array
- the same idea (summation, not Gauss) can be reused to solve the same problem in more generic domains
Cons:
- what if \( \dfrac{n(n+1)}{2} \) overflows?
In Haskell there is no overflow, because the type Integer is unbounded (also known as bignum).
|
|
A C++ variation that does not suffer for integer overflow:
|
|
XOR
The following solution is very similar to the previous one but it never overflows.
It’s based on XOR property:
- \( A \oplus B = C \)
- \( A \oplus C = B \)
- \( B \oplus C = A \)
That’s the idea:
- suppose \( A \) is the result of XORing all the input elements
- suppose \( B \) is the missing value
- the result of XORing \( A \oplus B \) is \( C \)
- then \( C \) is the result of XORing all the elements from 1 to N
- we can easily find \( B \) as \( A \oplus C \)
Talk is cheap, let’s look at some code:
|
|
In Haskell, but reducing with xor on a single list: the concatenation of the
original list plus numbers from 1 to n.
|
|
Pros:
- never overflows
- the same idea can be reused to solve the same problem in more generic domains
Cons:
- tricky ;)
Marking array elements
Another - a bit hackish - solution to solve the problem consists in marking array elements as negative by using array elements as indexes.
For example:
|
|
The first pass of the algorithm marks as negative those numbers whose indexes are elements of the array (they are not missing elements):
5can’t lead to any numbers marked because it would go out of bounds1leads to marking5as-5(because5is at1’s position that is0)3leads to marking3as-3(because3is in the right position)2leads to marking1as-1(because1is at2’s position that is1)
So, after this first pass, the array will look like:
|
|
The second part of the algorithm checks if any of the numbers is not negative. If any, the missing element is just the next index (because the domain starts from 1). In this case above, the missing element is 4 because 2 is positive and lies at index 3 (so the solution is the number 3+1).
The only exception is when, after the first pass, all the elements get negative. Here, the missing element is just N:
|
|
After the first pass:
|
|
The missing element is 5 (N=5).
Talk is cheap, show me the code:
|
|
Side note: we can restore the original array by turning negative elements positive.
Pros:
- still linear
- can’t overflow
Cons:
- not easily generalizable