Find the Missing Value

See the original problem on HackerRank.

Given an unsorted array of \( n-1 \) elements, it is missing exactly one element from the sequence of \( 1 \) to \( n\). Find the missing element.

Input Format

An integer \( n \) followed by \( n-1 \) space separated numeric values.

Constraints

\( 1<n<10^6 \)

Output Format

The missing value.

Solutions

This is an example of problems that we call manifold at Coding Gym, since they give us room to find several alternative solutions.

We sort the input array and we linearly scan the sequence to find the missing value. For example: In C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iterator>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;

int main() 
{
    int n; cin >> n;
    vector<int> v(n-1);
    copy_n(istream_iterator<int>(cin), n-1, begin(v));
    sort(begin(v), end(v));
    auto missing = n;
    for (auto i=1; i<n; ++i)
    {
        if (i != v[i-1])
        {
            missing = i;
            break;
        }
    }
    cout << missing;
}

In Haskell:

1
2
3
import Data.List (sort)
findTheMissingValue (_:xs) = head [ i | (x, i) <- zip (sort xs) [1..], i /= x ]
main = interact $ show . findTheMissingValue . fmap read . words

Pros:

  • easy to extend to a more generic domain (e.g. if we have a baseline array, we can easily adapt this solution to locate the element missing in the second array)
  • with a small effort, can work with duplicates

Cons:

  • we pay sort - \( O(N \cdot logN) \)
  • requires modifying the array (what if not permitted?)

Variant: use binary search to locate the missing value, since the array is sorted.

Since the domain is \( [1-N] \), we can actually sort in \( O(N) \) by applying Cycle sort (we can even sort with Counting sort or such, however those require some additional storage):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int N;
cin >> N;
vector<int> v(N);
copy_n(std::istream_iterator<int>(cin), N - 1, begin(v));
auto i = 0;
while (i<N) 
{
    auto correct = v[i] - 1;
    if (v[i] < N && v[i] != v[correct]) 
    {
        swap(v[i], v[correct]);
    }
    else 
    {
        i++;
    }
}

for (auto i = 0; i<N; i++) 
{
    if (i != v[i] - 1) {
        std::cout << i + 1;
        return 0;
    }
}

cout << N;

Pros:

  • it’s linear
  • minimal number of element writes

Cons:

  • we do modify the array
  • if the domain changes, cycle sort might be inconvenient (in general, it’s quadratic)

Additional Storage

We could keep track of elements into a support storage (e.g. a flag array). This is just an example:

1
2
3
4
5
vector<bool> hits(n, false); // C++ addicted will disapprove this!
for (auto val : v)
    hits[val-1] = true;
    
cout << (distance(begin(hits), find(begin(hits), end(hits), false))+1);

Pros:

  • it’s linear
  • we do not modify the input array
  • easy to extend to a more generic domain and also to duplicates (e.g. turn the vector into a frequency array or map)

Cons:

  • we pay additional storage

Note: since the domain is relatively small, we can even use constant storage:

1
2
3
4
5
6
7
8
int n, el;
array<bool, 1'000'000> hits;
for (auto i=0; i<n; ++i)
{
    cin >> el;
    hits[el-1] = true;
}
cout << (distance(begin(hits), find(begin(hits), end(hits), false))+1);

Mathematics

We can exploit the problem domain to the limit.

  • calculate the sum of all the numbers from \( 1 \) to \( N \). Call it \( expected \)
  • calculate the sum of input numbers. Call if \( sum \)
  • the missing value is just \( expected - sum \)

It’s interesting to note that the sum from \( 1 \) to \( N \) is the Gauss Formula:

\( \dfrac{n(n+1)}{2} \)

In C++:

1
2
3
auto sum = accumulate(begin(v), end(v), 0ull); // aka: reduce
auto expected = n*(n+1)/2;
cout << (expected - sum);

Pros:

  • efficient
  • no extra space
  • we do not modify the input array
  • the same idea (summation, not Gauss) can be reused to solve the same problem in more generic domains

Cons:

  • what if \( \dfrac{n(n+1)}{2} \) overflows?

In Haskell there is no overflow, because the type Integer is unbounded (also known as bignum).

1
2
3
findTheMissingValue :: [Integer] -> Integer
findTheMissingValue (n:xs) = (n * (n + 1) `div` 2) - sum xs
main = interact $ show . findTheMissingValue . fmap read . words

A C++ variation that does not suffer for integer overflow:

1
2
3
4
5
6
auto total = 1;
for (auto i=2; i<=n; ++i) 
{
    total = (total + i) - v[i - 2];
}
cout << total;

XOR

The following solution is very similar to the previous one but it never overflows.

It’s based on XOR property:

  • \( A \oplus B = C \)
  • \( A \oplus C = B \)
  • \( B \oplus C = A \)

That’s the idea:

  • suppose \( A \) is the result of XORing all the input elements
  • suppose \( B \) is the missing value
  • the result of XORing \( A \oplus B \) is \( C \)
  • then \( C \) is the result of XORing all the elements from 1 to N
  • we can easily find \( B \) as \( A \oplus C \)

Talk is cheap, let’s look at some code:

1
2
3
4
5
6
7
8
auto xorVals = accumulate(begin(v), end(v), 0, bit_xor<>{}); // aka: reduce
auto expected = 0;
// naive approach, for an efficient method see, for instance,
// here https://www.geeksforgeeks.org/calculate-xor-1-n/
for (auto i=1; i<=n; ++i)
    expected ^= i;
    
cout << (expected ^ xorVals);

In Haskell, but reducing with xor on a single list: the concatenation of the original list plus numbers from 1 to n.

1
2
3
4
import Data.Bits (xor)
findTheMissingValue :: [Integer] -> Integer
findTheMissingValue (n:xs) = foldl xor 0 ([1..n] ++ xs)
main = interact $ show . findTheMissingValue . fmap read . words

Pros:

  • never overflows
  • the same idea can be reused to solve the same problem in more generic domains

Cons:

  • tricky ;)

Marking array elements

Another - a bit hackish - solution to solve the problem consists in marking array elements as negative by using array elements as indexes.

For example:

1
5 1 3 2

The first pass of the algorithm marks as negative those numbers whose indexes are elements of the array (they are not missing elements):

  • 5 can’t lead to any numbers marked because it would go out of bounds
  • 1 leads to marking 5 as -5 (because 5 is at 1’s position that is 0)
  • 3 leads to marking 3 as -3 (because 3 is in the right position)
  • 2 leads to marking 1 as -1 (because 1 is at 2’s position that is 1)

So, after this first pass, the array will look like:

1
-5 -1 -3 2

The second part of the algorithm checks if any of the numbers is not negative. If any, the missing element is just the next index (because the domain starts from 1). In this case above, the missing element is 4 because 2 is positive and lies at index 3 (so the solution is the number 3+1).

The only exception is when, after the first pass, all the elements get negative. Here, the missing element is just N:

1
4 1 3 2

After the first pass:

1
-4 -1 -3 -2

The missing element is 5 (N=5).

Talk is cheap, show me the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int n; cin >> n;
vector<int> v(n-1);
copy_n(istream_iterator<int>(cin), n-1, begin(v));

for (int i = 0; i < n; i++)
{
    int absVal = abs(v[i]);
    if (absVal - 1 < n)
    {
        v[absVal - 1] = -v[absVal - 1];
    }
}

for (int i = 0; i < n; i++)
{
    if (v[i] > 0) 
    {
        cout << (i + 1);
        return 0;
    }
}

cout << n;

Side note: we can restore the original array by turning negative elements positive.

Pros:

  • still linear
  • can’t overflow

Cons:

  • not easily generalizable
We've worked on this challenge in these gyms: modena  padua  milan  barcelona  turin  bari  rome  polimi  lecce  brianza  latina 
comments powered by Disqus