Pairs

See the original problem on HackerRank.

You will be given an array of integers and a target value. Determine the number of pairs of array elements that have a difference equal to a target value.

For example, given an array of \( [1, 2, 3, 4] \) and a target value of \( 1 \), we have three values meeting the condition:

\( 2-1=1, 3-2=1, 4-3=1 \)

Input Format

The first line contains two space-separated integers and , the size of and the target value.

The second line contains space-separated integers of the array arr.

Constraints

  • \( 2 \le n \le 10^5 \)
  • \( 0 \lt k \lt 10^9 \)
  • \( 0 \lt arr[i] \lt 2^{31}-1 \)
  • each integer \( arr[i] \) will be unique

Output Format

An integer representing the number of pairs of integers whose difference is k.

Sample Input

1
2
5 2
1 5 3 4 2

Sample output

1
3

Explanation

There are 3 pairs of integers in the set with a difference of 2: [5,3], [4,2] and [3,1] .

Solutions

A quadratic solution is correct, but it’s too slow and causes some test cases to fail:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int count = 0;
for (auto i=0u; i<arr.size(); i++)
  {
      for (auto j=0; j<arr.size(); j++)
      {
          if (arr[i]-arr[j] == k)
          {
              count += 1;
              break;
          }
      }
  }

To find a better solution, we can turn the problem space of searching pairs into the space of searching one element for each iteration.

We need to find i - j = k, for each i and j. This means, for each i we just need to find if i - k is in the array.

The problem turns into a searching problem where looking up any element needs to be fast. We can use an efficient data struture for this purpose like tree-based sets (such as C++’s std::set or Java’s TreeSet), hash-based sets, etc, or we can simply sort the original array and use binary search.

It’s worth noting that we can make a quadratic solution that passes all the test cases:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
int count = 0;
sort(arr.begin(), arr.end());
for (auto i=0u; i<arr.size()-1; i++)
{
    for (auto j=i; j<arr.size(); j++)
    {
        if (arr[j]-arr[i] == k)
        {
            count += 1;
            break;
        }
    }
}
return count;

This is still quadratic but we are lucky enough with the problem test cases just because of sorting data beforehand! If we add an extra test case containing all the numbers from 2 to 100'000 and set k=100'001, we will have a timeout! Basically, we will end up visiting 100'000 elements 100'000 times!

We sort the input array and then we apply the idea above:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

using namespace std;

int main()
{
  int n, k;
  cin >> n >> k;
  vector<int> v(n);
  copy_n(istream_iterator<int>(cin), n, begin(v));
  sort(begin(v), end(v));

  // i - k = j
  auto cnt = 0;
  for (auto x : v)
  {
    if (binary_search(begin(v), end(v), x - k)) cnt++;
  }

  cout << cnt;
}

Using some standard algorithms:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
int main()
{
    int n, k; cin >> n >> k;
    vector<int> v(n);
    copy_n(istream_iterator<int>(cin), n, begin(v));
    sort(begin(v), end(v));
    cout << count_if(begin(v), end(v), [&](int curr){
        return binary_search(begin(v), end(v), curr-k);
    });
}

Looking at the previous solution, we notice that for each i, the binary search runs from the beginning of the array. If we change the number we search, we can start the binary search from the current position, cutting down the search space.

Instead of looking for i - k = j, we reverse the equation and look for j + k = i. Since the array is sorted, j+k is always increasing. This means it’s not possible to find i among the elements we have already looked into during the previous iterations.

Here is a working solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
int main()
{
  int n;
  cin >> n;
  int k;
  cin >> k;
  vector<int> data;
  for (size_t i = 0; i < n; i++)
  {
    int val;
    cin >> val;
    data.push_back(val);
  }
  sort(data.begin(), data.end());

  size_t count = 0;
  for (size_t i = 0; i < data.size(); ++i)
  {
    if (binary_search(data.begin() + i + 1, data.end(), data[i] + k)) count++;
  }
  cout << count;
}

An even more fine-tuned version has been proposed by Davide Malvezzi: the search range can be made smaller by setting the end at most k steps far from the beginning (the smaller k than n, the more convenient), and the beginning is either i + 1 or one element after the result of the last successful search.

For example, suppose we have:

1
1  4  7  9

And we look for pairs whose difference is k=6. When we start, the search spans from 1 to 1+k:

1
2
1  3  4  5  6  7  8  9
^                 ^

And we find 7:

1
2
1  3  4  5  6  7  8  9
^              *  ^

Then, we search for 3 + k, but it does not make any sense to start from 4 because the element can’t be found before the result of the previous hit (since the elements are sorted). Instead, we can start from 8 (that is one element next the result of the last hit) straight away:

1
2
1  3  4  5  6  7  8  9
                  ^  ^

And we find 9.

Here is the full solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
int pairs_count = 0;
auto prev_start = begin(arr);
sort(begin(arr), end(arr));

for(size_t i = 0; i < arr.size() - 1 && prev_start != prev(end(arr)); i++)
{
    auto start = max(begin(arr) + i + 1, prev_start + 1);
    auto end = min(start + k, arr.end());
    auto result = lower_bound(start, end, arr[i] + k);
    if(*result == arr[i] + k)
    {
        prev_start = result;
        pairs_count++;
    } 
}
return pairs_count;

Note that we early exit when prev_start == prev(end(arr)): if we reach the end of the array, no other element can contribute to the count (since they are all unique).

Although the complexity is still \( O(N \cdot log N) \), this solution is a perfect example of exploiting problem constraints (formally, the complexity is \( O(N \cdot log K) \) but in the worst case \( K > N \)).

An interesting related exercise consists in figuring out which test cases are more or less demanding for the solutions discussed above (e.g. in terms of runtime).

Using other data structures

Previously showed solutions are based on sort that is \( O(N \cdot logN) \) + iterating each element and performing binary search that still costs \( O(N \cdot logN) \). We can still apply linear sorting strategy (e.g. counting sort, radix sort), however the cost of performing the second part of the algorithm is still the same.

As said before, alternative solutions involve using other data structures such as maps, sets and dictionaries because we want to optimize looking up i - k = j (or j + k = i). Hash-based associative containers might decrease the overall complexity of the solution down to \( O(N) \), on average.

Here is an elegant solution in Erlang by Giancarlo Valente:

1
2
3
pairs(K, Arr) ->
    S = sets:from_list(Arr),
    length([X || X <- Arr, sets:is_element(X + K, S)]).

The same in Python by kernelfolla:

1
2
3
def pairs(k, arr):
    s = set(arr)
    return len([x for x in arr if x+k in s])

Same patterns in C++:

1
2
3
4
5
6
7
8
int N, K; cin >> N >> K;
unordered_set<int> S{
    istream_iterator<int>(cin),
    istream_iterator<int>()};

cout << count_if(begin(S), end(S), [&](int i){
   return S.count(i-K);
});

Using arrays intersection

We can use a support array to store each expected sum (a[i] + k). To get how many pairs match the requirement, we can count the number of elements are contained in the intersection of such an array with the input.

Here is a solution in PHP by Roberto Peruzzo:

1
2
3
4
5
function pairs($k, $arr) {
    $sum = $arr;
    array_walk($sum, function(&$item, $key, $k) { $item += $k; }, $k);
    return count(array_intersect($arr, $sum));
}

Here is the same idea in C++:

1
2
3
4
5
6
7
8
9
int n, k; cin >> n >> k;
vector<int> v(n);
copy_n(istream_iterator<int>(cin), n, begin(v));
sort(begin(v), end(v));
vector<int> tmp(n);
transform(begin(v), end(v), begin(tmp), [=](auto i){ return i + k; });
vector<int> inters;
set_intersection(begin(tmp), end(tmp), begin(v), end(v), back_inserter(inters));
cout << inters.size();

A more fluent version with C++ ranges (actually, using all the features available in range-v3):

1
2
sort(v);
std::cout << distance(views::set_intersection(v, views::transform(v, [=](auto i){ return i + k; })));

Here, sorting the input elements hides the dominant cost, whereas set intersection is a linear operation.

We've worked on this challenge in these gyms: modena  milan  padua  rome  turin  bari  polimi  lecce  brianza 
comments powered by Disqus