Cut the sticks

See the original problem on HackerRank.

Solutions

The naive solution to this problem consists in sorting the array in ascending order, subtracting the minimum value from all the sticks and printing the count of non-zero sticks.

We can do better and avoid updating the stick lengths at every step (that makes the solution quadratic).

One way consists in visiting each element only once:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int N;
cin >> N;
vector<int> sticks(N);
copy(istream_iterator<int>(cin), istream_iterator<int>(), begin(sticks));
sort(begin(sticks), end(sticks));

auto it = begin(sticks);
while (it != end(sticks))
{
    cout << distance(it, end(sticks)) << endl;
    it = find_if(it, end(sticks), [&](auto i){ return i>*it; });
}

For example:

1
1 2 3 4 3 3 2 1

After sorting:

1
1 1 2 2 3 3 3 4

We start from index 0 and we print the distance between index 0 and index 8 (the end of the array), that is 8.

Then we find the first element greater than 1 (value at the first index). It’s 2, at position 2. We print the span from 0 to 2, that is 2.

Basically, we count the number of elements we cut off at every step. This is given by the span of the “smallest” values along the way (indeed, N - distance(it, end(sticks)) gives back the number of smallest elements).

Since the sequence is sorted, that particular application of find_if should suggest you another pattern: upper_bound. Basically, a binary search for the element greater than a given one:

1
2
3
4
5
while (it != end(sticks))
{
    cout << distance(it, end(sticks)) << endl;
    it = upper_bound(it, end(sticks), *it);
}

upper_bound may actually do more iterations than find_if. It’s interesting to think about that: if the array has unique elements (or a few duplicates), upper_bound unnecessarily visits more elements (it splits the search space in two and does check elements more time than find_if). On the other hand, if the array has a lot of duplicates, it can give some advantage since it can skip many of them (instead, find_if visits all of them linearly).

For example:

1
1 1 1 1 1 1 1 1 1 1 2 3 4

When we search for the first element greater than 1 with find_if, we just go through all ones first.

On the other hand, upper_bound cuts the space in two, like:

1
2
3
1 1 1 1 1 1 1 1 1 1 2 3 4
^           ^           ^
low        mid         up

Since a[mid] is equal than the wanted element (1), it moves to the right hand side:

1
2
3
1 1 1 1 1 1 1 1 1 1 2 3 4
            ^     ^     ^
            low  mid   up

The same happens here then at the next iteration it finds 2.

As you see, we visited less elements than find_if.

However, when we need to find the next element greater than 2, you can see that we visit more.

Anyway, this is just pure speculation. It should be measured since other “variables” ride on this topic (e.g. caching, CPU instructions).

This solution costs \(N \cdot log(N)\) overall, since the cost of sorting prevails.

Can we do better?

Linear solution

A linear solution can be found just by using a frequency table. Actually, we can obtain the same cost by using a linear sorting algorithm like counting or radix sort, using additional space.

In this case the domain of the problem is quite small (just 1000 elements) so setting up such an additional space costs - practically speaking - nothing.

The idea is very similar to the others discussed above:

  • we fill up the frequency table with the number of occurrences of each input value
  • we start with count = N (we print it) and idx = 0
  • at every iteration, we search the table for the next (from idx) non-zero value and we subtract such a value from count
  • we print count and move on

Basically, the frequency table turns the problem of “finding the next greater element” into “finding the next non-zero element”:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
int N, val;
cin >> N;
array<int, 1000+1> freq{}; // or vector, if need way more elements
for (auto i=0; i<N; ++i)
{
    cin >> val;
    freq[val]++;
}

auto count = N;
auto it = begin(freq);
while ( (it = find_if(it, end(freq), [](auto i) { return i>0; })) != end(freq))
{
    cout << count << "\n";
    count -= *it;
    ++it;
}

This solution is linear in time and constant in space. If the domain grows up, we should decide if paying the price of allocating more memory is affordable for our scenario.

We've worked on this challenge in these gyms: modena 
comments powered by Disqus