Sam and substrings

See the original problem on HackerRank.

Solutions

The brute-force approach to solve this problems involves generating all the possible substrings of the input number. This requires \(O(N^2)\) time, which becomes too expensive in the worst case when the input size can be as large as \(2 \cdot 10^5\).

To find an efficient solution, let’s consider the input 8314.

All the possible substrings are listed below:

1
2
3
4
8
3 83
1 31 831
4 14 314 8314

Each line contains all the substrings ending with the corresponding digit. For example, the second line 3 83 includes all substrings ending with 3. The final result is the sum of all these substrings:

1
2
3
4
5
6
8 +
3 + 83 +
1 + 31 + 831 +
4 + 14 + 314 + 8314 =
___________________
9603

To compute this efficiently, let’s focus on how each line can be calculated. For example:

1
2
3
8
3 83
1 31 831

Notice that the digits from earlier lines “shift” left in subsequent lines. For instance, 8 becomes 80, then 800. Shifting corresponds to multiplying by 10. Thus, each digit contributes in two ways:

  1. directly, as a standalone value (e.g., 8, 3);
  2. indirectly, by extending all substrings from previous digits.

For a digit at position i, the total contribution is the number of substrings it ends (\(i+1\)) multiplied by its value, plus the contributions from extending all substrings ending at the previous digit (scaled up by 10).

Using this, the formula to calculate the sum for each line is:

\( line[i+1] = (i + 1) \cdot digits[i+1] + 10 \cdot line[i] \)

With \(line[0] = digits[0]\).

To solve the problem, we iterate through the input once, applying this formula. We should not forget to use modulo arithmetic as required by the problem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
size_t substrings(const string& n) 
{
    static constexpr auto bound = 1000000000+7;

    size_t next, acc;
    next = acc = n[0]-'0';
    for (auto i=1ull; i<n.size(); ++i)
    {
        next = (i + 1) * (n[i]-'0') + (10 * next) % bound;
        acc = (acc + next) % bound;
    }
    
    return acc;
}

An equivalent solution in Python:

1
2
3
4
5
6
7
8
9
def substrings(n):
    bound = 1000000000 + 7
    next_sum = acc = int(n[0])
    
    for i in range(1, len(n)):
        next_sum = (i + 1) * int(n[i]) + (10 * next_sum) % bound
        acc = (acc + next_sum) % bound
    
    return acc

The solution is linear in time and constant in space.

This problem is actually similar to the Maximum Subarray problem and it is solved using the same pattern.

At this stage, we can represent this code using functional patterns by breaking down the algorithm into distinct steps:

  • each character is mapped into its corresponding numeric digit;
  • each digit contributes to calculating the next value;
  • an accumulator is updated to maintain the total sum of all next values.

Here is a possible result using STL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
size_t substrings(const string& n)
{
    static constexpr auto bound = 1000000000+7;
	
    std::vector<int> digits(n.size());
    transform(begin(n), end(n), begin(digits), [](char c){
       return c - '0';
    }); // chars -> digits
	
    std::vector<size_t> nexts(n.size());
    auto i=1;
    partial_sum(begin(digits), end(digits), begin(nexts), [&](size_t psum, int c){
        return (i++ + 1) * c + (10 * psum) % bound;
    }); // calculating next values

    return accumulate(begin(nexts), end(nexts), 0ull, [](size_t i, size_t curr){
        return (i+curr) % bound;
    }); // total sum of all next values
}

Unlike the previous solution, these steps are now executed sequentially, resulting in three passes over the input instead of a single pass. Also, additional space is allocated to store both digits and next values. Additionally, the use of partial_sum in this approach is not stateless, as the variable i is modified during each invocation of the lambda.

All such issues are resolved in the following solution implemented with Niebler’s range-v3 library, where enumerate is used to pair each element with its corresponding position (index):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
size_t substrings(const std::string& n)
{
    static constexpr auto bound = 1000000000+7;
    
    auto rng = 
        views::transform(n, [](char c) -> size_t { return c - '0'; })
        | views::enumerate
        | views::partial_sum([](const auto& sum, const auto& cur){
            return std::make_pair(cur.first, (cur.first+1) * cur.second + (10 * sum.second) % bound);
          }) 
        | views::values;

    return accumulate(rng, 0ull, [](auto acc, auto i){
        return (acc + i) % bound;
    });
}

The first transform is straightforward: it simply converts characters into digits, just as in previous solution, but does so without requiring any additional space.

The most intricate part of this approach is the use of partial_sum. Essentially, partial_sum updates an accumulator while iterating through a range and allows access to the intermediate value of the accumulator at each step. Here, sum serves as the accumulator, with sum.second providing the current value of the accumulator, which was updated in the previous iteration. Meanwhile, cur represents the current input.

The syntax may appear convoluted because we’ve introduced enumerate to get the position of each element. As a result, each element becomes a pair, where the first value indicates the position, adding more complexity to the resulting solution.

Next, views::values is employed to extract only the second values from the pairs, effectively discarding the first values that represent the positions. This is because we are solely interested in the intermediate sums stored in the second values.

Finally, accumulate performs a straightforward summation of these intermediate sums, applying the modulo operation during the process.

This solution has linear time complexity, processes the input in a single pass, and requires no additional space.

A more concise implementation in Python using the same patterns is as follows:

1
2
3
4
5
6
import itertools
from operator import itemgetter

sum(
    map(itemgetter(1), itertools.accumulate(
        enumerate(map(int, n)), lambda sum, cur: [ cur[0], (cur[0]+1)*cur[1]+(10*sum[1]) ])))

To keep the code simpler, we have omitted the modulo operations.

Quick breakdown:

  • itertools.accumulate performs the partial sum operation, akin to views::partial_sum
  • enumerate functions like views::enumerate, adding positional indices to the values
  • map(int, n) mirrors the role of views::transform, converting characters to digits
  • map(itemgetter(1), tuples) corresponds to views::values, extracting only the second element of each tuple
  • finally, sum serves the same purpose as accumulate, summing up the values.
comments powered by Disqus