Hidden unique integers

See the original problem on HackerRank.

Solutions

A solution consists in finding all the contiguous digits in the string, converting them to integers and putting them in a set for removing duplicates. Finally, outputting the size of the set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
int n; cin >> n;
string input;
while (n--)
{
    cin >> input;
    set<int> integers;
    auto cur = begin(input);
    while (cur != end(input))
    {
        if (cur = std::find_if(cur, end(input), ::isdigit); cur != input.end())
        {
            const auto digitEnd = std::find_if_not(cur, end(input), ::isdigit);
            integers.insert(std::stoi(std::string(cur, digitEnd)));
            cur = digitEnd;    
        }
    }
    cout << integers.size() << "\n";
}

This solution can be redesigned by using C++ ranges. We note that “contiguous digits” are actually “chunks” of characters that are all digits. This means, we can apply views::chunk_by to obtain contiguous sequences of characters that are all of the same “alphabeticalness”. Then we just filter out those which are not digits, convert all the remaining ranges to integers, and finally construct a set from of the resulting range of numbers:

1
2
3
4
5
std::cout << size(input 
    | views::chunk_by([](auto a, auto b) { return isalpha(a) == isalpha(b); })
    | views::filter([](auto rng) { return isdigit(rng[0]); })
    | views::transform([](auto rng) { return std::stoi(to<std::string>(rng)); })
    | to<std::set>());

Strictly speaking, isalpha(a) == isalpha(b) might be false even though both characters are alphabetical because isalpha() returns any positive number if the input is a letter. However, most implementations just return the very same value for every letter and 0 otherwise. Thus, we should raplace that check with something like (bool)isalpha(a) == (bool)isalpha(b).

A similar approach in Python:

1
2
3
4
5
6
for _ in range(int(input()):    
    grouped = map(int, 
        map(lambda rng : "".join(rng[1]), 
            filter(lambda x: x[0], 
                groupby(input(), str.isdigit))))
    print(len(set(grouped)))

Or:

1
2
3
4
5
from itertools import groupby

for _ in range(int(input())):
    grouped = (int("".join(group)) for is_digit, group in groupby(input(), str.isdigit) if is_digit)
    print(len(set(grouped)))

In Haskell:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
alphaOrDigit :: Char -> Char -> Bool
alphaOrDigit l r = isAlpha l == isAlpha r

-- | Count the unique elements in a list.
uniqueCount :: Ord a => [a] -> Int
uniqueCount = Set.size . Set.fromList
-- Or
-- uniqueCount = length . group . sort

-- Solve assuming every integer in the string fits an @Int@.
solInt :: String -> Int
solInt =
  uniqueCount
    . filter isJust
    . fmap (readMaybe :: String -> Maybe Int)
    . groupBy alphaOrDigit

Back to the C++ version, Paolo noticed that filter is not necessary if we just iterate 2 “chunks” at a time by introducing stride:

1
2
3
4
5
6
std::cout << size(input
	| views::chunk_by([](auto a, auto b) { return static_cast<bool>(isalpha(a)) == static_cast<bool>(isalpha(b)); })
	| views::drop(static_cast<bool>(isalpha(input.front())))
	| views::stride(2)
	| views::transform([](auto rng) { return std::stoi(to<std::string>(rng)); })
	| to<std::set>());

If the input starts with a letter, we also drop the very first chunk.

This solution avoids checking every chunk explicitly. The only caveat is that if we introduce other “chunk types” (e.g. we need to distinguish uppercase and lowercase letters for some reason), the positional approach might not work anymore. On the other hand, the filter approach ensures we only consider the chunk containing digits.

Also, Paolo pointed out that, by replacing the alphabetical characters with '\0', you get null-terminated substrings representing integers (for free!). This approach, however, requires a copy of the input string:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
int hidden_unique_integers(std::string s) {
    auto const isAlpha = [](char c) { return std::isalpha(c); };
    std::replace_if(s.begin(), s.end(), isAlpha, '\0');

    // or unordered_set for O(n)
    std::set<int> numSet;

    char const *strEnd = s.data() + s.size();
    for (char const *substr = s.data(); substr < strEnd;) {
        if (*substr) {
            auto const num = std::atoi(substr);
            numSet.insert(num);

            substr += std::strlen(substr);
        }

        ++substr;
    }

    return numSet.size();
}

An extension of this solution consists in handling numbers that are out of built-in integral bounds (e.g. int and size_t). This is possible by removing leading zeros from extracted numbers and store them in a set of strings. In the solution snippet below, we actually used string_view to avoid copying data every time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
int n; cin >> n;
string input;
while (n--)
{
    cin >> input;
    std::set<std::string_view> numbers;
    auto cur = begin(input);
    while (cur != end(input))
    {
        if (cur = std::find_if(cur, end(input), ::isdigit); cur != input.end())
        {
            const auto digitEnd = std::find_if_not(cur, input.end(), ::isdigit);
            std::string_view digit(cur, digitEnd); // this is just referencing data
            digit.remove_prefix(std::min(digit.find_first_not_of('0'), digit.size())); // this is a "logical" removal of leading zeros
            numbers.insert(digit);
            cur = digitEnd;
        }
    }
    std::cout << numbers.size() << "\n";
}

Here is the same idea with ranges:

1
2
3
4
5
6
7
std::cout << size(input
	| views::chunk_by([](auto a, auto b) { return static_cast<bool>(isalpha(a)) == static_cast<bool>(isalpha(b)); })
	| views::drop(static_cast<bool>(isalpha(input.front())))
	| views::stride(2)
	| views::transform([](auto rng) { return std::string_view(begin(rng), end(rng)); })
	| views::transform([](std::string_view sv) { return sv.substr(0, sv.find_first_not_of('0')); })
	| to<std::set>());

A extension of this exercise consists in handling negative numbers.

Here’s the same idea in Haskell:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
trimLeadingZeros :: String -> String 
trimLeadingZeros = dropWhile (== '0')

-- | Solve _not_ assuming every integer in the string fits an @Int@.
solUnbounded :: String -> Int
solUnbounded =
  uniqueCount
    . fmap trimLeadingZeros
    . filter (any isDigit)
    . groupBy alphaOrDigit

Using regular expressions

Another solution consists in using a regular expression. For instance, here is a Python snippet by Simone Santillo:

1
2
for _ in range(int(input())):
    print(len(set(map(int, re.findall(r"\d+", input())))))

Eugenio Nurrito generalized this idea to work on any numbers (there is no map to int):

1
print(*[len(set(re.findall(r"0*(\d+)", l))) for l in sys.stdin][1:], sep="\n")
We've worked on this challenge in these gyms: modena  milan 
comments powered by Disqus