See the original problem on HackerRank.
Solutions
This problem can be solved in several ways, each carrying its own pros, cons and tradeoffs.
All the solutions are based on this observation: the answer is always \(max(6-n,4-d)\) where \(n\) is string length and \(d\) is the number of different type of characters that are already present in the input password.
Preamble
We call character family the group of characters belonging to the same requirement. The problem requires to check against 4: digits, lowercase letters, uppercase letters, special chars.
As usual, we can assume the problem won’t feed the input with other characters. This is the basic tradeoff.
A data of the problem that could change in a real environment is the number of characters required for each family. In this basic form of the problem is always ‘1’. This leads to solution that very likely just need to set a “bool flag” for each family. Flags could be turned into a counter in case of extending such requirements.
Another point is how to check if a character belongs to a family: in this problem we could even use the standard functions isdigit, isupper, and islower because we know they cover all the cases except the special chars. Again, in a real environment it can happen we must handle this simplification accordingly.
Variations discussed
- changing the content of a family at runtime (e.g. reading it from file)
- adding/removing families at runtime (e.g. “less-strong password” not requiring special_characters)
- different number of occurrences per family (e.g. 2 lowercase, 3 special_characters)
Range check + bit mask
Very likely, the most efficient solution fully exploits the data and structure of the problem.
Considerations:
- we don’t need to dynamically add families,
- we don’t need to dynamically change families,
- digits, lowercase chars and uppercase chars are contiguous in the ASCII table.
Saying that, we can easily map a char to its family this way:
|
|
Tradeoff: we are exploiting the structure of the ASCII table because we just need to check contiguous chars. In addition ,we know that if the char is not a digit, nor a lowercase, nor an uppercase then it must be a special char.
The main point of this solution is the contract between CharFamily
and the bitmask.
Having such mapping from chars to families, we can set a flag whenever a char of a certain family is found. We can use a bit mask. In C++ we can exploit the simplicity of bitset
:
|
|
Pros:
- fast
- compact
- tiny and constant extra space (bitset)
Cons (futurability/extensibility warnings):
- cannot change an existing family at runtime
- cannot add/remove families at runtime
- need to handle if we change the number of required chars (e.g. 2 lowercase, 3 uppercase)
The first cons is easy to handle: we need to turn CharFamily
into a more flexible function or we can design a proper class resolving the family id. Such customization point that can be implemented in terms of other methods such as:
- library functions (e.g. isdigit, islower, isupper)
- regular expressions (generally slower)
- linear (or binary) search on fixed strings (as we’ll do later)
The final cons can be mitigated, for instance, by using an array which stores the frequency of each char family. The final check should be changed accordingly.
Instead of bitmask, we can use an array of integers, even if we are just
flagging them. In rust it is necessary to do sum
(you cannot sum booleans).
|
|
Dynamic bitmask
Instead of using a bitset, we can use a number (e.g. 32bit/64bit int or uint) to handle more families dynamically. For instance, 32bit int supports up to 32 families. The rest of the code should be adapted accordingly: family ids will be mapped to powers of two.
In this case we have to change CharFamily
accordingly.
If we need more than 64 families (64bit ints), we could use a dynamic bitset and, as before, count how many flags are set. In C++, although vector<bool>
has many issues, it would be ok here. In other languages we have proper implementations, like C#’s BitArray
.
Array-based flexible number of chars per family
In this solution we still have fixed families (e.g. lower_case is fixed) but we support configuration over number of expected occurrences per each family by using a vector called expectedFreq
containing the number of chars that i-th family requires:
|
|
The astute reader will turn the second loop into a zip-map-filter-reduce combination.
This code could read expectedFreq dynamically. In addition, it’s independent from each family: the contract is on the “family id”, that has to be contiguously mapped to expectedFreq
.
Pros:
- customizable
Cons:
- CharFamily’s has to output an increasing family id (strong coupling with
expectedFreq
)
The following solutions will take into account the problem of managing families dynamically.
A bit of flexible families
We can easily elaborate a solution where customizing a family is quick, even at runtime:
|
|
(we could even use smarter search methods - e.g. BoyerMoore).
Pros:
- compact
- not using family id
- small (but not as small as before) extra space (families)
- can use (fixed) dynamic families (e.g. customization of “special_characters”)
Cons (futurability/extensibility warnings):
- cannot add/remove families at runtime
- need to handle if we change the number of required chars (e.g. 2 lowercase, 3 uppercase)
Tackling the former cons is easy: we could put the families into an array and perform “find_first_of” on each element (we can even read such array dynamically).
To accommodate the second possible requirement we should pretty much rewrite the algorithm because “find_first_of” does not fit anymore. In that case we have to count the occurrences of chars for each family (e.g. turning the families into a set and performing an intersection or a count operation).
If we really need a lot of flexibility, the next solution is able to handle all the previously discussed new possible requirements.
Fine-grained flexible families
In this solution, we can control families at char-level and we have full flexibility over the family ids (e.g. they are not coupled with an array index).
The basic idea is to put every char into a map with its family id. Also, it gives us control over the number of occurrences for each family, by using another map.
|
|
Pros:
- highly customizable
- family-id can be anything (not necessarily an increasing index)
Cons:
- more complex
- slower than other solutions (just a little)
- requires more space (but it’s constant)
Another similar approach consists in combining each family with the expected minimum number of occurrences:
|
|
Pros:
- highly customizable
- simple
Cons:
- slower than other solutions as each character must be searched into each family, linearly
- requires more space (but it’s constant)
Set intersection
Another interesting solution worth sharing:
|
|
We accumulate how many times the interesection between families and the password is empty.
Accommodating possible dynamic requirements discussed previously is trivial except for configuring how many characters are needed for each family (that requires using a strategy for counting occurrences, like one of those shown previously).
Conclusions
This problem has opened up interesting discussions on customization points that can be added to the algorithm. The spirit of such exercises is to understand pros, cons and tradeoffs of the different implementations.