Frequency Queries

See the original problem on HackerRank.

Solutions

Warning: the problem I/O might result slow. For some languages (e.g. C++), it’s better off reading the whole input first.

This problem requires some additional storage to be solved efficiently. Using only one hash map is not enough because of operation of type 3.

The idea is to use a hash map to store the frequencies of each element (query type 1 and 2) and also an additional hash map “histogram”, that is an inverse map from frequency to elements having such a frequency.

Here is a Python implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
def freqQuery(queries):
    results = []
    lookup = dict()
    histogram = defaultdict(set)
    for command, value in queries:
        freq = lookup.get(value, 0)
        if command == 1:
            lookup[value] = freq + 1
            histogram[freq].discard(value)
            histogram[freq + 1].add(value)
        elif command == 2:
            lookup[value] = max(0, freq - 1)
            histogram[freq].discard(value)
            histogram[freq - 1].add(value)
        elif command == 3:
            results.append(1 if histogram[value] else 0)
    return results

We can even improve our algorithm by noticing that the maximum frequency is fixed to \( 10^6 \) (we have at most that number of queries, read the problem constraints) so we can replace the histogram with a static array.

Here is the idea in C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
static const int MaxFrequency = 1'000'000+1;

int N, op, elem; cin >> N;
vector<int> q(N), v(N);
for (auto i=0; i<N; ++i)
{
    cin >> q[i] >> v[i];
}    

unordered_map<int, int> freq;
array<int, MaxFrequency> histogram{}; 

auto add = [&](int elem, int operation) {
    auto& oldFreq = freq[elem];
    if (oldFreq > 0)
        histogram[oldFreq]--;
    oldFreq = max(oldFreq+operation, 0);
    histogram[oldFreq]++;
};

for (auto i=0; i<N; ++i)
{
    switch (q[i])
    {
    case 1:
        add(v[i], 1);
        break;
    case 2:
        add(v[i], -1);
        break;
    case 3:
        cout << ( (v[i]<MaxFrequency && histogram[v[i]] > 0) ? "1\n" : "0\n");
    }
}

Sometimes the histogram should contain the identity of each element appearing a certain number of time. In this case, an approach similar to that presented in Python is better.

We should consider some tradeoffs such as:

  • how much storage can I use (balance between efficienty and memory)
  • need to show which elements have a certain frequency (the histogram should contain more information, like in Python approach)
  • if the query of type 3 are a few, the histogram can be avoided - we hopefully can afford a map traversal
  • we can compute the queries of type 3 in batch (between 1 and 2) if the answers are not needed in streaming (one in, one out)
comments powered by Disqus