The Versioned Scroll

See the original problem on HackerRank.

Solutions

The simplest solution to this problem involves storing a copy of the entire array every time a new version is saved. This way, to retrieve the value of an element at a specific version, we can simply access the saved array for that version. However, while easy to understand, this approach is highly inefficient for large arrays or when versions are taken frequently. If there are \(N\) calls to each function, this method would store \(N\) full copies of the array, leading to significant memory usage and poor time performance. Given that the array can have up to 50,000 elements and there can be as many as 50,000 function calls, the straightforward approach of saving the entire array on every version would result in copying a 50,000-element array up to 50,000 times.

This would lead to a staggering 2.5 billion elements being stored in total, resulting in extremely high memory consumption and inefficiency. Clearly, this method isn’t scalable for large inputs, highlighting the need for a more space-efficient solution.

This challenge introduces us to the concept of persistent data structures with our task serving as a straightforward example. A persistent data structure maintains all previous versions of itself after any modification, allowing access to any past version. In essence, whenever you update the structure, both the original and updated versions are preserved and can be accessed later. This capability is especially valuable in situations where it’s essential to track the history of changes or revert to earlier states of the data structure.

The twist here is that this persistence is managed efficiently. Unlike the naive approach of copying entire arrays, a persistent data structure allows us to maintain previous versions without unnecessary duplication. It uses techniques that only store the differences (or “changes”) between versions, making it much more space and time efficient, even as the structure grows or is modified frequently. This way, we can access any historical version of the data structure without the overhead of copying large amounts of data repeatedly.

Persistent arrays are among the simplest types of persistent data structures. A persistent array allows you to access and update elements at specific points in time. One common approach to implementing persistent arrays is using fat nodes, where multiple values are stored at each index without erasing older values.

The approach focuses on tracking the “versions” of individual array elements, rather than maintaining versions of the entire array. Each set operation appends a new (current_version, value) pair to the history of the corresponding index. Whenever a new version is created, the current_version counter is simply incremented.

The get operation then retrieves the most recent value for a given index at a specified version ID using binary search, since the list of changes is ordered by definition.

This approach enables the array to preserve historical versions efficiently, without needing to make full copies of the entire array.

Given the design freedom in this challenge, we take the opportunity to encapsulate this logic within a class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
class PersistentArray
{
public:
    PersistentArray(int len)
    {
        m_vals.resize(len, {{0, 0}});
    }

    void set(int index, int val)
    {
        m_vals[index].push_back({m_version, val});
    }
    
    int saveVersion()
    {
        return m_version++;
    }
    
    int get(int index, int versionID) 
    {
        return prev(upper_bound(m_vals[index].begin(), m_vals[index].end(), make_pair(versionID, numeric_limits<int>::max())))->second;
    }
private:
    vector<vector<pair<int, int>>> m_vals;
    int m_version = 0;
};

int main()
{
    int P, N, command, arg1, arg2;
    cin >> P >> N;
    
    PersistentArray pa{P};
    
    while (N--)
    {
        cin >> command;
        switch(command)
        {
            case 1:
                cin >> arg1 >> arg2;
                pa.set(arg1, arg2);
                break;
            case 2:
                std::cout << pa.saveVersion() << "\n";
                break;
            case 3:
                cin >> arg1 >> arg2;
                std::cout << pa.get(arg1, arg2) << "\n";
                break;
        }
    }    
}

The main function is straightforward, as it simply reads and processes the input.

Now, let’s add some comments on PersistentArray:

  • each element in the array is a vector of pairs, where each pair represents a version and its corresponding value
  • when initialized, the array consists of P vectors, each containing a single pair (version 0 and value 0, as specified by the problem)
  • when calling set, a new pair is added to the appropriate index’s vector, representing the “change” at a specific version (note that different sets for the same index at the same version are also stored - this might be useful or not)
  • calling saveVersion simply increments the version counter

Let’s break down the implementation of the get function. For each index in the array, we maintain a vector of pairs representing the history of changes at that index across different versions. These pairs are stored in chronological order, and there might be multiple changes at the same version. Here’s how it works:

Initially, the history of a 3-element array looks like:

1
[ [(0, 0)], [(0, 0)], [(0, 0)] ]

Now, suppose we:

  1. Update index 1 to value 2
  2. Save a new version
  3. Update index 1 to value 5
  4. Save again
  5. Update index 2 to value 7

The history will then be:

1
[ [(0, 0)], [(0, 2), (1, 5)], [(2, 7)] ]

Here is how get works. If we call get(1, 0), meaning we want the value of index 1 at version 0 (for brevity, we refer to numeric_limits<int>::max() as INT_MAX):

  • we use upper_bound with the target (0, INT_MAX) to find the first element with a version greater than 0
  • using INT_MAX ensures we pick the highest value for a given version if multiple entries exist
  • upper_bound returns the first element after (0, INT_MAX), which is (1, 5)
  • we step back one position (prev) to get (0, 2) and return the value 2.

Another example: calling get(2, 2):

  • upper_bound searches for (2, INT_MAX)
  • it finds no element after (2, INT_MAX) and returns the end iterator
  • we step back to the last element (2, 7) and return the value 7.

Note that the initial pair (0, 0) guarantees that no vector is ever empty, thus preventing empty history scenarios.

The time Complexity of this approach is \(O(N \cdot log N)\). In particular:

  • set: appends a record to an index’s history in amortized constant time
  • saveVersion: increments a counter in constant time
  • get: performs a binary search on the history of an index, which takes \(O(log N)\) time

The space complexiy is in the order of \(O(N + P)\) - better than before that was \(O(N \cdot P)\) as we store historical records for each index, using O(N) space for the records and O(P) for the array itself.

Here is a Python version of the PersistentArray class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class PersistentArray:

    def __init__(self, length: int):
        self.version = 0
        self.vals = [[[0, 0]] for _ in range(length)]
        
    def set(self, index: int, val: int) -> None:
        self.vals[index].append([self.version, val])

    def saveVersion(self) -> int:
        self.version += 1
        return self.version - 1

    def get(self, index: int, versionID: int) -> int:
        version_index = bisect.bisect_right(self.vals[index], [versionID, 10 ** 9])
        return self.vals[index][version_index - 1][1]

This method is shows an example of partially persistent array. This means that, although older versions of the array are accessible, only the latest version can be modified. Instead, in a fully persistent data structure, both old and new versions can be modified.

A simple way to implement this behavior is by using a map instead of a vector of pairs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class PersistentArray
{
public:        
    PersistentArray(int length) 
    {
    }
    
    void set(int index, int val)
    {
        m_indexToId[{index, m_version}] = val;
    }
    
    int saveVersion()
    {
        return m_version++;
    }
    
    int get(int index, int versionID)
    {
        const auto it = m_indexToId.upper_bound({index, versionID});
        if(it == m_indexToId.begin() || prev(it)->first.first != index)
            return 0;
        return prev(it)->second;
    }
private:
    int m_version = 0;
    map<pair<int, int>, int> m_indexToId;
};

In this approach, if we want to update the value of a certain index at a previous version, we can modify the set function like this:

1
2
3
4
void set(int version, int index, int val)
{
    m_indexToId[{index, version}] = val;
}

Persistent data structures are particularly significant in various programming languages, especially those that emphasize immutability, like functional programming languages.

We've worked on this challenge in these gyms: modena 
comments powered by Disqus