See the original problem on HackerRank.
Solutions
The simplest solution to this problem involves storing a copy of the entire array every time a new version is saved. This way, to retrieve the value of an element at a specific version, we can simply access the saved array for that version. However, while easy to understand, this approach is highly inefficient for large arrays or when versions are taken frequently. If there are \(N\) calls to each function, this method would store \(N\) full copies of the array, leading to significant memory usage and poor time performance. Given that the array can have up to 50,000
elements and there can be as many as 50,000
function calls, the straightforward approach of saving the entire array on every version would result in copying a 50,000
-element array up to 50,000
times.
This would lead to a staggering 2.5
billion elements being stored in total, resulting in extremely high memory consumption and inefficiency. Clearly, this method isn’t scalable for large inputs, highlighting the need for a more space-efficient solution.
This challenge introduces us to the concept of persistent data structures with our task serving as a straightforward example. A persistent data structure maintains all previous versions of itself after any modification, allowing access to any past version. In essence, whenever you update the structure, both the original and updated versions are preserved and can be accessed later. This capability is especially valuable in situations where it’s essential to track the history of changes or revert to earlier states of the data structure.
The twist here is that this persistence is managed efficiently. Unlike the naive approach of copying entire arrays, a persistent data structure allows us to maintain previous versions without unnecessary duplication. It uses techniques that only store the differences (or “changes”) between versions, making it much more space and time efficient, even as the structure grows or is modified frequently. This way, we can access any historical version of the data structure without the overhead of copying large amounts of data repeatedly.
Persistent arrays are among the simplest types of persistent data structures. A persistent array allows you to access and update elements at specific points in time. One common approach to implementing persistent arrays is using fat nodes, where multiple values are stored at each index without erasing older values.
The approach focuses on tracking the “versions” of individual array elements, rather than maintaining versions of the entire array. Each set
operation appends a new (current_version, value)
pair to the history of the corresponding index. Whenever a new version is created, the current_version
counter is simply incremented.
The get
operation then retrieves the most recent value for a given index at a specified version ID using binary search, since the list of changes is ordered by definition.
This approach enables the array to preserve historical versions efficiently, without needing to make full copies of the entire array.
Given the design freedom in this challenge, we take the opportunity to encapsulate this logic within a class:
|
|
The main
function is straightforward, as it simply reads and processes the input.
Now, let’s add some comments on PersistentArray
:
- each element in the array is a vector of pairs, where each pair represents a version and its corresponding value
- when initialized, the array consists of
P
vectors, each containing a single pair (version0
and value0
, as specified by the problem) - when calling
set
, a new pair is added to the appropriate index’s vector, representing the “change” at a specific version (note that different sets for the same index at the same version are also stored - this might be useful or not) - calling
saveVersion
simply increments the version counter
Let’s break down the implementation of the get
function. For each index in the array, we maintain a vector of pairs representing the history of changes at that index across different versions. These pairs are stored in chronological order, and there might be multiple changes at the same version. Here’s how it works:
Initially, the history of a 3-element array looks like:
|
|
Now, suppose we:
- Update index 1 to value 2
- Save a new version
- Update index 1 to value 5
- Save again
- Update index 2 to value 7
The history will then be:
|
|
Here is how get
works. If we call get(1, 0)
, meaning we want the value of index 1 at version 0 (for brevity, we refer to numeric_limits<int>::max()
as INT_MAX
):
- we use
upper_bound
with the target(0, INT_MAX)
to find the first element with a version greater than0
- using
INT_MAX
ensures we pick the highest value for a given version if multiple entries exist upper_bound
returns the first element after(0, INT_MAX)
, which is(1, 5)
- we step back one position (
prev
) to get(0, 2)
and return the value2
.
Another example: calling get(2, 2)
:
upper_bound
searches for(2, INT_MAX)
- it finds no element after
(2, INT_MAX)
and returns theend
iterator - we step back to the last element
(2, 7)
and return the value7
.
Note that the initial pair (0, 0)
guarantees that no vector is ever empty, thus preventing empty history scenarios.
The time Complexity of this approach is \(O(N \cdot log N)\). In particular:
set
: appends a record to an index’s history in amortized constant timesaveVersion
: increments a counter in constant timeget
: performs a binary search on the history of an index, which takes \(O(log N)\) time
The space complexiy is in the order of \(O(N + P)\) - better than before that was \(O(N \cdot P)\) as we store historical records for each index, using O(N)
space for the records and O(P)
for the array itself.
Here is a Python version of the PersistentArray
class:
|
|
This method is shows an example of partially persistent array. This means that, although older versions of the array are accessible, only the latest version can be modified. Instead, in a fully persistent data structure, both old and new versions can be modified.
A simple way to implement this behavior is by using a map instead of a vector of pairs:
|
|
In this approach, if we want to update the value of a certain index at a previous version, we can modify the set
function like this:
|
|
Persistent data structures are particularly significant in various programming languages, especially those that emphasize immutability, like functional programming languages.