Raid planner

See the original problem on HackerRank.

Solutions

This is the first graph problem at Coding Gym, specifically focusing on determining if a path exists between two nodes. To make it a suitable introductory challenge, we’ve applied a few simplifications: the graph is undirected (meaning each edge can be traversed in both directions), and the input format is designed to be straightforward - we’ll explain why shortly. Additionally, the graph contains no isolated nodes or self-edges.

The input format is favorable for storing the graph as an adjacency list, a well-known graph representation where each node is associated with a list of its neighbors (the nodes it’s connected to by edges). This data structure is space efficient as it only stores the edges that actually exist, making it ideal for sparse graphs (graphs with relatively few edges compared to the number of nodes). Also, it is traversal efficient as we can quickly access all neighbors of a node, which is essential for our task here.

Another alternative is the adjacency matrix that uses a 2D array where each entry at position (i, j) indicates whether there’s an edge between node i and node j. It’s suitable for dense graphs but uses more memory - \(O(n^2)\) space. Another one is the edge list that is a list of all edges in the graph, which can be helpful in certain algorithms but doesn’t offer fast access to a node’s neighbors.

For this problem, where the graph is relatively sparse (remember that each node might have at most \(n/4\) edges), the adjacency list is a great choice due to its compactness and efficiency in storing the graph’s structure.

Classical solutions

There are two classical solutions to finding a path between two nodes in a graph, both of which are based on graph traversal techniques: BFS (Breadth-First Search) and DFS (Depth-First Search).

The former explores the graph level by level, starting from the source node and checking all its neighbors before moving to the next level. This guarantees that the shortest path is found if one exists. It is particularly useful when the graph is relatively wide.

The latter, on the other hand, explores as far as possible along each branch before backtracking. It can be more memory-efficient than BFS in some cases but does not guarantee finding the shortest path.

From a coding perspective, the key distinction between BFS and DFS lies in the data structure used to manage the nodes that need to be explored next:

  • BFS uses a queue to store the nodes. The queue ensures that the search progresses in a level-by-level manner;
  • DFS, on the other hand, uses a stack (or recursion, which implicitly uses the call stack). The stack helps DFS dive deep into the graph.

Thus, the BFS solution works by exploring the graph level by level, starting from the given starting node. We initialize a queue with the starting node and also maintain a set to keep track of the nodes we’ve already visited, ensuring that we don’t revisit any.

As BFS proceeds, we dequeue a node and check if it’s the destination node. If it is, we immediately know there’s a path and print ARR. If it’s not, we enqueue all of its unvisited neighbors, marking them as visited in the process. This continues until either the destination is found or all possible nodes have been explored. If the destination is never reached, we print NAY.

The time complexity is \(O(V + E)\), where \(V\) is the number of nodes and \(E\) is the number of edges in the graph. In the worst case, BFS examines every node and edge once.

Here is the full code in C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
bool pathExists(int n, const vector<vector<int>>& adj, int source, int destination)
{    
    set<int> visited;
    queue<int> frontier;
    frontier.push(source);
    visited.insert(source);

    while (!frontier.empty())
    {
        const auto current = frontier.front();        
        frontier.pop();
        if (current == destination)
        {
            return true;
        }
        for (auto neighbor : adj[current])
        {
            if (const auto [it, inserted] = visited.insert(neighbor); inserted)
            {               
                frontier.push(neighbor);
            }
        }
    }

    return false;
}

int main()
{
    int n, k;
    cin >> n;
    int source, destination;
    cin >> source >> destination;
    vector<vector<int>> adj(n);
    for (auto i=0; i<n; ++i)
    {
        cin >> k;
        copy_n(istream_iterator<int>(cin), k, back_inserter(adj[i]));
    }
    
    cout << (pathExists(n, adj, source, destination) ? "ARR" : "NAY");
}

Since the graph nodes are always labeled from 0 to n-1, instead of using a set, we can use a vector<bool> (or a similar structure) to track visited nodes. This way, rather than storing the entire node data, we only store the node identifiers, marking each one as visited or not. In general, even when using a set, the full node data is not copied. Instead, only the node identifiers (such as integers or unique labels) are stored in the visited structure.

The solution based on DFS is very similar to the BFS solution, except that we replace the queue with a stack. Here’s the updated pathExists function using DFS, with comments highlighting the only differences:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
bool pathExists(int n, const vector<vector<int>>& adj, int source, int destination)
{    
    set<int> visited;
    stack<int> frontier; // <-- 
    frontier.push(source);
    visited.insert(source);

    while (!frontier.empty())
    {
        const auto current = frontier.top(); // <-- 
        frontier.pop();
        if (current == destination)
            return true;
        for (auto neighbor : adj[current])
        {
            if (const auto [it, inserted] = visited.insert(neighbor); inserted)
            {               
                frontier.push(neighbor);
            }
        }
    }

    return false;
}

Here is the equivalent solution in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
def pathExists(graph, start, destination):
    visited = set()
    stack = [start]
    
    while stack:
        node = stack.pop()
        
        if node == destination:
            return True
        
        visited.add(node)
        
        for neighbor in graph[node]:
            if neighbor not in visited:
                stack.append(neighbor)
    
    return False

An advanced solution

To solve the problem of finding if a path exists between two nodes in a graph, we can think of it as determining whether the two nodes belong to the same component. Imagine the graph as a group of islands, where each node is an island, and the edges are the bridges connecting them. Nodes that are connected directly or indirectly through other nodes form a component - meaning all the islands in that component can reach each other.

Hence, a component is simply a group of nodes that are all reachable from one another. So, if two nodes belong to the same component, there is a path connecting them. A notable data structure helps us efficiently keep track of which nodes belong to the same component: the disjoint-set. This data structure primarily supports two operations:

  • merge (or union): two nodes are “merged” into the same component;
  • find: we check which component a certain node belongs to.

Initially, each node is its own component. When we encounter an edge between two nodes, we unite them, meaning they are now part of the same component. When we have added all the input edges, to check if there is a path between two nodes, we simply check if they belong to the same component.

The naive implementation of a disjoint-set is sufficient for this problem. Essentially, each node keeps track of its “parent”, which serves as the representative of its component. When two nodes are merged, one of them is updated to have the other’s parent, effectively linking their components together. To determine whether two nodes belong to the same component, we repeatedly follow the parent links until we reach the root representative:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class disjoint_set
{
public:
    disjoint_set(int n)
        : parents(n)
    {
        iota(begin(parents), end(parents), 0);
    }
    
    int find(int x)
    {
        while (x != parents[x])
            x = parents[x];
        return x;
    }
    
    void merge(int x, int y)
    {
        const auto px = find(x);
        const auto py = find(y);
        if (px == py)
            return;
        parents[px] = py;
    }
private:
    vector<int> parents;
};

int main()
{
    int n, start, dest, k, neighbor;
    cin >> n >> start >> dest;
    disjoint_set ds{n};
    for (auto i=0; i<n; ++i)
    {
        cin >> k;
        while (k--)
        {
            cin >> neighbor;
            ds.merge(i, neighbor);
        }
    }
    
    cout << (ds.find(start) == ds.find(dest) ? "ARR" : "NAY");
}

The time-complexity is linear, sine, in the worst case, the path from the node to the representative might be very deep.

This implementation can be enhanced in two key ways. The first improvement, known as union by rank, involves always attaching the smaller tree to the root of the larger tree, rather than the other way around. This can be done by maintaining a vector of ranks or sizes. In this case, we use sizes, as it’s often useful to track the number of nodes in each component. This improvement affects the merge function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
void merge(int x, int y)
{
    const auto px = find(x);
    const auto py = find(y);
    if (px == py)
        return;

    if (sizes[px] >= sizes[py])
    {
        sizes[px] += sizes[py];
        parents[py] = px;
    }
    else
    {
        sizes[py] += sizes[px];
        parents[px] = py;
    }
}

The second improvement, called path compression, focuses on flattening the tree structure whenever find is called. The idea is that each node visited while traversing the tree to find the root can be directly connected to the root, as all nodes in the same component share the same representative. To implement this, as find traverses up the tree, it updates the parent reference of each node to point directly to the root. This results in a much flatter tree, which speeds up future operations—not just for the nodes involved, but also for any other nodes indirectly referencing them. Here is the improved find:

1
2
3
4
5
6
7
8
int find(int x)
{
    if (parents[x] != x)
    {
        parents[x] = find(parents[x]);
    }
    return parents[x];
}

These two techniques complement each other well. When applied together, the amortized time per operation becomes very efficient - specifically, it is \(O(α(n))\), where \(α(n)\) is the inverse of the extremely fast-growing Ackermann function. Since \(α(n)\) grows so slowly, it remains less than 5 for all practically usable values of \(n\) (e.g. \(α(10^{1000}) = 4\) - still smaller than 5). Therefore, the amortized running time per operation is effectively constant, making these optimizations highly efficient in practice.

Here is the full code in C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class disjoint_set
{
public:
    disjoint_set(int n)
        : sizes(n, 1), parents(n)
    {
        iota(begin(parents), end(parents), 0);
    }
    
    int find(int x)
    {
        if (parents[x] != x)
        {
            parents[x] = find(parents[x]);
        }
        return parents[x];
    }
    
    void merge(int x, int y)
    {
        const auto px = find(x);
        const auto py = find(y);
        if (px == py)
            return;

        if (sizes[px] >= sizes[py])
        {
            sizes[px] += sizes[py];
            parents[py] = px;
        }
        else
        {
            sizes[py] += sizes[px];
            parents[px] = py;
        }
    }
private:
    vector<int> sizes;
    vector<int> parents;
};

int main()
{
    int n, start, dest, k, neighbor;
    cin >> n >> start >> dest;
    disjoint_set ds{n};
    for (auto i=0; i<n; ++i)
    {
        cin >> k;
        while (k--)
        {
            cin >> neighbor;
            ds.merge(i, neighbor);
        }
    }
    
    cout << (ds.find(start) == ds.find(dest) ? "ARR" : "NAY");
}

A similar implementation in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class DisjointSet:
    def __init__(self, n):       
        self.sizes = [1] * n
        self.parents = list(range(n))

    def find(self, x):        
        if self.parents[x] != x:
            self.parents[x] = self.find(self.parents[x])
        return self.parents[x]

    def merge(self, x, y):       
        px = self.find(x)
        py = self.find(y)
        if px == py:
            return
        
        if self.sizes[px] >= self.sizes[py]:
            self.sizes[px] += self.sizes[py]
            self.parents[py] = px
        else:
            self.sizes[py] += self.sizes[px]
            self.parents[px] = py

if __name__ == "__main__":
    n, start, dest = map(int, input().split())
    ds = DisjointSet(n)
    
    for i in range(n):
        k, *neighbors = map(int, input().split())
        for neighbor in neighbors:
            ds.merge(i, neighbor)
    
    print("ARR" if ds.find(start) == ds.find(dest) else "NAY")
graphs 
comments powered by Disqus