Thiết kế website giá rẻ

Question

Preamble and goals

I have a working algorithm (and working code) for a custom reachable vertices algorithm with memoization on a 3 dimensional graph, but it is not performant enough. My goal is to find ways to make this procedure faster, if possible.

Description

Consider the following problem. Let a graph G contain V nodes and E directed edges between the nodes set up randomly. Let the graph be dense enough such that E -> V^2 and let the graph be able to have cycles. Let K be a subset of V, over which we wish to find the set of reachable vertices on each of the verticies in K.

Let each vertex v_i in G have S states (i = {1...S}), each state having it’s own reachable vertices cache. Given this, we can imagine the graph as a three dimensional graph.

As is, this algorithm is implemented using an iterative DFS that interacts with a cache system. This cache system is simply a structure that holds a cache for each vertex state on the graph. The cache for a vertex contains the reachable vertices for that vertex at that state.

Problematic

My algorithm is critically slow for a non-negligeable set of graph inputs because it spends ~80% of the time sorting. This is especially evident in graphs that have cycles and E->V^2. The way I implemented it, each time I end the visit on a vertex during the search:

I take the reachable vertices from the cache of the vertex and I sort and compact, so that the vertices in different states and the vertices that are duplicated might be compacted together.
I then append the currently visited vertex to the parent’s reachable vertices cache.
I append the reachable vertices from the current vertex’ cache, to the cache of its parent.

procedure EndVisit(X):
  reachable_vertices = cache_get(X)->reachable_vertices_get // N elements
  sortAndCompact(reachable_vertices)                        // K elements | K <= N

  // parent from the current DFS stack:
  parent_reachable_vertices = parent_cache_get(X)->reachable_vertices_get 
  parent_reachable_vertices.append(X)
  parent_reachable_vertices.append(reachable_vertices)

Iterative example of the algorithm and considerations

Consider the following trivial graph with cycles.

We wish to find the reachable vertices of vertex B (in state 1) and vertex A (in state 1). This is what happens by running my modified iterative DFS procedure (Note that the path which the DFS takes, i.e., which children it goes to, is random)

call ReachableVerticesGet(vertex: B, state: 1):
  - [b1]
  - [b1,c1]
  Call EndVisit ==> c1 {}, b1 {c1}
  - [b1]
  - [b1,a1]
  - [b1,a1,b2]
  - [b1,a1,b2,c1]
  Call EndVisit ==> c1 {}, b2 {c1}
  - [b1,a1,b2]
  - [b1,a1,b2,a1] Cycle detected, we mark it
  Call EndVisit ==> a1 {}, b2 {c1,a1}
  - [b1,a1,b2]
  Call EndVisit ==> b2 {c1,a1}, a1 {b2,c1,a1}
  - [b1,a1]
  - [b1,a1,d1]
  - [b1,a1,d1,b2]
  - [b1,a1,d1,b2,c1]
  Call EndVisit ==> c1 {}, b2 {c1,a1,c1}
  - [b1,a1,d1,b2] 
  - [b1,a1,d1,b2,a1] Cycle detected, we mark it
  Call EndVisit ==> a1 {b2,c1,a1}, b2 {c1,a1,c1,a1,b2,c1,a1}
  - [b1,a1,d1,b2]
  Call EndVisit ==> b2 {b2,c1,a1}, d1 {b2,b2,c1,a1}
  - [b1,a1,d1]
  Call EndVisit ==> d1 {b2,c1,a1}, a1 {b2,c1,a1,d1,b2,c1,a1}
  - [b1,a1]
  Call EndVisit ==> a1 {b2,c1,a1,d1}, b1 {c1,a1,b2,c1,a1,d1}
  - [b1]
  Call EndVisit ==> b1 {c1,a1,b2,d1}
  return {c1,a1,b2,d1}
 
call ReachableVerticesGet(vertex: A, state: 1):
  A's cache contains {b2,c1,a1,d1}
  return {b2,c1,a1,d1}

Now, this algorithm works. I have many tests, many of which include weird graphs with cycles and randomly constructed graphs and all of them work. There are some degenerate cases, wherein the algorithm will run for an abnormal time despite the graphs being sufficiently small, but nothing critical, the algorithm runs correctly for the vast majority of cases. The problem is that this algorithm is wildly problematic in terms of performance.

First, If I remove the SortAndCompact() function called in each EndVisit(), the RAM spikes up to >32gb really fast with no intention of stopping (no memory leak present).
Second, I call std::sort everytime I end the visit on a vertex, and in the context of graphs that have cycles and vertices that have multiple states, like the one I gave in this example, sometimes the sort is called multiple times for the same vertex. Of course, I would love to not call std::sort since it’s NlogN, but the more I stare at graphs for the problem I’m trying to solve, the more std::sort made sense at the time. There must exist better solutions, nevertheless.

What I have tried so far

The most obvious is to use a set/unordered_set instead of a vector for the reachable vertices cache at each vertex. However, for my use cases, this does not give any reasonable improvement. I even tried using custom hash functions. I profiled and it seems that the time is migrated from std::sort to std::set::insert.
I tried parallelizing the algorithm by cloning the caches, but then merging the caches was a nightmare and overall I dropped the investigation on this solution, there’s too much shared state.
I tried using std::pmr (polymorphic resource type) to use an unordered_map on the stack every time I want to do an std::sort. The map is used to compact the duplicates and replaces std::sort. This gives ~25% improvement in many cases, but still not satisfactory in my opinion.
I considered running a polling system that accumulates reachable vertices caches, and invokes SortAndCompact() on them when the system RAM reaches a certain threshold. This was a big hit on performance and I dropped the idea as well.
Most of the input graphs that I use for my application have strongly connected components, I tried to do some preprocessing that would set vertex priorities on the DFS accordingly, but without quantifiable improvement.
I considered rewriting everything from scratch to implement a new algorithm, but this would take a lot of time, I opted to look for ways to improve the performance instead.
I tried a wide range of different things and at this point I’ve been on this for many months.

I tried many tricks overall involving

reducing the calls to std::sort
eliminating the calls to std::sort
improving the speed to the calls to std::sort

but I’m at loss of ideas. I fear that I may have to replace this reachability algorithm all together but I’m looking for a second opinion before I drop this all together.

Thiết kế website giá rẻ

Danh mục

Improving performance of a reachable vertices algorithm applied on a 3 dimensional graph

Preamble and goals

Description

Problematic

Iterative example of the algorithm and considerations

What I have tried so far