I recently found a framework named ecto.
In this framework, a basic component named “plasm”, which is the ecto Directed Acyclic Graph.In ecto, plasm can be operated by ecto scheduler.
I am wondering what’s the advantage of this mechanism, and in what other situations can we exploit the concept of DAG?
4
Nice Question.
- Code may be represented by a DAG describing the
inputs and outputs of each of the arithmetic operations performed
within the code; this representation allows the compiler to perform
common subexpression elimination efficiently. - Most Source Control Management Systems implement the revisions as a
DAG. - Several Programming languages describe systems of values that are
related to each other by a directed acyclic graph. When one value
changes, its successors are recalculated; each value is evaluated as
a function of its predecessors in the DAG. - DAG are handy in detecting deadlocks as they illustrate the
dependencies amongst a set of processes and resources. - In many randomized algorithms in computational geometry, the
algorithm maintains a history DAG representing features of some
geometric construction that have been replaced by later finer-scale
features; point location queries may be answered, as for the above
two data structures, by following paths in this DAG. - Once we have the DAG in memory, we can write algorithms to
calculate the maximum execution time of the entire set. - While programming spreadsheet systems, the dependency graph that
connects one cell to another if the first cell stores a formula that
uses the value in the second cell must be a directed acyclic graph.
Cycles of dependencies are disallowed because they cause the cells
involved in the cycle to not have a well-defined value. Additionally,
requiring the dependencies to be acyclic allows a topological order
to be used to schedule the recalculations of cell values when the
spreadsheet is changed. - Using DAG we can write algorithms to evaluate the computations in
the correct order.
EDIT :
- Ordering of formula cell evaluation when recomputing formula values
in spreadsheets can be done using DAGs - Git uses DAGs for content storage, reference pointers for heads,
object model representation, and remote protocol. - DAGs is used at Trace scheduling: the first practical approach for
global scheduling, trace scheduling tries to optimize the control
flow path that is executed most often. - Ecto is a processing framework and it uses DAG to model processing
graphs so that the graphs do ordered synchronous execution. Plasm in
Ecto is the DAG and Scheduler operates on it. - DAGs is used at software pipelining, which is a technique used to
optimize loops, in a manner that parallels hardware pipelining.
Good Resources :
- http://www.biomedcentral.com/1471-2288/8/70
- http://www.ncbi.nlm.nih.gov/pubmed/12453109
- http://www.ericsink.com/vcbe/html/directed_acyclic_graphs.html
- http://xlinux.nist.gov/dads/HTML/directAcycGraph.html
9
The answer is that it doesn’t have much of anything to do with programming. It has to do with problem solving.
Just like linked-lists are data structures used for certain classes of problems, graphs are useful for representing certain relationships. Linked lists, trees, graphs, and other abstract structures only have a connection to programming in that you can implement them in code. They exist at a higher level of abstraction. It’s not about programming, it’s about applying data structures in the solution of problems.
If still you want some relationship with programming then please consider following points:
- DAG (known as Wait-For-Graphs – more technical details) are handy in detecting deadlocks as they illustrate the dependencies amongst a set of processes and resources (both are nodes in the DAG). Deadlock would happen when a cycle is detected.
- Once you have the DAG in memory, you can write algorithms to:
- make sure the computations are evaluated in the correct order (topological sort)
- if computations can be done in parallel but each computation has a maximum execution time, you can calculate the maximum execution time of the entire set
1
Other people have applied DAG to data, but I think it is at least as applicable (if not more so) to code. Mahbubur R Aaman mentions this, so really this is more of an addendum to his answer than a complete answer on its own.
It occurs to me than any imperative computer program that is free of infinite loops (thanks @AndresF.) is a Directed Acyclic Graph (DAG). Meaning that the possible paths of execution of the code are directed (first this, then that), and acyclic (not forming infinite loops). They are a graph because the path through any significant code is rarely as simple as a list or a tree.
I worked in XSLT for maybe 4 years. I had a terrible time trying to explain why it was not a good general purpose programming language, but DAG is the reason. Specifically, XSLT is a data driven language. You define functions (yes, in the functional-programming sense) but you don’t necessarily call these functions from your code. Rather, XSLT sets up a combination of selection of, and iteration through, the nodes of an input XML document. This lets the structure of the input data determine which functions are called and in what order.
This was very interesting and very cool until your program encountered a data condition that you didn’t test for at 2:30 AM and you had to wake up and fix it. When you let the data define the DAG, then the definition of the DAG becomes all the possible input conditions – which for any non-trivial business application are beyond incalculable; they are unimaginable.
At first I thought that functional programming may not be a DAG because execution order is sometimes not clear to, or even thought about by the programmer. But a functional program does define dependencies. In fact, the declarative nature of functional programming could be thought of as defining only dependencies (a^2 = b^2 + c^2) without specifying execution order (it does not matter whether ‘b’ or ‘c’ is squared first, so long as they are both squared before being added together).
But while Functional programming may be deliberately vague about order of operations at a detailed level, it is exquisitely clear about dependencies. These are the very features that make it so amenable to concurrency. In any case, there is still a graph of paths through the code, and that graph is still directed (dependencies must be evaluated before dependent tasks), so I think DAG applies there as well.
Nice question – thanks for posting!
2
Currently DAG is underrated in programming. Historically many things related to development were made with trees and hierarchies because moving something in a box is convenient for our brain to make complex things easier to manage. But if you look at events and how they depend on other events and states then you will get DAG because anything in our life and in the program can depend on anything in the past but not in the future, so you would get perfectly “acyclic” relations to be applicable to DAG concept. Although this rarely used explicitly in development, having this in mind would help to understand things better
I am wondering what’s the advantage of Plasm in Ecto…
DAG can be used to model a collection of tasks in a sequence with the constraint that certain tasks must be done before the others. Ecto is a processing framework and it uses DAG to model processing graphs so that the graphs do ordered synchronous execution. Plasm in Ecto is the DAG and Scheduler operates on it.
in what other situations can we exploit the concept of DAG?
- DAWG is a data structure that represents a set of strings and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length.
- Git uses DAGs for content storage, reference pointers for heads, object model representation, and remote protocol.
1
As a real world example, our software is similar to an IDE where the end user can define a series of operations to be performed on an image (machine vision inspection). These inspections can have dependencies on other inspections or may have inspections dependent upon them. Since this is all configurable by the end user, we cannot make optimizations for parallel processing at design time. By representing these inspections and dependencies as a DAG, we can optimize the parallelism of the overall inspection for maximum performance at run time.
Just for another example, the memory management rules in Cocoa apps are made so that all strong references form a directed acyclic graph, which is done to guarantee the absence of leaks.
Adding another answer as have not seen a reference to build systems like make
which uses DAG to find out dependencies for building.
More details here
5