Uneven Ramblings

Lasp and the Google Summer of Code

2016-08-16


Update

If you want to read a more formal explanation, we wrote a paper about our optimization. Grab a copy from arxiv.

Also, I'll be speaking at AGERE! later this year, colocated with SPLASH 2016. If you're attending, please feel free to drop by and say hi!


Over the past few months, I've been working in Lasp as part of the 2016 Google Summer of Code. Here I want to share some of my experiences, as well as give an overview of what I worked on during that time.

Lasp is a new programming model designed to simplify large scale, fault-tolerant, distributed programming. Lasp programs follow a dataflow, functional approach. The runtime seamlessly adapts to a dynamic network topology and replicates your data to different nodes.

My work was mostly focused on run-time optimizations. In particular, on applying deforestation techniques in order to remove intermediate values from computations.

I spent most of my time adding ways to monitor programs as they execute (we even added a control flow visualization tool, that would let you see how different functions and values interacted). I also had to modify some internal libraries that we depended on to support our optimizations.

The list with all the changes I made to the main repository is here, the list of changes to the supporting library, gen_flow, can be found here.

Some Background

What do I mean by intermediate values? And what is deforestation? (skip ahead if you are already familiar with these terms). Imagine the following piece of code:

> sum (map inc [1..10])
65

Here, we increment all values in a list, and then compute the sum of all the elements. We could imagine the above to execute as follows:

sum (map inc [1..10])
     ---------------   
       |
       V
sum [2..11]

We can already see a problem. We don't care about [2..11], we just want the sum of that list. We call this an intermediate value. In general, every time you compose functions together, you have an implicit intermediate value that holds the result of one function, and serves as an argument to the next one.

In a small program like this one, we might not care about it, but for large programs, or for intermediate values that might allocate a lot of memory, we will want to remove them. In the previous example, we could remove the intermediate list by rewriting our program like this:

> let h 0 m n =
    if m > n then a
    else h (a + inc m) (m + 1) n
  in
  h 0 1 10
65

Okay, so this isn't nice to look at. Maybe better names would help, but this is already harder to read and understand than our original one-liner. In addition, we have lost the ability to reuse the original sum and map functions.

We could let the compiler take care of it. In fact, the Glasgow Haskell Compiler already does, but you can also implement it with certain libraries. Programs without intermediate values are called treeless programs, hence the name deforestation; the process of transforming programs into treeless ones.

Graphs, Lasp and shortcuts

Intermediate values in Lasp not only waste memory, they also waste network bandwidth, increase latency and degrade run-time performance. This means that the incentive for removing these values is even greater.

When I first joined, I had several vague ideas about how to tackle the project. I looked through a lot of papers about deforestation. After some thinking, and a fruitful meeting with one of my mentors, we arrived at a promising solution, combining graph theory and control flow analysis.

Here's a simple Lasp program (currently implemented in Erlang):

{ok, {S1, _, _, _}} = lasp:declare(orset),
{ok, {S2, _, _, _}} = lasp:declare(orset),
{ok, {S3, _, _, _}} = lasp:declare(orset),

lasp:map(S1, fun(X) ->
  X + 1
end, S2),

lasp:filter(S2, fun(X) ->
  X rem 2 =:= 0
end, S3).

This creates a dataflow computation that reads values from S1, increments the contents and writes them to S2. Then, it keeps the even values, and writes those to S3. Here, our intermediate value is S2. We can also visualize the previous program as a graph:

S1 -> (map inc) -> S2 -> (filter even) -> S3

In fact, we can represent all Lasp programs as directed acyclic graphs. When they represent the dependencies between the values, we call them acyclic dependency graphs.

In our previous example, we could easily spot S2 as our intermediate value. However, for larger programs, we might want to find a way to automatically find these values.

If we modify our previous graph, and encode the functions as edges in the graph:

   map inc   filter even
      |           |
      V           V
S1 ------> S2 ----------> S3

then we can start to form an intuition about what intermediate values look like: any value, which, when represented as a vertex in our dependency graph, only has one parent and one child.

We can extend this definition to paths of arbitrary length. In S1 -> S2 -> ... -> SN-1 -> SN, the intermediate values are [S2 .. SN-1].

Alright, so now we have a way to identify intermediate values in our program. The next step is to remove them. Going back to our first Lasp example:

   map inc   filter even
      |           |
      V           V
S1 ------> S2 ----------> S3

we can sidestep S2 completely by connecting S1 to S3:

   map inc   filter even
      |           |
      V           V
S1 ------> S2 ----------> S3
 |                        ^
 |________________________|
            ^
            |
  (map inc) ∘ (filter even)

Given that we have composed the old functions into a new one, we can remove S2 from the graph, as well as the intermediate edges:

 (map inc) ∘ (filter even)
           |           
           V           
S1 -----------------> S3

This is what we call edge composition. More generally, it involves a process called edge contraction, and it has been used before to speed up route planning in road networks.

Now you might read the above and ask yourself, “Aren't the intermediate values still there, implicit in the composition of the old functions?”

And you'd be right, as the Erlang VM doesn't apply deforestation to programs1. However, the cost of these values, in Erlang, compared to to the cost of intermediate values, in Lasp, is really small.

What about the network?

We're still not done, however. Remember, Lasp runs the same program concurrently on different machines. When you're dealing with the network, you have to assume that some machines will get disconnected.

What would happen if we applied our optimization in a disconnected node, and discovered a new function writing to S2 when we rejoined the network?

        (map inc) ∘ (filter even)
                  |           
                  V         
(Node 1):  S1 ------------> S3
                  S2
                   ^
(Node 2):  S4 _____|
                ^
                |
            map square

When this happens, we undo our optimization (via a process called vertex cleaving) at the disconnected node:

   map inc
      |
      V
S1 ------   filter even
        |      |
        V      V
        S2 ------> S3
        ^
        |
S4 ------
     ^
     |
  map square

We are able to do this because, when we removed S2, we stored the original functions   map inc   and   filter even .

Can this lead to inconsistencies? What if S4 read from S2, instead of writing to it? Wouldn't it read old data? Lasp sidesteps this issue by implementing all its variables with special data types called Conflict-Free Replicated Data Types, or CRDTs2, for short. One of the many properties these data types guarantee is that, no matter the order of updates on a particular instance, the final state will always be the same.

Results and Future Work

After implementing the optimizations described above (we called dynamic path contraction to the combination of edge contraction and vertex cleaving), we ran several benchmarks on different application topologies. For each application, we identified the longest path in its graph representation, and measured the time it took for an update on one endpoint to reach the other. Depending on the topology, we found a 25% to 75% decrease on end-to-end distribution latency.

While the results are very promising, there are still some things that could improve them even more. In particular, there are certain types of intermediate values that the current implementation isn't able to remove, mainly those that involve binary functions. There are also some operations that aren't currently monitored, like stream, that introduce non-determinism into the system.

With that said, I look forward to keep working on these issues and be able to improve Lasp.

All in all, it's been an amazing experience. I've received a lot of support from my mentors and teammates; the Lasp team is full of incredible people. Thank you.


1 – There has been some work done in this area, mainly by Avgerinos, Sagonas and Weinholt. ↩︎

2 – Christopher Meiklejohn has collected a great list of links and information about CRDTs. ↩︎