Akka Persistent Views

In this post I’d like to talk about some aspects of Akka persistence (being still experimental at the moment of writing). The official documentation does a good job explaining the concept behind the Akka persistence and how to make an actor to be persistent. However I noticed one thing to be confusing to many people when dealing with akka-persistence at the beginning. Persistent View. Before we talk about it let’s start off with the pattern that will help to understand persistent views better.

CQRS (Command Query Responsibility Segregation)

The idea of CQRS pattern is to use a different model to update data and the model to read data instead of more classical way doing CRUD (create, read, update, delete) operations at the same time and place. In terms of CRUD, create, update and delete are Commands and read is Query. One of the architectural patterns CRQS fits with is Event Sourcing (you can read how Akka persistence implements this idea in the official documentation). Before looking at persistent views let’s take a look at non persistent actor first.

CQRS with Non Persistent Actors

Let’s imagine that there is an actor which keeps transactions in its internal state. We can use the same actor for storing new transactions and reading them. One of the issues with such a design is need of having absolutely different representations to display. For instance we would like to show all invalid transactions on the administration panel and the current balance for particular account. The main issue with this is not the code becoming more complex. If the actor is in the middle of updating the state it cannot be queried what makes the system less responsive. To separate commands and queries most likely we will end up with more than one actor (each with its own state reflecting changes from the main actor). It will work pretty well until the main actor (with all the transactions) goes down. When this happens the state is lost forever. The other actors can still function but the data will stay out of sync.

Persistent Views and CQRS

To make it clear the Persistent View is a Query part of CQRS. The persistent actor is used for managing state (persisting events, deleting messages, restoring from the journal, saving the snapshots, etc.). The persistent views are polling message from a persistent actor’s journal directly (instead of being coupled with the persistent actor itself). All they have to know is identifier of the persistent actor. If the persistent actor dies it does not affect the persistent views as they still can serve data from the journal. When the persistent actor is recovered the persistent views will become consistent eventually. Moreover the persistent view can use another sources to optimize data presentation. Basically it’s as simple as that. Use persistent actors for handling commands and persistent views for queries.

Persistent Views in Action

The working demo project can be found here. The source code consists of 3 files where one of them is the persistent actor and another two are the persistent views. TransactionActor is a persistent actor that persists transactions. All the machinery in the source code is about maintaining and recovering state. InvalidTransactionsView is a persistent view that shows all invalid transactions. You can see how it’s decoupled from the TransactionActor. Another persistent view is BalanceView. Instead of keeping the list of transactions it keeps a map where stores the current amount per account. In traditional approach both persistent views would be the methods of the same object. In case of separate views even if the persistent actor and one of the views die the remaining view will be able to process the queries.

I think Persistent Views play an important role when using event sourcing. They help to follow CQRS pattern more easily than ordinary actors. I would even say that they are as important as the persistent actors. I hope the official documentation will be more verbose about them and that this analogy with CQRS can help to develop intuition.

Scala: Function Object to the Rescue

The Function object has been there since Scala 1.0 version. It provides some utility methods for dealing with higher-order functions. Despite the simplicity and usefulness of some of them I found that not only do many beginning developers not use it, but they don’t even know about them. In this post I would like to remind you about some of the functions I find useful in some cases.

chain

To get started let’s introduce 2 extremely simple functions from Int to Int. One of them, let’s call it inc, will increase a number by one and the other one, double, with multiply a number by 2.

1
2
scala> val inc: Int => Int = x => x + 1
scala> val double: Int => Int = x => x * 2

Note: that for the both functions we could use shorter definition form like val inc: Int => Int = _ + 1. So what do we do when there is a sequence of these functions and we want to combine them? One of the most popular option in my practice is the following:

1
scala> List(inc, double, inc) reduce (_ andThen _)

This piece of code takes a sequence of functions and combines them using andThen method starting with the first one in the list and returning resulting function from Int to Int. It is by no means bad code. Personally I like it. Let’s see how it can be simpler with using chain function from the Function object:

1
scala> Function.chain(List(inc, double, inc))

It does exactly the same as the previous snippet but it might be more intuitive for the beginning developers. If we import all the functions from Function object first it will be dead simple:

1
2
scala> import Function._
scala> chain(List(inc, double, inc))

Note: If you want to use compose instead of andThen you still can do this with chain by reversing the sequence:

1
scala> chain(List(inc, double, double).reverse)

tupled

Imagine there is a tuple of 2 elements, id and name, representing a user wrapped into Option:

1
scala> val user: Option[(Int, String)] = Some(1, "Bob")

And a function, let’s call it auth, which accepts id and name as two different arguments:

1
scala> val auth: (Int, String) => Boolean = (id, name) => id == 1 && name == "Bob"

We can’t just map the auth function over the user because the former expects 2 arguments and our user is one tuple of 2 elements. One of the options would be extracting id and name and pass them as the separate arguments to the auth function. It requires some boilerplate code to write. This is where we can use tupled function from the Function object. What it does is taking a function of let say 2 arguments and convert it to a function taking a tuple of 2 elements with the types of the initial arguments. That’s exactly what we need to map over the auth function:

1
2
scala> user map auth.tupled
res4: Option[Boolean] = Some(true)

There are tupled functions defined for tupling functions of arity from 2 to 5 inclusive. I rarely use tuples with elements more than 2 or 3 so I find it convenient.

Technically in the previous example the function tupled was called on the function itself (as it’s defined both in the Function object and Function* traits). Another “trick” that can be useful is mapping over a map in what many people would say a natural way. (Also it demonstrates using the tupled function from Function object and not defined in Function* trait). In Scala you can’t write code like this:

1
2
3
4
5
6
7
8
9
scala> val m = Map(1 -> "first", 2 -> "second")
scala> m map { (k, v) => s"$k:$v" }
<console>:12: error: missing parameter type
Note: The expected type requires a one-argument function accepting a 2-Tuple.
      Consider a pattern matching anonymous function, `{ case (k, v) =>  ... }`
              m map { (k, v) => s"$k:$v" }
                       ^
<console>:12: error: missing parameter type
              m map { (k, v) => s"$k:$v" }

What many developers would do is something like this:

1
2
scala> m map { case (k, v) => s"$k:$v" }
res5: scala.collection.immutable.Iterable[String] = List(1:first, 2:second)

You can achieve the same using the tupled function:

1
2
scala> m map tupled { (k, v) => s"$k:$v" }
res6: scala.collection.immutable.Iterable[String] = List(1:first, 2:second)

Maybe not so useful but helps to understand its application.

unlift

To illustrate the usage of unlift let’s write a function that takes an Int and returns Some(x) if x is equal or greater than zero and None otherwise:

1
scala> val f: Int => Option[Int] = Option(_) filter (_ >= 0)

What unlift function does is it turns a function A => Option[B] into a PartialFunction[A, B]. It lets us to use our function in any place where partial function is required. To make it clear that’s how our function can be used to filter out the positive integers in the list:

1
2
3
scala> import Function._
scala> List(-1,0,1) collect unlift(f)
res7: List[Int] = List(0, 1)

I don’t use this on a daily basis but there are some cases like this when it comes very handy.

Bonus: there is an opposite function defined in PartialFunction called lift. To see how it is related to unlift function the following equation is always true:

1
2
scala> f == unlift(f).lift
res8: Boolean = true

uncurried/untupled

These functions are the opposite to curried and tupled respectively. I didn’t see them used as much as the functions described above.

Although there is nothing new I found that it’s easy to forget about some useful API provided by Scala standard library. I hope this reminder is on time and can save you a couple of lines of code now or in the future.

Ruby: Symbol#to_proc Is a Lambadass

It all started with adding to_proc method to the Symbol class. It works pretty simple and looks even better. Instead of writing

1
2
irb > [Object, Kernel, Class].map {|cls| cls.name }
=> ["Object", "Kernel", "Class"]

it can be written as

1
2
irb > [Object, Kernel, Class].map(&:name)
=> ["Object", "Kernel", "Class"]

And what it does is just calling to_proc method on the symbol :name (which returns a Proc) and converting the proc to a block with & operator (because map takes a block, not a proc).

The naive implementation of the Symbol#to_proc would look like this:

1
2
3
4
5
class Symbol
  def to_proc
    Proc.new {|obj, *args| obj.send(self, *args) }
  end
end

After fixing cases with arrays and possibly monkey-patched send method you will end up with something like this:

1
2
3
4
5
class Symbol
  def to_proc
    Proc.new {|*args| args.shift.__send__(self, *args) }
  end
end

It’s all well understood and described over the Internet. To understand why Symbol#to_proc is a lambadass (and what it means) let’s move on to the kinds of Ruby Procs.

Different kinds of Ruby Procs

As you know there are two kinds of Ruby Procs - procs and lambdas. They not only differ in how they check arity and treat return keyword, but also look different in irb:

Proc
1
2
irb > Proc.new {}
=> #<Proc:0x007f944a090c50@(irb):1>
Lambda
1
2
irb > lambda {}
=> #<Proc:0x007f944a08ac38@(irb):2 (lambda)>

You can see (lambda) suffix displayed for lambdas only and something like context (irb):2 for both of them. It turns out that there is a third kind of procs which I call lambadass but let’s talk about lambda scope or context at first.

Scope

Procs and lambdas (which are objects of class Proc too) are closures like blocks. The only thing which is important for us is that they are evaluated in the scope where they are defined or created. It means that any block (or proc or lambda) includes a set of bindings (local variables, instance variables, etc.) captured at the moment when it was defined. Simple example demonstrating this in action:

1
2
3
4
5
6
7
8
9
10
11
irb > x = 1

irb > def z(x)
irb >   lambda { x }
irb > end

irb > lambda { x }.call
=> 1

irb > z(2).call
=> 2

As method definition is a scope gate the only known binding with name x inside method z is the method parameter. To visually grab the context of defined lambda you can consider {} as constructor (not the lambda word).

It also means that the lambda defined inside method body knows nothing about any bindings defined out of the method scope:

1
2
3
4
5
6
7
8
irb > z = 1

irb > def x
irb >   lambda { z }
irb > end

irb > x.call
NameError: undefined local variable or method `z' for main:Object

Despite the fact that the lambda was called on the top level it was defined in the method where binding with name z didn’t exist. Once the scope or context is captured it remains the same inside the block no matter where it’s called from.

Lambadass

Lambadass is a proc or a lambda which looks similar to a normal proc or lambda but behaves differently.

So let’s back to the Symbol#to_proc. Usually it is used instead of block in such a method like map. As to_proc method returns a proc what if we want to use it standalone as any other proc? Let’s do just that:

1
2
3
4
irb > lambda &:name
=> #<Proc:0x007fcfa305dca8>
irb > (lambda &:name).call Class
=> "Class"

And it works! It does exactly what we’re expecting! So it looks like it is identical to

1
2
3
4
irb > lambda {|x| x.name }
=> #<Proc:0x007fcfa30465a8@(irb):8 (lambda)>
irb > lambda {|x| x.name }.call Class
=> "Class"

Cool!

But wait a minute… Why does the returned Proc object look different? #<Proc:0x007fcfa305dca8> instead of #<Proc:0x007f944a090c50@(irb):1>? It’s still a Proc, it’s still a callable object but it’s missing something. Looking at the object representation I would say it’s missing a context. How can we check it?

Binding

All the bindings captured from the scope where a block is defined are stored in the Binding object. We can get it by calling Proc#binding method:

1
2
irb > lambda {}.binding
=> #<Binding:0x007fcfa30363b0>

One thing we can do with Binding object is to evaluate any binding captured by block:

1
2
3
irb > x = 1
irb > eval('x', lambda {}.binding)
=> 1

or

1
2
3
irb > x = 1
irb > lambda {}.binding.eval 'x'
=> 1

It will raise an exception if binding with such a name is not defined but every block (or proc) has an associated binding object.

1
2
3
4
irb > lambda {}.binding
=> #<Binding:0x007fcfa40ab808>
irb > lambda {}.binding.eval 'y'
NameError: undefined local variable or method `y' for main:Object

Meet the Lambadass

Now let’s try to get the binding object of the proc created using Symbol#to_proc:

1
2
irb > (lambda &:name).binding
ArgumentError: Can't create Binding from C level Proc

Obviously there is something wrong with it. It turns out that Symbol#to_proc method is implemented in C in MRI (Matz’s Ruby Interpreter which is written in C). Of course it doesn’t make any sense to get the context of C level Proc object (would be nice though).

Let’s try another interpreters.

Rubinius

1
2
3
4
5
rubinius > x = 1
rubinius > lambda {}.binding.eval 'x'
=> 1
rubinius > (lambda &:name).binding.eval 'x'
NameError: undefined local variable or method `x' on name:Symbol.

We got the exception again. But it says that binding with name x is not defined. As Rubinius (at least Symbol#to_proc) is written in Ruby itself let’s look at its implementation:

symbol19.rb link
1
2
3
4
5
6
7
def to_proc
  sym = self
  Proc.new do |*args, &b|
    raise ArgumentError, "no receiver given" if args.empty?
    args.shift.__send__(sym, *args, &b)
  end
end

It looks very similar to what we initially defined. So what’s the problem? Let’s look at the error message again:

1
NameError: undefined local variable or method `x' on name:Symbol.

Of course there is no variable `x’ on the symbol :name! The key to understand it is that

1
lambda {}

is defined just here, where {} are, but

1
lambda &:name

is defined inside the Symbol class in the to_proc method which knows nothing about any bindings defined elsewhere Symbol object. As a callable object it behaves correctly but the scope is absolutely different. To better understand it let’s take a look at the Binding object:

1
2
3
4
rubinius > lambda {}.binding
=> #<Binding:0x179c @variables=#<Rubinius::VariableScope:0x17a0 module=Object method=#<Rubinius::CompiledCode irb_binding file=(irb)>> @compiled_code=#<Rubinius::CompiledCode __block__ file=(irb)> @proc_environment=#<Rubinius::BlockEnvironment:0x17a4 scope=#<Rubinius::VariableScope:0x17a0 module=Object method=#<Rubinius::CompiledCode irb_binding file=(irb)>> top_scope=#<Rubinius::VariableScope:0x1484 module=Object method=#<Rubinius::CompiledCode irb_binding file=.../rubinius/lib/19/irb/workspace.rb>> module=Object compiled_code=#<Rubinius::CompiledCode __block__ file=(irb)> constant_scope=#<Rubinius::ConstantScope:0x17a8 parent=nil module=Object>> @constant_scope=#<Rubinius::ConstantScope:0x17a8 parent=nil module=Object> @self=main>
rubinius > (lambda &:name).binding
=> #<Binding:0x17e0 @variables=#<Rubinius::VariableScope:0x17e4 module=Symbol method=#<Rubinius::CompiledCode to_proc file=kernel/common/symbol19.rb>> @compiled_code=#<Rubinius::CompiledCode to_proc file=kernel/common/symbol19.rb> @proc_environment=#<Rubinius::BlockEnvironment:0x17e8 scope=#<Rubinius::VariableScope:0x17e4 module=Symbol method=#<Rubinius::CompiledCode to_proc file=kernel/common/symbol19.rb>> top_scope=#<Rubinius::VariableScope:0x17e4 module=Symbol method=#<Rubinius::CompiledCode to_proc file=kernel/common/symbol19.rb>> module=Symbol compiled_code=#<Rubinius::CompiledCode to_proc file=kernel/common/symbol19.rb> constant_scope=#<Rubinius::ConstantScope:0x14cc parent=#<Rubinius::ConstantScope:0x14d0 parent=nil module=Object> module=Symbol>> @constant_scope=#<Rubinius::ConstantScope:0x14cc parent=#<Rubinius::ConstantScope:0x14d0 parent=nil module=Object> module=Symbol> @self=:name>

You can see that in the first case the module is Object and compiled code is block in irb where in the second output the module is Symbol and compiled code is to_proc method in the file kernel/common/symbol19.rb.

Of course if you wrap lambda &:name in another lambda the scope of this top lambda will be Object because it is not defined in Symbol anymore. Anyway the scope of the inner lambda will remain unchanged:

1
2
3
4
5
6
rubinius > (lambda &:name).binding
=> #<Binding:0x17e0 ... module=Symbol ...
rubinius > lambda { lambda &:name }.binding
=> #<Binding:0x1854 ... module=Object ...
rubinius > lambda { lambda &:name }.call.binding
=> #<Binding:0x1898 ... module=Symbol ...

JRuby

1
2
3
4
5
jruby > x = 1
jruby > lambda {}.binding.eval 'x'
=> 1
jruby > (lambda &:name).binding.eval 'x'
=> 1

That’s what almost everybody I asked would expect. No errors, works identical. But if you remember self-written to_proc method, how scope is defined in Ruby and Rubinius implementation this behaviour should be wrong even if it seems the only working without big surprises.

Epilogue

There is a Proc. Sometimes it can be a lambda. The same object with different behaviour. Different from just a proc but the same across the interpreters at least. They called it lambda. They even created a new syntax for it. With current implementation of Symbol#to_proc we have third behaviour of Proc. Behaviour that differs across interpreters. I call it lambadass.

Scala String Interpolation - It Happened!

Scala 2.10 introduces String Interpolation

I always wanted Scala to have something like Ruby string interpolation. It’s a pretty small feature and somebody would definitely call it unnecessary syntactic sugar (e.g. Java has no string interpolation at all) but it always felt just right speaking about Scala. Starting in Scala 2.10 there is a new mechanism for it called String Interpolation (who would have thought!). The documentation can be found here with corresponding SIP here. It’s quite small overview so I would recommend to read it through. Although the documentation is clear (it’s not so much to be covered actually) I would like to highlight a few points.

String Interpolation is safe

There are three string interpolation methods out of the box: s, f and raw. Let’s look at s String Interpolator in action with variables:

1
2
3
scala> val x = 1
scala> s"x is $x"
res1: String = x is 1

and with expressions:

1
2
scala> s"expr 1+1 equals to ${1 + 1}"
res2: String = expr 1+1 equals to 2

So it works as expected (at least for a Ruby developer :)). Let’s try to interpolate a variable that doesn’t exist:

1
2
3
4
scala> s"name doesn't exist $name"
<console>:8: error: not found: value name
              s"name doesn't exist $name"
                                    ^

It just doesn’t compile at all! Very nice of the compiler, isn’t it? If you have read the documentation it’s not difficult to understand why it’s safe. If not we’ll touch this in the next section anyway.

String Interpolation is extensible

If you’re still asking yourself what is this s before string literal the answer is that processed string literal is a code transformation which compiler transforms into a method call s on an instance of StringContext. In other words expression like

1
s"x is $x"

is rewritten by compiler to

1
StringContext("x is ", "").s(x)

First of all it explains why using string interpolation is safe (see previous section for example). Using nonexistent variable as a parameter for method call leads to not found: value [nonexistent variable here] error. In the second place it allows us to define our own string interpolators and reuse existing ones.

To see how it’s easy let’s create our own string interpolator which will work as s interpolator with added some debug info to the resulting string:

Creation and usage of simple Log Interpolator
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import java.util.Date
import java.text.SimpleDateFormat

object Interpolation {
  implicit class LogInterpolator(val sc: StringContext) extends AnyVal {
    def log(args: Any*): String = {
      val timeFormat = new SimpleDateFormat("HH:mm:ss")
      s"[DEBUG ${timeFormat.format(new Date)}] ${sc.s(args:_*)}"
    }
  }

  val logString = "one plus one is"
  def demo = log"$logString ${1+1}"
}

In the code above implicit classes and extending AnyVal (so called Value Classes) are also new features in Scala 2.10 which we’ll talk about later. Since any interpolator is in fact a method of StringContext class we can easily use them in our own ones (in the example we use s method forming the resulting string to not bother with implementing it in our new interpolator). The string interpolation

1
log"$logString ${1+1}"

will be rewritten by compiler to

1
new LogInterpolator(new StringContext("", " ", "")).log(logString, 2)

which is a nice combination of new Scala 2.10 features itself.

This new technique is useful writing more readable code, safe and allows to extend and combine existing functionality. The only limitation that it’s not working within pattern matching statements but it’s going to be implemented in Scala 2.11 release. I would call it String Interpolation with Batteries Included :)

Good News About Unicode in Erlang

Bad News (today)

If there are good news it means there should be some bad ones nearby. So bad news about Unicode support in Erlang is that it’s just impossible to use Unicode string literals in source files because Erlang compiler assumes they are Latin-1 encoded. Therefore in order to write something like "a∘b" in source code file you should use "a\x{2218}b" or even uglier [$a, 8728, $b] both of which are equal to the original string literal "a∘b". Even if you save the source file as UTF-8 the compiler still assumes it’s Latin-1 and there is no way telling the truth so far. Another thing that can be used is keeping Unicode string literals in separate files and reading them at runtime with built-in Erlang functions. (But hey, Swedish alphabet is covered by Latin-1 charset and it’s definitely better than bare US-ASCII :)).

Good News (near future)

Now then for the good news and all I can do is to quote decisions affecting Erlang releases R16 & R17:

The board decided to go for a solution where comments in the code (in the same way as in Python) informs the tool chain about input file encoding formats. This means that only UTF-8 and ISO-Latin-1 encoding will be supported. All source files can be marked as containing UTF-8 encoded Unicode characters by using the same mechanism (even files read using file:consult/1), namely formalized comments in the beginning of the file.

The change to the file format will be done incrementally, so that the tools will accept Unicode input (meaning that source code can contain Unicode strings, even for binary construction), but restrictions regarding characters in atoms will remain for two releases (due to distribution compatibility). The default file encoding will be ISO-Latin-1 in R16, but will be changed to UTF-8 in R17.

Source code will need no change in R16, but adding a comment denoting ISO-Latin-1 encoding will ensure that the code can be compiled with the R17 compiler. Adding a comment denoting UTF-8 encoding will allow for Unicode characters with code points > 255 in string and character literals in R16. The same comment will allow for atoms containing any Unicode code point in R18. From this follows that function names also can contain any Unicode code point in R18.

UTF-8 BOM’s will not be handled due to their limited use.

Variable names will continue to be limited to Latin characters.

It looks like the right decision overall for those who want to use characters out of Latin-1 character set in string literals.

Awaiting cryptic DSLs in R18 though… :)