Slow Down Your DSL Recipe in Clojure

Recipe history

The recipe should be used for making your elegant (in/ex)ternal DSL code much more slower than it could be. It is said that using suggested ingredients correctly can make your code slower up to 10 times without lack of elegance and readability. (All the examples are based on hypothetical Porter (also called Porter 1) stemming algorithm implementation).

Ingredients: Clojure, understanding of partial function application and what’s under the hood of Clojure data types.

Step I: Parse strings manually when regular expressions could do the dirty work

Just try to figure out whether ‘Y’ letter is a consonant or not. Don’t use regexp. Ever. Did you crave for slower code? Now you have it. So think twice before using regular expressions with JVM-based languages because it can make your code much faster.

Step II: Use partial functions everywhere to beautify your DSL

For example in step 2 we could use the following rule definitions:

1
2
3
{:condition (m > 1) :suffix1 "logi" :suffix2 "log"}
{:condition (m > 1) :suffix1 "ational" :suffix2 "ate"}
;; and so on

where m is a function that takes two arguments, a function and a number, and returns the partial function which is waiting for the last argument (a given word in our case) and returns the measure of the word. Something like this:

1
2
3
(defn m
  [f x]
    (partial #(%1 (measure %3) %2) f x))

where measure takes the word and produces its measure. In spite of Clojure’s prefix notation we could easily make the readable infix expressions in our DSL. It looks nice, doesn’t it? Of course it does! And what’s more important for our recipe that it evaluates slower than it could do! Of course we could avoid it and write something like this:

1
2
3
4
5
(let [m-cond (m > 1)]
  (apply to-given-word
    {:condition m-cond :suffix1 "logi" :suffix2 "log"}
    {:condition m-cond :suffix1 "logi" :suffix2 "log"}
    ;; and so on ))

It’s always faster but not so “DSLish” as the initial example.

Step III: Don’t use partial functions with map, filter and so on.

What? “To use or not to use [partial function application]” you might be asking. As always the best way is to show it by example. Imagine that we have some data structure, let’s say a vector of letters:

1
(def letters [\a \b \c \d \e])

So if we want to check if there are any letters in the other vector that equal to the last letter of the existing vector we could use the built-in function some like this:

1
(some #(= (peek x) %) [\f \g \h \i \j])

In our case it is just a great solution because it is slow :). As you might guess one of the approaches to make it faster (up to 3 times) is to use partial function instead of our anonymous function:

1
(some (partial = (peek x)) [\f \g \h \i \j])

(And yes, peek is faster than last for vectors).

Step IV: Use hashes as frequently as you can

Here is the example of rule definitions you have already seen before:

1
2
3
{:condition (m > 1) :suffix1 "logi" :suffix2 "log"}
{:condition (m > 1) :suffix1 "ational" :suffix2 "ate"}
;; and so on

You already know what to do (or not) with these (m > 1) expressions. I’m sure you know better how to make a DSL slower and the recipe tastier. But it’s not what makes the code much much slower. That’s our last ingredient - curly braces! Of course not as some additional characters or tokens in our expression but as a data structure behind it. Good Old The Java Hash Map. As using curly braces in Clojure is much more pleasant than creating these cumbersome definitions in Java (Map<Symbol, Object> rule = new HashMap<Symbol, Object>(); vs {}) it creates the illusion of some kind of magic under the hood. The trick is that it’s still The Java Hash Map and creating a bunch of them in our DSL is what makes it much much slower than it could be. So as you can see the rule is pretty simple - just create as many hashes in the form of curly braces as possible to kill the performance of any DSL.