Tuesday, July 10, 2012

Recently I needed to parse a list of reports and their recpients and create a list of the recipients with the number of reports they each received. The starting point was a CSV file of each report and a semicolon-delimited list of its recipients, like so:

REPORT_NAMERECIPIENTS
report_1bob@bigcorp.com; alice@bigcorp.com
report_2alice@bigcorp.com; eve@bigcorp.com
......

With an end result of:

EMAILCOUNT
bob@bigcorp.com1
alice@bigcorp.com2
eve@bigcorp.com1

Pretty simple. Now, full disclosure, I ended up doing this in Excel because I needed it done quickly, but later, when I had a few minutes to spare, I went back and thought about how I would do it in Clojure. Here's what I came up with:

A couple of things to note here:

One, because we're using the ->> macro, we eliminate a lot of the tedious "b = do_something(a); c = do_something_else(b); etc" code that you're probably used in other languages (this is, by the way, something you could accomplish by function composition -- i.e. c = do_something_else(do_something(a)), but it tends to look awful and make your code unreadable). The ->> macro allows for much tidier function composition by making the result of each function evaluation the last argument to the next function call. Hence the file location becomes the argument to text-reader, resulting in a file, which becomes the argument to slurp, which becomes a giant text string, and so on.

Two, whereas in procedural code we would normally use something like a hash map with each email address associated to a counter incremented each time we encounter that email address, in Clojure we tend to avoid the use of mutable state by various means, in this case, recursion. The loop/recur special form is used for recursion. loop sets up a recursion point, complete with bindings of the kind you're used to seeing in function definitions and let bindings, then recur calls the "function" that recur set up with new values. In this case, the new values are the hash map with the emails and their respective counts, the next email address to be counted, and the remainder of the list. When the list is empty, the function prints a nicely formatted list of the emails and their counts.

No comments:

Post a Comment