Archive for Languages

Clojure Protocols Part 3

Recently there have been some changes to the Clojure Protocols code out on Github. Not huge changes, but enough that the examples I wrote from Part 1 and Part 2 will no longer work. I thought I’d finish out my protocol blog entries by showing how I used it and include the new syntax. I also have a better understanding on how reify can be used (thanks Meikel) and will include some of that. First the goal of protocol usage. I have been working on some comparisons and evaluations of triplestores. Triplestores can be used to store RDF data which is a series of subject/predicate (or property)/object triples. There are many triplestores out there and of the triplestores that are out there, many have several interfaces. For example, Oracle has a JDBC interface that uses stored procedures and a Jena API that incorporates pieces of the Jena framework. This was some pretty low hanging fruit from an abstraction perspective. Whether inserting a new triple in Oracle JDBC, Jena (with Oracle) or one of the other triplestore impelementations, on the surface, it is the same. Take this subject, predicate and object and store it. The same could be said for querying it with SPARQL or deleting entries. I ended up with a protocol named TriplestoreOperations like below:

(ns revelytix.triplestore-operations)

(defprotocol TriplestoreOperations
  "Interface for the various operations allowed by a triple store"
  (create-graph [impl graph-name] "Creates a new graph of name graph-name")
  (delete-graph [impl graph-name] "Deletes graph graph-name if graph exists")
  (insert-quad [impl graph-name subject predicate object]
    "Creates a new triple, data is assumed to be a full URI")
  ;;...)

This syntax is the same. The first argument is used to pass in the implementation of TriplestoreOperations. The graph-name or model in Oracle terms, is what is going to hold the triples. The protocol exists in one namespace (called triplestore-operations above) and the implementations of the interfaces are in separate namespaces. The first is an Oracle JDBC implementation of TriplestoreOperations. It’s parameterized by the database connection details and the name of the table to store the data in.

(ns oracle.oracle-jdbc
  (:use clojure.contrib.sql
	triplestore-operations))

(deftype OracleJdbcOperations [db table-name]  TriplestoreOperations
  (delete-graph [impl graph-name]
	(let [drop-model-string (create-sql-string DROP-MODEL-SQL graph-name)
	      drop-table-string (create-sql-string DROP-TABLE-SQL table-name)]
	  (with-connection db
	      (with-open [drop-model-statement (.prepareCall (connection) drop-model-string)]
		(do
		  (drop-entailment-if-exists db graph-name "RDFS")
		  (.execute drop-model-statement)
		  (do-commands drop-table-string))))))
  (create-graph [impl graph-name]
      (let [createModelString (create-sql-string CREATE-MODEL-SQL graph-name table-name)
	    createTableString (create-sql-string CREATE-TABLE-SQL table-name)]
	(do (with-connection db
	      (with-open [createModelStatement
                                    (.prepareCall (connection) createModelString)]
		(do-commands createTableString)
		(.execute createModelStatement))))))
  (insert-quad [impl graph-name subject predicate object]
	       (create-family-triple table-name db graph-name subject predicate object))
  ;;...)

  (defn create-oracle-jdbc-triplestore-instance [table-name]
           (OracleJdbcOperations *oracle-jdbc-props* table-name)) ;;Awkward see below

One difference between the above code and the code in Part 1 or Part 2 is that the implementation parameter in the previous version of deftype disappeared. So the create-graph function above would have had only had a single parameter. I like the change, I found the original code a little confusing, wondering where the first parameter went etc. The next implementation of the TriplestoreOperations protocol was a Jena implementation of the protocol. The below code makes use of the reify function and feels a little more idomatic Clojure and less like the implementation of a protocol is something special and different from just functions. I like the refiy syntax over deftype and I’ve been moving my code over to use it. I’m going to cut a decent portion of the implementation below because it mostly calls Java APIs and is a bit noisy:

(ns jena-operations
  (:use triplestore-operations)
  ;;...)

(defn create-jena-operations-instance [jena-support-impl]
  (reify TriplestoreOperations
	  (create-graph [impl modelString] nil)
	  (delete-graph [impl modelString]
			(with-triplestore-connection ;...)
	  (insert-quad [impl modelString subject predicate object]
		       (with-triplestore-connection ;;...)
          ;;...))

The reify function call above also creates a new instance of the protocol TriplestoreOperations with the functions defined in line. There’s also not a need to create an instance of the type like is being done in the previous example. The end result, deftype or reify from a functionality perspective is the same, there’s just a different way to get there. Reading through some of the docs, it looks like reify is more dynamic and deftype results in generated code. One difference between Jena and the Oracle JDBC interface is that graphs don’t need to be created explicitly using Jena, so that method does nothing. The above code is slightly different as well in that the implementation parameter no longer disappears. Another interesting part is that the JenaOperations instance is parametrized by another protocol called JenaSupport. What I have found is that many vendors support the Jena APIs, but they implement it slightly different. It’s definitely not as pluggable as something like JDBC. This JenaOperations implementation is generic for the Jena APIs and is used by several triplestores with Jena implementations. The JenaSupport protocol abstracts things like getting a Jena connection, creating the correct implementation of Model etc which is different from implementation to implementation.

Development Gotchas

I have found a few issues when developing Clojure code that uses protocols. I’m using Leiningen and Lein Swank for development of the code. First I found that if I had AOT compilation enabled, and had run lein install, the protocol definition results in compiled code in the classes directory of the project. Where this caused a problem was when I tried to change a protocol definition. I’d make a change in Emacs, load the file with the updated protocol code and behaviour of the code would be such that I made no change to the protocol at all. What was happening was the old version of the code, the one that had the interface code generated, was still on the class path in the classes directory. Removing that code (through lein clean or something similar) allowed my changes to take affect. This problem stumped me for a couple of hours. I can avoid this entirely by just not using the AOT compilation (I don’t really need it) but others might not.

Another gotcha I found was in the loading of files that use implementations of protocols. In the example above, let’s say I have a test file (I’ll call it test-A) that executes functions from TriplestoreOperations on the JenaOperations implementation that in turn uses the Oracle implementation of JenaSupport. Just loading test-A.clj file does not cause the loading of the Jena implementation of the TriplestoreOperations, or the Oracle version of JenaSupport. Rather it just complains that there is not an implementation of TriplestoreOperations for ‘nil’. Loading those files individually fixes the problem, it just doesn’t do that automatically for me.

Comments

Clojure Protocols Part 2

Stale code warning

There have been small changes to the protocols code in Clojure. The below post is still useful, but a few details of the example code is different. See part 3 for the updated syntax.

Clojure Protocols Part 2

This is the second in the series of blog entries on Clojure protocols. The first can be found here. This entry continues by using protocols to implement Java interfaces and reify interfaces/protocols inline in a function invocation. First I’ll use reify to define an implementation of the TextOutput interface in-line of the function call. I’ll change the italics syntax to the MediaWiki italics format:

(println (output-string (reify TextOutput
		      (output-string [x] (str "''" x "''"))) "stuff"))
''stuff''

The acceptable things to reify are Interfaces (in Java) protocols or Object. I’ve not yet find a use for reify in code that I have written. One of the things that can be passed to reify are regular Java interfaces. This can also be passed to deftype to define Clojure implementations of Java interfaces. An implementation of Comparator looks like below:

(deftype ThreeCompare [] java.util.Comparator
	       (compare [o1 o2]
			(cond (= o1 3) -1
			      (= o2 3) 1
			      :else (.compareTo o1 o2)))
	       (equals [other] (isa? other ThreeCompare)))

The deftype above implements the protocol java.util.Comparator that when sorting a list of numbers will always put any values of 3 first in the list followed by the rest in ascending order. This can be used like any Java implementation of Comparator:

(def java-list (java.util.ArrayList. (list 1 2 3 4 5 6 7 8)))
(java.util.Collections/sort java-list (ThreeCompare))
(println java-list)

A nice side benefit of deftype is something that reminded me of records in OCaml:

(deftype Point [x y])
(defn midpoint [point1 point2]
	    (Point (/ (+ (:x point1) (:x point2)) 2)
		   (/ (+ (:y point1) (:y point2)) 2)))
(println (midpoint (Point -1 2) (Point 3 -6)))
#:Point{:x 1, :y -2}

The above code defines a new type Point, a midpoint function that takes two points and return a new Point that represents the midpoint of the two points.

Default Implementations

One feature I was looking for when I first incorporated protocols into some existing Clojure code was the concept of a default implementation of a protocol within a namespace. I think this would be a pretty typical usage of a protocol, you might have several implementations, but generally you’re only working with one at a time. In my case, I was testing three implementations of a protocol for an integration test. I wanted to run the same tests on all three implementations of the protocol. This presented a problem because the deftest macro doesn’t allow passed in parameters and yet each function that I called needed to be parametrized based on the implementation I was testing. I first attacked this problem with a bound variable and then just had all functions called on the protocol use the bound variable as their implementation. Then when a switch to another implementation was needed, I’d change the implementation assigned to the bound variable. This worked for me because it was just test code, but I think this will come up more in the future.

Comments (2)

Clojure Protocols Part 1

Stale code warning

There have been small changes to the protocols code in Clojure. The below post is still useful, but a few details of the example code is different. See part 3 for the updated syntax.

Clojure Protocols

Protocols are a new feature in Clojure, set to be released in the next version. They provide polymorphism in a very Clojure-ish way. I think it’s a great lightweight polymorphism implementation that has a lot of potential. In true Clojure style I think it meets the polymorphism objective and yet doesn’t need to totally change the way you already write your code in Clojure. I’m breaking this entry into more than one piece to show some different ways that Clojure protocols can be used. Because it’s so new, there are not a lot of docs out there on it, but Rich does some good documentation on the macros themselves. If you want to try these examples, make sure you’re running off of the 1.2 version of Clojure (from Clojars or a local build from the Clojure git repo). First I’ll start by defining a simple protocol:

(defprotocol TextOutput
	  (output-string [x string]))

In Java terms, I’m defining a TextOutput interface (actually a Java interface is being created, but more on that later), that has a single function named output-string that includes no implementation details. The input to this function is a little tricky though. I specified a parameter x and another one called string. The first parameter will be used to pass the implementation of the interface into the function. You don’t need to write code to handle the parameter x and when you write your implementation, you’ll act like it doesn’t exist. A wiki type text output of an italics string would look like:

(deftype ItalicsOutput [] TextOutput
	       (output-string [string] (str "_" string "_")))

I have begun thinking about this in Java terms as a class ItalicsOutput that implements the TextOutput interface. Here in the output-string function, I only specify one parameter (not two). Next you can use this implementation with the following code:

(output-string (ItalicsOutput) "stuff")
"_stuff_"

I’m telling Clojure I want it to execute the output-string function, on the implementation (ItalicsOutput) (more in this below) with the argument “stuff”. I think that below is a little more readable:

(def italics-impl (ItalicsOutput))
(output-string italics-impl "stuff")
"_stuff_"

Which just assigns the instantiated implementation to a variable which can then be used. These implementations can also have parameters, like:

(deftype PrefixedOutput [prefix-string] TextOutput
	       (output-string [string] (str prefix-string " " string)))

I think passing a variable in makes the instantiation step make a little more sense:

(def prefix-with-more (PrefixedOutput "more"))
(output-string prefix-with-more "stuff")
"more stuff"

Both implementations can be used together as well:

(defn print-all []
	(let [italics-impl (ItalicsOutput)
	     prefix-with-more (PrefixedOutput "more")]
	     (println (output-string italics-impl "stuff"))
	     (println (output-string prefix-with-more "stuff"))))

With output that would look like:

(print-all)
_stuff_
more stuff

Comments (4)

SICP – Chapter 1

I have began reading through Structure and Interpretation of Computer Programs through a study group (a spawn from the Lambda Lounge). A classic computer science textbook, I’ve wanted to read it for a while now, and I’m amazed that thus far I have avoided reading it. Maybe because it’s older is the reason I missed the SICP cut-off. I have to say, I’m impressed with the pace of the book. It’s partially a function of the language, but I really like how the text gets the the bare metal, in that it builds everything from the ground up. Scheme allows it to do this in that many things that are syntax in other languages (like basic arithmetic operations) are not syntax in Scheme. Languages like Java have operations like addition and division built into the syntax of the language. My answers to the exercises in the textbook are here. They are written in Clojure and I’ve been pretty surprised at how closely Clojure code corresponds with Scheme code.

The Good Stuff

Like I described above, Abelson and Sussman getting to the bare metal in terms of Scheme I think is a real benefit. I’m sure it took a lot of restraint to not use the fancy macros or functions early on and start small. I really liked the way that they described the benefits of tail recursion. I have been asked on several occasions to give such a description. I have to say their approach using visuals is much better than mine. I will definitely be borrowing theirs when asked that question in the future. Building on that I though that the exercises that were in 1.2 did a good job of covering how to go about making something tail recursive. I think their coverage of higher order functions was thorough and look forward to them revisiting the flexibility of this in the coming chapters.

The Bad Stuff

I thought their coverage of recurrences and asymptotic growth was particularly bad. I think recurrences are a very tough topic and section 1.22 barely skimmed the surface enough to give an exercise like 1.13. Maybe the students at MIT had a prerequisite that covered that or something, but I spent many hours in grad school trying to understand recurrences and I know I would have been drowning with such a light coverage of the topic. Maybe asymptotic growth will be covered in depth in another section, but I though that just skimmed the surface as well. The only other negative comment I have is that the amount of math makes the book less approachable. I know why they did it, it’s the only base that could be built upon easily for them to use their bare metal sort of approach. I also don’t think you need a very extensive math background to read it, they don’t come in expecting a whole lot. But math is intimidating to many people and just having something that smells and looks like hard math will turn people away.

Conclusion

In conclusion, chapter 1 from SICP has been worth the time and definitely a good start for a foundation in computer science. I think that the first chapter is a good read for any software developer. I’m looking forward to continuing with the rest of the book.

Comments

Worse is Better and Clojure

I’ve been writing code in Clojure now for a few weeks and I’m really enjoying the simplicity and power of the language. I think that the progress being made right now in the Clojure community is great and that there are definitely good things to come. I couldn’t help but thinking back to the Worse is Better series of papers the first week or so I was learning the language. For those that haven’t read the paper, I’d highly recommend it, along with the rebuttal found here and another here. I wrote a blog entry about it about 3 or 4 years ago, but unfortunately it looks like it’s been taken down. It was on a company blog and it looks like it’s been replaced with another blog system.

I remember reading the article for the first time and realizing how right Richard Gabriel was and how I wanted him to be wrong. The realization that the best solution to the problem isn’t always the right solution floored me. As someone who enjoys a hard problem and tries hard to come up with the best solution I can to problems, the C analogy was very thought provoking. This brings me to Clojure. Clojure seems like it might well be the compromise talked about in Worse is Better, yet with enough of the essence of Lisp to still have the right solution. There’s no doubt that the Clojure folks have had to make some compromises to fit into the mould of the JVM. An example is Tail Recursion in Clojure, implemented via the recur special form. As a user of the language, I obviously would prefer tail calls to just work, without me having to tell it. That is a hard problem in the context of the JVM, so I understand the decision. This felt to me like the PC loser-ing problem In the context of Worse is Better. Although the right decision might be to crack the hard problem or worse yet, wait for the tail calls on the JVM, this seems like a small trade-off that is still workable.

Another good call by the Clojure folks, in my opinion, is the Java integration. Below is a quote from Worse is Better on integration:

In the worse-is-better world, integration is linking your .o files together, freely intercalling functions, and using the same basic data representations. You don’t have a foreign loader, you don’t coerce types across function-call boundaries, you don’t make one language dominant, and you don’t make the woes of your implementation technology impact the entire system.

Sound familiar? Not only is calling Java from Clojure seamless, there’s actually syntax sugar (through macros) to make calling Java code easier. No need to convert everything over to a specific Clojure object format or anything like that, it just works. You might have to make a Java collection seq-able or something similar, but it’s pretty minimal fuss. There are also facilities for Clojure code to create Java proxies and Java interfaces (though I’ve not used them). This allows Java code to integrate with Clojure code. It seems to me that the Java integration in Clojure very much fits with the quote from Richard Gabriel. This tight integration I believe will be the path in to Clojure for many developers.

Comments

Emacs Talk Online

A video of the talk I gave at Lambda Lounge last Thursday can be found here.

Comments

Design By Contract with Clojure

I just learned about the design by contract features of Clojure, and I’m impressed by the simplicity. It’s implemented using regular Clojure metadata (i.e. no new language constructs to support this). {Small correction to this previous statement. It looks like metadata, and can be read as metadata, but is actually compiled into the function (i.e. can’t be modified at runtime). Thanks for the correction Alex.} Several times I desired DbC in Java and have tried some of the libraries written for Java. The Java ones were generally built on comments or annotations. Bottom line is that they just didn’t feel like they seamlessly integrated into the language, and they seemed to have a short shelf life. By short shelf life, I mean there were a lot of proof of concepts and abandoned projects, but none that were viable over the long term.

DbC in Clojure

Pretty slick how it’s implemented in Clojure. First we take a normal function definition;

(defn pos-add [& args]
(apply + args))

It doesn’t really do anything interesting, just delegates to the plus operator, but should only be used for positive integers. If you’ve not seen the & symbol, it just collects all function arguments in as a sequence. So a precondition of this function is that all arguments passed into pos-add should be zero or greater. To add this, the code looks like:

(defn pos-add [& args]
{:pre [(not-any? neg? args)]
:post [(<= 0 %)]}
(apply + args))

So there are two new pieces, a :pre that takes expressions, all of the expressions must return true for the pre-condition to pass. In the example above, there is only one expression, and it ensures that there are not-any negative numbers in the argument parameters. It also insures the the result is 0 or greater. The post condition isn’t of much value here, but I added it to demonstrate where it would go. Calling the function is the same as calling any other function, but if the pre/post conditions are not met, an AssertionError is thrown. Below are some basic tests for the function:

(is (= 10 (pos-add 1 2 3 4)))
(is (zero? (pos-add 0 0 0 0)))
(is (= 5 (pos-add 1 1 2 1)))
(is (thrown? AssertionError (pos-add 1 2 3 -4)))

What led me to Clojure’s DbC features was reading On Clojure and there was a proposal for a new DbC syntax. I like DbC in the original style, but I think that the one at On Clojure has some additional benefits because it can provide some hints as to what types are expected in the function. If you read Smalltalk Best Practice Patterns by Kent Beck, he recommends to name the variables after the type that is expected. So if the method is findByName, the parameter would be aString to give the caller a hint as to what is expected. What was detailed in the On Clojure blog not only provided hints about the type but also would also let you know acceptable values just by looking at the function declaration and the accepted parameters.

I would like to see the pre/post condition information somehow worked into the documentation generated by Clojure. Seems like it would be a very useful feature for callers of APIs.

Comments (2)

Emacs Talk at Lambda Lounge

I’m giving a talk tomorrow at the Lambda Lounge on Emacs. I’m planning on spending about half the talk on some Emacs conceptual basics. When it comes to the basics I’m going to try and avoid the “this key does this” and “that key does that”. I think going over cursor movements etc will just be lost. If you’re truely new to Emacs, you’ll need to go through the Emacs Tutorial (which comes with Emacs, there’s a link to it on startup) to get the basic movements down. Instead I’m planning to go over the concepts of buffers in Emacs, the modeline, the minibuffer, the .emacs file etc. I’m also planning on spending a decent amount of time going over the help system, both the info system and the built-in Elisp documentation. Then I’m planning on going over some Emacs Lisp basics, Eshell, org-mode and a little on Dired. Then I’m planning on doing some development demos, specifically highlighting REPL environment integration into Emacs with SLIME, Tuareg Mode and maybe some Ruby.

Comments

Clojure Apply

I ran into the following situation a few times, I had a list of items, and I was attempting to call a function that accepted individual items. As an example, I had a list of contact information, where the first element is the contact name, the second, contact info, the third a contact name and so on. This is exactly how the sorted-map function works in Clojure, only it expects the items individually, not the list. The doc for sorted-map is below:

clojure.core/sorted-map
([& keyvals])
keyval => key val
Returns a new sorted map with supplied mappings.

An example of how to call it is something like:

user> (sorted-map 1 2 3 4)
{1 2, 3 4}

What I had instead was something like [1 2 3 4]. An easy way to convert the list to individual arguments is the apply function:

clojure.core/apply
([f args* argseq])
Applies fn f to the argument list formed by prepending args to argseq.

Swapping out the direct sorted-map call for an apply call looks like:

user> (apply sorted-map '(1 2 3 4))
{1 2, 3 4}

Note it returns the same result as calling the function directly. Using apply saved me from writing code to iterate over each item and manually add them to the map.

Comments

Being Lazy with Clojure

I’ve done development in some other functional languages, but I’ve done the most development in OCaml. OCaml is a functional language, but not lazy by default. There are libraries that can be used to cause data structures and such to be lazy, but you have to go out of your way to use them. Clojure is lazy by default. One interesting ramification of this is in the lazy lists that it creates. I’m still trying to learn the language, so I figured I’d parse a CSV type of file that contained contact information that I dumped from a contact application I use. I wanted to get the lines in the CSV into a list so I could take a look at them from a Clojure perspective. To do this, I wrote some code like below:


(with-open [rdr (reader "/some/directory/contactinfo.csv")]
(def lst (line-seq rdr)))

The code basically opens the file, executes the code in the body and then closes the file (think of the def lst… part as in the try block). What I was expecting to happen was line-seq to read each line and store it in lst. Much to my surprise, when I attempted to look at the first element of lst, I received an error message:

java.io.IOException: Stream closed
[Thrown class java.lang.RuntimeException]

This pointed out two interesting things to me. First, when I accessed the list (since it was lazy) it tried to pull the first line out of the file. Since I used the with-open function, that file was closed. Next, I realized that if it was trying to read in from the closed file, (line-seq) was not behaving as I was expecting. The book I’m reading by Stuart Halloway did discuss this, I just forgot. My first reaction was to pass a function that does the parsing to the function above. This would cause all of the operations on the file to occur before the closing of the file. The file I was parsing ended up not being easily parsed line by line, since most of the data items spanned several lines. I found the right function was slurp:


clojure.core/slurp
([f] [f enc])
Reads the file named by f using the encoding enc into a string
and returns it.

Slurp will just pull all of the file into a String. I was able to differential the contact title from the contact details easily with the Clojure re-partition function:


clojure.contrib.str-utils/re-partition
([re string])
Splits the string into a lazy sequence of substrings, alternating
between substrings that match the pattern and the substrings
between the matches. The sequence always starts with the substring
before the first match, or an empty string if the beginning of the
string matches.

The only trick to using it was to go back through the list and discard the substrings between the matches. I was easily able to do this is the filter function. More on Clojure and lazy lists to come.

Comments