Moving On

Yesterday I accepted a position at Revelytix. I’ll be starting in March. They’re a tech company that is working on Sematic Web technologies. I’m looking forward to the cutting edge technologies and digging into the Semantic web. I’ll be working with Alex Miller and a few others as we start on development of a new product. Expect some posts on the Semantic Web, Clojure and Scala!

Comments

Spring Remoting – A Step Toward SOA?

Spring Remoting

Spring Remoting is an RMI type of facility built into the Spring framework. Basically you define an interface and an implementation on a remote application. Spring then places a proxy in your application and when it is called, goes over HTTP to the remote implementation and returns it as if the implementation was local. It’s really quite easy to implement using Spring. A few lines of configuration of where to find the remote implementation, a few lines to expose the remote implementation over HTTP and you’re set. This ends up being a very cheap way to start having services exposed in your applications. There are definitely some downsides to this approach. The first is that it’s only Java. There are some options to use Hessian/Burlap extensions that you can use, but deeper object graphs have difficulty travelling across the wire. Another is the potential set of dependency problems that can occur when using an RMI-like solution.

RMI and Dependencies

Probably the most significant downside to RMI doesn’t really occur until you have used it for a while. Maybe you just have a few services that need to be exposed, Spring Remoting seems easy, so you use it. But then it grows, maybe other applications use it and it becomes more critical. The question is what objects are being transferred over RMI? So if you call the service and try to find the address associated with user John Doe, how is the address returned? Probably this is some type of Address class. Then the big question. Where does Address class live? The problem is the Address class needs to be available to both the server (which knows how to look up addresses) and each of the clients calling it. Changes to the service or to the objects can have significant ripple effects in the application. The problem is easy to understand, but slowly creeps up on a project and becomes a dependency nightmare. I thought that this was a logical first step toward a true web service. The problem with this line of thinking is that if it stays this way too long, theres already too much damage and the refactor is too costly.

Why not start with a web service? Web services are somewhat expensive to create so you have to make sure it’s necessary. First you must develop some form of input to be accepted. Maybe this is an XML, or a JSON object and whether or not there is a proper schema doesn’t really matter. It still needs to be thought about and defined, formally or informally. Next, code needs to be written to translate between the request and the business objects of the back end system. The same translation needs to happen for the response. The client also needs to translate to/from this same intermediate format. There are obviously things that can make this easier like code generation and such, but it’s still additional work. In early phases of a project where the inputs/outputs might be changing substantially, this can lead to a developers thrashing with the services and producing very little.

Hibernate and RMI

Another potential RMI gotcha is attempting to transfer Hibernate POJOs. First, Hibernate POJOs are special. They have lazy loaded collections and other proxied objects that are more complex than just your basic JDK objects. The immediate consequence of this is every caller of the RMI service not only needs to have the POJO classes in their classpath, but also the Hibernate jars. The more subtle consequence of this, is what happens when one of those lazily loaded collections is transferred to the caller? The objects can’t be lazy loaded from the client, the client doesn’t have the database connection etc. From here you really have three options. The first option is to enable remote lazy loading (example here). I’ve not used this, it seems far too complex and error prone. The second option involves just marking all associations non-lazy (or using joins). Lazy fetching is a nice performance feature of Hibernate and the service will no longer be able to leverage it. The third options is to add a custom object serializer to Spring remoting that will exchange the lazy collections for a real collection. This will remove the dependency on Hibernate and essentially force non-lazy loading of all associations. All of these solutions make RMI less attractive and all of them are a good indication that you should rethink the need for service remoting, or rethink using RMI over a proper web service.

Other workarounds – is it worth it?

There are several techniques that can be used that can reduce the symptoms of these problems. Aside from Hibernate, interfaces for each request and response for the data passed in and returned from the RMI service. This will reduce the amount of data available to the service, require well defined input and output and will be easier to refactor to a service later. All of this adds up to a decent amount of extra work. I think in the end, the extra time involved evens out or becomes more than a proper web service.

Lesson Learned

I think that the lesson I have learned is that Spring remoting does not give you cheap services. Rather it gives you services with a low cost of entry, but that cost climbs much more quickly. With web services, you pay more up front, and less over the long term. I think maybe the best of both worlds is to use RMI/Spring remoting for the very early stages of the project (i.e. before going to prod) so that the service can be ironed out. What input data is really needed? What should be returned? Do we know most of what the service needs to do? With answers to these questions (which will only be known after some development) we are better armed for creating a real web service. At this point, the RMI implementation can be swapped and refactored to a web service, hopefully avoiding the longer term RMI issues discussed above.

Comments

Back on the Air

I’ve not had a post for quite a while now. Graduate school picked back up in August and things ramped up at work pushing to make a deadline. These things didn’t leave much time for writing! I graduated in December and things have calmed down a bit at work so I’m looking forward to doing some more writing. More to come in 2010!

Comments

Vending Machine in Squeak

The Lambda Lounge is having a language shootout this Thursday at it’s monthly meeting. Basically many people are going to implement a vending machine each in a different language and then we’ll compare. There are many languages that are going to be represented, Ruby, Groovy, Fan, Erlang and Haskell to name a few. The spec can be found here. The objective is to implement a basic vending machine, that can vend things such as a candy bar, soda, chips, etc. It also needs to keep track of the state of the vending machine, such as how much cash it has, whether it can make change and so on. I think that it’s a good problem, requires some thought and effort, but not too tough. I caught myself a few times wanting to get carried away, so you could definitely take it to the extreme in terms of complexity.

The Squeak Vending Machine

I implemented the vending machine using Squeak. The most interesting thing about this implementation is that it doesn’t rely on any external input. There is no prompt for a string of input, which would then be parsed/interpreted and converted to some running state. Instead everything is implemented inside of the Smalltalk language. Below is an example from one of the tests discussed in the specification:

vendingMachine := VendingMachine new.
result := vendingMachine Q Q Q Q GETB.
self should: result item name = 'Chips'.
self should: result moneyBack total = 0.0.

This is a snippet from an SUnit test. The purpose of the test is to ensure that the item is vended and appropriate change is given. It also checks to make sure the vending machine has the appropriate amount of money afterwards, but I removed that from the example to make it more clear. The first line just instantiates a new vending machine (this is actually done in the setUp method of the unit test). The vending machine is initialized using a default set of items and change pre-loaded. The next line is a little more complex. := is just the assignment operator. The right side calls a method on vendingMachine. Actually it’s calling “Q Q Q Q GETB” on vendingMachine. On the back end, this is incrementing the quarters in the vending machine by 1, 4 times. Then it calls the GETB method. This triggers the vending mechanism to vend the result. That line is 5 method calls total. The next two lines check that the result is in fact a bag of chips and there is no change returned. The interesting thing here, is that the actual code to interact with the vending machine (what the “user” would type) is “vendingMachine Q Q Q Q GETB”. This is valid Smalltalk and can be executed directly.

Running the Code

If you’re interested in running the code, it can be downloaded here. You’re welcome to look at the st files via a text editor, but that’s not the intention. In Squeak, everything exists inside the VM. I think of the Squeak VM as more like a hardware virtualizing product (Xen, Virtualbox etc) and less like the JVM. Everything existing inside the VM means even the source files are in the VM image. So I didn’t just go to the directory that I’ve been working in and copy the st files, I actually had to “export” or as it’s called in Squeak, “file out” the code. This code is then ready to be “filed in” or “installed” into another running Squeak VM. When you’re working in Squeak, you don’t really worry about which file is where. For that reason, traditional version control software doesn’t integrate well with Squeak. I think there have been some recent projects to get Squeak to work with Subversion, but I have not used those libraries. For version control, most Squeak developers use Monticello as far as I know. It’s a source control repository written specifically for Smalltalk. I have used it before, but I don’t have my own repository set up.

To install and run the vending machine in Squeak, download and install Squeak per the instructions here. Then start Squeak by running the squeak executable (depends on the platform, in Linux it’s just squeak). Then follow the below instructions:

  • Download the attached squeak_vendingmachine_source.zip
  • Unzip the source anywhere you’d like, there should be 2 st files and a readme.txt
  • In the running Squeak window, left click on the background and then click open
  • Click file list and navigate to where the files were unzipped
  • Select the Vend.st file and then at the top of the window, click install
  • Do the same for VendTests.st
  • Left click on the background again and again click open
  • About 3/4 of the way down the menu, click Test Runner
  • Navigate down to the bottom of the test package list and select VendTests
  • On the right, select any or all three of the test classes and at the bottom left, click Run Selected
  • Green bars should be visible, showing that the tests passed

If you want to look at the source, you can left click on the background, select open, then class browser. In the far left pane at the top, if you scroll to the bottom, there are two packages, Vend and VendTests with the source code.

Comments (1)

April Lambda Lounge – Factor and Parrot

I attended the April Lambda Lounge meeting last week and as always, there were interesting language topics to discuss. Kyle Cordes presented Factor and Charles Sharp presented Parrot and the new Perl. I think languages like Factor make the Lambda Lounge fun. Something outside of what you see everyday and very different. I am sceptical about the embracing of the stack at least logically in Factor. Developers in OO and functional styles are taught to avoid side affects. This is for good reason. Experience shows that the more side affects methods have the more difficult they are to understand and support. In Factor, this seems like the exact opposite. Not only does it embrace side affects, but it requires it.

Comments

JSourceObjectizer – Custom Static Code Analysis

I stumbled onto a pretty nice Java library for custom static code analysis last week called JSourceObjectizer. I call this custom static code analysis because it really isn’t like PMD or FindBugs or other tools that have built in rules for what to look for. Actually, I’m not even using the library for static code analysis (more on what I’m using it for later). Basically, the library is a facade over the Abstract Syntax Tree produced by a Java parser. The parser is written using ANTLR, as well as the tree walker and can be downloaded separately here. I started with the parser, not realizing that the JSourceObjectizer was there and then quickly switched when I found it.

JSourceObjectizer

The simplest API is a fairly large, single interface based, event driven API. How I used it was the TraverseActionAdapter which has a default implementation of all of the methods in TraverseAction interface. The method calls look like:

public class Traverser extends TraverseActionAdapter{
  @Override
  public void performAction(MethodDefinition methodDefinition) {
    //Do stuff with method definition
  }

  @Override
  public void actionPerformed(MethodDefinition methodDefinition) {
    //Do stuff with method definition
  }
//...
}

Basically each language construct has two methods, performAction and actionPerformed. PerformAction is called at the beginning, in this case when the method is declared and action performed is called at the end of the declaration. This same pattern is repeated for each construct, so at the beginning and end of an if statement, try/catch statement for/while loop etc. Once the traverse actions are defined, it just needs to be hooked into to the library with code like:

public void parseFile(){
JavaSource javaSource = null;
  TraverserExample traverser = new TraverserExample();

  try{
    javaSource = new JSourceUnmarshaller().unmarshal(sourceFile, null);
  }catch (JSourceUnmarshallerException ex) {
    //Properly handle error here of course...
  }
  javaSource.traverseAll(traverser);
}

The events will be sent to the TraverseAction implementation and action can be taken based on what the code looks like. It also gives line numbers with the callbacks to make good error messages easily. It’s important to create a new instance of the TraverseAction implementation for each file parsed. I tried to reuse instances and received some pretty odd results. Creating a new instance each time solved the problem.

Putting JSourceObjectizer to Good Use

My main purpose for using this software is to parse and classify statements in a Java program for the purposes of data mining. In my case I was using a fairly large code base (JBoss) and condensing methods into smaller patterns for sequential pattern mining. This library saved me quite a bit of time. I thought I would probably find an open source parser to use and then I would have to walk the AST to find out what I wanted. This library did almost everything I needed. One thing that it didn’t do, was to have a callback around the finally block, that would be nice. It also didn’t quite parse all of the files. I think I ran into 5 or 10 that it ran into difficulties with. I still need to debug where the problem is, but it looks to be pretty deep in JSourceObjectizer.

Comments

OCaml in the Real World

There’s a great video of a talk given by Yaron Minsky on his blog that discusses why Jane Street chose to go with OCaml. It’s definitely worth the time to watch. What’s interesting about the decision was the casualness of it and the fact that it was based on merit and success. Often times at organizations the standard language is decreed. Yaron describes more of a decision based on track record. OCaml was used by Yaron for a research project there, it went well, so more people were brought in to work on it. It turned out it was easier to find good developers if you advertised for OCaml developers. I thought this was interesting, because it goes against what is a fairly commonly held belief in our industry.

Lower Cost for Reuse

One interesting point Yaron makes on the technical merits of OCaml was on reuse. He describes a very typical copy and paste problem in larger code bases. He talks about how this was much worse in object oriented languages than it was in OCaml. When pressed for more details he didn’t have any hard proof but discussed higher order functions contributes to this. I agree with him on the less code duplication and cleaner code of OCaml. It’s true that duplicate code is a sign that some refactoring needs to take place, but why doesn’t it? We all know that duplicated code is bad. I think the reason this is worse in object oriented languages is because the barrier to entry is high on reuse. To have a small piece of reusable code, it must be put into a Class, if it’s an instance method, you need to create an instance of it etc. If you have a few lines that are duplicated between two classes, what do you do?

  • Create a new class C (3 – 5 lines of code)
  • Create a method in C that wraps those few lines (3 lines of code)
  • Create an instance of that class in Class A (1 line of code)
  • Swap out the duplicate code for a call to the shared code (swap 3 lines for 1)
  • Create an instance of that class in Class B (1 line of code)
  • Swap out the duplicate code for a call to the shared code (swap 3 lines for 1)

Is this worth it? We just traded 3 lines of duplicate code for 10+. In the Java world this is still worth it, but it did take quite a bit of work to refactor those 3 lines. Coincidentally, we have just about the same amount of duplicate code as before (i.e. creating the instance of class C is the same, so is the method call). The benefit here is that the business logic isn’t what is duplicated, it’s the fluff code. So if the business logic is centralized, we have achieved quite a bit, but what about the duplication of the fluff code? In OCaml the steps would be

  • Create a new function (1 line)
  • Add those few common lines (3 lines)
  • Swap out the duplicate code in A for a call to the shared code (3 lines for 1)
  • Swap out the duplicate code in B for a call to the shared code (3 lines for 1)

The fluff in defining this shared code is minimal, because defining a function is minimal.

Comments

Type Inference vs. Duck Typing

I’ve been working with JavaFX 1.0 lately, which has a new language called JavaFX Script. The language is based on Java, but is intended to make it easier and more concise to write Swing GUIs. The language has some interesting features and the first one I noticed was Type Inference. Having done some work in OCaml, I was pretty excited to see this. One point of confusion that I think a lot of people have is the difference between duck typing and type inference. It is not necessary to explicitly define the type in either of these type systems, so they look very similar on the surface. Dig a little deeper, and they are actually quite different.

Duck Typing

Duck Typing is a dynamic language concept that boils down to this: if an object responds to a message, that’s all we need to know. Put another way, we may know that the object we’re dealing with is an Employee, and we know Employee has a calculatePay() method, but at runtime, the fact that the object is an Employee doesn’t matter, all that matters is that the object happens to have a calculatePay() method. Later, when we add a Consultant class, that has a calculatePay() method, we can treat it as if it were an Employee (in this particular case). The key point here is that it is dynamic. Only at run time does this stuff happen.

Type Inference

Type inference is when the compiler determines what type an object is based on the operations that are performed on it. The key part to this is the compiler makes this check, it is not at runtime. As an example, if we had the simple code:

private String example = "text";

Is the String declaration above really necessary? It’s quite obvious that example is a String, since it is assigned the value right there. Now if we tried to treat the example as an Integer, it would fail. It would fail not because it doesn’t have the right method, but because it’s not the right type. Another example is below:

public int add(int x, int y)
  return x + y;
}

Here we have defined x and y as integers. With type inference, we would specify x and y and the compiler would know that since + was used on it, it must be a Number. With JavaFX, trying to pass a String into a type infered method that expects a string will result in a compile time error (and if using Netbeans, a nice set of red marks).

Comments

Clean Code By Robert Martin – Part 1

I’m just started reading through Clean Code by Robert C. Martin. It’s a book on writing good software at a low level. Things like how to name variables, how methods should look etc. His examples have been in Java, but the concepts are very general. I like what he has to say in the book and I agree with a lot of what he says. I have some comments on things I found interesting below.

Switch Statements

He shares a dislike for switch statements like I do. He gave the rather typical refactoring scenario of changing the switch code to be more object oriented through an Abstract Factory and Polymorphism. The example code he gave was:

1
2
3
4
5
6
7
8
9
10
11
12
13
 
public Money calculatePay(Employee e) throws InvalidEmployeeType {
  switch (e.type) {
    case COMMISSIONED:
      return calculateCommissionedPay(e);
    case HOURLY:
      return calculateHourlyPay(e);
    case SALARIED:
      return calculateSalariedPay(e);
    default:
      throw new InvalidEmployeeType(e.type);
  }
}

This is a pretty text book refactor to an Abstract Factory. Create a polymorphic method like “caluclatePay()” and have the employee subclasses provide their implementation of it. In the book he discusses burying that code deeper in the stack. In the Enterprise Java world, I find myself puting that logic into Hibernate. In this case, the Employee has a type, COMMISSIONED, HOURLY and SALARIED. Typically what I do is have the Employee become abstract and in the Hibernate mapping, indicate the descriminator as maybe “employeeType”. Then I map all three employee types as their own subclass of employee (maybe using single table inheritence). I end up with three more classes CommisionedEmployee, HourlyEmployee and SalariedEmployee, but they should probably be pretty small classes. With this set up Hibernate does the dirty work previously done by the Abstract Factory described in Clean Code.

How to Name Your Interfaces

This always seems to be a hot button issues. I’m not entirely sure why. There is the more straightforward case of several up front implementations of an interface. For example in a bank scenario, you have Account and implementers of Account are maybe SavingsAccount, CheckingAccount etc. What seems to cause controversy is when initially there is only one implementation. As an example, maybe there is an Account Service. In Clean Code, the suggestion is having AccountService and then AccountServiceImpl. As a nice counter example, in Implementation Patterns by Kent Beck, I remember him taking the approach of IAccountService and AccountService. In general, I have found fierce opposition to prefixing interfaces with I. My personal preference is, if it can be named differently (such as Account above) it should be, if there’s only one, I prefer Kent Beck’s approach. To me, it seems like Impl is redundant and doesn’t tell me much. Every implementation of the interface is an “impl”. However, I have found that there are far more pressing issues in the code than whether or not to have Impl on the end of a class name.

Comments

After reading through the chapter on comments and giving it some thought, I agree with a lot of what he has to say. One quote in particular at first struck me as odd, “The proper use of comments is to compensate for our failure to express ourself in code. Note that I used the word failure. I meant it. Comments are always failures. ” But when I thought about it, it made a lot of sense. The code is what matters, it’s what is executed and in the end, comments are just decoration. We know the code has to be up to date, the comments are questionable. If I’m putting comments in to help understand the code, then the code is not very understandable on it’s own, and I should work to make it better. Several times he mentions that when too many comments appear, or the same comment appears too frequently, we automatically block it out. I agree with this and I find myself blocking out most comments most of the time. I think the reason why these noisy or redundant comments make their way into the code because of a culture of comments. Early on, we are taught that comments are good and we should write them. So when a code review happens, and there’s no comments on the methods of a class, there’s usually a “oh, there should be a comment here” response. It usually has nothing to do with the actual method and most of the time (myself included), we don’t ask, why there should be a comment there. Is it because the method name isn’t intention revealing? Is the parameter or return type ambiguous?

Comments (4)

Maven and Ivy

Recently I have been working a lot with Maven. The shop I’m working in now is mostly Maven, but previously I had used Ant and Ivy. To be honest, I’m not that big of a fan of either. I think that Ant is far too verbose, and it’s too difficult to write good build files that are reusable. I don’t like Maven much either. It seems to do most of what I want, but that last piece of what I want it to do seems to be a very long road. It also seems to have many bugs and is very low on documentation.  Maybe sometime soon Gradle will come to the rescue.  For this post though, I’m ignoring the build/deploy portions of Maven and am focusing on dependency management.  I’ll discuss some of my favorite features of Ivy and Maven dependency management.

latest.integration

Ivy has the concept of latest.SOMETHING in your Ivy config. This basically says find the latest version with the tag SOMETHING. So this could be “latest.integration” which would pull whatever the latest version of a library is. Now at first this seems like the SNAPSHOT concept in Maven, but this is very different. First, you’re not relying on 2.0.0-SNAPSHOT, it can be any version of the library. Another difference is when the link to the latest version of the library happens. In Maven, when you rely upon the SNAPSHOT build, that build can change because the artifact that is produced in the build has a relationship to the SNAPSHOT. In Ivy, once the build artifact is created, latest.integration changes to an actual version. So what I’ve done in the past is just increment a version number, so assume version 123 is the latest version of library B and I’m going to build version 22 of library A.  In program A’s Ivy file, I indicate I want latest.integration version of B.  Now, when I build verison 22 of library A, the artifact that is created has a dependency on version 123 of library B.  So let’s say several new versions of library B come out, but library A stays at version 22.  When program C depends on library A, it will automatically get version 123 of library B.

“latest.integration” is good for early development of a library, but not the greatest long term strategy. The integration part of this can be changed for other configurations. I have also created milestone builds that tended to be more stable, users of the library simply needed to use latest.milestone. With this, whatever build is the latest version with the milestone tag is pulled in.

Dependency Math

Maven never seems to pick the version of the library I want it to pick if there is a conflict. I know the algorithm pulls the version of the software that is in the “nearest” pom.xml.  In many situations, I find that this isn’t what I want.  Ivy approaches this differently.  By default, it resolves conflict by picking the latest version number.  Although there can be problems with this approach, it seems that the Maven solution to the problem caused me a lot more grief.  Although I’ve not needed it, I thought it was interesting that Ivy allows other conflict resolution strategies.

Module Configurations

Another nice feature of Ivy is being able to define configurations for the modules. Basically this lets the publisher of modules define different dependencies based on configuration settings.  Examples of this would be having a library that supports more than one XML parser. Maybe there’s a version that uses basic dom4j and another that supports a higher level framework.  As a user of the library, I would indicate a dependency on that module, with the “dom4j” configuration (this can be named anything).  These configurations are published as part of the ivy module definition and the user of the library doesn’t need to know anything about the internals of the library or it’s dependencies.  To do this in Maven, the only way that I know how, is to explicitly exclude the libraries that I don’t want. This is less than ideal, because now the user of the library needs to know internal details of the library.  How do I know which dependencies to exclude?  Will it work if I exclude dependency X?

Central Repository

Although the central Maven repository can cause issues from time to time, it still saves a lot of time over Ivy. Ivy does have the ability to pull dependencies from the central Maven repository (it converts the Maven metadata to something Ivy can use), but I have not used it. Usually what I have done is create an enterprise repository and just populate everything in there that is needed there by hand (see below).

Repository Administrator UIs

This is something I frequently found myself wanting in the Ivy world. There are artifact administrator UIs in the Maven world, such as Nexus that make maintenance of the repository very easy. You can do things like proxy remote repositories such as the maven central repository, and it’s easy to upload an artifact into a local repository. The initial setup and the on-going maintenance of an Ivy repository is probably the most painful aspect of Ivy.  It’s all done by hand, without any slick web based tools and the files are stored in a normal file system hierarchy, which means no (easy) searching, like you would get with a Nexus type of UI.

Comments (1)