Working on iText®

2011 Brought me a new challenge. Since February  I’ve been working on the iText® library, a library for creating PDF with Java (or C# ). This is quite a change, instead of working on products/sites/applications at the client sites, I’m now coding in-house in corporation with the founder of iText® and another colleague of mine.

Since iText seems to be the leading PDF library for Java this is an interesting project. Upon entering the project I introduced maven in it. ( I couldn’t resist the urge to mavenize)
Continue reading “Working on iText®”

DataGrids and Infinispan

Data Grids & Infinispan

a (partner)talk by Manik Surtani, lead and founder of the Infinispan project at jboss

What are they: memory storage node, connected by network, be it udp or tcp and the illusion of a lot of memory. Datagrids supposed to be very fast since they are supposed to be in memory. Of course for this they need consistent hashing (for the keys) to allow fast retrieval and a certain degree of data locality (your data close to you so the transit tiime is short.

they should be easy to use, like auto recovery, auto node detection.

What’s the main purpose of data grids?
To make your rdbms go faster, give it some degree of fault tolerance and high availability.

Data grids are usually NoSQL (Not Only SQL ) solutions. And most nosql systems have the characteristics to have good scalability, high availability, fault tolerance. etc.. However not all of them have all of these.

Infinispan is an example of a data grid. It’s built with Java and Scala, it took some concepts from the Amazon Dynamo paper. It uses consistent hashing based distribution to locate stuff. Which is a very efficient and fast. And there is no single point of failure since it’s distributed. Infinispan uses MVCC locking. Which is a highly concurrent locking mechanism. It also support XA transactions through 2 phase commit cycle and deadlock detection. Currently under development is the Atomic broadcast mechanism where for they are working together with some research groups which should allow to contain consistency and reduce the network load with 50%.

the plan is to integrate map/reduce in Infinispan, part of it is already in the Infinispan trunk.
Through hibernate search or lucene you can even query Infinispan. In the future some sort of JPA/QL will be supported.

Infinispan also flushes stuff to disk to be able to recover after a crash. This a pluggable mechanism so one can create it’s own persistence mechanism.

Infinispan supports different eviction techniques. From fifo, lifo to other obscure new techniques.

The API is fairly simple it’s akey value store hence it’s like a map. there are talks about other high level API’s to allow Infinispan to be used with other languages.
Every method to talk to Infinispan have an async equivalent which return a java.util.concurrent.Future

Manik mentions that if you happen to be using jboss cache or ehcache configuring Infinispan happens the same way. i wonder, who the hell is still using JBoss Cache…

Infinispan supports REST, Memcached (text based protocol, nice commercial move since memcahced is widely used in different languages) and Hot Rod(wtf?) as communication protocol. It uses netty for creating server sockets, (netty is a nio server framework).

Hot Rod is a wire protocol for client server communication. Developed for Infinispan itself. It’s memchaced like but it is binary and not text based. And it’s a 2-way protocol, meaning that both client and server talk to each other. Hot Rod clients have built in fail-over, load-balancing and smart routing. The server can share the consistent hashes algorithms to allow a client to pick a server that is closer by or will already contain the key of the data so that one server does not have to reroute to the server that should actually contain the data. The client can already decide to send to the right server.

the first version of Infinispan is version 4.0.0, Manik does not want to bore the audience with why that is. But I’ll tell you. JBossCache was his previous project and went to 3.x.x. Infinispan is actually the predecessor of JBossCache which is a tree cache (resulting in lot of problems) so they started almost from scratch for Infinispan.

Nice note is that on network failure between couple nodes Infinispan does not know what to do but you can write your own callback handlers for that 🙂

Literal Testing – Justifying API’s

Literal testing

a BOF I went to listen handled about how one should create user documentation for a framework or API. The speaker (Peter Arrenbrecht)  had an interesting view on that: “For a certain use case, first write a tutorial and use a lot of examples, then from these examples build your API. This will result in an API with more understandable methods and classnames for your API.” To elaborate a bit. Because you write your tutorial with real examples your API comes forth of the tutorial which forces you to think about the way the API is build.

There are 2 approaches in this, code first and prose first.

in code first you write use case oriented tests which can be used as example in  tutorial. With a tool like bumblebee you can transform your usecase oriented tests javadoc to a tutorial document.

In prose first you write the documentation and with e.g. JCite you mark parts in your tests through annotations and when running JCite it will pick up the parts and put them in your documentation.

JCite is actually good for that. because when you start refactoring stuff in your code (hence also in your tests) and parse it again with JCite it will warn you about things that changed, somehow forcing you to take a look at the documentation of the parts you refactored/changed and triggering your brain to see if the change made sense.

see this post for links