Last two weeks our operations team had to do daily restarts of a site that has between 800K to 1M hits per day. At a certain point, the webservers couldn’t take more connections, died and had to be restarted.
To detect the problem they tried to simulate the same behaviour on an a staging environment but we never succeeded in bringing those down. It took the team two weeks to simulate the exact same behaviour. Which in fact was a combination of a back end search service that performed a re-indexing operation and a bug in the client used to request data from that engine. ( A search request to the search engine happens with 7 on 10 of the hits)
At a certain point of the indexing the search engine started to spit out exceptions responses instead of the only expected result response. The application code that requests data from that search engine was recently changed to use a pool of JAXB Unmarshallers from javax.xml.bind.JAXBContext
instead of creating new unmarshallers on the fly.
As you may know, or may not know and then you know now, creating JAXB Unmarshallers is a rather expensive operation and the usage of them is not Thread safe. Hence a pool was created where unmarshaller can be requested from and released to after usage. And that’s was the little bugger!
The code as it was
Unmarshaller um = pool.borrowObject(); T unmarshalled = (T) um.unmarshal(inputStream); pool.returnObject(um); return unmarshalled; |
Can you see what’s wrong with the above code?
The unmarshaller cannot unmarshal the exception response, which is not an issue, since we don’t need that it’s handled somewhere else, and it can happen once in a while. Due to um.unmarshal throwing an exception the um object is not being released to it’s pool and the pools objects started to run out in the end as the pool size was reached. Each user request thread that requested a new object from the pool was put in wait mode. Which then resulted in thread starvation of the tomcat engines. (The bigger the pool size, the longer it would have taken the web server to crash. With an unlimited pool it would run out of memory in the end. )
The code should have been
Unmarshaller um = pool.borrowObject(); try { T unmarshalled = (T) um.unmarshal(inputStream); return unmarshalled; } finally { // No matter what always release to the pool if (null != um) { pool.returnObject(um); } } |
Some one had not thought that pooling through. And would I have reviewed that code 2 weeks ago I would have seen it, and the operations team wouldn’t have to get out of bed at night and in the weekend to restart the application! (Sorry OPS!) We can be lucky that that search engine spilled out a lot of exceptions in short period when it performed indexing operations. It sometime throws an exception on wrong queries or very strange search terms, but these happen on a so small basis that it would take a long time for the unmarshaller pool to get out of free pooled objects, and it would have been harder to find the bug.
Conclusion! Don’t be afraid to review each others code! It can save you time later and you can learn from each other. ( Or do Pair Programming !! 😀 )
Hi,
It may look a good idea to always return to the pool, but if the pooled object is in an invalid state, it’s maybe better to not return it to the pool.
As such, it may be better to catch the Exception and call pool.invalidateObject.
You should probably also let the pool grown when exhausted combined with setting a removeAbandonedTimeout to cleanup not returned objects to the pool.
Assuming the pool used is Apache GenericKeyedObjectPool
Hi Geert
I cannot recall what pool was in use 4 years back. I don’t work there anymore, so can’t check the source history. But it could have very well been a commons pool.
It is usually better to invalidate. However in this case it is arguable that it would not be the marshaller object itself that should be invalidated. The marshaller did it’s job right and can perfectly be reused.