MessagePack vs JSON in Ruby
MessagePack (shorthand: msgpack) gives us a big performance boost when serializing data to our data store. JSON is the reigning champ for data serialization protocols on the web because it is easy to use, nearly universally supported, human readable, relatively efficient, and compresses very well. I love JSON and we use it extensively at MyDrive. But, there is a use case where we get a big win from using MessagePack instead.
We store a lot of time series data into Cassandra and that data is about 16 KB per time slice when encoded as JSON. That’s a pretty hefty chunk of data at the rate at which we write it. The size is less of a concern though than the time it takes to serialize it and deserialize it, because we often deserialize hundreds of JSON blobs at once. Deserializing alone takes us about 27% of the job time for this particular stage of our work flow. That 27% is on average a 1/2 second with MessagePack, for reference. We use JSON over the wire in most places, but to the data store we write MessagePack.
We made the decision to use MessagePack some time ago. But, I recently switched a lot of our current JSON code over to use Peter Ohler’s optimized JSON library for Ruby, Oj. It’s really fast: nearly 8x the performance of YAJL for our over-the-wire data.
I thought maybe now it would be faster than MessagePack for the data store also, and that perhaps we should look at switching. Some testing proved that this was not correct: MessagePack still outperforms JSON deserializing in Ruby, even using this excellent and highly optimized library. It’s still 2x the speed serializing and 1.6x when deserializing. Keep in mind that this is about a 16 KB document in JSON and 14 KB in MessagePack.
Encoding | Size in Bytes |
---|---|
JSON | 16402 |
MessagePack | 14063 |
When testing performance in Ruby I often reach for pry
, and benchmark
. I
test things out directly in the REPL. Doing so, I ran some numbers. I use the
Benchmark.bmbm
method here to try to rule out garbage collection timing
interference and other factors. You can read about it
here
if you want to know more. Benchmarks were on Ruby 1.9.3.
I did this a number of times but here are some exemplary numbers on my 2011 MacBook Air:
Deserializing
That ‘real’ number in the bottom right is the one we care about most. Doing 1000 calls took 0.408 seconds with MessagePack and 0.681 seconds with Oj, a 1.6x improvement.
Serializing
So 0.074 seconds for MessagePack and 0.156 seconds for Oj. That’s over a 2x improvement in speed.
Compression
There is one place where JSON is the clear winner with our data, though. JSON gzips very well and when compressed is noticeably smaller than the MessagePack data. Note that gzipped here means the normal gzip mode, not highest compression.
Encoding | Size in Bytes when gzipped |
---|---|
JSON | 1077 |
MessagePack | 1171 |
Conclusion
Oj is a very good and fast JSON library and if you’re using Ruby you should consider it. JSON is great over the wire, and if we were more concerned with space than deserializing speed, we’d be using JSON. But for performance, MessagePack has it, hands down and that is why we use it for serializing to our data store.