Protocol Buffers – Missing Usage Guide?

By libor

Last week Google released code called Protocol Buffers (PB) under open source license. Code essentially enables hierarchical data serialization base on some IDL like definition to binary form with advantage to be cross languages (presently Python, Java and C++) and cross OS capable.
 Not long after code release appeared many articles discussing code capability and (dis)advantages compare to XML. The big wave of reaction was also spurred by suggested use of binary message together with RPC method call.
Among posts on this topic were such highly respectable people like Ted Neward, Stefan Tilkov, Steve Vinosky here and here or Dare Obasanjo. Unfortunately those “big shooters” failed short with clean recommendation what part of solution (if anything) to use and under which conditions.
IMHO list shall their list look something like following:
Use it:

  1. Only for internal cross-service communication. Avoid expose it to client facing API.
  2. You need to handle high volume and/or low latency traffic and message parsing and size is high on your list of things to care.
  3. Underlying protocol is TCP, message queuing or similar (HTTP text nature definitely negates gains from binary here).
  4. Size of message or processing speed does not justify use of compression on more general text form like XML.
  5. If multiple languages are used on cross communicated services and you don’t need general and heavy format like IDL (i.e. WSDL or similar).

Not recommended:

  1. For client facing API.
  2. When you need document exchange rather than specialized business/processing data processing (i.e. document source which is transformed in some/several stages to XHTML presentation – invoice type data).
  3. When is used HTTP based communication protocol.
  4. When you use post data processing with standardized processing data tools (i.e. XSLT, etc.).

Questionable:

  1. Suggested use of RPC as offered by abstract classes in the PB. Supplied interface without reference implementation might not be good guidance for uses as example for proper implementation on different communication protocol (i.e. Danger of easy fall into RPC leaky solution).
  2. Use as storage data format. Main concern here is limited capability from DB point of view to understand message content which limits use data in WHERE part of queries.

I don’t think PB is dangerous for XML, REST, EDA, SOA or whatever else. It only shows GOOGLES pragmatic approach to the technology use. Is nice to be XML fan but who will pay bills for servers, electricity and maintenance? If you control environment from end to end then you can allow this kind of solution very easily as you can ensure consistency and management of definitions here. Moreover is not a problem generate out of “proto” file some form of XML schema and processor code to enable exchange data in this format.
 Right tool for right job is the key.

-Libor

Tags: , , ,

4 Responses to “Protocol Buffers – Missing Usage Guide?”

  1. links for 2008-07-15 « Breyten’s Dev Blog Says:

    [...] Protocol Buffers ? Missing Usage Guide? " Libor.SOUCEK("WEBLog&quot (tags: distributed-computing google) [...]

  2. bmadigan Says:

    How does HTTP negate performance gains of binary format vs character format? You generally read the content as a byte stream, so setting the content-type and content-encoding properly should prevent any problems.

  3. libor Says:

    HTTP is stateless protocol. Each request might start with making/opening TCP connection and than sending request to the server side.

    So you have perf. degradation right there.

    Additionally you have to parse text based HTML header to know binary data are send over.

    I agree for the most cases this probably does not matter much but in real-time or large number transaction systems (i.e. GOOGLE like case) even such overheads will magnify total results.

  4. estinaKet Says:

    Super site:D will definitely visit soon,,

Leave a Reply