Generate System Unique Identity Field on SOA Based System

By libor

When you build true distributed system (and SOA based distributed system are very popular these days…) often you need to get assigned system unique key which allows to do reference business object not only from machine point of view but also from human readable point of view as well as from point of view integration with limited capability of integrating external systems.

Typical example of human readable ID use is invoice number or electronic exchange order number. Moreover associated ID shall be from ergonomic point of view as small as possible in size to let user easier reading and handling manual intervention (i.e. user must solve some error issue personally, etc.).

There is several options how generate identity to be compliant with system “global” uniqueness. Nice list of possibilities is for example available in Martin Fowler’s Identity Field pattern section of Patterns of Enterprise Application Architecture book.   Surprisingly section is concerned with mapping object data to relational database only and no word is mentioned with relation to integration to third party systems (i.e. typical integration requirement from third party system is to get object data references as plain integers as external application does not count with compound key or strings, etc.) or human readable ID use.
 
I have found that non of commonly used choices (i.e. DB auto-increment, database counter, GUID table scan or “identity key range broker”, etc.) proved to be either performing, well scalable, human friendly or combination of those on distributed system.

Among the most problematic aspects of above listed allocation techniques I see following:

  • With DB source based identity fields (auto-increment, table scan, etc.) you are constrained to use same DB for all running services (i.e. if you need global unique number of course). This has profound effect on execution speed and system scalability.
  • With GUID you hope no collision is made when generating numbers and you might forgot about sorting according this field as it doesn’t give you much value (i.e. third party systems typically use ID in sorted order to load data ). On top of it all values are very long and similar looking therefore ergonomic user aspect is greatly reduced.
  • The “identity key range broker” (i.e. an broker in distributed architecture is one/more services dedicated to keep track of allocated numbers and distributing range of non-assigned keys to service who requests it) tends to be state full service. This imposes risk on performance, scalability and failover issues. Also unused keys are more difficult to collect and reuse later as broker disseminates ranges. 

In our trading application where users strongly demanded not only system wide uniqueness and performance (i.e. our application can use as much as 1000 unique ID’s per second per one service process!!!) but also ergonomic aspect of ID’s and use of “plain numbers” only (i.e. numbers are required to simplify data exchange with third party system which accepts only simple integer like values as data references).

Due those constrains I have to come up with different approach to generate unique ID’s then above mentioned cases.

Main idea for unique system wide ID is use sort of compound key represented as single integer value to user and third party system.

Our electronic trading application is SOA based enterprise application where all running services are managed and monitored in dedicated framework environment with centralized service registration and monitoring facility. This design allows us use simply technique where each service has associated its own unique identity number during creating/adding it into system. Individual business services then manage their associated data and are responsible for unique data “numbering” in local service scope. Proper combination of those two identifications ID’s then give property of DB’s compound key which might be expressed in single integer/long type value.

Exact formula of generating global result ID is done logically as “string” concatenation of individual ID values as following:

Local Service Object Number Individual Service ID Number Service ID Number Length – 1

You have to choose right from beginning of system installation how many digits will always contains last part of global key (i.e. “Service ID Number Length” part of global ID). We have settled internally on one digit as sufficient value. This gives us convenient range as much as 9,999,999,999 individual services can be such way identified.

Because we are using .NET framework and our “global” number is defined as long data type we can allocate at maximum 9,223,372,036,854,775,807 number. Therefore if our maximum on service ID is used (i.e. 10 digits) then service is left with allocating of maximum 922,337,202 individual objects in it. When is used 1,000,000 ID’s a day this last 2.5 year before local numbering service space is exhausted.

Realistically you end on SOA system with something like maximum of 10000 services defined over lifetime of system. This gives you “room” for 92,233,720,368,547 individual objects allocated for that service. If you consider again that service will allocate 1,000,000 ID’s a day you will be fine for next 252 thousand!!! years without being single time when one ID is reused. This I think is more than enough.

How exactly will look “real” global ID is shown on following example. Say you have invoicing service with associated individual service number 14 and locally just created invoice number 52. Then global unique number for invoice will be concatenation of numbers in following order 52, 14, 1.  Invoice “global” number is therefore 52,141.

This algorithmic approach gives as smallest possible number used on system number 110 (i.e. if individual services as well as local service objects gets starts numbering from value 1).

As services are independent once they get its service ID the algorithm allows store service managed data individually. One might even freely select the best strategy for associated persistent storage per service and therefore properly address scalability, performance and throughput of entire system.

Although Martin Fowler  its book discourages from use meaningful ID’s on software system described algorithm offers some kind meaning of it for free. Contrary of Martin’s explanation it pays back during resolving errors with much more quick pinpoint error service just from parsing “global” ID value especially when system generates millions of global ID’s across many running services.

I think this algorithm can be easily adopted to “less compact” system if system design allows newly created services one time register on central place (system shall have dedicated registry service which tracks last used service ID and allocates new one every time some service asks for it). Registry of new service must be done either during deployment of service or prior first business process use.

I hope outlined algorithm will helps someone out there same way as it helped me.

- Libor

Leave a Reply