Thoughts on using Azure Service Bus Queues
Recently we switched from using the Azure Queues as part of the standard storage account to using Windows Azure Service Bus.
The primary motivation behind this was:
- In-built dead letter queues
- Ease of complete and abandon operations
- The opportunity to use topics and subscriptions in the future
The initial transition was fairly straight forward with the latest Service Bus Nuget package being simple to use.
Worth a mention is an excellent tool developed by Paolo Salvatori called Azure Service Bus Explorer 2.0 which allows you to monitor, profile and fix Service Bus Queues.
We ended up using a DataContractSerializer to store our objects within the brokered message body (The disadvantage is having to use the DataContract, DataMember and EnumMember attributes everywhere; though to be honest this does clearly indicate which classes are used by the Service Bus) as shown in this Stack Overflow question.
Upon extensive load testing; two issues came to light:
Concurrent Connection Limit
Windows Azure Service Bus allows you to have a maximum of 100 open concurrent connections at any one time as highlighted by Abhishek Lal in this Stack answer. We use Castle Windsor as our IoC container with the default lifestyle of an object set to PerWebRequst; this means that rather than reusing the same underlying connection we were creating a new one for each request received by our Api.
The resolution was to change the lifestyle to a Singleton; the same Stack answer indicates that we can safely use a Singleton as the MessagingFactory manages connections internally.
LockLostException
As part of our migration to the Service Bus we implemented granular logging of which users had been processed relating to a given message.
The aim is to ensure that no matter what happens during the processing of a queue message; if the message was not marked as complete we could reprocess it without fear of sending duplicate notifications.
The fundamental workflow is:
- Using PeekAndLock; get the BrokeredMessage from the Service Bus Queue and lock it for 5 minutes
- For each user relating to the message - Check if user is already processed
- If processed already; skip this user
- Else process the user
- Log the user as processed
- Mark BrokeredMessage as Complete
As this processes each user in turn whilst the message is locked; the more users that are related to a message, the greater the chance that it will take over 5 minutes to process.
The maximum lock time when using PeekAndLock with a Service Bus Queue is 5 minutes; if you exceed that then when attempting to either Abandon or Complete the message, an exception will be thrown and the operation will not complete.
OK; so a message overruns it’s 5 minute limit, we call Complete, catch the exception and prepare to reprocess it. This isn’t a total failure, as we have such fine grained logging of who has been processed that we just loop through them again, skipping each recipient in turn.
However, for sufficiently large messages, just checking whether each recipient has been processed may take over 5 minutes; causing our workers to loop through the same message until they hit a maximumDeliveryCount (We set this to 3; upon the 4th delivery the message is sent to the dead letter queue for debugging).
There are 3 ways around this
- Use a set based operation to filter out users that have already been processed at the start
- Use the RenewLock method on a BrokeredMessage periodically to ensure we maintain a lock on the message
- Split the processing logic so each BrokeredMessage represents a single recipient rather than a collection of recipients
The first option means that we may lose some of the essential granularity we’ve introduced to our messaging framework.
The second would be the easiest; ensuring we continue to maintain a lock long after the 5 minute limit. However the message taking 5 minutes to process is almost certainly a smell of a greater issue.
Whilst the third option makes a lot of sense to me; instead of using a single BrokeredMessage for a set of recipients; each recipient receives their own message; which can be processed individually without fear of losing the lock. It would also introduce better scalability to our architecture allowing us to identify load more accurately.
If using the Service Bus to process large messages like this; it is definitely something to bear in mind.
Miscellaneous Stack Overflow Questions
I asked a lot of questions during this work and, as always, Stack was invaluable: