Scaling background jobs

Most systems often have recurring background jobs that execute at regular intervals to perform database maintenance tasks, importing data from other systems, caching data, newsletters, etc. When you have a multi tenant platform, you usually have these recurring jobs per tenant, which sometimes means hundreds or thousands of concurrent jobs. Some jobs may even be long-running jobs, while others could be very cpu intensive jobs.

Lots of open source schedulers allow you to configure (dynamically or not) when your jobs run, but often they have to run in-proc, as in, in the scheduler’s process by implementing an IJob interface to execute the job’s code.

public interface IJob
    Task Execute(JobRequest job);

This is required so that the scheduler can measure total execution time and ensure the interval between executions is respected, but also to know whether the job has completed or failed in order to retry it. Scaling schedulers often requires distributed locks to synchronize executions, splitting jobs amongst multiple scheduler instances, while at the same time, executing the actual jobs. When you have thousands of concurrent jobs this is a serious problem, as you often end up with hundreds of threads in your scheduler processes and/or massive cpu/memory usage causing a lot of strain in the scheduler with some jobs affecting the stability of others. This happens because you’re mixing two different things: scheduling vs processing of the jobs.

In order to achieve higher scalability you need to offload jobs and run them outside the scheduler. The most common solution is to use message queues, where you tipically only need to worry about scaling your message handlers by simply adding more machines/processes. If your queue becomes a bottle neck you can check whether your queuing technology supports partitioning, and if it doesn’t you can use multiple queues (perhaps one per tenant or per group of tenants).

The scheduler still needs to be notified about job completions in order to be able to ensure proper intervals and avoid concurrent executions of the same job. If your scheduler has an API where you can report job completion status from the message handlers, great. Otherwise, don’t despair.. it’s relatively easy to implement one using messaging and using a jobId as correlation, and have your scheduler’s IJob await until the actual job completes.

Continue reading


Reliable Redis Request-Reply using service contracts in .NET

In my previous post I wrote about ServiceProxy and how easy it should be to integrate with messaging frameworks and do request/reply with service contracts even if your messaging framework has no support for it.

If you’re not familiar with Redis, you should take some time and get acquainted with it. There is a lot of material on Redis website to get you started, and you’ll be amazed with how Redis helps you solve complex problems in a simple, easy and performant way. There are many use cases for Redis, and one is certainly messaging. Redis supports queues and pub/sub, which are the messaging capabilities that probably most common systems will ever need.

Request-Reply can be done with Redis in many ways: queues with or without blocking primitives, pub/sub or a combination of pub/sub and queues.


ServiceProxy.Redis is an implementation of ServiceProxy using Redis and the Booksleeve Redis client.

Update (07 April 2014): ServiceProxy.Redis now uses the StackExchange.Redis library, Booksleeve’s successor. The source code changes can be found here.

ServiceProxy.Redis uses an asynchronous and reliable request-reply pattern using Queues. There are two ways of dequeuing items in Redis: non-blocking and blocking. The non-blocking operation returns immediately whether there is an item to be dequeued or not, returning null in the latter. The blocking operation is used together with a timeout parameter and Redis will block the client’s connection when the queue is empty, until an item can be dequeued or the timeout time is reached. When multiple connections are blocked on the same queue, Redis will do fair queueing among them, so load balancing is trivially simple to do. Since the connection to Redis can be blocked when dequeuing, it is recommended to use a different connection for enqueuing and/or issueing other non-blocking commands. The same principle applies when doing pubsub (one connection to subscribe channels, another to publish and/or issue other commands).

Continue reading

Asynchronous request-reply messaging using service contracts and ServiceProxy

Typically messaging frameworks support the request/reply pattern by sending messages between client and server, using a correlation id to map response callbacks and the client’s address or inbound channel to know whom the response should be sent to. There is usually a Send mehod that takes a message in and a Reply method that takes a response message and the address/channel of the client. Other frameworks simply rely on queues and have no concept of responses at all, so you have to define queues for the client and tell the server to drop the response message there. This works well for message oriented architectures with very well defined messages.

When you have a lot of different requests/responses you’ll have to create a lot of classes to represent your messages and a lot of message handlers to handle them and give responses back. This model can lead to a very bloated architecture if you start creating messages like GetCustomerByName, GetCustomerById, ListCustomers, UpdateCustomer and such, or create a generic CustomerQueryMessage with lots of properties and one single message handler. In these cases it is much cleaner to interact between your application components using service interfaces and group related operations.

Enter ServiceProxy

ServiceProxy is a lightweight asynchronous proxy for .NET that allows you to use service contracts in a request/reply manner with any messaging framework. It is open source and is the result of my past work with service oriented architectures using high performance messaging frameworks that had no support for services (eg: ZeroMQ, Redis, etc).

Continue reading