In this article, I will be diving into the guidelines and best practices in writing performant background jobs in your Rails application.
Firstly, when should we move our web transactions to be processed in the background? The below are the three criteria where we should use background jobs to process the transactions instead of having to perform it immediately and having the users to wait:-
- The transaction always takes more than your average response time to complete
- The transaction contact an external service over network
- The user does not care if the transaction is completed immediately
Now, let’s talk about how we can write safe, performant and reliable background jobs. There are basically few characteristics and best practices that we can follow:-
In programming, the term “idempotent” describes an operation that will produce the same output if executed once or multiple times. In our case, a background job must produce the same result (without side effects) regardless of how many times we run it.
Below is an example of non-idempotent background job that we can find in most Rails applications:-
class RegistrationMailJob < ActiveJob::Base queue_as :default def perform(user) UserMailer.signup_email(to: user).deliver end end
If we run the job above twice, we will be sending two welcome email, which is bad from a user perspective. When writing a background job, we must always assume that it’s possible for any given job to be run more than once as most background job processors cannot guarantee that any given jobs will not be run more than once. The workaround for this is to implement row-level database lock like shown below:-
# app/models/user.rb class User after_commit :send_signup_email def send_signup_email UserMailer.signup_email(self).deliver end end # app/jobs/registration_mail_job.rb class RegistrationMailJob < ActiveJob::Base queue_as :default def perform(user) UserMailer.signup_email(to: user).deliver end around_perform do |job, block| user = job.arguments.first user.with_lock do return if user.signup_email_sent if block.call user.update_attributes(signup_email_sent: true) else retry_now end end end
Referring to the example above, the around_perform block will prevent the following scenarios:-
- If the
RegistrationMailJobjob is enqueued more than once. The email will only be sent once. The first job will set
user.signup_email_sentto attribute to true, then the second job will exit after checking
- In a rare case where two jobs with the same user are executed at the same time, the with_lock block will block the second worker from executing the job until the first job is completed (once the first job is completed,
user.signup_email_sentwill be set to true and the second job will exit, as per the first point)
- If the deliver method fails, we’ll set up a job to retry.
Writing the smallest job possible
You should write your job to be as small as possible in terms of lines of code and execution time. When possible, instead of bulk processing a bunch of objects in a single job, try to split them into multiple jobs. The rules of thumb is, every job in a queue should have the same average execution time. There should not be any job that takes significantly longer time than others.
Set timeout aggressively
To prevent the worker from being stuck on a job with extraordinary long response time, we can set aggressive timeout. Also, since we are writing jobs that are idempotent as mentioned earlier, there is really no reason to have a long timeout as we can always retry the job without drawback.
Say no to job uniqueness
If you designed your job to be idempotent, you do not need to be concerned about the uniqueness of any job since an idempotent job can be executed infinite number of times without changing the output. On the other hand, you can achieve what you want by using throttling instead.
Proper error handling
You should always implement an error handler for any given job when an exception occurs. It is good enough to write jobs inside a database transaction or a database row-level lock. The point is, we do not want to leave transactions incomplete such as the transaction should either fail completely and do nothing, or succeed and all work should be completed.
Use red flag on problematic job
Oftentimes, some jobs will keep failing and will never complete. In such a case, you need to set up a flag so that we will be notified when a certain number of failures occur and further action can be taken.
For example, Sidekiq has a “retry” queue and by default, Sidekiq will retry your job for maximum 25 times before moving the job into the “dead” queue.
We should always use background job when services involving external network are been used, when the action need not to be completed immediately or when the action will take very long time to complete. Background jobs should always be idempotent in which we can run the job multiple times without breaking anything. We should breakdown our background jobs to as simple and small as possible instead of chunking everything into one job. In term of setting timeout, we should be take a more aggressive approach in setting timeout as it is better to fail fast than waiting for a very slow background job to response. Last but not least, we should have red flag to notify us on problematic job that have failed for more than certain amount of times.