In Django I have a table that contains records that need to be processed by celery workers.
The way this works is that the worker will grab the next X number of rows from the table in a batch to process, and then mark them processed. When using one worker this has been fine, but the amount of rows as increased and I would like to add more workers.
My concern is that the other workers will grab the same rows and try to process them.
That being said I have implemented the following logic.
from django.db import models
class Row(models.Model):
# Define your row fields here
processed = models.BooleanField(default=False)
worker = models.ForeignKey('Worker', on_delete=models.SET_NULL, null=True)
class Worker(models.Model):
# Define your worker fields here
name = models.CharField(max_length=100)
def distribute_rows_to_workers():
workers = Worker.objects.all()
batch_size = 10
for worker in workers:
with transaction.atomic():
rows = Row.objects.select_for_update().filter(processed=False)[:batch_size]
if rows:
rows.update(worker=worker)
def process_rows_for_worker(worker):
rows = Row.objects.filter(worker=worker, processed=False)
for row in rows:
# Process the row here
row.processed = True
row.save()
What I want to konw is, will that transaction
block prevent the other workers from selecting those rows until the select and subsequent update is finished? Esentially locking those rows so that other workers wont perform the select request until the upodate is finished.