As the title indicates, when I try to order by my custom through table, the field now shows duplicates. It seems to be doing a left outer join, displaying all fields on both sides instead of showing only unique values, and distinct() seems to do nothing here. I’m using sqlite3 and distinct(‘field-name’) isn’t supported with sqlite3. Here’s a dumbed-down summary of my code, showing only relevant fields:
# Models
class DataRepository(Base):
class Meta:
ordering = ('-updated_at',)
name = models.CharField(max_length=100, unique=True)
class Dataset(models.Model):
class Meta:
ordering = ('-updated_at',)
name = models.CharField(max_length=100, unique=True) # Dataset name must be unique across BrainSTEM
datarepositories = models.ManyToManyField(DataRepository, through="DatasetDatarepository", blank=True) # Data repositories where the data from this dataset is stored. Create private data repositories [here]
class DatasetDatarepository(models.Model):
class Meta:
ordering = ('order',)
unique_together = (('dataset', 'datarepository'),)
dataset = models.ForeignKey(Dataset, on_delete=models.CASCADE)
datarepository = models.ForeignKey(DataRepository, on_delete=models.CASCADE)
order = models.PositiveIntegerField(null=True, blank=True, default=0)
# Admin
class DatasetAdmin(admin.ModelAdmin):
fieldsets = (
(None, {'fields': (
'name',
'datarepositories',
)}),
)
form = DatasetForm
# Form
class DatasetForm(forms.ModelForm):
class Meta:
model = Dataset
fields = [
"name",
"datarepositories",
]
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
if self.fields.get('datarepositories', None):
datarepositories = self.fields.get("datarepositories")
dr = DataRepository.objects.none()
projects = self.instance.projects.all()
for project in projects:
for group in get_groups_with_perms(project):
dr |= get_objects_for_group(group, 'change_datarepository', DataRepository, accept_global_perms=False)
datarepositories.queryset = dr.order_by('datasetdatarepository')
That very last line, when I comment it out, it won’t show duplicates. Otherwise, it shows duplicates. And it’s probably because Django doesn’t know whether an associate exists or not, so it does a left outer join. The problem with this is that it pulls in every instance of a datarepository field, which could be associated with different datasets, and it doesn’t know which order to choose.
Is there a way to make this queryset work like this?
- Get every datasetrepository that meets the criteria in the form’s init (all the ones that are associated with projects whose groups have permissions to change them)
- From that list, check to see if any of the ones from that list are already associated with this dataset, and if so, order by their ‘order’ value in the through table without showing duplicates.
I saw a post here that was similar, and they used annotate to get the max value of a field. I tried that like this:
DataRepository.object.annotate(max_order=Max('datasetdatarepository__order')).order_by('max_order')
This doesn’t work because it’s not guaranteed to get the right datarepository field (the one associated with this dataset). Now it just takes whatever one has the highest value. And I need to be associated with the current dataset or else not be included.
Also, it needs to return a queryset, so I can’t convert this to some other kind of object.