I am building a web application tutorial using Pony ORM. My Python is working, but feels very clunky: in particular, I’m managing date conversion by hand after fetching records, and am doing concatenate-then-split string operations to handle a one-to-many relationship where I would naturally use a JSON field. DB schema, Pony entities, my query, and my conversion code are below; I would be very grateful for advice on how to simplify this.
Database Schema
This database keeps track of which staff members have performed which experiments (many-to-many), which experimental plates are involved in which experiments (many-to-one), and which of those plates have been invalidated (optional one-to-one).
-- staff members: staff_id is primary key
CREATE TABLE staff (
staff_id BIGINT,
personal TEXT,
family TEXT
);
-- experiments: sample_id is primary key
CREATE TABLE experiment (
sample_id BIGINT,
kind TEXT,
start TEXT,
"end" TEXT -- may be NULL if the experiment is ongoing
);
-- join table showing which staff were involved in which experiment
-- there is at least one staff member associated with every experiment
CREATE TABLE performed (
staff_id BIGINT,
sample_id BIGINT
);
-- plate_id is primary key; plate-to-experiment is many to one
- there is at least one plate associated with each experiment
CREATE TABLE plate (
plate_id BIGINT,
sample_id BIGINT,
date TEXT,
filename TEXT
);
-- invalidated plates (along with who invalidated the plate and when)
-- this table only contains records for plates that have been invalidated,
-- and contains at most one such record for each plate
CREATE TABLE invalidated (
plate_id BIGINT,
staff_id BIGINT,
date TEXT
);
Entity Classes
These are straightforward given the table definitions above.
class Staff(DB.Entity):
staff_id = orm.PrimaryKey(int)
personal = orm.Required(str)
family = orm.Required(str)
performed = orm.Set("Performed")
invalidated = orm.Set("Invalidated")
class Experiment(DB.Entity):
sample_id = orm.PrimaryKey(int)
kind = orm.Required(str)
start = orm.Required(str)
end = orm.Optional(str)
performed = orm.Set("Performed")
plate = orm.Set("Plate")
class Performed(DB.Entity):
staff_id = orm.Required(Staff)
sample_id = orm.Required(Experiment)
orm.PrimaryKey(staff_id, sample_id)
class Plate(DB.Entity):
plate_id = orm.PrimaryKey(int)
sample_id = orm.Required(Experiment)
date = orm.Required(date)
filename = orm.Required(str)
invalidated = orm.Set("Invalidated")
class Invalidated(DB.Entity):
plate_id = orm.Required(Plate)
staff_id = orm.Required(Staff)
date = orm.Required(date)
orm.PrimaryKey(plate_id, staff_id)
My Problematic Query
I want the following for each experiment:
- experiment ID (so that I can construct a hyperlink to the experiment’s page)
- start and end dates (again, the end date may be
None
if the experiment is ongoing) - whether or not the experiment has any invalidated plates
- the IDs and names of all staff involved in the experiment
What I have found is:
- The first three fields (ID, start date, and end date) are easy to get, but I have to convert the two date fields from text to Python
date
objects after retrieving the values: Pony has astr2date
function, but there doesn’t appear to be a way to use that inside a query. (Trying to do so produces error messages.) - I was expecting
any()
andall()
functions to go along with Pony’scount()
,sum()
, and other aggregation functions, but they don’t appear to exist and Pony rejects use of the built-in functions. I’m therefore usingcount() > 0
to answer the question, “Have any of the plates in this experiment been invalidated?” - Since several staff may be involved in a single experiment, I am constructing a string with each staff member’s ID, personal names, and family name, then using Pony’s
group_concat()
to join those to get a single string in the query result. I then post-process this (strip, split, and convert the ID back to an integer). I realize may actually be necessary because Pony doesn’t support nested fields (i.e., doesn’t seem to be able to create JSON results), but I’m hoping there’s a simpler way.
def experiment_details():
query = orm.select(
(
e.sample_id,
e.start,
e.end,
orm.count(e.plate.invalidated) != 0,
orm.group_concat(
f"<{s.staff_id}|{s.personal}|{s.family}>"
for s in Staff
if e.sample_id in s.performed.sample_id.sample_id
),
)
for e in Experiment
)
rows = list(query.order_by(lambda eid, start, end, inv, staff: eid))
reformatters = (_reformat_as_is, _reformat_date, _reformat_date, _reformat_as_is, _reformat_staff)
rows = [[f(r) for (f, r) in zip(reformatters, row)] for row in rows]
return rows
def _reformat_as_is(text):
"""Do not reformat (used for uniform zipping)."""
return text
def _reformat_date(text):
"""Reformat possibly-empty date."""
return None if text is None else str2date(text)
def _reformat_staff(text):
"""Convert concatenated staff information back to list of tuples."""
fields = text.lstrip("<").rstrip(">").split(">,<")
values = [f.split("|") for f in fields]
result = [(int(v[0]), v[1], v[2]) for v in values]
return result
My thanks in advance to anyone who can help make this simpler—my future students will be grateful as well. If you want the database and code in question, I am [email protected]
.