I’m sorry if this is a ABSOLUTELY sophomoric question, but I’m curious what the best practices are out there, and I can’t seem to find a good answer on Google.
In Python, I usually use an empty class as a super-catchall data structure container (sort of like a JSON file), and add attributes along the way:
class DataObj:
"Catch-all data object"
def __init__(self):
pass
def processData(inputs):
data = DataObj()
data.a = 1
data.b = "sym"
data.c = [2,5,2,1]
This gives me a tremendous amount of flexibility, because the container object can essentially store anything. So if new requirements crop up, I’ll just add it as another attribute to the DataObj object (which I pass around in my code).
However, recently it has been impressed upon me (by FP programmers) that this is an awful practice, because it makes it very hard to read the code. One has to go through all the code to figure out what attributes DataObj actually has.
Question: How can I rewrite this for greater maintainability without sacrificing flexibility?
Are there any ideas from functional programming that I can adopt?
I’m looking for best-practices out there.
Note: one idea is to pre-initialize the class with all the attributes that one expects to encounter, e.g.
class DataObj:
"Catch-all data object"
def __init__(self):
data.a = 0
data.b = ""
data.c = []
def processData(inputs):
data = DataObj()
data.a = 1
data.b = "sym"
data.c = [2,5,2,1]
Is this actually a good idea? What if I don’t know what my attributes are a priori?
2
How can I rewrite this for greater maintainability without sacrificing flexibility?
You don’t. The flexibility is precisely what causes the problem. If any code anywhere may change what attributes an object has, maintainability is already in pieces. Ideally, every class has a set of attributes that’s set in stone after __init__
and the same for every instance. Not always possible or sensible, but it should the case whenever you don’t have really good reasons for avoiding it.
one idea is to pre-initialize the class with all the attributes that one expects to encounter
That’s not a good idea. Sure, then the attribute is there, but may have a bogus value, or even a valid one that covers up for code not assigning the value (or a misspelled one). AttributeError
is scary, but getting wrong results is worse. Default values in general are fine, but to choose a sensible default (and decide what is required) you need to know what the object is used for.
What if I don’t know what my attributes are a priori?
Then you’re screwed in any case and should use a dict or list instead of hardcoding attribute names. But I take it you meant “… at the time I write the container class”. Then the answer is: “You can edit files in lockstep, duh.” Need a new attribute? Add a frigging attribute to the container class. There’s more code using that class and it doesn’t need that attribute? Consider splitting things up in two separate classes (use mixins to stay DRY), so make it optional if it makes sense.
If you’re afraid of writing repetive container classes: Apply metaprogramming judiciously, or use collections.namedtuple
if you don’t need to mutate the members after creation (your FP buddies would be pleased).
You could always use Alex Martelli’s Bunch class. In your case:
class DataObj:
"Catch-all data object"
def __init__(self, **kwds):
self.__dict__.update(kwds)
def processData(inputs):
data = DataObj(a=1, b="sym", c=[2,5,2,1])
That way, at least it’s clear for the reader that data is just a dumb data store and one can see immediately which values are stored uner what name, since it all happens in one line.
And yes, doing things this way is in fact a good idea, sometimes.
I would likely use the second approach, possibly using None
to indicate invalid data. It is true that it is difficult to read/maintain if you add attributes later. However, more information on the purpose of this class/object would give insight as to why the first idea is a bad design: where would you ever have a completely empty class with no methods or default data? Why wouldn’t you know what attributes the class has?
It’s possible that processData
might be better as a method (process_data
to follow python naming conventions), since it acts upon the class. Given the example, it looks like it might be better as a data structure (where a dict
may suffice).
Given a real example, you might consider taking the question to CodeReview, where they could help to refactor the code.