Let me preface this by saying this is not my code nor my coworkers’ code. Years ago when our company was smaller, we had some projects we needed done that we did not have the capacity for, so they were outsourced. Now, I have nothing against outsourcing or contractors in general, but the codebase they produced is a mass of WTFs. That being said, it does (mostly) work, so I suppose it’s in the top 10% of outsourced projects I’ve seen.
As our company has grown, we’ve tried to take more of our development in house. This particular project landed in my lap so I’ve been going over it, cleaning it up, adding tests, etc etc.
There’s one pattern I see repeated a lot and it seems so mindblowingly awful that I wondered if maybe there is a reason and I just don’t see it. The pattern is an object with no public methods or members, just a public constructor that does all the work of the object.
For example, (the code is in Java, if that matters, but I hope this to be a more general question):
public class Foo {
private int bar;
private String baz;
public Foo(File f) {
execute(f);
}
private void execute(File f) {
// FTP the file to some hardcoded location,
// or parse the file and commit to the database, or whatever
}
}
If you’re wondering, this type of code is often called in the following manner:
for(File f : someListOfFiles) {
new Foo(f);
}
Now, I was taught long ago that instantiated objects in a loop is generally a bad idea, and that constructors should do a minimum of work. Looking at this code it looks like it would be better to drop the constructor and make execute
a public static method.
I did ask the contractor why it was done this way, and the response I got was “We can change it if you want”. Which was not really helpful.
Anyway, is there ever a reason to do something like this, in any programming language, or is this just another submission to the Daily WTF?
9
Ok, going down the list:
I was taught long ago that instantiated objects in a loop is generally a bad idea
Not in any languages I’ve used.
In C it is a good idea to declare your variables up front, but that is different from what you said. It might be slightly faster if you declare objects above the loop and reuse them, but there are plenty of languages where this increase in speed will be meaningless (and probably some compilers out there that do the optimization for you 🙂 ).
In general, if you need an object within a loop, create one.
constructors should do a minimum of work
Constructors should instantiate the fields of an object and do any other initialization necessary to make the object ready to use. This is generally means constructors are small, but there are scenarios where this would be a substantial amount of work.
is there ever a reason to do something like this, in any programming language, or is this just another submission to the Daily WTF?
It’s a submission to the Daily WTF. Granted, there are worse things you can do with code. The problem is that the author has a vast misunderstanding of what classes are and how to use them. Specifically, here’s what I see that is wrong with this code:
- Misuse of classes: The class is basically acting like a function. As you mentioned, it should either be replaced with a static function, or the function should just be implemented in the class that is calling it. Depends on what it does and where it is used.
- Performance overhead: Depending on the language, creating an object can be slower than calling a function.
- General confusion: It is generally confusing to the programmer how to use this code. Without seeing it used, no one would know how the author intended to use the code in question.
9
For me, it is a bit of a surprise when you get an object doing much work in the constructor, so just on that I’d recommend against it (principle of least surprise).
An attempt at a good example:
I’m not sure if this usage is an anti-pattern, but in C# I’ve seen code which was along the lines of (somewhat made up example):
using (ImpersonationContext administrativeContext = new ImpersonationContext(ADMIN_USER))
{
// Perform actions here under the administrator's security context
}
// We're back to "normal" privileges here
In this case, the ImpersonationContext
class does all of its work in the constructor and the Dispose method. It takes advantage of the C# using
statement, which is fairly well understood by C# developers, and thus guarantees that the setup which occurs in the constructor is rolled back once we’re outside of the using
statement, even if an exception occurs. This is probably the best usage I’ve seen for this (i.e. “work” in the constructor), although I’m still not sure I’d consider it “good”. In this case, it’s fairly self-explanatory, functional and succint.
An example of a good, and similar pattern would be the command pattern. You definitely don’t execute the logic in the constructor there, but it’s similar in the sense that the entire class is equivalent to a method invocation (including the parameters needed to invoke the method). In this way, method invocation information can be serialised or passed for later usage, which is immensely useful. Perhaps your contractors were attempting something similar, although I highly doubt it.
Comments on your example:
I would say it’s a very definite anti-pattern. It adds nothing, goes against what’s expected, and performs overly long processing in the constructor (FTP in the constructor? WTF indeed, although I guess people have been known to call web services in this fashion).
I’m struggling to find the quote, but Grady Booch’s book “Object Solutions” has an anecdote about a team of C developers transitioning to C++. Apparently they had been through training, and were told by management that they absolutely had to do things “object oriented”. Brought in to figure out what’s wrong, the author used a code metric tool, and found that the average number of methods per class was exactly 1, and were variations of the phrase “Do It”. I must say, I’ve never encountered this particular problem before, but apparently it can be a symptom of people being forced to create object oriented code, without really understanding how to do so, and why.
3
No.
If all the work is done in the constructor it means the object was never needed in the first place. Most likely, the contractor was merely using the object for modularity. In that case, a non-instantiated class with a static method would have gained the same benefit without creating any superfluous object instances.
There is only one case in which doing all the work in an object constructor is acceptable. That is when the purpose of the object is specifically to represent a long term unique identity, rather than perform work. In such a case, the object would be kept around for reasons other than performing meaningful work. For example, a singleton instance.
I’ve certainly created many classes that do a lot of work to calculate something in the constructor, but the object is immutable, so it makes the results of that computation available via properties, and then never does anything else. That’s normal immutability in action.
I think the weird thing here is to put code with side effects into a constructor. Building an object and throwing it away without accessing the properties or methods looks weird (but at least in this case it’s pretty obvious that something else is going on).
This would be better if the function was moved to a public method, and then have an instance of the class injected, then have the for loop call the method repeatedly.
Is there ever a reason to do all an object’s work in a constructor?
No.
- A constructor shouldn’t have any side-effects.
- Anything more than private field initialization should be viewed as a side-effect.
- A constructor with side-effects breaks the Single-Responsibilty-Principle (SRP) and runs contrary to the spirit of object-oriented-programming (OOP).
- A constructor should be light and should never fail.
- For instance, I always shudder when I see a try-catch block inside a constructor. A constructor should not throw exceptions or log errors.
One could reasonably question these guidelines and say, “But I don’t follow these rules, and my code works fine!”
To that, I would reply, “Well that may be true, until it isn’t.”
- Exceptions and errors inside of a constructor are very unexpected. Unless they are told to do so, future programmers will not be inclined to surround these constructor calls with defensive code.
- If anything fails in production, the generated stack trace may be difficult to parse. The top of the stack trace may point to the constructor call, but many things happen in the constructor, and it may not point to the actual LOC that failed.
- I’ve parsed many .NET stack traces where this was the case.
3
It seems to me that the person doing this was trying to hack around the limitation of Java that does not allow to pass functions around as first class citizens. Whenever you have to pass a function, you have to wrap it inside a class with a method like apply
or similar.
So I guess he just used a shortcut and used directly a class as a function replacement. Whenever you want to execute the function, you just instantiate the class.
It is definitely not a good pattern, at least because we are here trying to figure it out, but I guess it may be the least verbose way to do something similar to functional programming in Java.
2
For me the rule of thumb is not to do anything in the constructor, except for initialising member variables. The first reason for that is that it helps to follow SRP principle, as usually a lengthy initialization process is an indication that the class does more than it should do, and the initialization should be done somewhere outside of the class. The second reason is that this way only the necessary parameters are passed, so you create less coupled code. A constructor with complicated initialisation usually needs parameters which are used only to construct another object, but which are not used by the original class.
I’m going to go against the tide here — in languages where everything has to be in some class AND you can’t have a static class, you can run into the situation where you have an isolated bit of functionality that needs to be put somewhere and doesn’t fit with anything else. Your choices are a utility class that covers various unrelated bits of functionality, or a class that does basically just this one thing.
If you have just one bit like this, then your static class is your specific class. So, either you have a constructor that does all the work, or a single function that does all the work. The advantage of doing everything in the constructor is that it happens just once, it allows you to use private fields, properties and methods without worrying about them being reused or threading. If you have a constructor/method you may be tempteded to let the parameters be passed to the single method, but if you use private fields or proprties that introduces potential threading issues.
Even with a utility class, most people won’t think to have nested static classes (and that might not be possible either depending upon the language).
Basically this is a relatively understandable way of isolating the behavior from the rest of the system, within what is allowed by the language, while still allowing you to take advantage of as much of the language as possible.
Frankly I would have answered much as your contractor did, it’s a minor adjustment to turn it into two calls, there’s not all that much to recommend one over the other. What dis/advatanges there are, are probably more hypothetical than actual, why try to justify the decision when it doesn’t matter and it probably wasn’t so much decided as just the default action?
There are patterns common in languages where functions and classes are much more the same thing, like Python, where one uses an object basically to be function with too many side-effects or with multiple named objects returned.
In this case, the bulk, if not all of the work may be done in the constructor, as the object itself is just a wrapper around a single bundle of work, and there is no good reason to stuff it into a separate function. Something like
var parser = XMLParser()
var x = parser.parse(string)
string a = x.LookupTag(s)
is really no better than
var x = ParsedXML(string)
string a = x.LookupTag(s)
Instead of directly having the side effects outright, you can call the object to enforce the side-effect on cue, which gives you better visibility. If I am going to have a bad side effect on an object, I can do all my work and then get passed the object I will badly effect and do my damage to it. Identifying the object and isolating the call this way is clearer than passing multiple such objects into a function at once.
x.UpdateView(v)
You can make the multiple return values through accessors on the object. All the work is done, but I want the stuff all in context, and I don’t really want to pass multiple references into my function.
var x = new ComputeStatistics(inputFile)
int max = x.MaxOptions;
int mean = x.AverageOption;
int sd = x.StandardDeviation;
int k = x.Kurtosis;
...
So as a joint Python/C# programmer, this does not seem all that weird to me.
1
One case comes to mind where the constructor/destructor does all the work:
Locks. You normally never interact with a lock during it’s lifetime. C#’s using architecture is good for handling such cases without explicitly referring to the destructor. Not all locks are implemented this way, though.
I have also written what amounted to a constructor-only bit of code to patch a runtime library bug. (Actually fixing the code would have caused other problems.) There was no destructor because there was no need to undo the patch.
2
This sounds like a pattern I actually use. No idea if it is a good idea, but it works for me. Code example in C++ (struct
means everything is public).
// Throws Blqh on failure.
struct Connection()
{
using Socket = int;
Connection()
: _s{ connect() }
{}
const Socket _s;
private:
static Socket connect()
{
// do stuff and return an open socket
// or throw an exception
// but clean up before that
}
}
Advantages:
- terse (code is a liability)
- should appeal to the functional folks
- C++ permits RAII, which makes ctors and dtors much much more useful than in a garbage-collected environment
Disadvantages:
- other programmers do not expect the lack of get/setters, the public fields (which we can safely have because of
const
& all the work being done in the ctor)
One use case: It is nice to have immutable objects. But in practice, I create an object using a constructor, then modify it until it is exactly what I want it to be, and then I don’t want it to change anymore. So I want the object to change from mutable to immutable. Good luck.
That can be fixed if the constructor does all the work, so there will be no need ever to modify the object once it is constructed. Of course that means more parameters and more work for the constructor.
1
One reason I can think to do this is to work around deficiencies in the programming language.
Java in particular lacks parameter pass by reference and only has a single return value. So if you want to return multiple values from a function then you have to return an object.
So what do you do when you want to return more than one value? well you create a class to represent the returned data.
If you have a class whose role is purely to store the results of a given function, it’s an easy step from there to figuring that you may as well just put the code in the constructor of the class that is used to store the results.
I agree with AndyBursh. It seems some lack of knowledge issue.
However it’s important to mention frameworks like NHibernate which require you just to code in constructor (when creating map classes). However this is not a limitation of the framework, it’s just what you need.
When it comes to reflection, the constructor it’s the only method which will always exist (although it might be private) and this might be helpful if you have to call a method of a class dynamically.
“I was taught long ago that instantiated objects in a loop is
generally a bad idea”
Yes, may be you are recalling why we need the StringBuilder.
There are two schools of Java.
The builder was promoted by the first one, which teaches us to reuse (because sharing = identity to efficiency). However, in the beginning of 2000 I started to read that creating objects in Java is so cheap (as opposed to C++) and GC is so efficient that reuse is worthless and the only thing that matters is productivity of programmer. For productivity, he must write manageable code, which means modularization (aka narrowing the scope). So,
this is bad
byte[] array = new byte[kilos or megas]
for_loop:
use(array)
this is good.
for_loop:
byte[] array = new byte[kilos or megas]
use(array)
I asked this question when forum.java.sun existed, and people agreed that they would prefer declaring the array as narrow as possible.
Modern java programmer never hesitates to declare a new class or create an object. He is encouraged to do so. Java gives all the reason to do so, with its idea that “everything is an object” and cheap creation.
Furthermore, modern enterprise programming style is, when you need to execute an action, you create a special class (or even better, a framework) that will execute that simple action. Good Java IDEs, that help here. They generate tons of crap automatically, as breathing.
So, I think your objects lack the Builder or Factory pattern. It is not decent to create instances directly, by constructor. A special factory class for every your object is advised.
Now, when functional programming is coming into play, creating objects becomes even simpler. You will create objects at every step as breathing, without even noticing this. So, I expect your code will become even more “efficient” in new era
“do not do anything in your constructor. It is for initialization only”
Personally, I see no reason in the guideline. I consider it a just another method. Factoring code out is necessary only when you can share it.
2