I have a polyline “class” in my Clojure program, which is represented by a vector of points. (It’s not really a class or anything.)
The polyline’s length (in the geometric sense) is something that is often needed for computations. It’s also fairly slow to calculate every time it’s needed.
In OO code, I’d just have the length as a property of the Polyline class.
But in functional code, I’m lost. Do I duplicate the OO pattern and make the polyline a map of {:points points, :length (length points)}
? Do I make a length
function and memoize
it? Do I just depend on the compiler to cache the results?
5
The most literal and direct answer to the asked question is “you don’t.”
More helpful but still not quite sufficient as an answer is “as function results.”
Neither is completely enough when just starting out looking at things from a functional viewpoint though so let me share some things that helped me.
-
Functions are not methods — Functions operate on whatever is passed to them, not the ‘attributes’ of their ‘object’. They are not bound to a thing. This is easy to grasp but equally easy to forget. In OO methods and functions are treated as synonyms.
-
The OO concept ‘object’ isn’t helpful — data is a thing of its own and functions are things of their own. They are not bound together into a conceptual unit.
-
Data doesn’t have ‘properties’ in the OO sense — Since your data isn’t an instance it doesn’t carry around details within that instance. And you probably don’t want to subvert this. Doval and user39685 touch on why in their comments. I do also, below.
Looked at this way you don’t really have a polyline ‘class’. You have polylines as a concept and you have chosen to represent this concept as a vector of points. Since it is a vector you use it like any other vector!
If it was a vector of names and you needed how many, you’d call length. If it was a vector of airplanes and you needed how many, you’d call length, etc. all the same!
If performance is a problem then you start looking at changing the algorithm, datatype, or both. After profiling.
As user39685 said in his comment “beware of premature optimization.” If it turns out you spend all your time calculating the length of points then you memoize or wrap the polyline up in a map with a :length key like you have (but think twice because now you have a special snowflake to support, see below).
If you don’t rush to bundle your data up into ad-hoc objects you don’t have to un-bundle to use all the primitive functions already written for you. Your functions then also tend to become smaller, simpler, and more general because you don’t have to wrap and unwrap or parse and bundle data before operating on it.
3
First of all, make sure you really have a problem. Profiling and benchmarking are key.
There are a multitude of options, and ultimately it’s up to you to determine which one(s) is most suited for your specific application. But I can give you a few ideas.
But in functional code, I’m lost.
Don’t forget that Clojure is a multi-paradigm language. There’s no reason to use one paradigm when another is a better fit. In fact, Clojure itself has quite the repertoire of classes and interfaces.
Do I duplicate the OO pattern and make the polyline a map of
{:points points, :length (length points)}
?
That’s certainly one way to do it, but it’s definitely not the only way. For starters, note that in the specific example you gave, you are calculating the length of the polyline up-front. But if you only use it “often” (but not all the time), then you are wasting time calculating the length if it’s never used for that particular polyline. Using delay
, you can defer the computation until it’s definitely needed:
{:points points
:length (delay (length points))}
Now, storing the length
on the map (either directly or wrapped in a delay
) has one major drawback: you need to remember to “reset” the length anytime a new polyline map is generated. For example, suppose we want to implement a conj-point
function to “add” a point to a polyline. We might try this:
(defn conj-point [polyline point]
(update polyline :points conj point))
But in this case, the new polyline would have the old polyline’s length. So conj-point
needs to update the length as well:
(defn conj-point [polyline point]
(let [points' (conj (:points polyline) point)
length' (delay (length points'))]
(assoc polyline :points points' :length length')))
With this code, we still have the problem that we are locked into one specific representation of a polyline (a Clojure map). If we wish to change the representation, we would have to change every function that deals (directly) with that representation. In Java, we’d make an IPolyline
interface. We can do the same in Clojure, but we can also use Protocols:
(defprotocol IPolyline
(conj-point [polyline point]))
Now, we can extend IPolyline
to any representation of polylines we wish to use. For example, Clojure’s IPersistentMap
interface:
(extend-protocol IPolyline
clojure.lang.IPersistentMap
(conj-point [polyline point]
...))
Now, if we wish to be more specific about which maps we accept, we could define a record type:
(defrecord Polyline [points length]
IPolyline
(conj-point [self point] ...))
Now, we can use the functions we normally would with maps (get
, update
, assoc
, etc.) to work with Polyline
s (though we still need to be careful to recalculate the length as necessary). But we also have a type we can “tag” polylines with to distinguish them from other maps.
If the map interface isn’t important, we can use deftype
instead of defrecord
:
(deftype Polyline [points length]
IPolyline
(conj-point [self point] ...))
Of course, all three implementations could co-exist.
But let’s take a step back now. Logically, conj-point
is just the same as conj
. The only difference is that conj-point
is “specialized” for polylines. There may be good reasons to retain the conj-point
interface (for instance, you are using map or defrecord
representations). But, there’s also good reason to want to just use the conj
function directly — it makes your code more general, and you can take advantage of code which isn’t aware of polyline but is aware of conj
able things.
Since your original representation of a polyline was a vector, it might make sense to use IPersistentVector
as your primary abstraction. You can then define (using deftype
, genclass
, or Java) a Polyline
class that implements IPersistentVector
in a way that ensures consistency of the length
. However, if you go this route, be aware that many of Clojure’s collection interfaces are not officially considered to be part of the public API. They are largely undocumented, and (at least in theory) could change without warning.
Do I make a
length
function andmemoize
it?
That’s almost certainly a bad idea — at least if you’re talking about memoize
ing length
at the top-level. The problem is that memoize
d functions retain references to their arguments for the lifetime of the object representing the function. That is, when a memoize
d function is called, its arguments are first looked up in a hash map. If an entry is found for those arguments, the “remembered” value is returned. If not, the underlying function is called and the return value is “remembered” by adding an entry to the hash map.
I’m specifically avoiding use of the word “cache” here, because “cache” has the connotation that things “could be forgotten / purged from the cache” if the cache becomes “too full”. But that’s not how memoize
works — it never purges entries from the map, so it’s not really a “cache” in that sense. It remembers for its lifetime, which in the case of a top-level function is “forever”. So a memoize
d length
function would retain all the polylines it was ever invoked with — they would not be garbage collected!
Now, if there happens to be only a few polyline values that you actually use in your application, this might be ok. However, IIRC, memoize
uses value-based comparison. So if your polylines have a lot of points, you might just be trading time spent calculating the length of a polyline for time spent comparing polylines.
One approach that might be effective is to add an instance of a memoize
d length
function to the polyline map when it is constructed. For example:
(defn make-polyline [points]
{:points points
:length (memoize length)})
You could then get the length of polyline using something like ((:length polyline) (:points polyline))
(which, realistically, you would abstract into a function). Since each call to make-polyline
constructs a different memoize
ation of length
, you would only be retaining polylines during the lifetime of that specific memoize
ation. In theory, that would be the same as the lifetime of the polyline on which it was initially attached to. But if you happen to forget to construct a new :length
entry in some case, you will at least still compute the correct length, and you will keep unnecessary garbage to a minimum.
Do I just depend on the compiler to cache the results?
Not unless you implement your own compiler ;-). Clojure doesn’t cache the results produced by any expressions (unless you count local variables). You can, of course, “cache” the results explicitly by storing them in a datastructure (e.g., map, delay
, or even a closure).