I am looking for the name of a computer algorithm. I’m fairly sure that one should exist which will help me, but I can’t for the life of me figure out what to search for.
If you have implementations of the algorithm, that’d be awesome – but I’m really just researching this because I’d like to figure out how to build it myself.
Basically, I’m looking for an algorithm that can calculate future usage of some set of data based on my previous usage. For example, my electricity supplier knows how much power I’ve used for the previous 20 days of a 30 day month, so they can extrapolate that out and predict how quickly I will run out of money on my power account. I’d like to know if there’s an algorithm out there that can tell me in how long I’m likely to run out.
What I’m looking for is something that given the following inputs:
- A range (e.g. 30 days)
- A period of time through that range (e.g. 4, never smaller than zero, never greater than the max of the range)
- Some values for previous time periods – either the previous days, or the same days from previous months)
- A total value for this range (e.g. 300)
Can give me a date range in which I’m likely to run out of the total value (300). What I’d like at the other end would be something simple – e.g. “Based on previous usage, you’re going to run out in 2 days.”
I’ve done the usual Google searches, but haven’t come up with anything particular that even looks close – yet as far as I can see it’s a reasonably common algorithm used all over the place.
3
Basically, I’m looking for an algorithm that can calculate future usage of some set of data based on my previous usage.
First of all, there is no way to predict the future. The best thing you can do is to make a projection, i.e. to make a statement like
Given all the information at hand, for some specific question it is useful (or reasonable or fair) to assume that this unknown variable has value X.
That said, there is of course no generic answer to your question: you are basically asking for a general approach to model anything. There is no such a thing.
Basically you need to model your values, by guessing how they are distributed. Once you picked a distribution, look at the past to guess parameters of the distribution. Then guess if values are independant or not. If they are independant, you can make a draw of the missing values, compute the day when you run out, and repeating the process, compute the expected value of the day when you run out. If values are not independant, then you are stuck, because it is hard to simulate non independant values (except for a few distributions, including the normal distribution).
But note that the correct modelisation depends on the system. Your energy company will take into account that you need more energy in winter than in summer, and will also note is the winter is especially cold. If she analyses my withdrawals and deposits, my banker should take into account that some are weekly, daily, or monthly, and other random, and so on.
This is a case of Linear regression. The most commonly used approach is the Simple linear regression using the least squares method.
An implementation in java can be found at LinearRegression.java. See also this Weighted Linear Regression in Java question.
Linear regression is one answer, but it might not be the correct answer. The appropriate extrapolation method to use depends on the nature of the data itself.
For example, electricity demand is often cyclical, with daily, weekly, and yearly patterns superimposed on each other. Linear extrapolation would fail to take such patterns into account; seasonal adjustment may be called for.
As an extreme example, financial markets are chaotic. Many smart people have tried to write programs to predict the markets, sometimes successfully, sometimes not.
Anyway, only you can decide what method of extrapolation is appropriate. It could be extrapolation based on linear regression, some other kind of regression, or not reasonable at all.