I try to do linear regression on datapoints and was wondering why gnuplot fit is different to an excel trendline and my powershellscript.
My datapoints:
Timestamp Value
1709289398 3.89
1711018216 3.9
1711450016 3.93
1711460288 3.91
1711462458 3.92
1711618830 3.95
1711620395 3.94
1711630819 3.95
1711635811 3.93
1711638487 3.91
1712057838 3.96
1712578056 3.93
1712590907 3.89
1712662297 3.95
1712918775 3.9
1712924128 3.94
1713170534 3.98
1713191398 3.94
1713773713 3.94
1713776247 3.97
1713777374 3.96
1713790153 3.97
1713860582 3.96
1713881074 3.86
1713882272 3.89
1714032046 3.94
1714033372 3.86
1714045066 3.8
1714055405 3.81
1714382965 3.82
1714478636 3.78
1714494189 3.96
1714496044 3.95
1714496681 3.95
1714652333 3.97
1714654135 4.01
1714732802 4.02
1714735452 4.01
1714736266 4.07
1714737131 3.99
1714748973 4.02
1714984178 4.24
1715006630 4
1715181736 3.91
1715599718 3.88
1715675239 3.87
1717430326 3.91
1717432138 3.88
1717750408 3.98
This is my powershell funtion that gives me nearly the same output as a trendline in excel:
function Find-LinearRegession($dataPoints) {
$n = $dataPoints.Length
$sumX = 0
$sumY = 0
$sumXY = 0
$sumXX = 0
foreach ($point in $dataPoints) {
$x = $point[0]
$y = $point[1]
$sumX += $x
$sumY += $y
$sumXY += $x * $y
$sumXX += $x * $x
}
$meanX = $sumX / $n
$meanY = $sumY / $n
$slope = (($n * $sumXY) - ($sumX * $sumY)) / (($n * $sumXX) - ($sumX * $sumX))
$intercept = $meanY - ($slope * $meanX)
return @{
Slope = $slope
Intercept = $intercept
}
}
Excel: y = 0,000000005343803x – 5,221042665245090
Powershell: y = 0,000000005343795x – 5,2210293420428
Now i saw that gnuplot got a fit function and i was thinking if both (trendline and fit) doing the same.
But after a view trys i cant figure it out. Here is my Gnuplotscript
data = 'GnuPlotValues.txt'
set ticslevel 0
set tics nomirror
set key noautotitle
set grid
set key font ',14'
set xdata time
set timefmt "%s"
set format x "%d/%m/%y"
set xtics rotate by 45 right
a = 1
b = 1
g(x) = a*x + b
fit g(x) data using 1:2 via a, b
f(x) = 5.34379555506111E-09*x + -5.2210293420428
plot data using 1:2 with points pt 7 ps 0.6 lc rgb 'red' title 'Datapoints',
g(x) with lines,
f(x) with lines title 'Linear regression: f(x) = 5.3437E-09x + -5.221'
The Output looks like this:
Red is the Gnuplot fit and blue is the powershellscript function. Now iam wondering what is correct and what is the better choice for a linear regression. Can someone tell me why both have such a differance in slope?