Data fitting using fmin
Fitting nonlinear models to data
Simple example of curve-fitting
1) Get Data
Lets say you have some data you want to fit -from pylab import * from pylab import * from numpy import * ## Generating data to fit n = 20 xmin = 1 xmax = 11 a=2 b=4 funct=lambda x:a*x+b x = linspace(xmin,xmax,n) y = funct(x) a=plot(x,y,'ro') show(a)
This data was generated from the line y=ax+b where the slope a=2 and intercept b=4.
However, if you just have the data handed to you or you recorded it-you do not know this. Which brings us to step 2.
2) Guess the Relationship between the x and y coordinates
You might guess y=ax+b is the right relationship between the data points. "Hey" you say "That looks like a line!". And yes you would be right. You might even be able to guess with a decent amount of certainty what the slope (a) and intercept (b) should be. However sometimes you get messier relationships that are not so straightforward when you fit data. So for example's sake, we will continue to pretend we are dumbfounded as to what the slope(a) and intercept (b) should be.We can graph the relationship and throw out a guess for "a" and "b"....
from pylab import * from numpy import * ## Generating data to fit n = 20 xmin = 1 xmax = 11 a=2 b=4 funct=lambda x:a*x+b x = linspace(xmin,xmax,n) y = funct(x) #Our function we think might fit the data A=3 B=4 my_guess_function=lambda x:A*x+B X=linspace(xmin,xmax,100) Y=my_guess_function(X) a=plot(x,y,'ro',X,Y,'-') title('The Data with Best Guess Function') show(a)
Well...almost I guessed A=3 and B=4. We can keep guessing and graphing to find something closer. But how do we tell if one guess is closer than another? It can be hard to eyeball at a certain point. This brings us to step 3.
3) Write a Function to Score the Fit
How close is close? Well we want to minimize the distance between theblue line (our guess) and the red dots (the data). Ideally we want them on top of each other (aka distance from blue line to red dot = zero).
Whatever guess gets us close to the most data points wins...basically.
You can determine which guess gets closest by calculating the "sum of squares".
If you have never heard of it click here. There are other ways to calculate how close your guess is to data but this one is the most straightforward and useful.
Here is an example of how to code it to check and see 'which guess is best'
## Generating data to fit n = 20 xmin = 1 xmax = 11 a=2 b=4 funct=lambda x:a*x+b x = linspace(xmin,xmax,n) y = funct(x) # The function we think fits the data my_guess_function=lambda x:A*x+B #Use sum of squared distances (SS) to score the fit ss = lambda x, y: ((my_guess_function(x)-y)**2).sum() #our 1st fit attempt A=3 B=4 X1=linspace(xmin,xmax,100) Y1=my_guess_function(X1) ss1=ss(my_guess_function(x),y) #our 2nd fit attempt A=1.5 B=2 X2=linspace(xmin,xmax,100) Y2=my_guess_function(X2) ss2=ss(my_guess_function(x),y) #plotting the data and the fits plt.plot(x,y,'ro',X1,Y1,'-',X2,Y2,'-') plt.legend(('data', 'fit 1', 'fit 2'),'upper center') graphname=' Fit1: '+str(ss1)+' Fit2: '+str(ss2) plt.title(graphname) plt.show()
The summed squares of the distances are listed in the title of this figure. we can see that the second fit in green has a lower number (136) then the first fit in blue (67346.) This means that the green line was, overall,closer to the red data points since its overall squared distances were smaller.
4) Optimize to find the Best Fit.
Previously, we figured out a way to determine if one fit was better than another. Now we need to find the 'best' fit possible. We could keep guessing over and over again, and try to find the fit that gives us the lowest sum of squared distances.But this is a pain. Really we need to have an optimizer do this for us.
The optimizer will use an algorithm to find the values of A and B that minimize
the sum of squares. It will give us the 'least sum of squares' .
import numpy import matplotlib.pyplot as plt from scipy.optimize import fmin #============================================== ## Generating data to fit n = 20 xmin = 1 xmax = 11 a=2 b=4 funct=lambda x,a,b:a*x+b x = linspace(xmin,xmax,n) y = funct(x,a,b) #=============================================== #scores fit of functions for a set of parameters def score((parms)): #1 pull off parameters from optimizer input A,B=parms #2 data you are matching to funct=lambda x,a,b:a*x+b x = linspace(1,11,20) y = funct(x,2,4) #3 write function to fit data my_guess_function=lambda x,A,B:A*x+B #4 write cost function to see how well it fits ss=lambda x,y,A,B:((my_guess_function(x,A,B)-y)**2).sum() #5 return the fit score for this try return ss(x,y,A,B) #find the parameters that give us the lowest score answ=fmin(score,(3,1),full_output=1,maxiter=1000) #Create function from lowest scoring parameters function=lambda x,A,B:A*x+B A=answ[0][0] B=answ[0][1] X1 = linspace(xmin,xmax,n*5) Y1= funct(X1,A,B) #=============================================== #plotting the data and lowest scoring (aka best)fit plt.plot(x,y,'ro',X1,Y1,'-') plt.legend(('data', 'the lowest scoring fit'),'upper center') graphname='optimized fit' plt.title(graphname) plt.show() #===============================================
5) Other Optimization Functions
There are a variety of optimization functions we could have usedsee here.
For instance if you have a discrete optimization problem it might make
more sense to use an optimizer that anneals...
However if you have a problem whose minimization function does not
tend to 'jump' around as you shift parameters, then you should stick
with something like fmin.
from numpy import * import matplotlib.pyplot as plt from scipy.optimize import fmin from scipy.optimize import anneal #============================================== ## Generating data to fit n = 20 xmin = 1 xmax = 11 a=2 b=4 funct=lambda x,a,b:a*x+b x = linspace(xmin,xmax,n) y = funct(x,a,b) #=============================================== #scores fit of functions for a set of parameters def score((parms)): #1 pull off parameters from optimizer input A,B=parms #2 data you are matching to funct=lambda x,a,b:a*x+b x = linspace(1,11,20) y = funct(x,2,4) #3 write function to fit data my_guess_function=lambda x,A,B:A*x+B #4 write cost function to see how well it fits ss=lambda x,y,A,B:((my_guess_function(x,A,B)-y)**2).sum() #5 return the fit score for this try return ss(x,y,A,B) #find the parameters that give us the lowest score answ=anneal(score,(3,1),schedule='boltzmann',full_output=1,maxiter=10000) #Create function from lowest scoring parameters function=lambda x,A,B:A*x+B A=answ[0][0] B=answ[0][1] X1 = linspace(xmin,xmax,n*5) Y1= funct(X1,A,B) #=============================================== #plotting the data and lowest scoring (aka best)fit plt.plot(x,y,'ro',X1,Y1,'-') plt.legend(('data', 'the lowest scoring fit'),'upper center') graphname='optimized fit' plt.title(graphname) plt.show() #===============================================