Curve Fitting


Contents Summary

Detailed Contents
Introduction
Command Line
File Format
Plot Instructions
Expressions
Curve Fitting
Three-Dimensional Plots
PostScript Files
Appendix: Line Styles
Index


6 Curve Fitting

A powerful feature of DigiPlot is its ability to evaluate least squares fits to the data it plots. At present, only functions which are linear in the initially unknown coefficients may be fitted. Nevertheless, when combined with transformation of both x and y data, the linear least squares procedure provides flexible curve fitting of a wide range of functions.

Consider a set of data (xi,yi), for i=0,n­1. We first transform the y data by some function f(y) to give

Fi = f(yi).

The transformed data Fi is then approximated by the function G(x) defined by

G(x) = a0g0(x) + a1g1(x) + … + a1g1(x),

where aj, j=1,1 (m  n) are the coefficients to be determined by the least squares process and g(xi) describe the function to be fitted. By definition, the least squares fit determines the values of the coefficients to minimise r2, where

.

In DigiPlot the coefficients aj are evaluated using a Householder transformation.

Consider the plot file below as a simple example of fitting a quadratic function

G(x) = a0 + a1x + a2x2.

# Fitting quadratic to data by least squares
C 0 10 0 100
A
P 2
X [1
Y [2
0 0.1
1 0.99
2 3.9
3 9.2
4 16
5 24.6
6 36.4
8 63
9 81.7
10 100
SF Y : 1.0 : X : X*X
I 128
SP F
SE 1 80
XT Time (&It&R)
YT Height (&IH&R)
G Growth of mixed region


As with the earlier example, the coordinate systems and x and y expressions are specified prior to processing any data. The data points will be plotted using + marks (P 2 instruction). Once the data has been plotted, the SF plot instruction indicates that a least squares fit for the data should be evaluated. The function to be fitted is specified by the remainder of the line. The first entry, up until the first colon (:), indicates the transformation f(yi) to be applied to the y data points. This expression may include any of the normal DigImage operators and Y is used to represent the yi data. The function g0(x) is given by the text between the first and second colon (i.e. 1.0), with g1(x) given between the second and third colons (i.e. X) and g2(x) after the last colon (i.e. X*X).

Once the least squares fit has been evaluated, it is normal for the fit to be utilised in some manner. The SP plot instruction tells DigiPlot to plot the fitted equation using the current plotting intensity (here set to 128 on the previous line by the I plot instruction). DigiPlot must be told how to relate the value of the fitted function G(x) to the y coordinate of the plot. The text on the SP line gives

y = Y(F),

where the functions f(y) and Y(F) are related by

y = Y(f(y)).

The function Y(F) is specified on the SP line in terms of F using the normal DigImage rules for expressions. For the present example, Fi = f(yi) = yi, so y = Y(F) = F.

The final instruction related to the fitting procedure is SE which is used to instruct DigiPlot to write out the equation and coefficients which have been fitted. The two numeric parameters following the SE instruction inform DigiPlot where on the plot the equation should be written. This position is specified in terms of the current coordinate system.

In addition to being available to the SP and SE plot instructions, the coefficients determined by SF are available through the !!Rn return variables. The first coefficient is returned in !!R0, and the last in !!R2 (in this example). The RMS error is returned in !!R3. In general if there are m terms in the fit, then the last coefficient is returned in !!R1 and the root mean square error in !!Rm.

Note we have also used embedded character formatting in the XT and YT plot instructions. Here we simply italicise the t and H.

DigiPlot may fit more than one form of G(x) to the data by simply repeating the appropriate SF, SP, and SE instructions, with SF and SP reflecting any changes in f(y), gi(x) and Y(F) for each G(x). Alternatively, DigiPlot may be requested to fit the curve to only a subset of the data by using the N plot instruction to indicate the start of a new set of data.

In our second example, we shall evaluate power law fits to two sets of data. For this example we shall use the R plot instruction to read the data from a secondary file GROWTH.DAT rather than intermix the plot instructions and data as was done previously. We shall assume the file GROWTH.DAT contains a number of columns of data: the first column is the position, the second the thickness at time zero, the third the thickness at time one and so on.

# Fitting power law to multiple data sets

# Determine the number of data sets to be fitted
!^ Number of data sets ?
!!1:=!!K
# Initialise pointers
!!0:=1        Start with first data set
!!2:=255      Start with solid line
!!3:=180      Start position of equations
# Set up axes
C 0 100 0 200
A
# Loop around the data sets
!!:NextDataSet
  # Specify plot mark for this data set
  P !!0
  # Specify start of new data set
  N
  # Specify column for this data set
  !!0+=1        Increment to obtain column number for y data
  X [1
  Y [!!0
  # Read and plot data from file
  R GROWTH.DAT
  # Fit data and plot out
  SF LN(Y) : 1.0 : LN(X)
  I !!2
  SP EXP(F)
  SE 10 !!3
  # Update plot intensity/line type
  !!2-=32       Update plot intensity
  !!3-=20       Update
!!0<=!!1 :NextDataSet

XT Position
YT Layer thickness
G Power laws for growth of layer thickness


This example has used some of the more advanced features of DigiPlot to simplify the plotting of the curves and avoid duplication of plotting instructions. The main mechanism through which this is achieved is the DigImage command language (refer to the System Overview for complete details). Before setting up the coordinate system and axes, DigiPlot prompts the user for the number of data sets to be plotted (one less than the number of columns in GROWTH.DAT). This is read into a DigImage variable for later use. A number of other DigImage variables are also initialised at this point: these will keep track of the type of marks to be used, the intensity/line type for the fitted curves and the position at which the equation will be printed out.

After the coordinate system and axes have been specified, the main loop of the plotting procedure is started with a DigImage label (!!:NextDataSet). DigiPlot is instructed to use the mark type dictated by the current contents of the !!0 variable (which is used to specify which data set is being plotted), and the fact that it is a new data set indicated by the N instruction. The !!0 variable is incremented to give the column number before specifying the x and y data columns using the X and Y plot instructions. The data itself is read in and plotted with the R GROWTH.DAT read command, and then the power law determined by fitting a straight line to the ln(x) and ln(y) data. The fitted power law is plotted out with the intensity specified by !!2, the EXP(F) being required to undo the earlier LN(Y) operation, and the equation printed out at the appropriate place. Finally the DigImage variables are incremented (decremented) before the conditional jump to process the next data set.

If a single power law fit to all the data sets was required instead of a separate fit to each data set, all that would be needed is to remove (or comment out) the N new data set instruction and move the SF, SP and SE fitting instructions to after the end of the loop.

This plot file could be made even more flexible by introducing further DigImage commands and variables. For example, we may replace the hard coded data file GROWTH.DAT with the following:

# Fitting power law to multiple data sets

# Determine name of data file
!^ Name of data file to be plotted (excluding .DAT extension) ?
!0:=!K
# Determine the number of data sets to be fitted
...
  # Read and plot data from file
  R !0.DAT
...


Additional refinements might include interactive input of the function to be fitted.

As an alternative to the SF plot instruction for performing the least squares fit, the SW instruction offers the additional flexibility of a user-defined weighting function. Using a combination of the PC conditional plotting and SF fitting it is possible to implement rectangular weighting functions to include only specified data points. However, with SW more complex weightingings may be given. Effectively we minimise

where gi is some weighting function specified as the first argument of the SW plot instruction. At present gi may be specified as either a function of x or a function of y, but not a function of both or of some independent quantity.

If, for a given data point (xi,yi) the weighting gi is less than 10-12 (or is negative), then the data from that point is not used in evaluating the fit. The SF plot instruction is equivalent to SP with the weighting function set to unity (or any other positive constant).

If the optional ampersand (&) is included in any of the three forms of the least squares fit plot instructions (SF, SW or SV), then the x and y data are interchanged before fitting, making y the independent variable and x the dependent variable. The expressions should be quoted in terms of the appropriate variable name (i.e. the opposite way around when compared with the plot instruction without the ampersand).

Goto next document (3DPlots)


Links

DIGIPLOT

Contents
Index

OTHER

DigImage documentation page
DigImage home page
Stuart Dalziel's home page


Stuart Dalziel, last page update: 21 June 1999