Copyright
Copyright (C) 1986 - 1993, 1998 Thomas Williams, Colin Kelley
Permission to use, copy, and distribute this software and its
documentation for any purpose with or without fee is hereby granted,
provided that the above copyright notice appear in all copies and
that both that copyright notice and this permission notice appear
in supporting documentation.
Permission to modify the software is granted, but not the right to
distribute the complete modified source code. Modifications are to
be distributed as patches to the released version. Permission to
distribute binaries produced by compiling modified sources is granted,
provided you
1. distribute the corresponding source modifications from the
released version in the form of a patch file along with the binaries,
2. add special version identification to distinguish your version
in addition to the base release version number,
3. provide your name and address as the primary contact for the
support of your modified version, and
4. retain our contact information in regard to use of the base
software.
Permission to distribute the released version of the source code along
with corresponding source modifications in the form of a patch file is
granted with same provisions 2 through 4 for binary distributions.
This software is provided "as is" without express or implied warranty
to the extent permitted by applicable law.
AUTHORS
Original Software:
Thomas Williams, Colin Kelley.
Gnuplot 2.0 additions:
Russell Lang, Dave Kotz, John Campbell.
Gnuplot 3.0 additions:
Gershon Elber and many others.
Introduction
gnuplot is a command-driven interactive function and data plotting program.
It is case sensitive (commands and function names written in lowercase are
not the same as those written in CAPS). All command names may be abbreviated
as long as the abbreviation is not ambiguous. Any number of commands may
appear on a line (with the exception that load or call must be the final
command), separated by semicolons (;). Strings are indicated with quotes.
They may be either single or double quotation marks, e.g.,
load "filename"
cd 'dir'
although there are some subtle differences (see syntax for more details).
Any command-line arguments are assumed to be names of files containing
gnuplot commands, with the exception of standard X11 arguments, which are
processed first. Each file is loaded with the load command, in the order
specified. gnuplot exits after the last file is processed. When no load
files are named, gnuplot enters into an interactive mode. The special
filename "-" is used to denote standard input. See "help batch/interactive"
for more details.
Many gnuplot commands have multiple options. These options must appear in
the proper order, although unwanted ones may be omitted in most cases. Thus
if the entire command is "command a b c", then "command a c" will probably
work, but "command c a" will fail.
Commands may extend over several input lines by ending each line but the last
with a backslash (\). The backslash must be the _last_ character on each
line. The effect is as if the backslash and newline were not there. That
is, no white space is implied, nor is a comment terminated. Therefore,
commenting out a continued line comments out the entire command (see
comment). But note that if an error occurs somewhere on a multi-line
command, the parser may not be able to locate precisely where the error is
and in that case will not necessarily point to the correct line.
In this document, curly braces ({}) denote optional arguments and a vertical
bar (|) separates mutually exclusive choices. gnuplot keywords or help
topics are indicated by backquotes or boldface (where available). Angle
brackets (<>) are used to mark replaceable tokens. In many cases, a default
value of the token will be taken for optional arguments if the token is
omitted, but these cases are not always denoted with braces around the angle
brackets.
For on-line help on any topic, type help followed by the name of the topic
or just help or ? to get a menu of available topics.
The new gnuplot user should begin by reading about plotting (if on-line,
type help plotting).
Simple Plots Demo
Seeking-assistance
There is a mailing list for gnuplot users. Note, however, that the
newsgroup
comp.graphics.apps.gnuplot
is identical to the mailing list (they both carry the same set of messages).
We prefer that you read the messages through the newsgroup rather than
subscribing to the mailing list. Administrative requests should be sent to
majordomo@dartmouth.edu
Send a message with the body (not the subject) consisting of the single word
"help" (without the quotes) for more details.
The address for mailing to list members is:
info-gnuplot@dartmouth.edu
Bug reports and code contributions should be mailed to:
bug-gnuplot@dartmouth.edu
The list of those interested in beta-test versions is:
info-gnuplot-beta@dartmouth.edu
There is also a World Wide Web page with up-to-date information, including
known bugs:
http://www.cs.dartmouth.edu/gnuplot_info.html
Before seeking help, please check the
FAQ (Frequently Asked Questions) list.
If you do not have a copy of the FAQ, you may request a copy by email from
the Majordomo address above, ftp a copy from
ftp://ftp.dartmouth.edu/pub/gnuplot
or see the WWW gnuplot page.
When posting a question, please include full details of the version of
gnuplot, the machine, and operating system you are using. A _small_ script
demonstrating the problem may be useful. Function plots are preferable to
datafile plots. If email-ing to info-gnuplot, please state whether or not
you are subscribed to the list, so that users who use news will know to email
a reply to you. There is a form for such postings on the WWW site.
What's New in version 3.7
Gnuplot version 3.7 contains many new features. This section gives a partial
list and links to the new items in no particular order.
1. fit f(x) 'file' via uses the Marquardt-Levenberg method to fit data.
(This is only slightly different from the gnufit patch available for 3.5.)
2. Greatly expanded using command. See plot using.
3. set timefmt allows for the use of dates as input and output for time
series plots. See Time/Date data and
timedat.dem.
4. Multiline labels and font selection in some drivers.
5. Minor (unlabeled) tics. See set mxtics.
6. key options for moving the key box in the page (and even outside of the
plot), putting a title on it and a box around it, and more. See set key.
7. Multiplots on a single logical page with set multiplot.
8. Enhanced postscript driver with super/subscripts and font changes.
(This was a separate driver (enhpost) that was available as a patch for
3.5.)
9. Second axes: use the top and right axes independently of the bottom and
left, both for plotting and labels. See plot.
10. Special datafile names '-' and "". See plot special-filenames.
11. Additional coordinate systems for labels and arrows. See coordinates.
12. set size can try to plot with a specified aspect ratio.
13. set missing now treats missing data correctly.
14. The call command: load with arguments.
15. More flexible range commands with reverse and writeback keywords.
16. set encoding for multi-lingual encoding.
17. New x11 driver with persistent and multiple windows.
18. New plotting styles: xerrorbars, histeps, financebars and more.
See set style.
19. New tic label formats, including "%l %L" which uses the mantissa and
exponents to a given base for labels. See set format.
20. New drivers, including cgm for inclusion into MS-Office applications
and gif for serving plots to the WEB.
21. Smoothing and spline-fitting options for plot. See plot smooth.
22. set margin and set origin give much better control over where a
graph appears on the page.
23. set border now controls each border individually.
24. The new commands if and reread allow command loops.
25. Point styles and sizes, line types and widths can be specified on the
plot command. Line types and widths can also be specified for grids,
borders, tics and arrows. See plot with. Furthermore these types may be
combined and stored for further use. See set linestyle.
26. Text (labels, tic labels, and the time stamp) can be written vertically
by those terminals capable of doing so.
Batch/Interactive Operation
gnuplot may be executed in either batch or interactive modes, and the two
may even be mixed together on many systems.
Any command-line arguments are assumed to be names of files containing
gnuplot commands (with the exception of standard X11 arguments, which are
processed first). Each file is loaded with the load command, in the order
specified. gnuplot exits after the last file is processed. When no load
files are named, gnuplot enters into an interactive mode. The special
filename "-" is used to denote standard input.
Both the exit and quit commands terminate the current command file and
load the next one, until all have been processed.
Examples:
To launch an interactive session:
gnuplot
To launch a batch session using two command files "input1" and "input2":
gnuplot input1 input2
To launch an interactive session after an initialization file "header" and
followed by another command file "trailer":
gnuplot header - trailer
Command-line-editing
Command-line editing is supported by the Unix, Atari, VMS, MS-DOS and OS/2
versions of gnuplot. Also, a history mechanism allows previous commands to
be edited and re-executed. After the command line has been edited, a newline
or carriage return will enter the entire line without regard to where the
cursor is positioned.
(The readline function in gnuplot is not the same as the readline used in
GNU Bash and GNU Emacs. If the GNU version is desired, it may be selected
instead of the gnuplot version at compile time.)
The editing commands are as follows:
Line-editing:
^B moves back a single character.
^F moves forward a single character.
^A moves to the beginning of the line.
^E moves to the end of the line.
^H and DEL delete the previous character.
^D deletes the current character.
^K deletes from current position to the end of line.
^L,^R redraws line in case it gets trashed.
^U deletes the entire line.
^W deletes the last word.
History:
^P moves back through history.
^N moves forward through history.
On the IBM PC, the use of a TSR program such as DOSEDIT or CED may be desired
for line editing. The default makefile assumes that this is the case; by
default gnuplot will be compiled with no line-editing capability. If you
want to use gnuplot's line editing, set READLINE in the makefile and add
readline.obj to the link file. The following arrow keys may be used on the
IBM PC and Atari versions if readline is used:
Left Arrow - same as ^B.
Right Arrow - same as ^F.
Ctrl Left Arrow - same as ^A.
Ctrl Right Arrow - same as ^E.
Up Arrow - same as ^P.
Down Arrow - same as ^N.
The Atari version of readline defines some additional key aliases:
Undo - same as ^L.
Home - same as ^A.
Ctrl Home - same as ^E.
Esc - same as ^U.
Help - help plus return.
Ctrl Help - help .
Coordinates
The commands set arrow, set key, and set label allow you to draw
something at an arbitrary position on the graph. This position is specified
by the syntax:
{<system>} <x>, {<system>} <y> {,{<system>} <z>}
Each <system> can either be first, second, graph or screen.
first places the x, y, or z coordinate in the system defined by the left
and bottom axes; second places it in the system defined by the second axes
(top and right); graph specifies the area within the axes---0,0 is bottom
left and 1,1 is top right (for splot, 0,0,0 is bottom left of plotting area;
use negative z to get to the base---see set ticslevel); and screen
specifies the screen area (the entire area---not just the portion selected by
set size), with 0,0 at bottom left and 1,1 at top right.
If the coordinate system for x is not specified, first is used. If the
system for y is not specified, the one used for x is adopted.
If one (or more) axis is timeseries, the appropriate coordinate should
be given as a quoted time string according to the timefmt format string.
See set xdata and set timefmt. gnuplot will also accept an integer
expression, which will be interpreted as seconds from 1 January 2000.
Environment
A number of shell environment variables are understood by gnuplot. None of
these are required, but may be useful.
If GNUTERM is defined, it is used as the name of the terminal type to be
used. This overrides any terminal type sensed by gnuplot on start-up, but
is itself overridden by the .gnuplot (or equivalent) start-up file (see
start-up) and, of course, by later explicit changes.
On Unix, AmigaOS, AtariTOS, MS-DOS and OS/2, GNUHELP may be defined to be the
pathname of the HELP file (gnuplot.gih).
On VMS, the logical name GNUPLOT$HELP should be defined as the name of the
help library for gnuplot. The gnuplot help can be put inside any system
help library, allowing access to help from both within and outside gnuplot
if desired.
On Unix, HOME is used as the name of a directory to search for a .gnuplot
file if none is found in the current directory. On AmigaOS, AtariTOS,
MS-DOS and OS/2, gnuplot is used. On VMS, SYS$LOGIN: is used. See help
start-up.
On Unix, PAGER is used as an output filter for help messages.
On Unix, AtariTOS and AmigaOS, SHELL is used for the shell command. On
MS-DOS and OS/2, COMSPEC is used for the shell command.
On MS-DOS, if the BGI or Watcom interface is used, PCTRM is used to tell
the maximum resolution supported by your monitor by setting it to
S<max. horizontal resolution>. E.g. if your monitor's maximum resolution is
800x600, then use:
set PCTRM=S800
If PCTRM is not set, standard VGA is used.
FIT_SCRIPT may be used to specify a gnuplot command to be executed when a
fit is interrupted---see fit. FIT_LOG specifies the filename of the
logfile maintained by fit.
Expressions
In general, any mathematical expression accepted by C, FORTRAN, Pascal, or
BASIC is valid. The precedence of these operators is determined by the
specifications of the C programming language. White space (spaces and tabs)
is ignored inside expressions.
Complex constants are expressed as {<real>,<imag>}, where <real> and <imag>
must be numerical constants. For example, {3,2} represents 3 + 2i; {0,1}
represents 'i' itself. The curly braces are explicitly required here.
Note that gnuplot uses both "real" and "integer" arithmetic, like FORTRAN and
C. Integers are entered as "1", "-10", etc; reals as "1.0", "-10.0", "1e1",
3.5e-1, etc. The most important difference between the two forms is in
division: division of integers truncates: 5/2 = 2; division of reals does
not: 5.0/2.0 = 2.5. In mixed expressions, integers are "promoted" to reals
before evaluation: 5/2e0 = 2.5. The result of division of a negative integer
by a positive one may vary among compilers. Try a test like "print -5/2" to
determine if your system chooses -2 or -3 as the answer.
The integer expression "1/0" may be used to generate an "undefined" flag,
which causes a point to ignored; the ternary operator gives an example.
The real and imaginary parts of complex expressions are always real, whatever
the form in which they are entered: in {3,2} the "3" and "2" are reals, not
integers.
Functions
Operators
User-defined
Ternary
There is a single ternary operator:
Symbol Example Explanation
?: a?b:c ternary operation
The ternary operator behaves as it does in C. The first argument (a), which
must be an integer, is evaluated. If it is true (non-zero), the second
argument (b) is evaluated and returned; otherwise the third argument (c) is
evaluated and returned.
The ternary operator is very useful both in constructing piecewise functions
and in plotting points only when certain conditions are met.
Examples:
Plot a function that is to equal sin(x) for 0 <= x < 1, 1/x for 1 <= x < 2,
and undefined elsewhere:
f(x) = 0<=x && x<1 ? sin(x) : 1<=x && x<2 ? 1/x : 1/0
plot f(x)
Note that gnuplot quietly ignores undefined values, so the final branch of
the function (1/0) will produce no plottable points. Note also that f(x)
will be plotted as a continuous function across the discontinuity if a line
style is used. To plot it discontinuously, create separate functions for the
two pieces. (Parametric functions are also useful for this purpose.)
For data in a file, plot the average of the data in columns 2 and 3 against
the datum in column 1, but only if the datum in column 4 is non-negative:
plot 'file' using 1:( $4<0 ? 1/0 : ($2+$3)/2 )
Please see plot data-file using for an explanation of the using syntax.
User-defined
New user-defined variables and functions of one through five variables may
be declared and used anywhere, including on the plot command itself.
User-defined function syntax:
<func-name>( <dummy1> {,<dummy2>} ... {,<dummy5>} ) = <expression>
where <expression> is defined in terms of <dummy1> through <dummy5>.
User-defined variable syntax:
<variable-name> = <constant-expression>
Examples:
w = 2
q = floor(tan(pi/2 - 0.1))
f(x) = sin(w*x)
sinc(x) = sin(pi*x)/(pi*x)
delta(t) = (t == 0)
ramp(t) = (t > 0) ? t : 0
min(a,b) = (a < b) ? a : b
comb(n,k) = n!/(k!*(n-k)!)
len3d(x,y,z) = sqrt(x*x+y*y+z*z)
plot f(x) = sin(x*a), a = 0.2, f(x), a = 0.4, f(x)
Note that the variable pi is already defined. But it is in no way magic;
you may redefine it to be whatever you like.
Valid names are the same as in most programming languages: they must begin
with a letter, but subsequent characters may be letters, digits, "$", or "_".
Note, however, that the fit mechanism uses several variables with names
that begin "FIT_". It is safest to avoid using such names. "FIT_LIMIT",
however, is one that you may wish to redefine. See the documentation
on fit for details.
See show functions, show variables, and fit.
Glossary
Throughout this document an attempt has been made to maintain consistency of
nomenclature. This cannot be wholly successful because as gnuplot has
evolved over time, certain command and keyword names have been adopted that
preclude such perfection. This section contains explanations of the way
some of these terms are used.
A "page" or "screen" is the entire area addressable by gnuplot. On a
monitor, it is the full screen; on a plotter, it is a single sheet of paper.
A screen may contain one or more "plots". A plot is defined by an abscissa
and an ordinate, although these need not actually appear on it, as well as
the margins and any text written therein.
A plot contains one "graph". A graph is defined by an abscissa and an
ordinate, although these need not actually appear on it.
A graph may contain one or more "lines". A line is a single function or
data set. "Line" is also a plotting style. The word will also be used in
sense "a line of text". Presumably the context will remove any ambiguity.
The lines on a graph may have individual names. These may be listed
together with a sample of the plotting style used to represent them in
the "key", sometimes also called the "legend".
The word "title" occurs with multiple meanings in gnuplot. In this
document, it will always be preceded by the adjective "plot", "line", or
"key" to differentiate among them.
A graph may have up to four labelled axes. Various commands have the name of
an axis built into their names, such as set xlabel. Other commands have
one or more axis names as options, such as set logscale xy. The names of
the four axes for these usages are "x" for the axis along the bottom border
of the plot, "y" for the left border, "x2" for the top border, and "y2" for
the right border. "z" also occurs in commands used with 3-d plotting.
When discussing data files, the term "record" will be resurrected and used
to denote a single line of text in the file, that is, the characters between
newline or end-of-record characters. A "point" is the datum extracted from
a single record. A "datablock" is a set of points from consecutive records,
delimited by blank records. A line, when referred to in the context of a
data file, is a subset of a datablock.
Plotting
There are three gnuplot commands which actually create a plot: plot,
splot and replot. plot generates 2-d plots, splot generates 3-d
plots (actually 2-d projections, of course), and replot appends its
arguments to the previous plot or splot and executes the modified
command.
Much of the general information about plotting can be found in the discussion
of plot; information specific to 3-d can be found in the splot section.
plot operates in either rectangular or polar coordinates -- see set polar
for details of the latter. splot operates only in rectangular coordinates,
but the set mapping command allows for a few other coordinate systems to be
treated. In addition, the using option allows both plot and splot to
treat almost any coordinate system you'd care to define.
splot can plot surfaces and contours in addition to points and/or lines.
In addition to splot, see set isosamples for information about defining
the grid for a 3-d function; splot datafile for information about the
requisite file structure for 3-d data values; and set contour and set
cntrparam for information about contours.
Syntax
The general rules of syntax and punctuation in gnuplot are that keywords
and options are order-dependent. Options and any accompanying parameters are
separated by spaces whereas lists and coordinates are separated by commas.
Ranges are separated by colons and enclosed in brackets [], text and file
names are enclosed in quotes, and a few miscellaneous things are enclosed
in parentheses. Braces {} are used for a few special purposes.
Commas are used to separate coordinates on the set commands arrow,
key, and label; the list of variables being fitted (the list after the
via keyword on the fit command); lists of discrete contours or the loop
parameters which specify them on the set cntrparam command; the arguments
of the set commands dgrid3d, dummy, isosamples, offsets, origin,
samples, size, time, and view; lists of tics or the loop parameters
which specify them; the offsets for titles and axis labels; parametric
functions to be used to calculate the x, y, and z coordinates on the plot,
replot and splot commands; and the complete sets of keywords specifying
individual plots (data sets or functions) on the plot, replot and splot
commands.
Parentheses are used to delimit sets of explicit tics (as opposed to loop
parameters) and to indicate computations in the using filter of the fit,
plot, replot and splot commands.
(Parentheses and commas are also used as usual in function notation.)
Brackets are used to delimit ranges, whether they are given on set, plot
or splot commands.
Colons are used to separate extrema in range specifications (whether they
are given on set, plot or splot commands) and to separate entries in
the using filter of the plot, replot, splot and fit commands.
Semicolons are used to separate commands given on a single command line.
Braces are used in text to be specially processed by some terminals, like
postscript. They are also used to denote complex numbers: {3,2} = 3 + 2i.
Text may be enclosed in single- or double-quotes. Backslash processing of
sequences like \n (newline) and \345 (octal character code) is performed for
double-quoted strings, but not for single-quoted strings.
The justification is the same for each line of a multi-line string. Thus the
center-justified string
"This is the first line of text.\nThis is the second line."
will produce
This is the first line of text.
This is the second line.
but
'This is the first line of text.\nThis is the second line.'
will produce
This is the first line of text.\nThis is the second line.
Filenames may be entered with either single- or double-quotes. In this
manual the command examples generally single-quote filenames and double-quote
other string tokens for clarity.
At present you should not embed \n inside {} when using the enhanced option
of the postscript terminal.
The EEPIC, Imagen, Uniplex, LaTeX, and TPIC drivers allow a newline to be
specified by \\ in a single-quoted string or \\\\ in a double-quoted string.
Back-quotes are used to enclose system commands for substitution.
Time/Date data
gnuplot supports the use of time and/or date information as input data.
This feature is activated by the commands set xdata time, set ydata time,
etc.
Internally all times and dates are converted to the number of seconds from
the year 2000. The command set timefmt defines the format for all inputs:
data files, ranges, tics, label positions---in short, anything that accepts a
data value must receive it in this format. Since only one input format can
be in force at a given time, all time/date quantities being input at the same
time must be presented in the same format. Thus if both x and y data in a
file are time/date, they must be in the same format.
The conversion to and from seconds assumes Universal Time (which is the same
as Greenwich Standard Time). There is no provision for changing the time
zone or for daylight savings. If all your data refer to the same time zone
(and are all either daylight or standard) you don't need to worry about these
things. But if the absolute time is crucial for your application, you'll
need to convert to UT yourself.
Commands like show xrange will re-interpret the integer according to
timefmt. If you change timefmt, and then show the quantity again, it
will be displayed in the new timefmt. For that matter, if you give the
deactivation command (like set xdata), the quantity will be shown in its
numerical form.
The command set format defines the format that will be used for tic labels,
whether or not the specified axis is time/date.
If time/date information is to be plotted from a file, the using option
_must_ be used on the plot or splot command. These commands simply use
white space to separate columns, but white space may be embedded within the
time/date string. If you use tabs as a separator, some trial-and-error may
be necessary to discover how your system treats them.
The following example demonstrates time/date plotting.
Suppose the file "data" contains records like
03/21/95 10:00 6.02e23
This file can be plotted by
set xdata time
set timefmt "%m/%d/%y"
set xrange ["03/21/95":"03/22/95"]
set format x "%m/%d"
set timefmt "%m/%d/%y %H:%M"
plot "data" using 1:3
which will produce xtic labels that look like "03/21".
See the descriptions of each command for more details.
call
The call command is identical to the load command with one exception: you
can have up to ten additional parameters to the command (delimited according
to the standard parser rules) which can be substituted into the lines read
from the file. As each line is read from the called input file, it is
scanned for the sequence $ (dollar-sign) followed by a digit (0--9). If
found, the sequence is replaced by the corresponding parameter from the
call command line. If the parameter was specified as a string in the
call line, it is substituted without its enclosing quotes. $ followed by
any character other than a digit will be that character. E.g. use $$ to
get a single $. Providing more than ten parameters on the call command
line will cause an error. A parameter that was not provided substitutes as
nothing. Files being called may themselves contain call or load
commands.
The call command _must_ be the last command on a multi-command line.
Syntax:
call "<input-file>" <parameter-0> <parm-1> ... <parm-9>
The name of the input file must be enclosed in quotes, and it is recommended
that parameters are similarly enclosed in quotes (future versions of gnuplot
may treat quoted and unquoted arguments differently).
Example:
If the file 'calltest.gp' contains the line:
print "p0=$0 p1=$1 p2=$2 p3=$3 p4=$4 p5=$5 p6=$6 p7=x$7x"
entering the command:
call 'calltest.gp' "abcd" 1.2 + "'quoted'" -- "$2"
will display:
p0=abcd p1=1.2 p2=+ p3='quoted' p4=- p5=- p6=$2 p7=xx
NOTE: there is a clash in syntax with the datafile using callback
operator. Use $$n or column(n) to access column n from a datafile inside
a called datafile plot.
fit
The fit command can fit a user-defined function to a set of data points
(x,y) or (x,y,z), using an implementation of the nonlinear least-squares
(NLLS) Marquardt-Levenberg algorithm. Any user-defined variable occurring in
the function body may serve as a fit parameter, but the return type of the
function must be real.
Syntax:
fit {[xrange] {[yrange]}} <function> '<datafile>'
{datafile-modifiers}
via '<parameter file>' | <var1>{,<var2>,...}
Ranges may be specified to temporarily limit the data which is to be fitted;
any out-of-range data points are ignored. The syntax is
[{dummy_variable=}{<min>}{:<max>}],
analogous to plot; see plot ranges.
<function> is any valid gnuplot expression, although it is usual to use a
previously user-defined function of the form f(x) or f(x,y).
<datafile> is treated as in the plot command. All the plot datafile
modifiers (using, every,...) except smooth are applicable to fit.
See plot datafile.
The default data formats for fitting functions with a single independent
variable, y=f(x), are {x:}y or x:y:s; those formats can be changed with
the datafile using qualifier. The third item, (a column number or an
expression), if present, is interpreted as the standard deviation of the
corresponding y value and is used to compute a weight for the datum, 1/s**2.
Otherwise, all data points are weighted equally, with a weight of one.
To fit a function with two independent variables, z=f(x,y), the required
format is using with four items, x:y:z:s. The complete format must be
given---no default columns are assumed for a missing token. Weights for
each data point are evaluated from 's' as above. If error estimates are
not available, a constant value can be specified as a constant expression
(see plot datafile using), e.g., using 1:2:3:(1).
Multiple datasets may be simultaneously fit with functions of one
independent variable by making y a 'pseudo-variable', e.g., the dataline
number, and fitting as two independent variables. See fit multibranch.
The via qualifier specifies which parameters are to be adjusted, either
directly, or by referencing a parameter file.
Examples:
f(x) = a*x**2 + b*x + c
g(x,y) = a*x**2 + b*y**2 + c*x*y
FIT_LIMIT = 1e-6
fit f(x) 'measured.dat' via 'start.par'
fit f(x) 'measured.dat' using 3:($7-5) via 'start.par'
fit f(x) './data/trash.dat' using 1:2:3 via a, b, c
fit g(x,y) 'surface.dat' using 1:2:3:(1) via a, b, c
After each iteration step, detailed information about the current state
of the fit is written to the display. The same information about the
initial and final states is written to a log file, "fit.log". This file
is always appended to, so as to not lose any previous fit history; it
should be deleted or renamed as desired.
The fit may be interrupted by pressing Ctrl-C (any key but Ctrl-C under
MSDOS and Atari Multitasking Systems). After the current iteration
completes, you have the option to (1) stop the fit and accept the current
parameter values, (2) continue the fit, (3) execute a gnuplot command
as specified by the environment variable FIT_SCRIPT. The default for
FIT_SCRIPT is replot, so if you had previously plotted both the data
and the fitting function in one graph, you can display the current state
of the fit.
Once fit has finished, the update command may be used to store final
values in a file for subsequent use as a parameter file. See update
for details.
adjustable parameters
beginner's guide
error estimates
fit controlling
multi-branch
starting values
tips
adjustable parameters
There are two ways that via can specify the parameters to be adjusted,
either directly on the command line or indirectly, by referencing a
parameter file. The two use different means to set initial values.
Adjustable parameters can be specified by a comma-separated list of variable
names after the via keyword. Any variable that is not already defined is
is created with an initial value of 1.0. However, the fit is more likely
to converge rapidly if the variables have been previously declared with more
appropriate starting values.
In a parameter file, each parameter to be varied and a corresponding initial
value are specified, one per line, in the form
varname = value
Comments, marked by '#', and blank lines are permissible. The
special form
varname = value # FIXED
means that the variable is treated as a 'fixed parameter', initialized by the
parameter file, but not adjusted by fit. For clarity, it may be useful to
designate variables as fixed parameters so that their values are reported by
fit. The keyword # FIXED has to appear in exactly this form.
beginner's guide
fit is used to find a set of parameters that 'best' fits your data to your
user-defined function. The fit is judged on the basis of the the sum of the
squared differences or 'residuals' (SSR) between the input data points and
the function values, evaluated at the same places. This quantity is often
called 'chisquare' (i.e., the Greek letter chi, to the power of 2). The
algorithm attempts to minimize SSR, or more precisely, WSSR, as the residuals
are 'weighted' by the input data errors (or 1.0) before being squared; see
fit error_estimates for details.
That's why it is called 'least-squares fitting'. Let's look at an example
to see what is meant by 'non-linear', but first we had better go over some
terms. Here it is convenient to use z as the dependent variable for
user-defined functions of either one independent variable, z=f(x), or two
independent variables, z=f(x,y). A parameter is a user-defined variable
that fit will adjust, i.e., an unknown quantity in the function
declaration. Linearity/non-linearity refers to the relationship of the
dependent variable, z, to the parameters which fit is adjusting, not of
z to the independent variables, x and/or y. (To be technical, the
second {and higher} derivatives of the fitting function with respect to
the parameters are zero for a linear least-squares problem).
For linear least-squares (LLS), the user-defined function will be a sum of
simple functions, not involving any parameters, each multiplied by one
parameter. NLLS handles more complicated functions in which parameters can
be used in a large number of ways. An example that illustrates the
difference between linear and nonlinear least-squares is the Fourier series.
One member may be written as
z=a*sin(c*x) + b*cos(c*x).
If a and b are the unknown parameters and c is constant, then estimating
values of the parameters is a linear least-squares problem. However, if
c is an unknown parameter, the problem is nonlinear.
In the linear case, parameter values can be determined by comparatively
simple linear algebra, in one direct step. However LLS is a special case
which is also solved along with more general NLLS problems by the iterative
procedure that gnuplot uses. fit attempts to find the minimum by doing
a search. Each step (iteration) calculates WSSR with a new set of parameter
values. The Marquardt-Levenberg algorithm selects the parameter values for
the next iteration. The process continues until a preset criterium is met,
either (1) the fit has "converged" (the relative change in WSSR is less than
FIT_LIMIT), or (2) it reaches a preset iteration count limit, FIT_MAXITER
(see fit control variables). The fit may also be interrupted
and subsequently halted from the keyboard (see fit).
Often the function to be fitted will be based on a model (or theory) that
attempts to describe or predict the behaviour of the data. Then fit can
be used to find values for the free parameters of the model, to determine
how well the data fits the model, and to estimate an error range for each
parameter. See fit error_estimates.
Alternatively, in curve-fitting, functions are selected independent of
a model (on the basis of experience as to which are likely to describe
the trend of the data with the desired resolution and a minimum number
of parameters*functions.) The fit solution then provides an analytic
representation of the curve.
However, if all you really want is a smooth curve through your data points,
the smooth option to plot may be what you've been looking for rather
than fit.
statistical overview
The theory of non-linear least-squares (NLLS) is generally described in terms
of a normal distribution of errors, that is, the input data is assumed to be
a sample from a population having a given mean and a Gaussian (normal)
distribution about the mean with a given standard deviation. For a sample of
sufficiently large size, and knowing the population standard deviation, one
can use the statistics of the chisquare distribution to describe a "goodness
of fit" by looking at the variable often called "chisquare". Here, it is
sufficient to say that a reduced chisquare (chisquare/degrees of freedom,
where degrees of freedom is the number of datapoints less the number of
parameters being fitted) of 1.0 is an indication that the weighted sum of
squared deviations between the fitted function and the data points is the
same as that expected for a random sample from a population characterized by
the function with the current value of the parameters and the given standard
deviations.
If the standard deviation for the population is not constant, as in counting
statistics where variance = counts, then each point should be individually
weighted when comparing the observed sum of deviations and the expected sum
of deviations.
At the conclusion fit reports 'stdfit', the standard deviation of the fit,
which is the rms of the residuals, and the variance of the residuals, also
called 'reduced chisquare' when the data points are weighted. The number of
degrees of freedom (the number of data points minus the number of fitted
parameters) is used in these estimates because the parameters used in
calculating the residuals of the datapoints were obtained from the same data.
To estimate confidence levels for the parameters, one can use the minimum
chisquare obtained from the fit and chisquare statistics to determine the
value of chisquare corresponding to the desired confidence level, but
considerably more calculation is required to determine the combinations of
parameters which produce such values.
Rather than determine confidence intervals, fit reports parameter error
estimates which are readily obtained from the variance-covariance matrix
after the final iteration. By convention, these estimates are called
"standard errors" or "asymptotic standard errors", since they are calculated
in the same way as the standard errors (standard deviation of each parameter)
of a linear least-squares problem, even though the statistical conditions for
designating the quantity calculated to be a standard deviation are not
generally valid for the NLLS problem. The asymptotic standard errors are
generally over-optimistic and should not be used for determining confidence
levels, but are useful for qualitative purposes.
The final solution also produces a correlation matrix, which gives an
indication of the correlation of parameters in the region of the solution;
if one parameter is changed, increasing chisquare, does changing another
compensate? The main diagonal elements, autocorrelation, are all 1; if
all parameters were independent, all other elements would be nearly 0. Two
variables which completely compensate each other would have an off-diagonal
element of unit magnitude, with a sign depending on whether the relation is
proportional or inversely proportional. The smaller the magnitudes of the
off-diagonal elements, the closer the estimates of the standard deviation
of each parameter would be to the asymptotic standard error.
practical guidelines
If you have a basis for assigning weights to each data point, doing so lets
you make use of additional knowledge about your measurements, e.g., take into
account that some points may be more reliable than others. That may affect
the final values of the parameters.
Weighting the data provides a basis for interpreting the additional fit
output after the last iteration. Even if you weight each point equally,
estimating an average standard deviation rather than using a weight of 1
makes WSSR a dimensionless variable, as chisquare is by definition.
Each fit iteration will display information which can be used to evaluate
the progress of the fit. (An '*' indicates that it did not find a smaller
WSSR and is trying again.) The 'sum of squares of residuals', also called
'chisquare', is the WSSR between the data and your fitted function; fit
has minimized that. At this stage, with weighted data, chisquare is expected
to approach the number of degrees of freedom (data points minus parameters).
The WSSR can be used to calculate the reduced chisquare (WSSR/ndf) or stdfit,
the standard deviation of the fit, sqrt(WSSR/ndf). Both of these are
reported for the final WSSR.
If the data are unweighted, stdfit is the rms value of the deviation of the
data from the fitted function, in user units.
If you supplied valid data errors, the number of data points is large enough,
and the model is correct, the reduced chisquare should be about unity. (For
details, look up the 'chi-squared distribution' in your favourite statistics
reference.) If so, there are additional tests, beyond the scope of this
overview, for determining how well the model fits the data.
A reduced chisquare much larger than 1.0 may be due to incorrect data error
estimates, data errors not normally distributed, systematic measurement
errors, 'outliers', or an incorrect model function. A plot of the residuals,
e.g., plot 'datafile' using 1:($2-f($1)), may help to show any systematic
trends. Plotting both the data points and the function may help to suggest
another model.
Similarly, a reduced chisquare less than 1.0 indicates WSSR is less than that
expected for a random sample from the function with normally distributed
errors. The data error estimates may be too large, the statistical
assumptions may not be justified, or the model function may be too general,
fitting fluctuations in a particular sample in addition to the underlying
trends. In the latter case, a simpler function may be more appropriate.
You'll have to get used to both fit and the kind of problems you apply it
to before you can relate the standard errors to some more practical estimates
of parameter uncertainties or evaluate the significance of the correlation
matrix.
Note that fit, in common with most NLLS implementations, minimizes the
weighted sum of squared distances (y-f(x))**2. It does not provide any means
to account for "errors" in the values of x, only in y. Also, any "outliers"
(data points outside the normal distribution of the model) will have an
exaggerated effect on the solution.
control variables
The default epsilon limit (1e-5) may be changed by declaring a value for
FIT_LIMIT
When the sum of squared residuals changes between two iteration steps by
a factor less than this number (epsilon), the fit is considered to have
'converged'.
The maximum number of iterations may be limited by declaring a value for
FIT_MAXITER
A value of 0 (or not defining it at all) means that there is no limit.
If you need even more control about the algorithm, and know the
Marquardt-Levenberg algorithm well, there are some more variables to
influence it. The startup value of lambda is normally calculated
automatically from the ML-matrix, but if you want to, you may provide
your own one with
FIT_START_LAMBDA
Specifying FIT_START_LAMBDA as zero or less will re-enable the automatic
selection. The variable
FIT_LAMBDA_FACTOR
gives the factor by which lambda is increased or decreased whenever
the chi-squared target function increased or decreased significantly.
Setting FIT_LAMBDA_FACTOR to zero re-enables the default factor of
10.0.
Oher variables with the FIT_ prefix may be added to fit, so it is safer
not to use that prefix for user-defined variables.
The variables FIT_SKIP and FIT_INDEX were used by earlier releases of
gnuplot with a 'fit' patch called gnufit and are no longer available.
The datafile every modifier provides the functionality of FIT_SKIP.
FIT_INDEX was used for multi-branch fitting, but multi-branch fitting of
one independent variable is now done as a pseudo-3D fit in which the
second independent variable and using are used to specify the branch.
See fit multi-branch.
multi-branch
In multi-branch fitting, multiple data sets can be simultaneously fit with
functions of one independent variable having common parameters by minimizing
the total WSSR. The function and parameters (branch) for each data set are
selected by using a 'pseudo-variable', e.g., either the dataline number (a
'column' index of -1) or the datafile index (-2), as the second independent
variable.
Example: Given two exponential decays of the form, z=f(x), each describing
a different data set but having a common decay time, estimate the values of
the parameters. If the datafile has the format x:z:s, then
f(x,y) = (y==0) ? a*exp(-x/tau) : b*exp(-x/tau)
fit f(x,y) 'datafile' using 1:-1:2:3 via a, b, tau
For a more complicated example, see the file "hexa.fnc" used by the
"fit.dem" demo.
Appropriate weighting may be required since unit weights may cause one
branch to predominate if there is a difference in the scale of the dependent
variable. Fitting each branch separately, using the multi-branch solution
as initial values, may give an indication as to the relative effect of each
branch on the joint solution.
starting values
Nonlinear fitting is not guaranteed to converge to the global optimum (the
solution with the smallest sum of squared residuals, SSR), and can get stuck
at a local minimum. The routine has no way to determine that; it is up to
you to judge whether this has happened.
fit may, and often will get "lost" if started far from a solution, where
SSR is large and changing slowly as the parameters are varied, or it may
reach a numerically unstable region (e.g., too large a number causing a
floating point overflow) which results in an "undefined value" message
or gnuplot halting.
To improve the chances of finding the global optimum, you should set the
starting values at least roughly in the vicinity of the solution, e.g.,
within an order of magnitude, if possible. The closer your starting values
are to the solution, the less chance of stopping at another minimum. One way
to find starting values is to plot data and the fitting function on the same
graph and change parameter values and replot until reasonable similarity
is reached. The same plot is also useful to check whether the fit stopped at
a minimum with a poor fit.
Of course, a reasonably good fit is not proof there is not a "better" fit (in
either a statistical sense, characterized by an improved goodness-of-fit
criterion, or a physical sense, with a solution more consistent with the
model.) Depending on the problem, it may be desirable to fit with various
sets of starting values, covering a reasonable range for each parameter.
tips
Here are some tips to keep in mind to get the most out of fit. They're not
very organized, so you'll have to read them several times until their essence
has sunk in.
The two forms of the via argument to fit serve two largely distinct
purposes. The via "file" form is best used for (possibly unattended) batch
operation, where you just supply the startup values in a file and can later
use update to copy the results back into another (or the same) parameter
file.
The via var1, var2, ... form is best used interactively, where the command
history mechanism may be used to edit the list of parameters to be fitted or
to supply new startup values for the next try. This is particularly useful
for hard problems, where a direct fit to all parameters at once won't work
without good starting values. To find such, you can iterate several times,
fitting only some of the parameters, until the values are close enough to the
goal that the final fit to all parameters at once will work.
Make sure that there is no mutual dependency among parameters of the function
you are fitting. For example, don't try to fit a*exp(x+b), because
a*exp(x+b)=a*exp(b)*exp(x). Instead, fit either a*exp(x) or exp(x+b).
A technical issue: the parameters must not be too different in magnitude.
The larger the ratio of the largest and the smallest absolute parameter
values, the slower the fit will converge. If the ratio is close to or above
the inverse of the machine floating point precision, it may take next to
forever to converge, or refuse to converge at all. You will have to adapt
your function to avoid this, e.g., replace 'parameter' by '1e9*parameter' in
the function definition, and divide the starting value by 1e9.
If you can write your function as a linear combination of simple functions
weighted by the parameters to be fitted, by all means do so. That helps a
lot, because the problem is no longer nonlinear and should converge with only
a small number of iterations, perhaps just one.
Some prescriptions for analysing data, given in practical experimentation
courses, may have you first fit some functions to your data, perhaps in a
multi-step process of accounting for several aspects of the underlying
theory one by one, and then extract the information you really wanted from
the fitting parameters of those functions. With fit, this may often be
done in one step by writing the model function directly in terms of the
desired parameters. Transforming data can also quite often be avoided,
though sometimes at the cost of a more difficult fit problem. If you think
this contradicts the previous paragraph about simplifying the fit function,
you are correct.
A "singular matrix" message indicates that this implementation of the
Marquardt-Levenberg algorithm can't calculate parameter values for the next
iteration. Try different starting values, writing the function in another
form, or a simpler function.
Finally, a nice quote from the manual of another fitting package (fudgit),
that kind of summarizes all these issues: "Nonlinear fitting is an art!"
if
The if command allows commands to be executed conditionally.
Syntax:
if (<condition>) <command-line>
<condition> will be evaluated. If it is true (non-zero), then the command(s)
of the <command-line> will be executed. If <condition> is false (zero), then
the entire <command-line> is ignored. Note that use of ; to allow multiple
commands on the same line will _not_ end the conditionalized commands.
Examples:
pi=3
if (pi!=acos(-1)) print "?Fixing pi!"; pi=acos(-1); print pi
will display:
?Fixing pi!
3.14159265358979
but
if (1==2) print "Never see this"; print "Or this either"
will not display anything.
See reread for an example of how if and reread can be used together to
perform a loop.
load
The load command executes each line of the specified input file as if it
had been typed in interactively. Files created by the save command can
later be loaded. Any text file containing valid commands can be created
and then executed by the load command. Files being loaded may themselves
contain load or call commands. See comment for information about
comments in commands. To load with arguments, see call.
The load command _must_ be the last command on a multi-command line.
Syntax:
load "<input-file>"
The name of the input file must be enclosed in quotes.
The special filename "-" may be used to load commands from standard input.
This allows a gnuplot command file to accept some commands from standard
input. Please see "help batch/interactive" for more details.
Examples:
load 'work.gnu'
load "func.dat"
The load command is performed implicitly on any file names given as
arguments to gnuplot. These are loaded in the order specified, and
then gnuplot exits.
pause
The pause command displays any text associated with the command and then
waits a specified amount of time or until the carriage return is pressed.
pause is especially useful in conjunction with load files.
Syntax:
pause <time> {"<string>"}
<time> may be any integer constant or expression. Choosing -1 will wait
until a carriage return is hit, zero (0) won't pause at all, and a positive
integer will wait the specified number of seconds. pause 0 is synonymous
with print.
Note: Since pause communicates with the operating system rather than the
graphics, it may behave differently with different device drivers (depending
upon how text and graphics are mixed).
Examples:
pause -1 # Wait until a carriage return is hit
pause 3 # Wait three seconds
pause -1 "Hit return to continue"
pause 10 "Isn't this pretty? It's a cubic spline."
plot
plot is the primary command for drawing plots with gnuplot. It creates
plots of functions and data in many, many ways. plot is used to draw 2-d
functions and data; splot draws 2-d projections of 3-d surfaces and data.
plot and splot contain many common features; see splot for differences.
Note specifically that splot's binary and matrix options do not exist
for plot.
Syntax:
plot {<ranges>}
{<function> | {"<datafile>" {datafile-modifiers}}}
{axes <axes>} {<title-spec>} {with <style>}
{, {definitions,} <function> ...}
where either a <function> or the name of a data file enclosed in quotes is
supplied. A function is a mathematical expression or a pair of mathematical
expressions in parametric mode. The expressions may be defined completely or
in part earlier in the stream of gnuplot commands (see user-defined).
It is also possible to define functions and parameters on the plot command
itself. This is done merely by isolating them from other items with commas.
There are four possible sets of axes available; the keyword <axes> is used to
select the axes for which a particular line should be scaled. x1y1 refers
to the axes on the bottom and left; x2y2 to those on the top and right;
x1y2 to those on the bottom and right; and x2y1 to those on the top and
left. Ranges specified on the plot command apply only to the first set of
axes (bottom left).
Examples:
plot sin(x)
plot f(x) = sin(x*a), a = .2, f(x), a = .4, f(x)
plot [t=1:10] [-pi:pi*2] tan(t), \
"data.1" using (tan($2)):($3/$4) smooth csplines \
axes x1y2 notitle with lines 5
data-file
errorbars
parametric
ranges
title
with
data-file
Discrete data contained in a file can be displayed by specifying the name of
the data file (enclosed in single or double quotes) on the plot command line.
Syntax:
plot '<file_name>' {index <index list>}
{every <every list>}
{thru <thru expression>}
{using <using list>}
{smooth <option>}
The modifiers index, every, thru, using, and smooth are discussed
separately. In brief, index selects which data sets in a multi-data-set
file are to be plotted, every specifies which points within a single data
set are to be plotted, using determines how the columns within a single
record are to be interpreted (thru is a special case of using), and
smooth allows for simple interpolation and approximation. ('splot' has a
similar syntax, but does not support the smooth and thru options.)
Data files should contain at least one data point per record (using can
select one data point from the record). Records beginning with # (and
also with ! on VMS) will be treated as comments and ignored. Each data
point represents an (x,y) pair. For plots with error bars (see set style
errorbars), each data point is (x,y,ydelta), (x,y,ylow,yhigh), (x,y,xdelta),
(x,y,xlow,xhigh), or (x,y,xlow,xhigh,ylow,yhigh). In all cases, the numbers
on each record of a data file must be separated by white space (one or more
blanks or tabs), unless a format specifier is provided by the using option.
This white space divides each record into columns.
Data may be written in exponential format with the exponent preceded by the
letter e, E, d, D, q, or Q.
Only one column (the y value) need be provided. If x is omitted, gnuplot
provides integer values starting at 0.
In datafiles, blank records (records with no characters other than blanks and
a newline and/or carriage return) are significant---pairs of blank records
separate indexes (see plot datafile index). Data separated by double
blank records are treated as if they were in separate data files.
Single blank records designate discontinuities in a plot; no line will join
points separated by a blank records (if they are plotted with a line style).
If autoscaling has been enabled (set autoscale), the axes are automatically
extended to include all datapoints, with a whole number of tic marks if tics
are being drawn. This has two consequences: i) For splot, the corner of
the surface may not coincide with the corner of the base. In this case, no
vertical line is drawn. ii) When plotting data with the same x range on a
dual-axis graph, the x coordinates may not coincide if the x2tics are not
being drawn. This is because the x axis has been autoextended to a whole
number of tics, but the x2 axis has not. The following example illustrates
the problem:
reset; plot '-', '-'
1 1
19 19
e
1 1
19 19
e
every
example datafile
index
smooth
special-filenames
thru
using