On Fri, May 27, 2011 at 18:31, Ethan A Merritt wrote:
>
>> However I have
>> /Library/Frameworks/QtCore.framework
>> and others present and I'm able to compile and use other Qt applications.
>
> Aha. Sounds promising.
> Can you see how the Make files for those other Qt applications
> refer to Qt, and borrow equivalent commands for gnuplot's configure script?
I just checked the source code of Geant4
(http://geant4.cern.ch/support/download.shtml). The configure script
includes 500 lines of code devoted to Qt only.
At the end it sets the following variables (which are later used as
flags to compiler):
QTFLAGS=-I/Library/Frameworks/QtCore.framework/Headers
-I/Library/Frameworks/QtGui.framework/Headers
-I/Library/Frameworks/QtOpenGl.framework/Headers
QTLIBS=-F/Library/Frameworks -framework QtCore -framework QtGui
I now commented out a few lines in gnuplot's configure script and ran
export QT_CFLAGS="-I/Library/Frameworks/QtCore.framework/Headers
-I/Library/Frameworks/QtGui.framework/Headers
-I/Library/Frameworks/QtNetwork.framework/Headers
-I/Library/Frameworks/QtSvg.framework/Headers"
export QT_LIBS="-F/Library/Frameworks -framework QtCore -framework
QtGui -framework QtNetwork -framework QtSvg"
./configure --enable-qt
which ended up building gnuplot, but complaining about
qtterminal/QtGnuplotWidget.cpp:280:34: error: ui_QtGnuplotSettings.h:
No such file or directory
which seems to be a valid complaint. I cannot find such file in
gnuplot sources either.
Mojca

On Friday, May 27, 2011 12:20:26 am Mojca Miklavec wrote:
> Hello,
>
> I wanted to test Qt compilation, but I cannot even start it.
The configure script is assuming that a pkg-config files exists to
describe the local installation of Qt. I think we've been through
this before that OSX does not [typically] use pkg-config. So you
may have to modify the autoconf scripts to deal with the OSX way
of doing things. Here's a start:
> configure scripts asks for lrelease-qt4, but the name of binary on mac
> seems to be lrelease-4.7 (it probably depends on version).
Modify the following line
configure.in:1189:AC_CHECK_PROGS(LRELEASE, lrelease-qt4 lrelease, no)
> On top of that it complains about
>
> No package 'QtCore' found
> No package 'QtGui' found
> No package 'QtNetwork' found
> No package 'QtSvg' found
Those would have been picked up in the pkgconfig files, I think.
But on OSX they may well be defined in a system header that needs to
be included. The linux libqt4 distribution includes header files for
a surprising number of other systems. I see, for example:
/usr/lib/qt4/mkspecs/common/mac-g++.conf
/usr/lib/qt4/mkspecs/common/mac-llvm.conf
/usr/lib/qt4/mkspecs/common/mac.conf
/usr/lib/qt4/mkspecs/cygwin-g++
/usr/lib/qt4/mkspecs/cygwin-g++/qmake.conf
/usr/lib/qt4/mkspecs/cygwin-g++/qplatformdefs.h
/usr/lib/qt4/mkspecs/darwin-g++
/usr/lib/qt4/mkspecs/darwin-g++/qmake.conf
/usr/lib/qt4/mkspecs/darwin-g++/qplatformdefs.h
/usr/lib/qt4/mkspecs/macx-xcode
/usr/lib/qt4/mkspecs/macx-xcode/Info.plist.app
/usr/lib/qt4/mkspecs/macx-xcode/Info.plist.lib
/usr/lib/qt4/mkspecs/macx-xcode/qmake.conf
/usr/lib/qt4/mkspecs/macx-xcode/qplatformdefs.h
/usr/lib/qt4/mkspecs/macx-xlc
/usr/lib/qt4/mkspecs/macx-xlc/qmake.conf
/usr/lib/qt4/mkspecs/macx-xlc/qplatformdefs.h
It seems likely that among those (and many more I didn't list) are
the definitions and configuration information you need.
> However I have
> /Library/Frameworks/QtCore.framework
> and others present and I'm able to compile and use other Qt applications.
Aha. Sounds promising.
Can you see how the Make files for those other Qt applications
refer to Qt, and borrow equivalent commands for gnuplot's configure script?

Hello,
I wanted to test Qt compilation, but I cannot even start it. The
configure scripts asks for lrelease-qt4, but the name of binary on mac
seems to be lrelease-4.7 (it probably depends on version).
On top of that it complains about
No package 'QtCore' found
No package 'QtGui' found
No package 'QtNetwork' found
No package 'QtSvg' found
However I have
/Library/Frameworks/QtCore.framework
and others present and I'm able to compile and use other Qt applications.
Mojca

On Tuesday, May 24, 2011 12:52:52 am Christoph Bersch wrote:
> On 24.05.2011 05:35, sfeam (Ethan Merritt) wrote:
> > Christoph Bersch<usenet@...> wrote>
> >>
> >> is there a specific reason why the 'transparent' terminal option
applies
> >> only to the pngcairo and not to the pdfcairo terminal?
> >>
> >
> > So far as I know, pdf and PostScript files have no "background" to be
> > set transparent or opaque.
>
> Correct, but in cairo.trm the background of the pdf is explicitely set
> to white. This can be avoided with the 'transparent' option which is
> used only by the pngcairo terminal
You are right.
The default for PDF output should be transparent. It already accepts an
explicit background command to override this, but I'll add your patch to
allow the {no}transparent keywords also.
cheers,
Ethan

On 24.05.2011 05:35, sfeam (Ethan Merritt) wrote:
> Christoph Bersch<usenet@...> wrote>
>>
>> is there a specific reason why the 'transparent' terminal option applies
>> only to the pngcairo and not to the pdfcairo terminal?
>>
>
> So far as I know, pdf and PostScript files have no "background" to be
> set transparent or opaque.
Correct, but in cairo.trm the background of the pdf is explicitely set
to white. This can be avoided with the 'transparent' option which is
used only by the pngcairo terminal
> Do you actually see a difference in the output file after your patch?
Yes.
I came to this problem when I tried to combine the epslatex and the
pdfcairo terminal. I could not use the pdf file unless all text was set
to front.
> Using what tool to view it?
To show the problem, consider the following code:
set terminal pdfcairo transparent
set output 'test-inc.pdf'
set tics lc rgbcolor 'white'
plot sin(x) lw 4
\documentclass{minimal}
\usepackage{graphicx}
\begin{document}
\rule{3cm}{3cm}\hspace*{-3cm}\includegraphics{test-inc}
\end{document}
Without my patch, the file test-inc.pdf completely covers the black
square. Only with my patch the background of the pdf file is transparent
to show the black square behind.
Christoph

On 05/24/11 02:08, Ethan Merritt wrote:
> On Monday, May 23, 2011 01:12:23 am plotter@... wrote:
>>> if there is an alternative approach for the case when x
>>> uncertainty can't be ignored, keep the discussion going and maybe we can
>>> add such a feature to the list of items we'd like to add in the future.
>>>
>>> Dan
>>
>> Yes , I would love to propose that. Some kind of total least squares may
>> be an option I don't know enough about that to make a concrete proposal.
>
> It is standard in my field to use instead a maximum likelihood residual
> that allows for separate error distributions on x and y. The maximum
> likelihood treatment reduces to a weighted least squares treatment
> if and only if the distribution of errors on both x and y are
> (1) Gaussian and (2) of equal magnitude.
>
> Gnuplot allows for input of precalculated non-uniform weights on y,
> which partially addresses (2). But there is no provision for non-Gaussian
> errors on y, and no provision for any error model at all on x.
>
> If the errors in your data do not follow the simple Gaussian model,
> then almost certainly you would be better off using maximum likelihood
> rather than least-squares. But in order to do so you need first a model
> for the error distributions. That may be obvious for any particular
> experiment or source of data, but it's difficult to impossible for a
> general-purpose program to figure it out for you. You're the one who
> knows the source of the data, so it's up to you to provide an appropriate
> explicit error model along with the data.
>
> The point is that switching to a better minimization residual is more
> than just a matter of changing the internal code. It would require a
> more complex description of your data that includes error models for
> both the independent and dependent variables.
>
> NB: When I say "x", I really mean the full set of independent variables
> [x1,x2,x3,...]. "y" is the single dependent variable estimated by
> f(x1,x2,x3,...)
>
> Ethan
>
Thanks for all that detail Ethan.
It seems likely that this may be the job for external processing by some
stats capable package rather than a job for a plotting tool.
That brings us back to the original point of this thread : the idea of
putting a specific warning about the limitations of least squares
techniques used by gnuplot into the help text for "fit".
I won't bore everyone with anecdotes but I never cease to be amazed by
the widespread ignorance of this issue , even at PhD level.
For the cost of a couple of lines it would be good if at least gnuplot
user were made aware of it.
best regards.Peter.

Christoph Bersch <usenet@...> wrote>
>
> is there a specific reason why the 'transparent' terminal option applies
> only to the pngcairo and not to the pdfcairo terminal?
>
> Christoph
So far as I know, pdf and PostScript files have no "background" to be
set transparent or opaque. When you print the file, the color of the paper
it is printed on becomes the background color.
Do you actually see a difference in the output file after your patch?
Using what tool to view it?
Ethan

On Monday, May 23, 2011 01:12:23 am plotter@... wrote:
> > if there is an alternative approach for the case when x
> > uncertainty can't be ignored, keep the discussion going and maybe we can
> > add such a feature to the list of items we'd like to add in the future.
> >
> > Dan
>
> Yes , I would love to propose that. Some kind of total least squares may
> be an option I don't know enough about that to make a concrete proposal.
It is standard in my field to use instead a maximum likelihood residual
that allows for separate error distributions on x and y. The maximum
likelihood treatment reduces to a weighted least squares treatment
if and only if the distribution of errors on both x and y are
(1) Gaussian and (2) of equal magnitude.
Gnuplot allows for input of precalculated non-uniform weights on y,
which partially addresses (2). But there is no provision for non-Gaussian
errors on y, and no provision for any error model at all on x.
If the errors in your data do not follow the simple Gaussian model,
then almost certainly you would be better off using maximum likelihood
rather than least-squares. But in order to do so you need first a model
for the error distributions. That may be obvious for any particular
experiment or source of data, but it's difficult to impossible for a
general-purpose program to figure it out for you. You're the one who
knows the source of the data, so it's up to you to provide an appropriate
explicit error model along with the data.
The point is that switching to a better minimization residual is more
than just a matter of changing the internal code. It would require a
more complex description of your data that includes error models for
both the independent and dependent variables.
NB: When I say "x", I really mean the full set of independent variables
[x1,x2,x3,...]. "y" is the single dependent variable estimated by
f(x1,x2,x3,...)
Ethan
--
Ethan A Merritt
Biomolecular Structure Center, K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Hi,
is there a specific reason why the 'transparent' terminal option applies
only to the pngcairo and not to the pdfcairo terminal?
I removed the two respective lines in term/cairo.trm and it worked well
for me:
--- gnuplot-cvs/term/cairo.trm 2011-05-23 13:48:46.000000000 +0200
+++ gnuplot/term/cairo.trm 2011-05-23 14:00:16.000000000 +0200
@@ -299,13 +299,11 @@
break;
case CAIROTRM_TRANSPARENT:
c_token++;
- if (!strcmp(term->name,"pngcairo"))
- cairo_params->transparent = TRUE;
+ cairo_params->transparent = TRUE;
break;
case CAIROTRM_NOTRANSPARENT:
c_token++;
- if (!strcmp(term->name,"pngcairo"))
- cairo_params->transparent = FALSE;
+ cairo_params->transparent = FALSE;
break;
case CAIROTRM_CROP:
c_token++;
I only tested it with
set terminal pdfcairo transparent
set output 'test-transparency.pdf'
plot sin(x)
But because there is an explicit test for the pngcairo terminal, I
suppose there was a good reason for this?
Christoph

On 05/23/11 02:55, Daniel J Sebald wrote:
> On 05/22/2011 01:14 PM, plotter@... wrote:
>> On 05/21/11 23:04, Daniel J Sebald wrote:
>>> On 05/21/2011 03:15 AM, plotter@... wrote:
>>>> Hi,
>>>>
>>>> 'help fit' reports that the fit command uses Levenberg–Marquardt algo to
>>>> do the fit.
>>>>
>>>> I think this raises an important question that very often over-looked by
>>>> many users of least-squares techniques even at maths PhD level.
>>>>
>>>> Such techniques often only optimised the y error rather than the
>>>> perpendicular error from the line. This is implicitly assuming y
>>>> uncertainty>> x uncertainty. While this condition is often satisfied
>>>> in a controlled experiment there are many situations where this is not
>>>> applicable and gets totally overlooked.
>>>>
>>>> A common case is scatter plots which are frequently used to seek a
>>>> relations between two quantities , each with significant errors /
>>>> uncertainties.
>>>>
>>>> In this situation the fitted line is "wrong". In fact it's the
>>>> application that is wrong , hence the wrong result. This may or may not
>>>> be apparent to the eye.
>>>>
>>>> I have seen this happen so many times (including once in a PhD thesis
>>>> report!) that I think it needs a serious health warning in the doc.
>>>>
>>>> "Warning: using least-squares inappropriately can seriously damage your
>>>> reputation". ;)
>>>>
>>>> Firstly , could you confirm the basis on which this algo is applied in
>>>> gnuplot? Does it only optimise vertical y residuals?
>>>
>>> Often it is an assumption that the independent variables are exact
>>> measurements. Not true, typically, but if the variance is small and
>>> homoscedastic, the two can probably be lumped together. I.e., we are
>>> searching for a relationship:
>>>
>>> Y = f(X + eps1) + eps2
>>> ~= f(X) + C eps1 + residual + eps2
>>> ~= f(X) + (C eps1 + eps2)
>>>
>>> where hopefully the residual due to nonlinearity of the relationship is
>>> small compared to other randomness. It's up to the user's judgment and
>>> knowledge of the application to determine that.
>>>
>>> Anyway, your point is true of most software packages: details are so
>>> often lacking. That's why it would be nice to have a set of white
>>> papers to go along with the software so that people know exactly what
>>> the algorithm is, both for the benefit of the user and other developers.
>>> Most of the time it is "here's a hunk of code, use it at your own risk".
>>>
>>> Dan
>>>
>>
>> Thanks, I've read up on that algo and clearly this is only doing NLLS on
>> y residuals.
>>
>> So my suggestion is that this is made abundantly clear in the help text.
>> I'm not suggesting your "while paper" but just some comment to the
>> effect that 'fit' will not give correct results if there are non
>> negligible errors in x values.
>>
>> It absolutely amazes me how few people realise this , even highly
>> qualified ones, so this is not some pedantic nicety.
>>
>> Most people seem to think once they've heard of doing a least squares
>> fit that's all there is to it and it's some magical formula that works
>> for all cases.
>>
>> I suggest modifying the first paragraph of help fit with something like
>> the following:
>>
>> >>
>> The `fit` command can fit a user-supplied expression to a set of data
>> points
>> (x,z) or (x,y,z), using an implementation of the nonlinear least-squares
>> (NLLS) Marquardt-Levenberg algorithm. Any user-defined variable
>> occurring in
>> the expression may serve as a fit parameter, but the return type of the
>> expression must be real.
>> >>
>>
>> new
>> >>
>> The `fit` command can fit a user-supplied expression to a set of data
>> points
>> (x,z) or (x,y,z), using an implementation of the non-linear least-squares
>> (NLLS) Marquardt-Levenberg algorithm. This algorithm optimises y
>> residuals only and
>> carries the implicit assumption that error/uncertainties in x are
>> negligible. If that is not the case the fit may succeed but will give
>> wrong results.
>
> I'm OK with the change, but for a few things.
>
> First, you stated 'optimizes y' and 'uncertainties in x', but please
> check that this precisely describes the algorithm. The beginning of the
> paragraph lists (x, z) or (x, y, z). The first expression has no 'y',
> so what is optimized in that case? The second expression has (x, y, z),
> so is it only the 'x' in that case that is assumed exact? Or is it both
> 'x' and 'y'? Perhaps that sentence should be, "This algorithm optimises
> z residuals only and carries the implicit assumption that
> error/uncertainties in x (and y) are negligible."
very good point , I was trying to keep it brief (and perhaps dumb it
down) but it should correctly refer to dependent and independent
variables. I was concerned that may mean the warning was lost on many
users. Perhaps "independant variable (eg. x axis)" or similar would be
better.
>
> Second, I would hold off using the statement "wrong results" and maybe
> use "a poor fit". Saying the result is wrong means the algorithm is
> broken, but it just provides numbers. The user is using the wrong tool.
> Also, "wrong" means a bad fit in this context and one can get a bad
> fit and misinterpret results even if uncertainty in x is small.
No, I think wrong result is exactly the case. Wrong result does not mean
the algo is "broken" it means the wrong tool was used and the result
obtained was wrong. The result is not "poor" it is wrong. Watering down
the language does not shift the blame.
Maybe some wording like "wrong technique" could help underline the cause
of the problem but I think "wrong" definitely needs to be in there.
There's no hedging around the fact, it can be considerably wrong.
>
> Third, if there is an alternative approach for the case when x
> uncertainty can't be ignored, keep the discussion going and maybe we can
> add such a feature to the list of items we'd like to add in the future.
>
> Dan
>
Yes , I would love to propose that. Some kind of total least squares may
be an option I don't know enough about that to make a concrete proposal.
I suspect it may be too variable to include as a turn key option like
fit but I would really like that option if it is possible.
thanks for your thoughtful comments.

On 05/22/2011 01:14 PM, plotter@... wrote:
> On 05/21/11 23:04, Daniel J Sebald wrote:
>> On 05/21/2011 03:15 AM, plotter@... wrote:
>>> Hi,
>>>
>>> 'help fit' reports that the fit command uses Levenberg–Marquardt algo to
>>> do the fit.
>>>
>>> I think this raises an important question that very often over-looked by
>>> many users of least-squares techniques even at maths PhD level.
>>>
>>> Such techniques often only optimised the y error rather than the
>>> perpendicular error from the line. This is implicitly assuming y
>>> uncertainty>> x uncertainty. While this condition is often satisfied
>>> in a controlled experiment there are many situations where this is not
>>> applicable and gets totally overlooked.
>>>
>>> A common case is scatter plots which are frequently used to seek a
>>> relations between two quantities , each with significant errors /
>>> uncertainties.
>>>
>>> In this situation the fitted line is "wrong". In fact it's the
>>> application that is wrong , hence the wrong result. This may or may not
>>> be apparent to the eye.
>>>
>>> I have seen this happen so many times (including once in a PhD thesis
>>> report!) that I think it needs a serious health warning in the doc.
>>>
>>> "Warning: using least-squares inappropriately can seriously damage your
>>> reputation". ;)
>>>
>>> Firstly , could you confirm the basis on which this algo is applied in
>>> gnuplot? Does it only optimise vertical y residuals?
>>
>> Often it is an assumption that the independent variables are exact
>> measurements. Not true, typically, but if the variance is small and
>> homoscedastic, the two can probably be lumped together. I.e., we are
>> searching for a relationship:
>>
>> Y = f(X + eps1) + eps2
>> ~= f(X) + C eps1 + residual + eps2
>> ~= f(X) + (C eps1 + eps2)
>>
>> where hopefully the residual due to nonlinearity of the relationship is
>> small compared to other randomness. It's up to the user's judgment and
>> knowledge of the application to determine that.
>>
>> Anyway, your point is true of most software packages: details are so
>> often lacking. That's why it would be nice to have a set of white
>> papers to go along with the software so that people know exactly what
>> the algorithm is, both for the benefit of the user and other developers.
>> Most of the time it is "here's a hunk of code, use it at your own risk".
>>
>> Dan
>>
>
> Thanks, I've read up on that algo and clearly this is only doing NLLS on
> y residuals.
>
> So my suggestion is that this is made abundantly clear in the help text.
> I'm not suggesting your "while paper" but just some comment to the
> effect that 'fit' will not give correct results if there are non
> negligible errors in x values.
>
> It absolutely amazes me how few people realise this , even highly
> qualified ones, so this is not some pedantic nicety.
>
> Most people seem to think once they've heard of doing a least squares
> fit that's all there is to it and it's some magical formula that works
> for all cases.
>
> I suggest modifying the first paragraph of help fit with something like
> the following:
>
> >>
> The `fit` command can fit a user-supplied expression to a set of data
> points
> (x,z) or (x,y,z), using an implementation of the nonlinear least-squares
> (NLLS) Marquardt-Levenberg algorithm. Any user-defined variable
> occurring in
> the expression may serve as a fit parameter, but the return type of the
> expression must be real.
> >>
>
> new
> >>
> The `fit` command can fit a user-supplied expression to a set of data
> points
> (x,z) or (x,y,z), using an implementation of the non-linear least-squares
> (NLLS) Marquardt-Levenberg algorithm. This algorithm optimises y
> residuals only and
> carries the implicit assumption that error/uncertainties in x are
> negligible. If that is not the case the fit may succeed but will give
> wrong results.
I'm OK with the change, but for a few things.
First, you stated 'optimizes y' and 'uncertainties in x', but please
check that this precisely describes the algorithm. The beginning of the
paragraph lists (x, z) or (x, y, z). The first expression has no 'y',
so what is optimized in that case? The second expression has (x, y, z),
so is it only the 'x' in that case that is assumed exact? Or is it both
'x' and 'y'? Perhaps that sentence should be, "This algorithm optimises
z residuals only and carries the implicit assumption that
error/uncertainties in x (and y) are negligible."
Second, I would hold off using the statement "wrong results" and maybe
use "a poor fit". Saying the result is wrong means the algorithm is
broken, but it just provides numbers. The user is using the wrong tool.
Also, "wrong" means a bad fit in this context and one can get a bad
fit and misinterpret results even if uncertainty in x is small.
Third, if there is an alternative approach for the case when x
uncertainty can't be ignored, keep the discussion going and maybe we can
add such a feature to the list of items we'd like to add in the future.
Dan

On 05/21/11 23:04, Daniel J Sebald wrote:
> On 05/21/2011 03:15 AM, plotter@... wrote:
>> Hi,
>>
>> 'help fit' reports that the fit command uses Levenberg–Marquardt algo to
>> do the fit.
>>
>> I think this raises an important question that very often over-looked by
>> many users of least-squares techniques even at maths PhD level.
>>
>> Such techniques often only optimised the y error rather than the
>> perpendicular error from the line. This is implicitly assuming y
>> uncertainty>> x uncertainty. While this condition is often satisfied
>> in a controlled experiment there are many situations where this is not
>> applicable and gets totally overlooked.
>>
>> A common case is scatter plots which are frequently used to seek a
>> relations between two quantities , each with significant errors /
>> uncertainties.
>>
>> In this situation the fitted line is "wrong". In fact it's the
>> application that is wrong , hence the wrong result. This may or may not
>> be apparent to the eye.
>>
>> I have seen this happen so many times (including once in a PhD thesis
>> report!) that I think it needs a serious health warning in the doc.
>>
>> "Warning: using least-squares inappropriately can seriously damage your
>> reputation". ;)
>>
>> Firstly , could you confirm the basis on which this algo is applied in
>> gnuplot? Does it only optimise vertical y residuals?
>
> Often it is an assumption that the independent variables are exact
> measurements. Not true, typically, but if the variance is small and
> homoscedastic, the two can probably be lumped together. I.e., we are
> searching for a relationship:
>
> Y = f(X + eps1) + eps2
> ~= f(X) + C eps1 + residual + eps2
> ~= f(X) + (C eps1 + eps2)
>
> where hopefully the residual due to nonlinearity of the relationship is
> small compared to other randomness. It's up to the user's judgment and
> knowledge of the application to determine that.
>
> Anyway, your point is true of most software packages: details are so
> often lacking. That's why it would be nice to have a set of white
> papers to go along with the software so that people know exactly what
> the algorithm is, both for the benefit of the user and other developers.
> Most of the time it is "here's a hunk of code, use it at your own risk".
>
> Dan
>
Thanks, I've read up on that algo and clearly this is only doing NLLS on
y residuals.
So my suggestion is that this is made abundantly clear in the help text.
I'm not suggesting your "while paper" but just some comment to the
effect that 'fit' will not give correct results if there are non
negligible errors in x values.
It absolutely amazes me how few people realise this , even highly
qualified ones, so this is not some pedantic nicety.
Most people seem to think once they've heard of doing a least squares
fit that's all there is to it and it's some magical formula that works
for all cases.
I suggest modifying the first paragraph of help fit with something like
the following:
>>
The `fit` command can fit a user-supplied expression to a set of data
points
(x,z) or (x,y,z), using an implementation of the nonlinear least-squares
(NLLS) Marquardt-Levenberg algorithm. Any user-defined variable
occurring in
the expression may serve as a fit parameter, but the return type of the
expression must be real.
>>
new
>>
The `fit` command can fit a user-supplied expression to a set of data
points
(x,z) or (x,y,z), using an implementation of the non-linear least-squares
(NLLS) Marquardt-Levenberg algorithm. This algorithm optimises y
residuals only and
carries the implicit assumption that error/uncertainties in x are
negligible. If that is not the case the fit may succeed but will give
wrong results.
Any user-defined variable occurring in
the expression may serve as a fit parameter, but the return type of the
expression must be real.
>>
regards, Peter.

On 05/21/2011 03:15 AM, plotter@... wrote:
> Hi,
>
> 'help fit' reports that the fit command uses Levenberg–Marquardt algo to
> do the fit.
>
> I think this raises an important question that very often over-looked by
> many users of least-squares techniques even at maths PhD level.
>
> Such techniques often only optimised the y error rather than the
> perpendicular error from the line. This is implicitly assuming y
> uncertainty>> x uncertainty. While this condition is often satisfied
> in a controlled experiment there are many situations where this is not
> applicable and gets totally overlooked.
>
> A common case is scatter plots which are frequently used to seek a
> relations between two quantities , each with significant errors /
> uncertainties.
>
> In this situation the fitted line is "wrong". In fact it's the
> application that is wrong , hence the wrong result. This may or may not
> be apparent to the eye.
>
> I have seen this happen so many times (including once in a PhD thesis
> report!) that I think it needs a serious health warning in the doc.
>
> "Warning: using least-squares inappropriately can seriously damage your
> reputation". ;)
>
> Firstly , could you confirm the basis on which this algo is applied in
> gnuplot? Does it only optimise vertical y residuals?
Often it is an assumption that the independent variables are exact
measurements. Not true, typically, but if the variance is small and
homoscedastic, the two can probably be lumped together. I.e., we are
searching for a relationship:
Y = f(X + eps1) + eps2
~= f(X) + C eps1 + residual + eps2
~= f(X) + (C eps1 + eps2)
where hopefully the residual due to nonlinearity of the relationship is
small compared to other randomness. It's up to the user's judgment and
knowledge of the application to determine that.
Anyway, your point is true of most software packages: details are so
often lacking. That's why it would be nice to have a set of white
papers to go along with the software so that people know exactly what
the algorithm is, both for the benefit of the user and other developers.
Most of the time it is "here's a hunk of code, use it at your own risk".
Dan

Hi,
'help fit' reports that the fit command uses Levenberg–Marquardt algo to
do the fit.
I think this raises an important question that very often over-looked by
many users of least-squares techniques even at maths PhD level.
Such techniques often only optimised the y error rather than the
perpendicular error from the line. This is implicitly assuming y
uncertainty >> x uncertainty. While this condition is often satisfied
in a controlled experiment there are many situations where this is not
applicable and gets totally overlooked.
A common case is scatter plots which are frequently used to seek a
relations between two quantities , each with significant errors /
uncertainties.
In this situation the fitted line is "wrong". In fact it's the
application that is wrong , hence the wrong result. This may or may not
be apparent to the eye.
I have seen this happen so many times (including once in a PhD thesis
report!) that I think it needs a serious health warning in the doc.
"Warning: using least-squares inappropriately can seriously damage your
reputation". ;)
Firstly , could you confirm the basis on which this algo is applied in
gnuplot? Does it only optimise vertical y residuals?
Thanks, Peter.

On 14.05.2011 13:06, Petr Mikulik wrote:
> A colleague has noticed wrong parameter values after 2D fitting via
> fit f(x,y) 'data.txt' us 1:2:3 via a,b,c
>
> Gnuplot was fitting correctly after adding the dummy weight column as
> "help fit" indicates:
> fit f(x,y) 'data.txt' us 1:2:3:(1) via a,b,c
>
> I think gnuplot should write an error of not having enough columns or
> missing weights column in the former command or it should fill weights by
> one if the 4th column is missing.
That idea is based on a misconception. The problem at hand is not that
gnuplot didn't fill in the weights --- it's that the user incorrectly
believed the first command given above were a 2-variable fit (somewhat
incorrectly referred to as a "2D" one), which it's not. That's a
perfectly valid command to perform a 1-variable fit (assuming a variable
'y' was defined before) that, and gnuplot has no way of knowing that's
not what the user wanted.
I.e. there is no "missing" column specification here, so nothing to
diagnose.

A colleague has noticed wrong parameter values after 2D fitting via
fit f(x,y) 'data.txt' us 1:2:3 via a,b,c
Gnuplot was fitting correctly after adding the dummy weight column as
"help fit" indicates:
fit f(x,y) 'data.txt' us 1:2:3:(1) via a,b,c
I think gnuplot should write an error of not having enough columns or
missing weights column in the former command or it should fill weights by
one if the 4th column is missing.
---
PM

Am 13.05.2011 14:35, schrieb plotter@...:
>>> While the subject is being raised , may I suggest a couple of possible
>>> feature improvements in this area?
>>>
>>> 1) This cycling of "mode" could also be done by clicking on the status
>>> message (click is mapped to "1" key). This would avoid switching mouse
>>> to keyboard which can be handy.
>>
>> Could be done, certainly, but there is currently no code to interpret the
>> mouse location in terms like "on the status message".
>
> Well there is click on graph area (pause mouse etc) so I would have
> thought x_bottom_graph and x_bottom_window would be close to what is
> needed.
Mouse events outside the graph area don't get passed on to the gnuplot
kernel. It would be straight forward to translate those events to
keyboard events, though. A trivial patch for Windows is attached.
Bastian

On 05/13/11 05:27, sfeam (Ethan Merritt) wrote:
> On Thursday, 12 May 2011, plotter@... wrote:
>>
>> I had tried the h key help but had not appreciated what those variables
>> referred to:
>>
>> 1 `builtin-decrement-mousemode`
>> 2 `builtin-increment-mousemode`
>>
>> Maybe those terms could be more explicit, it's not clear what they refer
>> to.
>>
>> The content of the various modes is equally a bit arcane and the help
>> doesn't:
>>
>> >> This corresponds to the key
>> >> bindings '1', '2', '3', '4' (see the driver's documentation)
>>
>> I find no detail for example in help wxt. What does this comment refer to?
>
> I was surprised to find that the documentation is seriously out of date,
> as in "doesn't describe the actual behavior of any gnuplot version since
> before the start of the CVS repository". I have now updated the 4.5
> documentation a bit.
OK, thanks, glad you caught that one.
>
>> While the subject is being raised , may I suggest a couple of possible
>> feature improvements in this area?
>>
>> 1) This cycling of "mode" could also be done by clicking on the status
>> message (click is mapped to "1" key). This would avoid switching mouse
>> to keyboard which can be handy.
>
> Could be done, certainly, but there is currently no code to interpret the
> mouse location in terms like "on the status message".
Well there is click on graph area (pause mouse etc) so I would have
thought x_bottom_graph and x_bottom_window would be close to what is
needed.
>
>> 2) As I understand it this only shows x1y1 coords. It would clearly be
>> useful when using x2y2 as well if those were available in this read-out.
>> One way to trigger this would be a click on the legend entry for a
>> line or on/near the relevant axis. (some thought needed to avoid
>> ambiguity with the new SVG toggle , pause mouse or other functions. )
>
> Huh. For me the x2y2 axes are echoed properly if they are active in the
> current plot. Tested on wxt, x11, and canvas. It is true that I ignored
> x2y2 in the very recent svg mouse-tracking code. What terminal are you
> using that fails to show them?
Sorry , my incorrect recollection. It most likely was in relation to the
SVG changes that I'd retained that idea.
>
> Ethan
>

On Thursday, 12 May 2011, plotter@... wrote:
>
> I had tried the h key help but had not appreciated what those variables
> referred to:
>
> 1 `builtin-decrement-mousemode`
> 2 `builtin-increment-mousemode`
>
> Maybe those terms could be more explicit, it's not clear what they refer
> to.
>
> The content of the various modes is equally a bit arcane and the help
> doesn't:
>
> >> This corresponds to the key
> >> bindings '1', '2', '3', '4' (see the driver's documentation)
>
> I find no detail for example in help wxt. What does this comment refer to?
I was surprised to find that the documentation is seriously out of date,
as in "doesn't describe the actual behavior of any gnuplot version since
before the start of the CVS repository". I have now updated the 4.5
documentation a bit.
> While the subject is being raised , may I suggest a couple of possible
> feature improvements in this area?
>
> 1) This cycling of "mode" could also be done by clicking on the status
> message (click is mapped to "1" key). This would avoid switching mouse
> to keyboard which can be handy.
Could be done, certainly, but there is currently no code to interpret the
mouse location in terms like "on the status message".
> 2) As I understand it this only shows x1y1 coords. It would clearly be
> useful when using x2y2 as well if those were available in this read-out.
> One way to trigger this would be a click on the legend entry for a
> line or on/near the relevant axis. (some thought needed to avoid
> ambiguity with the new SVG toggle , pause mouse or other functions. )
Huh. For me the x2y2 axes are echoed properly if they are active in the
current plot. Tested on wxt, x11, and canvas. It is true that I ignored
x2y2 in the very recent svg mouse-tracking code. What terminal are you
using that fails to show them?
Ethan

On 05/12/11 17:13, sfeam (Ethan Merritt) wrote:
> On Wednesday, 11 May 2011, plotter@... wrote:
>> Hi,
>>
>> I work quite a lot with time data (years and month scale) . Although
>> gnuplot is very flexible in reading time data and labelling x axis the
>> mouse cursor seems a bit challenged.
>>
>> If I have "%Y %m" , having 6.004538e+09 as mouse coord is really little
>> more than useless.
>>
>> Since the scaling is available for axis labels why isn't this applied to
>> the mouse coords? It would seem to be very simple and I can't see the
>> utility in the current date in seconds.
>>
>> Is this just an oversight?
>
> Mouse coordinate readout is available in 5 or 6 different formats,
> including user-specified. See "help mouseformat" and the help
> message printed by typing "h" in the plot window.
> In general you can cycle through the different options using the
> characters "1" and "2" as hotkeys.
>
>
Hi Ethan,
thanks for that. It appears that there is not mouseformat help but I did
find it under mouse.
gnuplot> help mouseformat
Sorry, no help for 'mouseformat'
gnuplot> help mouse
I had tried the h key help but had not appreciated what those variables
referred to:
1 `builtin-decrement-mousemode`
2 `builtin-increment-mousemode`
Maybe those terms could be more explicit, it's not clear what they refer
to.
The content of the various modes is equally a bit arcane and the help
doesn't:
>> This corresponds to the key
>> bindings '1', '2', '3', '4' (see the driver's documentation)
I find no detail for example in help wxt. What does this comment refer to?
While the subject is being raised , may I suggest a couple of possible
feature improvements in this area?
1) This cycling of "mode" could also be done by clicking on the status
message (click is mapped to "1" key). This would avoid switching mouse
to keyboard which can be handy.
2) As I understand it this only shows x1y1 coords. It would clearly be
useful when using x2y2 as well if those were available in this read-out.
One way to trigger this would be a click on the legend entry for a
line or on/near the relevant axis. (some thought needed to avoid
ambiguity with the new SVG toggle , pause mouse or other functions. )
Not priority issues but hopefully useful.
best regards, Peter.

On Wednesday, 11 May 2011, plotter@... wrote:
> Hi,
>
> I work quite a lot with time data (years and month scale) . Although
> gnuplot is very flexible in reading time data and labelling x axis the
> mouse cursor seems a bit challenged.
>
> If I have "%Y %m" , having 6.004538e+09 as mouse coord is really little
> more than useless.
>
> Since the scaling is available for axis labels why isn't this applied to
> the mouse coords? It would seem to be very simple and I can't see the
> utility in the current date in seconds.
>
> Is this just an oversight?
Mouse coordinate readout is available in 5 or 6 different formats,
including user-specified. See "help mouseformat" and the help
message printed by typing "h" in the plot window.
In general you can cycle through the different options using the
characters "1" and "2" as hotkeys.

Hi,
I work quite a lot with time data (years and month scale) . Although
gnuplot is very flexible in reading time data and labelling x axis the
mouse cursor seems a bit challenged.
If I have "%Y %m" , having 6.004538e+09 as mouse coord is really little
more than useless.
Since the scaling is available for axis labels why isn't this applied to
the mouse coords? It would seem to be very simple and I can't see the
utility in the current date in seconds.
Is this just an oversight?
regards, Peter.

On 2011-05-06, at 3:56 PM, Daniel J Sebald wrote:
> On 05/06/2011 04:10 PM, Thomas Mattison wrote:
>>
>> 2. Gnuplot will indeed refuse to fit a constant function to a single
>> data point, or a line to two points. Yes, I hadn't done the
>> experiment. Sorry.
>>
>> However I regard item 2 as a bug not a feature. It is perfectly well
>> defined to ask the question, what is the chisquare for agreement
>> between data and model, as a function of the parameter values, for
>> these cases. There is a perfectly well defined answer to that
>> question. There is a perfectly well defined point of minimum
>> chisquare, which gives the best fit parameters. And the curvature of
>> the chisquare curve or surface gives the fit errors. Nothing
>> anomalous happens in the case where the number of data points equals
>> the number of parameters.
>
> It seems like it should.
But it doesn't, from a mathematical point of view.
> From H.B.B.'s last post, if there are only two points and the curve to fit is a line (two parameters, first order equation**) then the fit is exact in the sense that the residuals are zero...if no other information is given about the data. But that doesn't say anything about the measurement "error" in the fit. (I sort of prefer the word "uncertainty", but that's just me.) We can only glean information about the uncertainty of the fit (i.e., the statistics of the original data) if there are additional data points beyond the number of parameters.
You are right that there need to be more measurements than parameters in order to use the chisquare/degree-of-freedom to infer something about the errors of the fit INPUTS from the quality of the fit.
But in many cases, the user ALREADY KNOWS what the input data measurement errors are. He knows the precision of the markings on his ruler, or how many decimal places his voltmeter has, or square-root-of-N for the bin contents of a histogram, etc [I am not discussing "systematic" errors here, like is the calibration of your voltmeter correct].
In those cases, the "raw" fit errors are the appropriate ones, and the "rescaled" errors are less appropriate.
>
> OK, so I'm beginning to see there is this vital piece of information about the quality of fit, reflecting the underlying statistics or the uncertainty in the data (i.e., how noisy it is). That's what is meant by error, right?
The way I would phrase it is, the noise in the input data propagates to noise in the fit parameters. The "standard errors" of the fit parameters are the standard deviation of the input data (assumed to be known) propagated to the standard deviation of the fit parameters.
>
>> I hope we can all agree that when fitting a line to two data points
>> that have errors, the slope and intercept DO have errors.
>
> Yes, but I don't see how one can estimate the size of that error without additional data beyond two data points, unless some information is known apriori.
>
Sure, but when the user has supplied meaningful y-errors on his data, we have precisely the a-priori informaton that your intuition is telling you is needed.
> But I'm surprised that no well defined, commonly understood terminology has developed for what seems like an important idea in the numerical analysis community. Bastian points out that:
>
> * Minuit does not scale
> * Origin offers both options, the default being version dependent
> * SAS and Mathematica default to scaling of errors
>
> That inconsistency is the problem.
To me, that looks like most programs give you a choice. Mathematica may default to scaling, but it looks like it's optional. I don't have much knowledge about SAS, but I would expect a big package like that to have an option buried somewhere.
Minuit comes from the physics community, nuclear and particle physics in particular. In physics, we tend to have pretty good a-priori estimates for the errors of the measurements going into a fit. We use the chisquare/DOF as a measure of the (statistical) fit quality. Physicists would seldom rescale fit parameter errors by sqrt(chisq/DOF). And since Minuit is normally used in an expert-programming context instead of an ignorant-GUI-user context, the authors would expect the user to be able multiply by sqrt(chisq/DOF) himself.
In bioscience, and even more so in medical and social science, there's often next to nothing known about the errors of the input measurements a-priori. In such cases, there's not much the user can do except set all the errors equal (or not tell gnuplot any errors, which will have the same result). Then the "raw" errors from the fit aren't very meaningful, but the "rescaled" errors have at least some utility. So defaulting to rescaled makes some sence.
My guess would be that there are more gnuplot fit users who don't know much about their input data errors than those who know a great deal about their input data errors. So if I were forced to make a choice of only one error to present, I'd probably present the rescaled error. It will be the "right thing" for users who don't give errors or give uniform and arbitrary errors. And for the people who supply accurate errors, if the model really reflects the data, then chisq/DOF will be close to 1, and the "rescaled" errors will not be very different, very often, from the "raw" errors.
But why not give both?
Cheers
Prof. Thomas Mattison Hennings 276
University of British Columbia Dept. of Physics and Astronomy
6224 Agricultural Road Vancouver BC V6T 1Z1 CANADA
mattison@... phone: 604-822-9690 fax:604-822-5324

On 05/06/2011 04:10 PM, Thomas Mattison wrote:
> I stand corrected about two things:
>
> 1. Gnuplot fits have always done the "right" thing and divided
> chisquare by degrees of freedom rather than measurements. Sorry, I
> haven't looked at what gets printed out in a while; I mostly use my
> own fitting programs these days.
>
> 2. Gnuplot will indeed refuse to fit a constant function to a single
> data point, or a line to two points. Yes, I hadn't done the
> experiment. Sorry.
>
> However I regard item 2 as a bug not a feature. It is perfectly well
> defined to ask the question, what is the chisquare for agreement
> between data and model, as a function of the parameter values, for
> these cases. There is a perfectly well defined answer to that
> question. There is a perfectly well defined point of minimum
> chisquare, which gives the best fit parameters. And the curvature of
> the chisquare curve or surface gives the fit errors. Nothing
> anomalous happens in the case where the number of data points equals
> the number of parameters.
It seems like it should. From H.B.B.'s last post, if there are only two
points and the curve to fit is a line (two parameters, first order
equation**) then the fit is exact in the sense that the residuals are
zero...if no other information is given about the data. But that
doesn't say anything about the measurement "error" in the fit. (I sort
of prefer the word "uncertainty", but that's just me.) We can only
glean information about the uncertainty of the fit (i.e., the statistics
of the original data) if there are additional data points beyond the
number of parameters.
** Polynomial order of the fitting curve is important. For example,
exp() has infinite polynomial order, which has ramifications on uniqueness.
> I agree that in those two particular cases, it's not necessary to "do
> a fit" to find the parameters; it can easily be done by hand. But
> mathematically the fit is still well-defined.
OK, so I'm beginning to see there is this vital piece of information
about the quality of fit, reflecting the underlying statistics or the
uncertainty in the data (i.e., how noisy it is). That's what is meant
by error, right?
In this discussion
http://www.erikburd.org/projects/pitfall/evaluate.html
is a column for "standard error". Is that what is being referred to,
the standard error typically used in statistical analysis? (There are
the underlying actual measurement errors, but we can't know what those
are because there is uncertainty in the fit.)
> I hope we can all agree that when fitting a line to two data points
> that have errors, the slope and intercept DO have errors.
Yes, but I don't see how one can estimate the size of that error without
additional data beyond two data points, unless some information is known
apriori.
> Is it really so much to ask to have both raw and rescaled errors
> printed?
I'm fine with that idea. But I'm surprised that no well defined,
commonly understood terminology has developed for what seems like an
important idea in the numerical analysis community. Bastian points out
that:
* Minuit does not scale
* Origin offers both options, the default being version dependent
* SAS and Mathematica default to scaling of errors
That inconsistency is the problem.
Dan