Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

Jonathan Tammo Siebert
Hi,

I hope this is the appropriate place to ask something like
this, otherwise please let me know (or feel free to ignore
this). Also I hope that I do not misunderstood something or
did some silly mistake. If so, please let me know as well!

TLDR:
When scaling the covariance matrix based on the residuals,
scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
(with M=number of point, N=number of parameters) and
numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
is correct.

I am somewhat confused about different results I am getting
for the covariance matrix of a simple linear fit, when
comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
am aware, that `curve_fit` solves the more general non-linear
problem numerically, while `polyfit` finds an analytical
solution to the linear problem. However, both converge to the
same solution, so I suspect that this difference is not
important here. The difference, I am curious about is not in
the returned parameters but in the estimate of the
corresponding covariance matrix. As I understand, there are
two different ways to estimate it, based either on the
absolute values of the provided uncertainties or by
interpreting those only as weights and then scaling the
matrix to produce an appropriate reduced chisq. To that end,
curve_fit has the parameter:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur
ve_fit.html:
"absolute_sigma : bool, optional
If True, sigma is used in an absolute sense and the
estimated parameter covariance pcov reflects these absolute
values.
If False, only the relative magnitudes of the sigma values
matter. The returned parameter covariance matrix pcov is
based on scaling sigma by a constant factor. This constant
is set by demanding that the reduced chisq for the optimal
parameters popt when using the scaled sigma equals unity. In
other words, sigma is scaled to match the sample variance of
the residuals after the fit. Mathematically,
pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
chisq(popt)/(M-N)"
https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
on the other hand, does not say anything about how the
covariance matrix is estimated. To my understanding, its
default should correspond to `absolute_sigma=False` for
`curve_fit`. As `polyfit` has a weight parameter instead of
an uncertainty parameter, I guess the difference in default
behavior is not that surprising.
However, even when specifying `absolute_sigma=False`,
`curve_fit` and `polyfit` produce different covariance
matrices as the applied scaling factors are
chisq(popt)/(M-N-2) for `polyfit`
(https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82
21348b/numpy/lib/polynomial.py#L598-L605)
and chisq(popt)/(M-N) for `curve_fit`
(https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a
3ccd5b/scipy/optimize/minpack.py#L781-L782).
The argument given in a comment to the scaling `polyfit` is:
"Some literature ignores the extra -2.0 factor in the
denominator, but it is included here because the covariance
of Multivariate Student-T (which is implied by a Bayesian
uncertainty analysis) includes it. Plus, it gives a slightly
more conservative estimate of uncertainty.",
but honestly, in a quick search, I was not able to find any
literature not ignoring the extra "factor". But obviously, I
could very well be misunderstanding something.
Nonetheless, as `curve_fit` ignores it as well, I was
wondering whether those two shouldn't give consistent
results and if so, which would be the correct solution.


Best, 

Jonathan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

josef.pktd


On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert <[hidden email]> wrote:
Hi,

I hope this is the appropriate place to ask something like
this, otherwise please let me know (or feel free to ignore
this). Also I hope that I do not misunderstood something or
did some silly mistake. If so, please let me know as well!

TLDR:
When scaling the covariance matrix based on the residuals,
scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
(with M=number of point, N=number of parameters) and
numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
is correct.

I am somewhat confused about different results I am getting
for the covariance matrix of a simple linear fit, when
comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
am aware, that `curve_fit` solves the more general non-linear
problem numerically, while `polyfit` finds an analytical
solution to the linear problem. However, both converge to the
same solution, so I suspect that this difference is not
important here. The difference, I am curious about is not in
the returned parameters but in the estimate of the
corresponding covariance matrix. As I understand, there are
two different ways to estimate it, based either on the
absolute values of the provided uncertainties or by
interpreting those only as weights and then scaling the
matrix to produce an appropriate reduced chisq. To that end,
curve_fit has the parameter:
<a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur ve_fit.html" rel="noreferrer" target="_blank">https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur
ve_fit.html:
"absolute_sigma : bool, optional
If True, sigma is used in an absolute sense and the
estimated parameter covariance pcov reflects these absolute
values.
If False, only the relative magnitudes of the sigma values
matter. The returned parameter covariance matrix pcov is
based on scaling sigma by a constant factor. This constant
is set by demanding that the reduced chisq for the optimal
parameters popt when using the scaled sigma equals unity. In
other words, sigma is scaled to match the sample variance of
the residuals after the fit. Mathematically,
pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
chisq(popt)/(M-N)"
https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
on the other hand, does not say anything about how the
covariance matrix is estimated. To my understanding, its
default should correspond to `absolute_sigma=False` for
`curve_fit`. As `polyfit` has a weight parameter instead of
an uncertainty parameter, I guess the difference in default
behavior is not that surprising.
However, even when specifying `absolute_sigma=False`,
`curve_fit` and `polyfit` produce different covariance
matrices as the applied scaling factors are
chisq(popt)/(M-N-2) for `polyfit`
(<a href="https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82 21348b/numpy/lib/polynomial.py#L598-L605" rel="noreferrer" target="_blank">https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82
21348b/numpy/lib/polynomial.py#L598-L605)
and chisq(popt)/(M-N) for `curve_fit`
(<a href="https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a 3ccd5b/scipy/optimize/minpack.py#L781-L782" rel="noreferrer" target="_blank">https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a
3ccd5b/scipy/optimize/minpack.py#L781-L782).
The argument given in a comment to the scaling `polyfit` is:
"Some literature ignores the extra -2.0 factor in the
denominator, but it is included here because the covariance
of Multivariate Student-T (which is implied by a Bayesian
uncertainty analysis) includes it. Plus, it gives a slightly
more conservative estimate of uncertainty.",
but honestly, in a quick search, I was not able to find any
literature not ignoring the extra "factor". But obviously, I
could very well be misunderstanding something.
Nonetheless, as `curve_fit` ignores it as well, I was
wondering whether those two shouldn't give consistent
results and if so, which would be the correct solution.


I've never seen the -2 in any literature, and there is no reference in the code comment.
(I would remove it as a bug-fix. Even if there is some Bayesian interpretation, it is not what users would expect.)

There was a similar thread in 2013

Josef
 


Best, 

Jonathan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

Jonathan Tammo Siebert
On Tue, 2018-05-29 at 10:47 -0400, [hidden email] wrote:

> On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert <
> [hidden email]> wrote:
>
> > Hi,
> >
> > I hope this is the appropriate place to ask something like
> > this, otherwise please let me know (or feel free to ignore
> > this). Also I hope that I do not misunderstood something or
> > did some silly mistake. If so, please let me know as well!
> >
> > TLDR:
> > When scaling the covariance matrix based on the residuals,
> > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
> > (with M=number of point, N=number of parameters) and
> > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
> > is correct.
> >
> > I am somewhat confused about different results I am getting
> > for the covariance matrix of a simple linear fit, when
> > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
> > am aware, that `curve_fit` solves the more general non-linear
> > problem numerically, while `polyfit` finds an analytical
> > solution to the linear problem. However, both converge to the
> > same solution, so I suspect that this difference is not
> > important here. The difference, I am curious about is not in
> > the returned parameters but in the estimate of the
> > corresponding covariance matrix. As I understand, there are
> > two different ways to estimate it, based either on the
> > absolute values of the provided uncertainties or by
> > interpreting those only as weights and then scaling the
> > matrix to produce an appropriate reduced chisq. To that end,
> > curve_fit has the parameter:
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize
> > .cur
> > ve_fit.html:
> > "absolute_sigma : bool, optional
> > If True, sigma is used in an absolute sense and the
> > estimated parameter covariance pcov reflects these absolute
> > values.
> > If False, only the relative magnitudes of the sigma values
> > matter. The returned parameter covariance matrix pcov is
> > based on scaling sigma by a constant factor. This constant
> > is set by demanding that the reduced chisq for the optimal
> > parameters popt when using the scaled sigma equals unity. In
> > other words, sigma is scaled to match the sample variance of
> > the residuals after the fit. Mathematically,
> > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
> > chisq(popt)/(M-N)"
> > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.
> > html
> > on the other hand, does not say anything about how the
> > covariance matrix is estimated. To my understanding, its
> > default should correspond to `absolute_sigma=False` for
> > `curve_fit`. As `polyfit` has a weight parameter instead of
> > an uncertainty parameter, I guess the difference in default
> > behavior is not that surprising.
> > However, even when specifying `absolute_sigma=False`,
> > `curve_fit` and `polyfit` produce different covariance
> > matrices as the applied scaling factors are
> > chisq(popt)/(M-N-2) for `polyfit`
> > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2
> > ec82
> > 21348b/numpy/lib/polynomial.py#L598-L605)
> > and chisq(popt)/(M-N) for `curve_fit`
> > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994
> > 137a
> > 3ccd5b/scipy/optimize/minpack.py#L781-L782).
> > The argument given in a comment to the scaling `polyfit` is:
> > "Some literature ignores the extra -2.0 factor in the
> > denominator, but it is included here because the covariance
> > of Multivariate Student-T (which is implied by a Bayesian
> > uncertainty analysis) includes it. Plus, it gives a slightly
> > more conservative estimate of uncertainty.",
> > but honestly, in a quick search, I was not able to find any
> > literature not ignoring the extra "factor". But obviously, I
> > could very well be misunderstanding something.
> > Nonetheless, as `curve_fit` ignores it as well, I was
> > wondering whether those two shouldn't give consistent
> > results and if so, which would be the correct solution.
> >
>
>
> I've never seen the -2 in any literature, and there is no reference
> in the
> code comment.
> (I would remove it as a bug-fix. Even if there is some Bayesian
> interpretation, it is not what users would expect.)

That would be my preferred fix as well. If there aren't any objections,
I'll open a corresponding issue and PR.

> There was a similar thread in 2013
> https://mail.scipy.org/pipermail/numpy-discussion/2013-
> February/065664.html

Thanks for the link. I must've somehow missed that earlier discussion.
Would it be appropriate to also add an additional parameter along the
lines of curve_fit's `absolute_sigma` with default `False` to keep it
consistent? I already felt that something like this was missing for
cases where proper standard errors are known for the data points and it
was apparently already discussed in 2013. As far as I can see, the main
reason against that is the fact that `polyfit` accepts `w`
(weights->`1/sigma`) as a parameter and not `sigma`, which would make
the documentation somewhat less intuitive than in the case of
`curve_fit`.


Jonathan

>
> Josef
>
>
> >
> >
> > Best,
> >
> > Jonathan
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

josef.pktd


On Tue, May 29, 2018 at 2:21 PM, Jonathan Tammo Siebert <[hidden email]> wrote:
On Tue, 2018-05-29 at 10:47 -0400, [hidden email] wrote:
> On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert <
> [hidden email]> wrote:
>
> > Hi,
> >
> > I hope this is the appropriate place to ask something like
> > this, otherwise please let me know (or feel free to ignore
> > this). Also I hope that I do not misunderstood something or
> > did some silly mistake. If so, please let me know as well!
> >
> > TLDR:
> > When scaling the covariance matrix based on the residuals,
> > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
> > (with M=number of point, N=number of parameters) and
> > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
> > is correct.
> >
> > I am somewhat confused about different results I am getting
> > for the covariance matrix of a simple linear fit, when
> > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
> > am aware, that `curve_fit` solves the more general non-linear
> > problem numerically, while `polyfit` finds an analytical
> > solution to the linear problem. However, both converge to the
> > same solution, so I suspect that this difference is not
> > important here. The difference, I am curious about is not in
> > the returned parameters but in the estimate of the
> > corresponding covariance matrix. As I understand, there are
> > two different ways to estimate it, based either on the
> > absolute values of the provided uncertainties or by
> > interpreting those only as weights and then scaling the
> > matrix to produce an appropriate reduced chisq. To that end,
> > curve_fit has the parameter:
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize
> > .cur
> > ve_fit.html:
> > "absolute_sigma : bool, optional
> > If True, sigma is used in an absolute sense and the
> > estimated parameter covariance pcov reflects these absolute
> > values.
> > If False, only the relative magnitudes of the sigma values
> > matter. The returned parameter covariance matrix pcov is
> > based on scaling sigma by a constant factor. This constant
> > is set by demanding that the reduced chisq for the optimal
> > parameters popt when using the scaled sigma equals unity. In
> > other words, sigma is scaled to match the sample variance of
> > the residuals after the fit. Mathematically,
> > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
> > chisq(popt)/(M-N)"
> > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.
> > html
> > on the other hand, does not say anything about how the
> > covariance matrix is estimated. To my understanding, its
> > default should correspond to `absolute_sigma=False` for
> > `curve_fit`. As `polyfit` has a weight parameter instead of
> > an uncertainty parameter, I guess the difference in default
> > behavior is not that surprising.
> > However, even when specifying `absolute_sigma=False`,
> > `curve_fit` and `polyfit` produce different covariance
> > matrices as the applied scaling factors are
> > chisq(popt)/(M-N-2) for `polyfit`
> > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2
> > ec82
> > 21348b/numpy/lib/polynomial.py#L598-L605)
> > and chisq(popt)/(M-N) for `curve_fit`
> > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994
> > 137a
> > 3ccd5b/scipy/optimize/minpack.py#L781-L782).
> > The argument given in a comment to the scaling `polyfit` is:
> > "Some literature ignores the extra -2.0 factor in the
> > denominator, but it is included here because the covariance
> > of Multivariate Student-T (which is implied by a Bayesian
> > uncertainty analysis) includes it. Plus, it gives a slightly
> > more conservative estimate of uncertainty.",
> > but honestly, in a quick search, I was not able to find any
> > literature not ignoring the extra "factor". But obviously, I
> > could very well be misunderstanding something.
> > Nonetheless, as `curve_fit` ignores it as well, I was
> > wondering whether those two shouldn't give consistent
> > results and if so, which would be the correct solution.
> >
>
>
> I've never seen the -2 in any literature, and there is no reference
> in the
> code comment.
> (I would remove it as a bug-fix. Even if there is some Bayesian
> interpretation, it is not what users would expect.)

That would be my preferred fix as well. If there aren't any objections,
I'll open a corresponding issue and PR.

> There was a similar thread in 2013
> https://mail.scipy.org/pipermail/numpy-discussion/2013-
> February/065664.html

Thanks for the link. I must've somehow missed that earlier discussion.
Would it be appropriate to also add an additional parameter along the
lines of curve_fit's `absolute_sigma` with default `False` to keep it
consistent? I already felt that something like this was missing for
cases where proper standard errors are known for the data points and it
was apparently already discussed in 2013. As far as I can see, the main
reason against that is the fact that `polyfit` accepts `w`
(weights->`1/sigma`) as a parameter and not `sigma`, which would make
the documentation somewhat less intuitive than in the case of
`curve_fit`.

It would work with "absolute_weights"

After the long discussions for scaling in curve_fit, I think it's fine to add it.

asides:

I still don't really understand why users would want it for the covariance of the parameter estimates.
However, I also added an option to statsmodels OLS and WLS to keep the scale fixed instead of using the estimated scale.

There is a reason that polyfit might have different, larger standard errors than curve_fit, if we assume that curve_fit has a correctly specified mean function and polyfit is just a low order approximation to an arbitrary non-linear function. That is polyfit combines a functional approximation error with the stochastic error from the random observations. (which is also ignore if scale is fixed.)
(But I never tried to figure out those non- or semi-parametric approximation details.)

aside further away: In Poisson regression, we go the opposite way and use an estimated residual scale instead of a fixed scale = 1 so we can correct the standard errors for the parameters when there is over-dispersion. (I just ran a Monte Carlo example where a hypothesis test under Poisson assumption is very wrong because of the over dispersion. I.e. if the chi2 is far away from 1, then the standard errors for the parameters are "useless" if scale is assumed to be 1.)

Josef
 


Jonathan

>
> Josef
>
>
> >
> >
> > Best,
> >
> > Jonathan
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion