Proposal: split numpy.distutils into it's own package

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal: split numpy.distutils into it's own package

Dustin Spicuzza
Hey all,

[copied from https://github.com/numpy/numpy/issues/17620, as requested
by the feature request guidelines]

Cross-compiling scipy and other projects that depend on numpy's
distutils is a huge pain right now, because to do it [in addition to
lots of other details that you have to get right] you have to have both
a native and cross-compiled version of numpy installed. It seems pretty
unreasonable that I need a native version of numpy installed to compile
scipy. One might ask, why is this needed?

Well, scipy's setup.py uses numpy's distutils fork for the fortran
support (I think). Because numpy.distutils is a subpackage of numpy, if
you're working with a cross-compiled version of numpy, it eventually
tries to import some .so that wasn't compiled for your host and
everything dies -- thus you have to have a native numpy installed to
allow numpy.distutils to import correctly.

As far as I can tell, numpy's distutils fork is pure python and doesn't
actually use anything else in numpy itself, and so is completely
separable from numpy. If it were its own top-level package that scipy et
al could use, then cross-compiling would be significantly less tricky.

The mechanics of this would work like so:

* contents of numpy.distutils would be moved to 'numpy_distutils' package
* importing numpy.distutils would raise a deprecation warning and
redirect to numpy_distutils, probably using import hook magic
* scipy and other packages would now utilize numpy_distutils instead of
numpy.distutils at build time

Initially, I considered proposing creating a separate pypi package for
numpy_distutils, but I expect that would be more trouble than it's
worth. One could also propose creating a new PEP 517 build backend, or
move to cmake, or other huge changes, but those also seem more trouble
than they're worth.

Thanks for your consideration.

Dustin

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

bashtage
Separating it is definitely a good idea.  The only thing that I think would be better would be if they key features that are not in setuptools could be added there so that NumPy distutils could eventually be retired.  

Kevin

On Sat, Oct 24, 2020, 08:12 Dustin Spicuzza <[hidden email]> wrote:
Hey all,

[copied from https://github.com/numpy/numpy/issues/17620, as requested
by the feature request guidelines]

Cross-compiling scipy and other projects that depend on numpy's
distutils is a huge pain right now, because to do it [in addition to
lots of other details that you have to get right] you have to have both
a native and cross-compiled version of numpy installed. It seems pretty
unreasonable that I need a native version of numpy installed to compile
scipy. One might ask, why is this needed?

Well, scipy's setup.py uses numpy's distutils fork for the fortran
support (I think). Because numpy.distutils is a subpackage of numpy, if
you're working with a cross-compiled version of numpy, it eventually
tries to import some .so that wasn't compiled for your host and
everything dies -- thus you have to have a native numpy installed to
allow numpy.distutils to import correctly.

As far as I can tell, numpy's distutils fork is pure python and doesn't
actually use anything else in numpy itself, and so is completely
separable from numpy. If it were its own top-level package that scipy et
al could use, then cross-compiling would be significantly less tricky.

The mechanics of this would work like so:

* contents of numpy.distutils would be moved to 'numpy_distutils' package
* importing numpy.distutils would raise a deprecation warning and
redirect to numpy_distutils, probably using import hook magic
* scipy and other packages would now utilize numpy_distutils instead of
numpy.distutils at build time

Initially, I considered proposing creating a separate pypi package for
numpy_distutils, but I expect that would be more trouble than it's
worth. One could also propose creating a new PEP 517 build backend, or
move to cmake, or other huge changes, but those also seem more trouble
than they're worth.

Thanks for your consideration.

Dustin

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

Pauli Virtanen-3
In reply to this post by Dustin Spicuzza
la, 2020-10-24 kello 03:11 -0400, Dustin Spicuzza kirjoitti:
> Cross-compiling scipy and other projects that depend on numpy's
> distutils is a huge pain right now, because to do it [in addition to
> lots of other details that you have to get right] you have to have
> both
> a native and cross-compiled version of numpy installed. It seems
> pretty
> unreasonable that I need a native version of numpy installed to
> compile
> scipy. One might ask, why is this needed?

Factoring out numpy.distutils from numpy alone will not enable
compiling scipy without numpy being installed. It probably can help,
though, and might make sense also in view of the incoming deprecation
of Python distutils (https://www.python.org/dev/peps/pep-0632/).

Extension modules, including f2py, need numpy headers and probably also
their platform-specific configuration. There are also some assumptions
about data type sizes and Numpy versions at build-time being compatible
with the ones at runtime, which factoring out distutils won't address.
IIUC, cross-compilation is not actually supported, so that it can be
made to work is surprising.

        Pauli


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

Dustin Spicuzza

On 10/24/2020 8:31 AM Pauli Virtanen <[hidden email]> wrote:


la, 2020-10-24 kello 03:11 -0400, Dustin Spicuzza kirjoitti:
Cross-compiling scipy and other projects that depend on numpy's
distutils is a huge pain right now, because to do it [in addition to
lots of other details that you have to get right] you have to have
both
a native and cross-compiled version of numpy installed. It seems
pretty
unreasonable that I need a native version of numpy installed to
compile
scipy. One might ask, why is this needed?
Factoring out numpy.distutils from numpy alone will not enable
compiling scipy without numpy being installed. It probably can help,
though, and might make sense also in view of the incoming deprecation

Extension modules, including f2py, need numpy headers and probably also
their platform-specific configuration. There are also some assumptions
about data type sizes and Numpy versions at build-time being compatible
with the ones at runtime, which factoring out distutils won't address.
IIUC, cross-compilation is not actually supported, so that it can be
made to work is surprising.

Yes, as I said, there's a lot of little details that have to be correct for cross-compiling to work, but making numpy.distutils a separate toplevel will simplify other aspects of the process.

For those interested, the crossenv project (https://github.com/benfogle/crossenv) takes care of a lot of those other details pretty well. I posted my steps for cross-compiling numpy/scipy utilizing crossenv at https://github.com/scipy/scipy/issues/8571#issuecomment-715877299

Since this initial feedback seems mostly positive, I'll go ahead and take a stab at refactoring it and make a PR potentially this weekend. It should be fairly straightforward.

Dustin

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

Dustin Spicuzza

On 10/24/20 2:59 PM, Dustin Spicuzza wrote:
> Since this initial feedback seems mostly positive, I'll go ahead and
> take a stab at refactoring it and make a PR potentially this weekend.
> It should be fairly straightforward.
>
> Dustin


I took a first stab at it, and... surprise, surprise, there were a few
more warts than I had originally expected in my initial survey. The
biggest unexpected result is that numpy.f2py would need to also be a
toplevel package. I did get the refactor cross-compiled and started on
scipy, but there's a few more issues that will have to be resolved on
the scipy side.

I posted a detailed set of notes on the issue (#17620) and made a draft
PR with my initial results (#17632) if you want to get a sense for how
invasive this is (or isn't depending on your point of view).

Dustin


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

mattip
Administrator

On 10/25/20 10:46 AM, Dustin Spicuzza wrote:

> I took a first stab at it, and... surprise, surprise, there were a few
> more warts than I had originally expected in my initial survey. The
> biggest unexpected result is that numpy.f2py would need to also be a
> toplevel package. I did get the refactor cross-compiled and started on
> scipy, but there's a few more issues that will have to be resolved on
> the scipy side.
>
> I posted a detailed set of notes on the issue (#17620) and made a draft
> PR with my initial results (#17632) if you want to get a sense for how
> invasive this is (or isn't depending on your point of view).
>
> Dustin
>
Is there a way to do this without modifying SciPy? That would reassure
me that this change will not break other peoples' workflow. It is hard
to believe that only SciPy uses numpy.distutils. If the changes break
backward compatibility, they need to be done like any other deprecation:
warn for 4 releases (two years) before actually breaking workflows.


Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

bashtage
NumPy could take an explicit runtime dependency on numpy-distutils so that the code would be technically in a different repo bit would always be available through NumPy. Eventually this could be removed as a runtime dependency.

Kevin


On Sun, Oct 25, 2020, 09:23 Matti Picus <[hidden email]> wrote:

On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
> I took a first stab at it, and... surprise, surprise, there were a few
> more warts than I had originally expected in my initial survey. The
> biggest unexpected result is that numpy.f2py would need to also be a
> toplevel package. I did get the refactor cross-compiled and started on
> scipy, but there's a few more issues that will have to be resolved on
> the scipy side.
>
> I posted a detailed set of notes on the issue (#17620) and made a draft
> PR with my initial results (#17632) if you want to get a sense for how
> invasive this is (or isn't depending on your point of view).
>
> Dustin
>
Is there a way to do this without modifying SciPy? That would reassure
me that this change will not break other peoples' workflow. It is hard
to believe that only SciPy uses numpy.distutils. If the changes break
backward compatibility, they need to be done like any other deprecation:
warn for 4 releases (two years) before actually breaking workflows.


Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

Dustin Spicuzza
In reply to this post by mattip
On 10/25/20 5:23 AM, Matti Picus wrote:

>
> On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
>> I took a first stab at it, and... surprise, surprise, there were a few
>> more warts than I had originally expected in my initial survey. The
>> biggest unexpected result is that numpy.f2py would need to also be a
>> toplevel package. I did get the refactor cross-compiled and started on
>> scipy, but there's a few more issues that will have to be resolved on
>> the scipy side.
>>
>> I posted a detailed set of notes on the issue (#17620) and made a draft
>> PR with my initial results (#17632) if you want to get a sense for how
>> invasive this is (or isn't depending on your point of view).
>>
>> Dustin
>>
> Is there a way to do this without modifying SciPy? That would reassure
> me that this change will not break other peoples' workflow. It is hard
> to believe that only SciPy uses numpy.distutils. If the changes break
> backward compatibility, they need to be done like any other
> deprecation: warn for 4 releases (two years) before actually breaking
> workflows.
>
>
> Matti


Sorry for not being clear, when I was discussing modifications to scipy
I was referring to the specific use case of cross-compilation. The goal
is that existing native builds would not break backwards compatibility.
To that end, there's a package redirection stub in my PR for both
numpy.distutils and numpy.f2py.

Just tried a native build using my current PR branch and at the moment
scipy doesn't work. However, it's a size mismatch during compilation as
opposed to an ImportError, so I probably just missed a subtlety when I
moved things. But I would definitely expect the finalized version of
this set of changes should not break existing users.

Dustin


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

Dustin Spicuzza
On 10/25/20 11:39 AM, Dustin Spicuzza wrote:

> On 10/25/20 5:23 AM, Matti Picus wrote:
>> On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
>>> I took a first stab at it, and... surprise, surprise, there were a few
>>> more warts than I had originally expected in my initial survey. The
>>> biggest unexpected result is that numpy.f2py would need to also be a
>>> toplevel package. I did get the refactor cross-compiled and started on
>>> scipy, but there's a few more issues that will have to be resolved on
>>> the scipy side.
>>>
>>> I posted a detailed set of notes on the issue (#17620) and made a draft
>>> PR with my initial results (#17632) if you want to get a sense for how
>>> invasive this is (or isn't depending on your point of view).
>>>
>>> Dustin
>>>
>> Is there a way to do this without modifying SciPy? That would reassure
>> me that this change will not break other peoples' workflow. It is hard
>> to believe that only SciPy uses numpy.distutils. If the changes break
>> backward compatibility, they need to be done like any other
>> deprecation: warn for 4 releases (two years) before actually breaking
>> workflows.
>>
>>
>> Matti
>
> Sorry for not being clear, when I was discussing modifications to scipy
> I was referring to the specific use case of cross-compilation. The goal
> is that existing native builds would not break backwards compatibility.
> To that end, there's a package redirection stub in my PR for both
> numpy.distutils and numpy.f2py.
>
> Just tried a native build using my current PR branch and at the moment
> scipy doesn't work. However, it's a size mismatch during compilation as
> opposed to an ImportError, so I probably just missed a subtlety when I
> moved things. But I would definitely expect the finalized version of
> this set of changes should not break existing users.
>
> Dustin

PR now builds unmodified scipy natively.

Dustin


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

mattip
Administrator
In reply to this post by Dustin Spicuzza

On 10/25/20 5:39 PM, Dustin Spicuzza wrote:
> Sorry for not being clear, when I was discussing modifications to scipy
> I was referring to the specific use case of cross-compilation. The goal
> is that existing native builds would not break backwards compatibility.
> To that end, there's a package redirection stub in my PR for both
> numpy.distutils and numpy.f2py.
>
> Dustin


Thanks for the clarification.

Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: split numpy.distutils into it's own package

ralfgommers
In reply to this post by bashtage


On Sun, Oct 25, 2020 at 1:20 PM Kevin Sheppard <[hidden email]> wrote:
NumPy could take an explicit runtime dependency on numpy-distutils so that the code would be technically in a different repo bit would always be available through NumPy. Eventually this could be removed as a runtime dependency.

I put some more thoughts in https://github.com/numpy/numpy/issues/17620. We cannot remove numpy.distutils, so that separate package may be needed for cross-compilation but we don't need to use it in NumPy itself.

From a high level: being able to cross-compile would be great. However long-term we can hopefully put everything in setuptools, so I'd like the changes now to be as non-invasive as possible.



Kevin


On Sun, Oct 25, 2020, 09:23 Matti Picus <[hidden email]> wrote:

On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
> I took a first stab at it, and... surprise, surprise, there were a few
> more warts than I had originally expected in my initial survey. The
> biggest unexpected result is that numpy.f2py would need to also be a
> toplevel package. I did get the refactor cross-compiled and started on
> scipy, but there's a few more issues that will have to be resolved on
> the scipy side.
>
> I posted a detailed set of notes on the issue (#17620) and made a draft
> PR with my initial results (#17632) if you want to get a sense for how
> invasive this is (or isn't depending on your point of view).
>
> Dustin
>
Is there a way to do this without modifying SciPy?

The goal is to modify SciPy here, so it can be cross-compiled.

That would reassure
me that this change will not break other peoples' workflow. It is hard
to believe that only SciPy uses numpy.distutils.

That is indeed not the case, numpy.distutils is widely used. The `numpy.distutils` namespace must remain accessible imho.

Cheers,
Ralf


If the changes break
backward compatibility, they need to be done like any other deprecation:
warn for 4 releases (two years) before actually breaking workflows. 


Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion