Is it OK to extend the ndarray structure?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it OK to extend the ndarray structure?

Sebastian Berg
Hi all,

just curious, has anyone reservations about extending the ndarray
struct (and the void scalar one)?

The reason is that, I am starting to dislike the way we handle the
buffer interface.
Due to issues with backward compatibility, we cannot use the "right"
way to free the buffer information. Because of that, the way we solve
it is by storing lists of pointers in a dictionary...

To me this seems a bit complicating, and is annoying since it adds a
dictionary lookup overhead to every single array deletion (and
inserting for every buffer creation). Also, it looks a bit like a
memory leak in some cases (although that probably only annoys me and
only when running valgrind).

It seems that it would be much simpler to tag the buffer-info on to the
array object itself. Which, however, would require extending the array
object by a single pointer [1].

Extending is in theory an ABI break if anyone subclasses ndarray from C
(extending the struct) and does not very carefully anticipate the
possibility.  I am not even sure we support that, but its hard to be
sure...

Cheers,

Sebastian


[1] The size difference should not matter IMO, and with cythons
memoryviews buffers are not an uncommon feature in any case, for the
void scalar it is a bit bigger, but they are also very rare.
(I thought of using weak references, but the CPython API seems not very
fleshed out, or at least not documented, so not sure about that).

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Is it OK to extend the ndarray structure?

ralfgommers


On Fri, May 22, 2020 at 10:14 PM Sebastian Berg <[hidden email]> wrote:
Hi all,

just curious, has anyone reservations about extending the ndarray
struct (and the void scalar one)?

The reason is that, I am starting to dislike the way we handle the
buffer interface.
Due to issues with backward compatibility, we cannot use the "right"
way to free the buffer information. Because of that, the way we solve
it is by storing lists of pointers in a dictionary...

To me this seems a bit complicating, and is annoying since it adds a
dictionary lookup overhead to every single array deletion (and
inserting for every buffer creation). Also, it looks a bit like a
memory leak in some cases (although that probably only annoys me and
only when running valgrind).

It seems that it would be much simpler to tag the buffer-info on to the
array object itself. Which, however, would require extending the array
object by a single pointer [1].

Extending is in theory an ABI break if anyone subclasses ndarray from C
(extending the struct) and does not very carefully anticipate the
possibility.  I am not even sure we support that, but its hard to be
sure...

I had no idea if we support that, so I crowdsourced some inputs.

Feedback from Travis: "I would be quite sure there are extensions out there that do this.  Please just break the ABI and change the version number to do that."

Feedback from Pearu: "ndarray itself (PyArrayObject) is a kind-of subclass of PyObject. See https://www.python.org/dev/peps/pep-0253.  Something like the following might work:
typedef struct {
PyArrayObject super;
/* insert extensions here */
} MyPyArrayObject;
"

Cheers,
Ralf



Cheers,

Sebastian


[1] The size difference should not matter IMO, and with cythons
memoryviews buffers are not an uncommon feature in any case, for the
void scalar it is a bit bigger, but they are also very rare.
(I thought of using weak references, but the CPython API seems not very
fleshed out, or at least not documented, so not sure about that).

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Is it OK to extend the ndarray structure?

Sebastian Berg
On Wed, 2020-05-27 at 18:36 +0200, Ralf Gommers wrote:
> On Fri, May 22, 2020 at 10:14 PM Sebastian Berg <
> [hidden email]>
> wrote:
<snip>

>
> I had no idea if we support that, so I crowdsourced some inputs.
>
> Feedback from Travis: "I would be quite sure there are extensions out
> there
> that do this.  Please just break the ABI and change the version
> number to
> do that."
>
> Feedback from Pearu: "ndarray itself (PyArrayObject) is a kind-of
> subclass
> of PyObject. See https://www.python.org/dev/peps/pep-0253.  Something
> like
> the following might work:
>
> typedef struct {
>   PyArrayObject super;
>   /* insert extensions here */
> } MyPyArrayObject;
>
> "

Yes, it is a break if someone subclasses from C (or probably Cython)
without being very careful (and we do not help with it well right now).

But, the ABI break is very mild in the sense that it is very easy to
recompile such a library to be compatible with *both* old and new
versions [1]. And I still think that it will be super rare (which I
would love to check [2]).

In either case, though, I am pretty convinced for a long time now, that
a major version is becoming more and more something we should simply
do.
And making 1.20 a 2.0 release will have many good reasons aside from
such a ABI break (and if it is just that we are expecting a lot of code
churn both due to SIMD and changes in the core).

To be clear, I personally do *not* like to aim for a serious ABI break.
The vast majority of libraries should not require recompilation, and
IMO it must be easy to create a single binary compatible with both old
and new versions.
If someone wants to aim for a real ABI break, I would be interested to
see the thoughts on feasibility, but to me that simply feels like
aiming high. And I am not sure there is much gain?
But a small wave of C-API deprecation and small, technically
incompatible, changes that most uses will never notice, does seem
plausible to me.

Cheers,

Sebastian


[1]  You simply have to manually include the larger struct (or we
update our headers). The only annoyance is that the crashes/errors that
happen if you run a non-recompiled/old version against a new NumPy
version may be pretty random.

[2] I would also like to do an anaconda or PIP search to sieve through
actual code and see that while it may technically be an ABI break, it
will affect practically no largish libraries... (i.e. large enough to
land in Anaconda)
If anyone knows how to do that best, I would be interested.


>
> Cheers,
> Ralf
>
>
>
> > Cheers,
> >
> > Sebastian
> >
> >
> > [1] The size difference should not matter IMO, and with cythons
> > memoryviews buffers are not an uncommon feature in any case, for
> > the
> > void scalar it is a bit bigger, but they are also very rare.
> > (I thought of using weak references, but the CPython API seems not
> > very
> > fleshed out, or at least not documented, so not sure about that).
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion