A little about XND

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

A little about XND

teoliphant
Hi everyone, 

I'm glad I'm able to contribute back to this discussion thread.  I wanted to post a quick message to this group to make sure there is no mis-information about XND which has finally reached the point where it can be experimented with (http://xnd.io) and commented on.

XND came out of thoughts and conversations we had at Continuum (now Anaconda) when thinking about cross-language array computing and how to enable improved features for high-level users in many languages (including Python, R, Ruby, Node, Scala, Rust, Go, etc.).  Technically there are three projects that make up XND (thus the name Plures for the Github organization). All of these projects have a C-library and then a high-level interface (right now we only have resources to develop the Python interface but would love to see support for other languages).  xnd (libxnd) is the typed container.  ndtypes (libndtypes) is the (datashape-like) type system with a grammar, parser, and type matcher.  gumath (libgumath) are generalized ufuncs which represent the entire function system on xnd.

We will be talking more about XND in the coming months and years, but for the purposes of this list, I wanted to make it clear that 

1) XND is not trying to replace NumPy. XND is a low-level library and intended to be such. It would be most welcome if someday NumPy uses XND.  We understand this may be a while and certainly not before NumPy 2.0 or 3.0.  

2) Our initial target users are Numba, pandas, Dask, xarray, and other higher-level objects at the moment. We are eagerly searching for integration opportunities to connect more developers (or advanced users) to xnd before making more progress.  

3) We do discuss array-like things in the public channels. NumPy users and developers are welcome in those channels.  Everything is done in public including the weekly meeting which anyone can attend: 


Live discussions:  https://gitter.im/Plures/xnd for the libraries themselves 
                             https://gitter.im/Plures/xnd-ml for integrations.  

Issues and PRs: https://github.com/plures  --- under the various projects.

4) We are thinking about adding a custom-dtype to NumPy that uses xnd and would be happy for anyone's help on that project.  

5) We are in the early stages of exploring a high-level array interface (using the ideas of MoA and the Psi Calculus with Lenore Mullen who worked on APL years ago).  Likely the first place this will see some initial progress is in an ND Sparse array that uses XND.

We welcome participation and input from all.  Stefan Krah has written the majority of the code and so we tend to respect his point of view.  Pearu Peterson (of f2py and SciPy fame) has made some useful contributions recently.  

Stefan and I have been talking roughly weekly for a couple of years and so some of the problems currently there, I am certainly responsible for.  

Two of our immediate goals are to work with the Numba team to get support for ndtypes in Numba and allow Numba to use libgumath in no-python mode. 

I look forward to continuing the conversation with any of you who want to participate.  Perhaps some of us can meet up during NumPy sprints to discuss more.

XND is also currently looking for funding and time from interested parties to continue its development.

-Travis

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

Marten van Kerkwijk
Hi Travis,

More of a detailed question, but as we are currently thinking about extending the signature of gufuncs (i.e., things like `(m,n),(n,p)->(m,p)` for matrix multiplication), and as you must have thought about this for libgufunc, could you point me to how one would document the signature in your new system? (I briefly tried but there's no docs yet and I couldn't immediately find it in the code). If it is at all similar to numpy's and you have extended it, we should at least check whether we can do the same thing.

Thanks, all best wishes,

Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

teoliphant


On Sun, Jun 17, 2018, 7:48 PM Marten van Kerkwijk <[hidden email]> wrote:
Hi Travis,

More of a detailed question, but as we are currently thinking about extending the signature of gufuncs (i.e., things like `(m,n),(n,p)->(m,p)` for matrix multiplication), and as you must have thought about this for libgufunc, could you point me to how one would document the signature in your new system? (I briefly tried but there's no docs yet and I couldn't immediately find it in the code). If it is at all similar to numpy's and you have extended it, we should at least check whether we can do the same thing.

I have been reading with interest these gufunc proposals and have pointed it out to the gumath devs.  Right now, gumath doesn't go much beyond NumPy's syntax except for use of a more extensible type system.  It uses the same notion of the dimension signature, though with a syntax derived from datashape which you can read more about here: http://datashape.readthedocs.io/en/latest/

Stefan Krah, Pearu, or Saul may have more comments.

Thanks,

-Travis


Thanks, all best wishes,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

Marten van Kerkwijk
Interesting. If nothing else, it would be a nice way to mark our internal functions, including the loops. It also should not be difficult to have (g)ufunc signatures exported in that way, combining `signature` and `types`.

In more detail, I see the grammar clearly allows fixed dimensions in a way that easily translates, but it isn't immediately obvious to me how one would express broadcasting or possibly missing ones, so perhaps there is room for sharing how to indicate that (although it is at a higher level; the function signature is fine).

-- Marten

For others, direct link to datashape grammar: http://datashape.readthedocs.io/en/latest/grammar.html


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

Stefan Krah
In reply to this post by Marten van Kerkwijk
On Sun, Jun 17, 2018 at 08:47:02PM -0400, Marten van Kerkwijk wrote:
> More of a detailed question, but as we are currently thinking about
> extending the signature of gufuncs (i.e., things like `(m,n),(n,p)->(m,p)`
> for matrix multiplication), and as you must have thought about this for
> libgufunc, could you point me to how one would document the signature in
> your new system? (I briefly tried but there's no docs yet and I couldn't
> immediately find it in the code).

The docs are a bit scattered across the three libraries, here is something
about types and pattern matching:

   http://ndtypes.readthedocs.io/en/latest/ndtypes/types.html
   http://ndtypes.readthedocs.io/en/latest/ndtypes/pattern-matching.html

A couple of example signatures:

   https://github.com/plures/gumath/blob/5f1f6de3d2c9a003b9dfb224fe09c63ae81bf18b/libgumath/extending/quaternion.c#L121
   https://github.com/plures/gumath/blob/5f1f6de3d2c9a003b9dfb224fe09c63ae81bf18b/libgumath/extending/pdist.c#L115



The function signature for float64-specialized matrix multiplication is:

  "... * N * M * float64, ... * M * P * float64 -> ... * N * P * float64"


The function signature for generic matrix multiplication is:

  "... * N * M * T, ... * M * P * T -> ... * N * P * T"


A function that only accepts scalars:

  "... * N * M * Scalar, ... * M * P * Scalar -> ... * N * P * Scalar"


A couple of observations:  Functions are multimethods, so function dispatch
on concrete arguments works by trying to locate a matching kernel.

For example, if only the above "float64" kernel is present, all other
dtypes will fail.


Casting
-------

It is still under debate how we handle casting.  The current examples
libgumath/kernels simply generate *all* signatures that allow exact
casting of the input for a specific function.


This is feasible for unary and binary kernels, but could lead to case
explosion for functions with many arguments.


The kernel writer however is always free to use the above type variable
or Scalar signatures and handle casting inside the kernel.


Explicit gufuncs
----------------

Gufuncs are explicit and require leading ellipses.  A signature of
"N * M * float64" is not a gufunc and does not allow outer dimensions.


Disable broadcasting
--------------------

  "D... * N * M * float64, D... * M * P * float64 -> D... * N * P * float64"

Dimension variables match a sequence of dimensions, so in the above example
all outer dimensions must be exactly the same.


Non-symbolic matches
--------------------

"... * 2 * 3 * int8" only accepts "2 * 3 * int8" as the inner dimensions.


Sorry for the long mail, I hope this clears up a bit what function signatures
generally look like.



Stefan Krah



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

Marten van Kerkwijk
Hi Stefan,

That looks quite nice and expressive. In the context of a discussion we have been having about describing `matmul/@` and possibly broadcastable dimensions, I think from your description it sounds like one would describe `@` with multiple functions (the multiple dispatch we have been (are?) considering as well):


"... * N * M * T, ... * M * P * T -> ... * N * P * T"
"M * T, ... * M * P * T -> ... P * T"
"... * N * M * T, M * T -> ... * N * T"
"M * T, M * T -> T"

Is there a way to describe broadcasting?  The sample case we've come up with is a function that calculates a weighted mean. This might take (values, sigmas) and return (mean, sigma_mean), which would imply a signature like:

"... N * T, ... N * T -> ... * T, ... * T"

But would your signature allow indicating that one could pass in a single sigma? I.e., broadcast the second 1 to N if needed?

I realize that this is no longer about describing precisely what the function doing the calculation expects, but rather what an upper level is allowed to do before calling the function (i.e., take a dimension of 1 and broadcast it).

All the best,

Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

Stefan Krah

Hi Marten,

On Mon, Jun 18, 2018 at 12:34:03PM -0400, Marten van Kerkwijk wrote:

> That looks quite nice and expressive. In the context of a discussion we
> have been having about describing `matmul/@` and possibly broadcastable
> dimensions, I think from your description it sounds like one would describe
> `@` with multiple functions (the multiple dispatch we have been (are?)
> considering as well):
>
>
> "... * N * M * T, ... * M * P * T -> ... * N * P * T"
> "M * T, ... * M * P * T -> ... P * T"
> "... * N * M * T, M * T -> ... * N * T"
> "M * T, M * T -> T"

Yes, that's the way, and the outer dimensions (the part matched by the
ellipsis) are always broadcast like in NumPy.


> Is there a way to describe broadcasting?  The sample case we've come up
> with is a function that calculates a weighted mean. This might take
> (values, sigmas) and return (mean, sigma_mean), which would imply a
> signature like:
>
> "... N * T, ... N * T -> ... * T, ... * T"
>
> But would your signature allow indicating that one could pass in a single
> sigma? I.e., broadcast the second 1 to N if needed?

Actually I came across this today when implementing optimized matching
for binary functions.

I wanted the faster kernel

  "... * N * int64, ... * N * int64 -> ... * N * int64"

to also match e.g. the input

  "int64, 10 * int64".


The generic datashape spec would forbid this, but perhaps the '?' that
you propose in nep-0020 would offer a way out of this for ndtypes.


It's a bit confusing for datashape, since there is already a questionmark
for missing variable dimensions (that have shape==0 in the data).

  >>> ndt("var * ?var * int64")
  ndt("var * ?var * int64")

This would be the type for e.g. [[0], None, [1,2,3]].


But for symbolic dimensions (which only match fixed dimensions) perhaps this

   "... * ?N * int64, ... * ?N * int64 -> ... * ?N * int64"

or, as in the NEP,

   "... * N? * int64, ... * N? * int64 -> ... * N? * int64"

should mean "At least one input has ndim >= 1, broadcast as necessary".


This still means that for the "all ndim==0" case one would need an
additional kernel "int64, int64 -> int64".


> I realize that this is no longer about describing precisely what the
> function doing the calculation expects, but rather what an upper level is
> allowed to do before calling the function (i.e., take a dimension of 1 and
> broadcast it).

Yes, for datashape the problem is that it also allows non-broadcastable
signatures like "N * float64", really the same as "double x[]" in C.

But the '?' with occasionally one additional kernel for ndim==0 could
solve this.


Stefan Krah



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: A little about XND

Marten van Kerkwijk
Hi Stefan,

Just to clarify: the ? we propose in the NEP is really for matmul - it indicates a true missing dimension (i.e., the array cannot have outer broadcast dimensions as well). For inner loop broadcasting, I'm proposing a "|1" post-fix, which means a dimension could also be missing, but can also be there and be 1, in which case it can do outer broadcast as well.  So, for your function in your notation, it might look like:

"... * N|1 * int64, ... * N|1 * int64 -> ... * N * int64"

(Note that the output of course always has N - if both inputs have 1 then N=1; it is not meant to be absent).

I think that actually looks quite clear, although perhaps one might want parentheses around it (since "|" = "or" normally does not have precedence over "*" = multiply), i.e.,

"... * (N|1) * int64, ... * (N|1) * int64 -> ... * N * int64"

All the best,

Marten


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion