On Wed, Jun 4, 2014 at 12:33 AM, Charles R Harris

<

[hidden email]> wrote:

> On Tue, Jun 3, 2014 at 5:08 PM, Kyle Mandli <

[hidden email]> wrote:

>>

>> Hello everyone,

>>

>> As one of the co-chairs in charge of organizing the birds-of-a-feather

>> sesssions at the SciPy conference this year, I wanted to solicit through the

>> NumPy list to see if we could get enough interest to hold a NumPy centered

>> BoF this year. The BoF format would be up to those who would lead the

>> discussion, a couple of ideas used in the past include picking out a few of

>> the lead devs to be on a panel and have a Q&A type of session or an open Q&A

>> with perhaps audience guided list of topics. I can help facilitate

>> organization of something but we would really like to get something

>> organized this year (last year NumPy was the only major project that was not

>> really represented in the BoF sessions).

>

> I'll be at the conference, but I don't know who else will be there. I feel

> that NumPy has matured to the point where most of the current work is

> cleaning stuff up, making it run faster, and fixing bugs. A topic that I'd

> like to see discussed is where do we go from here. One option to look at is

> Blaze, which looks to have matured a lot in the last year. The problem with

> making it a NumPy replacement is that NumPy has become quite widespread,

> with downloads from PyPi running at about 3 million per year. With that much

> penetration it may be difficult for a new core like Blaze to gain traction.

> So I'd like to also discuss ways to bring the two branches of development

> together at some point and explore what NumPy can do to pave the way. Mind,

> there are definitely things that would be nice to add to NumPy, a better

> type system, missing values, etc., but doing that is difficult given the

> current design.

I won't be at the conference unfortunately (I'm on the wrong continent

and have family commitments then anyway), but I think there's lots of

exciting stuff that can be done in numpy-land.

We absolutely could rewrite the dtype system, and this would

straightforwardly give us excellent support for missing values, units,

categorical data, automatic differentiation, better datetimes, etc.

etc. -- and make numpy much more friendly in general to third-party

extensions.

I'd like to see the ufunc system revisited in the light of all the

things we know now, to make gufuncs more first-class, provide better

support for user-defined types, more flexible loop selection (e.g.

make it possible to implement np.add.reduce(a, type="kahan")), etc.;

one goal would be to convert a lot of ufunc-like functions (np.mean

etc.) into being real ufuncs, and then they'd automatically benefit

from __numpy_ufunc__, which would also massively improve

interoperability with alternative array types like blaze.

I'd like to see support for extensible label-based indexing, like pandas.

Internally, I'd like to see internal migrating out of C and into

Cython -- we have hundreds of lines of code that could be replaced

with a few lines of Cython and no-one would notice. (Combining this

with a cffi cython backend and pypy would be pretty interesting

too...)

I'd like to see sparse ndarrays, with integration into the ufunc

looping machinery so all ufuncs just work. Or even better, I'd like to

see the right hooks added so that anyone can write a sparse ndarray

package using only public APIs, and have all ufuncs just work. (I was

going to put down deferred/loop-fused/out-of-core computation as a

wishlist item too, but if we do it right then this too could be

implemented by anyone without needing to be baked into numpy proper.)

All of these things would take some work and care, but I think they

could all be done incrementally and without breaking backwards

compatibility. Compare to ipython, which -- as Fernando likes to point

out :-) -- went from a little console program to its current

distributed-notebook-skynet-whatever-it-is by merging one working PR

at a time. Certainly these changes would much easier and less

disruptive than any plan that involves throwing out numpy and starting

over. But they also do help smooth the way for an incremental

transition to a world where numpy is regularly used alongside other

libraries.

-n

--

Nathaniel J. Smith

Postdoctoral researcher - Informatics - University of Edinburgh

http://vorpus.org_______________________________________________

NumPy-Discussion mailing list

[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion