Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
Hi,

    I've just started working on a prototype for a plugin system for
numpy. The plugin aims at providing a framework for the following user
cases:
    - runtime selection of blas/lapack/etc...: instead of harcoding in
the binary one blas/lapack implementation, numpy could choose the SSE
optimized if the CPU supports SSE, etc...
    - this could also be used for core numpy, for example ufuncs: if we
want to start implementing some tight loop with aggressively optimized
code (SSE, etc...), we could again ship with a default pure C
implementation, and choose the best one at runtime.
    - we could even have a system to choose a different implementation
(for example, right now, scipy is shipped with a slow fft for licensing
issues mainly, and people installing fftw could then tell scipy to use
fftw instead of the included one).

Right now, the prototype does not do much, and only works for linux; I
mainly focused on automatic generation of the plugin from a list of
functions, and transparent use from numpy point of view. It provides the
plugin api through pure function pointers, without the need for the user
to be aware of it. For example, if you have an api with the following
functions:

void    foo1();
int     foo2();
int     foo3(int);
int     foo4(double* , double*);
int     foo5(double* , double*, int);

The current implementation would build the boilerplate to load those
functions, etc... and you would just use those functions in numpy like
the following:

init_foo();

/* all functions are prefixed with npyw, for numpy wrapper */
npyw_foo1();
npyw_foo2(n);
etc...

The code can be found there:

https://code.launchpad.net/~david-ar/+junk/numplug

And some thinking (pretty low content for now):

http://www.scipy.org/RuntimeOptimization

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Stéfan van der Walt
2008/4/28 David Cournapeau <[hidden email]>:
>     - this could also be used for core numpy, for example ufuncs: if we
>  want to start implementing some tight loop with aggressively optimized
>  code (SSE, etc...), we could again ship with a default pure C
>  implementation, and choose the best one at runtime.

This would be a *fantastic* addition, especially if a user can add his
own ufuncs written in, say Cython.

>  Right now, the prototype does not do much, and only works for linux; I
>  mainly focused on automatic generation of the plugin from a list of
>  functions, and transparent use from numpy point of view. It provides the
>  plugin api through pure function pointers, without the need for the user
>  to be aware of it. For example, if you have an api with the following
>  functions:
>
>  void    foo1();
>  int     foo2();
>  int     foo3(int);
>  int     foo4(double* , double*);
>  int     foo5(double* , double*, int);
>
>  The current implementation would build the boilerplate to load those
>  functions, etc... and you would just use those functions in numpy like
>  the following:

I assume that, since you call it a plugin system, it can be done at
runtime a-la ctypes?

Cheers
Stéfan
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

David Cournapeau
On Tue, Apr 29, 2008 at 1:00 AM, Stéfan van der Walt <[hidden email]> wrote:

>  I assume that, since you call it a plugin system, it can be done at
>  runtime a-la ctypes?

I am not sure to understand what you mean exactly by a-la ctypes, but
yes, the actual implementation of the npyw_* functions would be
decided at runtime, that's the whole point. For example, instead of
using directly cblas_*dot* functions in blasdot, we would use
npyw_cblas*dot* functions, which would point to something in SSE3
optimized atlas if run on SSE3 cpu, etc... For the actual code change
in numpy and scipy to be minimal, it should only involve renaming
functions, which is what this first prototype focus on.

Once the sytem is ready to be integrated (not anytime soon; for once,
we would need support for the build system to build dynamically loaded
libraries), this would mean numpy and scipy would work like matlab,
which ship with different blas/lapack (atlas_sse, atlas_sse2, mkl),
without the user to have to deal with this kind of low level details.

It would also help to make (if only by making the incentive) a cleaner
difference between pure C implementation and python C api boilerplate
code. I think that optimizing some ufuncs with SSE and co without a
runtime optimization would be a nightmare to deploy otherwise. It is
basically linked to the directions I am the most interested in for
future python numpy releases

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Matthew Brett
In reply to this post by Stéfan van der Walt
>  This would be a *fantastic* addition, especially if a user can add his
>  own ufuncs written in, say Cython.

I'd like to add some large number to whatever *fantastic* means in terms of +N!

Best,

Matthew
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Stéfan van der Walt
In reply to this post by David Cournapeau
2008/4/28 David Cournapeau <[hidden email]>:
> On Tue, Apr 29, 2008 at 1:00 AM, Stéfan van der Walt <[hidden email]> wrote:
>
>  >  I assume that, since you call it a plugin system, it can be done at
>  >  runtime a-la ctypes?

What I meant was: would the plugin "slots" be decided beforehand, or
could we manipulate them at runtime?  I.e. what I would really enjoy
doing is define arbitrary ufuncs and plug them in (not only the blas
funcs and a select few others).

Either way, this is good news for distributors -- it would make it so
much easier to provide an optimised version of scipy to windows users,
who can't easily compile it themselves.

Cheers
Stéfan
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

David Cournapeau
On Tue, Apr 29, 2008 at 4:03 AM, Stéfan van der Walt <[hidden email]> wrote:
>
>  What I meant was: would the plugin "slots" be decided beforehand, or
>  could we manipulate them at runtime?  I.e. what I would really enjoy
>  doing is define arbitrary ufuncs and plug them in (not only the blas
>  funcs and a select few others).

For each set of functions, there is a init function (just like python
extensions) you must call before calling any function, and you can do
what you want there: this could be controlled by environment variables
and the likes. I still don't have a clear idea on the API for handling
the different ways you would like to load the plugin (multi thread vs
mono-thread, SSE1/2/3/4 vs no SSE, etc...), though. I would prefer
every plugin to have exactly the same signature for the init function,
and would like to avoid as much as possible global state.

The thing I wanted to do but quickly gave up because of the complexity
is unloading/reloading a plugin. For example, at the beginning, you
may want to load atlas for the blas, and then use mkl instead.

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
In reply to this post by Stéfan van der Walt
Stéfan van der Walt wrote:
>
> What I meant was: would the plugin "slots" be decided beforehand, or
> could we manipulate them at runtime?  I.e. what I would really enjoy
> doing is define arbitrary ufuncs and plug them in (not only the blas
> funcs and a select few others).
>  

Do you want to define the ufuncs at runtime ? Or do you want to be able
to select the ufuncs at runtime ?

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Stéfan van der Walt
2008/4/29 David Cournapeau <[hidden email]>:

> Stéfan van der Walt wrote:
>  >
>  > What I meant was: would the plugin "slots" be decided beforehand, or
>  > could we manipulate them at runtime?  I.e. what I would really enjoy
>  > doing is define arbitrary ufuncs and plug them in (not only the blas
>  > funcs and a select few others).
>  >
>
>  Do you want to define the ufuncs at runtime ? Or do you want to be able
>  to select the ufuncs at runtime ?

Both, eventually, but I realise that this is somewhat out of the scope
of your original suggestion.

Don't let me distract you, I'm very keen to see what you come up with!

Cheers
Stéfan
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
Stéfan van der Walt wrote:
> Don't let me distract you, I'm very keen to see what you come up with!
>  

Well, the point of making my preliminary work public is to get
"distracted", as you put it :) It is really easy to come up with
something not that useful without feedback or people remarks.

Typically, I would have never thought about the usefulness of defining
ufunc functions on the fly; I have my own ideas on how/why I do things,
but I would consider it a failure if it was limited to that,

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Lisandro Dalcin
In reply to this post by cdavid
David, I briefly took a look at your code, and I have a very, very
important observation.

Your implementation make uses of low level dlopening. Then, your are
going to have to manage all the oddities of runtime loading in the
different systems. In this case, 'libtool' could really help. I know,
it is GPL, but AFAIK it has some special licencing that let's you ship
it with your code in the same licence terms than your code.

But, I definitely think that a betther approach would be using a stubs
mechanism ala TCL, or wath is currently used in some extension modules
in core python, like cStringIO. In short, you access all your
functions from a pointer do a struct (statically or heap allocated)
where each struct member is filled with a pointer to a function. This
is pretty much similar to C++ virtual tables, or the Cython cdef's
classes with cdef's methods. And then you just let Python do the work
of dynamic loading of extension modules. The numpy C/API uses a
similar approach, but uses an array. IMHO, the struct approach is
cleaner.

What do you think about this?




On 4/28/08, David Cournapeau <[hidden email]> wrote:

> Hi,
>
>     I've just started working on a prototype for a plugin system for
>  numpy. The plugin aims at providing a framework for the following user
>  cases:
>     - runtime selection of blas/lapack/etc...: instead of harcoding in
>  the binary one blas/lapack implementation, numpy could choose the SSE
>  optimized if the CPU supports SSE, etc...
>     - this could also be used for core numpy, for example ufuncs: if we
>  want to start implementing some tight loop with aggressively optimized
>  code (SSE, etc...), we could again ship with a default pure C
>  implementation, and choose the best one at runtime.
>     - we could even have a system to choose a different implementation
>  (for example, right now, scipy is shipped with a slow fft for licensing
>  issues mainly, and people installing fftw could then tell scipy to use
>  fftw instead of the included one).
>
>  Right now, the prototype does not do much, and only works for linux; I
>  mainly focused on automatic generation of the plugin from a list of
>  functions, and transparent use from numpy point of view. It provides the
>  plugin api through pure function pointers, without the need for the user
>  to be aware of it. For example, if you have an api with the following
>  functions:
>
>  void    foo1();
>  int     foo2();
>  int     foo3(int);
>  int     foo4(double* , double*);
>  int     foo5(double* , double*, int);
>
>  The current implementation would build the boilerplate to load those
>  functions, etc... and you would just use those functions in numpy like
>  the following:
>
>  init_foo();
>
>  /* all functions are prefixed with npyw, for numpy wrapper */
>  npyw_foo1();
>  npyw_foo2(n);
>  etc...
>
>  The code can be found there:
>
>  https://code.launchpad.net/~david-ar/+junk/numplug
>
>  And some thinking (pretty low content for now):
>
>  http://www.scipy.org/RuntimeOptimization
>
>  cheers,
>
>  David
>  _______________________________________________
>  Numpy-discussion mailing list
>  [hidden email]
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>


--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Andreas Klöckner-3
On Dienstag 29 April 2008, Lisandro Dalcin wrote:
> Your implementation make uses of low level dlopening. Then, your are
> going to have to manage all the oddities of runtime loading in the
> different systems.

Argh. -1 for a hard dependency on dlopen(). At some point in my life, I might
be forced to compile numpy on an IBM Bluegene/L, which does *not* have
dynamic linking at all. (Btw, anybody done something like this before?)

Andreas

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
In reply to this post by Lisandro Dalcin
Lisandro Dalcin wrote:
> David, I briefly took a look at your code, and I have a very, very
> important observation.
>
> Your implementation make uses of low level dlopening. Then, your are
> going to have to manage all the oddities of runtime loading in the
> different systems. In this case, 'libtool' could really help. I know,
> it is GPL, but AFAIK it has some special licencing that let's you ship
> it with your code in the same licence terms than your code.
Ok, there are several issues here:
    1 cross platform runtime loading
    2 how to access the plugin capabilities (function pointer,
interfaces, etc...)
    3 how to build

1: the implementation is not cross platform, but the API is; It took me
~ 2 hours to refactor symbol loading, and getting an implementation for
posix/win32/Mac os X. I don't know any OS with runtime loading
capabilities and without the ability to load a file and a symbol from
it; if it does not, it cannot be used by python anyway, everything would
have to be build statically at the same time as python, which we do not
support in numpy anway, AFAIK.

2: I studied quite a bit several approaches before using this one. That
was my main concern at first. For plugins, you have the following
possibilities I am aware of:
    - raw function pointer
    - COM
    - pre-defined API

Raw function pointer are the simplest, but is not really scalable. COM
is this big monstrosity, extremely ackward to use, but can be extended
ad nauseum, without pre-defined interface. By pre-defined API, I mean
something like VST plugins and the co (used for music softwares, where a
host can load may plugins to provide sound effectsl it is the de-facto
standard, Mac OS X has its built-in thing called AudioUnit, which is the
same thing for what matters here).

For the usage I have in mind (blas, fft, lapack, etc...), the API  
cannot be pre-defined (each one is totally different), so I quickly
dismiss the VST-like approach. Then there is the COM thing, which is
really complicated. Although each plugin interface is totally different
(blas vs lapack), they are relatively fixed in stone, so I thought that
by using generated code, the scalability problem of raw pointers could
be alleviated.

3: libtool does not know about windows, and I think it is way too
overkill. We can't use libtool for building (which is one of the big
thing libtool provides). dlopen-like approach is not as portable as
libtool, but it is as portable as python, which is good enough for numpy :)

> But, I definitely think that a betther approach would be using a stubs
> mechanism ala TCL, or wath is currently used in some extension modules
> in core python, like cStringIO. In short, you access all your
> functions from a pointer do a struct (statically or heap allocated)
> where each struct member is filled with a pointer to a function. This
> is pretty much similar to C++ virtual tables, or the Cython cdef's
> classes with cdef's methods. And then you just let Python do the work
> of dynamic loading of extension modules. The numpy C/API uses a
> similar approach, but uses an array. IMHO, the struct approach is
> cleaner.
>
> What do you think about this?

It may be cleaner, but I am not convinced it buys us much. With my
approach, all is needed for the existing code (numpy.core, numpy.linalg,
etc...) is a renaming of the used function (which can be done in 5
minutes with sed), because in C, calling a function or a function
pointer is exactly the same thing. With the approach of function
pointers, you will have to replace all the function calls for blas,
lapack, etc... by a system to pass the array of function pointers. That
sounds like nightmare to me, because I don't see how to do that
automatically. Maybe I just don't see it; in that case, what would be
your approach ?

I have been convinced that the function pointer approach is usable by
looking at liboil, which does exactly the thing we need:

http://liboil.freedesktop.org/wiki/

(you can look at the liboil/liboilfuncs* and liboil/liboilfunc.* files).
Liboil's approach is more complicated, because the function pointer is
provided by a function factory (so that each function can be initialized
differently), but I don't think we need the factory in our case (and if
we need, we can do it without changing anything in the code which uses
the plugin).

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
In reply to this post by Andreas Klöckner-3
Andreas Klöckner wrote:
>
> Argh. -1 for a hard dependency on dlopen().

There is no hard dependency on dlopen, there is a hard dependency on
runtime loading, because well, that's the point of a plugin system. It
should not be difficult to be able to disable the plugin system for
platforms who do not support it, though (and do as today), but I am not
sure it is really useful.

> At some point in my life, I might
> be forced to compile numpy on an IBM Bluegene/L, which does *not* have
> dynamic linking at all. (Btw, anybody done something like this before?)

How will you build numpy in the case of a system without dynamic linking
? The only solution is then to build numpy and link it statically to the
python interpreter. Systems without dynamic linking are common (embedded
systems), though.

cheers,

David


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Andreas Klöckner-3
On Dienstag 29 April 2008, David Cournapeau wrote:
> Andreas Klöckner wrote:
> > Argh. -1 for a hard dependency on dlopen().
>
> There is no hard dependency on dlopen, there is a hard dependency on
> runtime loading, because well, that's the point of a plugin system. It
> should not be difficult to be able to disable the plugin system for
> platforms who do not support it, though (and do as today), but I am not
> sure it is really useful.

As long as it's easy to disable (for example with a preprocessor define), I
guess I'm ok.

> > At some point in my life, I might
> > be forced to compile numpy on an IBM Bluegene/L, which does *not* have
> > dynamic linking at all. (Btw, anybody done something like this before?)
>
> How will you build numpy in the case of a system without dynamic linking
> ? The only solution is then to build numpy and link it statically to the
> python interpreter. Systems without dynamic linking are common (embedded
> systems), though.

Yes, obviously everything will need to be linked into one big static
executable blob. I am somewhat certain that distutils will be of no help
there, so I will need to "roll my own". There is a CMake-based build of
Python for BG/L, I was planning to work off that.

But so far, I might not end up having to do all that, for which I'd be
endlessly grateful.

Andreas

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
Andreas Klöckner wrote:
>
> Yes, obviously everything will need to be linked into one big static
> executable blob. I am somewhat certain that distutils will be of no help
> there, so I will need to "roll my own". There is a CMake-based build of
> Python for BG/L, I was planning to work off that.
>  

You will have to build numpy too. Not that I want to discourage you, but
that will be a hell of a work.

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

cdavid
In reply to this post by Andreas Klöckner-3
Andreas Klöckner wrote:
> But so far, I might not end up having to do all that, for which I'd be
> endlessly grateful.
>  

If you really need it, note that numpy can be built with scons instead
of distutils, and the scons scripts are now available in numpy svn (and
will be included in the releases sources starting from 1.1.0).

Scons severely lacks in the cross-compilation departement, but I think
scons scripts are easier to adapt to cmake than distutils setup.py files
if you need to use cmake :) I would actually be quite interested in
making numpy build in a cross-compilation environment with scons

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Starting to work on runtime plugin system for plugin (automatic sse optimization, etc...)

Andreas Klöckner-3
In reply to this post by cdavid
On Dienstag 29 April 2008, David Cournapeau wrote:
> Andreas Klöckner wrote:
> > Yes, obviously everything will need to be linked into one big static
> > executable blob. I am somewhat certain that distutils will be of no help
> > there, so I will need to "roll my own". There is a CMake-based build of
> > Python for BG/L, I was planning to work off that.
>
> You will have to build numpy too. Not that I want to discourage you, but
> that will be a hell of a work.

Good news is that Bluegene/P (the next version of that architecture) *does*
support dynamic linking. It's probably broken in some obscure way, but that's
(hopefully) better than not exsitent. :)

In any case, if I can't dodge porting my code to BG/L, you'll hear from me. :)

Andreas

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

signature.asc (196 bytes) Download Attachment