Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

Sebastian Haase
Hi,
can someone comment on these timing numbers ?
http://narray.rubyforge.org/bench.html.en

Is the current numpy faster ?

Cheers,
Sebastian Haase


On Sat, May 3, 2008 at 2:07 AM, Travis E. Oliphant
<[hidden email]> wrote:

>
> http://narray.rubyforge.org/matrix-e.html
>
> It seems they've implemented some of what Tim is looking for, in
> particular.  Perhaps there is information to be gleaned from what they
> are doing.   It looks promising..
>
> -Travis
>
>
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

cdavid
Sebastian Haase wrote:
> Hi,
> can someone comment on these timing numbers ?
> http://narray.rubyforge.org/bench.html.en
>
> Is the current numpy faster ?
>  

It is hard to know without getting the same machine or having the
benchmark sources. But except for add, all other operations rely on
underlying blas/lapack (only matrix operations do if you have no cblas),
so I am a bit surprised by the results.

FWIW, doing 100 x "c = a + b" with 1e6 elements on a PIV prescott @ 3.2
Ghz is about 2 sec, and I count numpy start:

import numpy as np

a = np.random.randn(1e6)
b = np.random.randn(1e6)

for i in range(100):
    a + b

And np.dot(a, b) for 3 iterations and 500x500 takes 0.5 seconds (again
taking into account numpy import), but what you really do here is
benchmarking your underlying BLAS (if numpy.dot does use BLAS, again,
which it does at least when built with ATLAS).

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

Anne Archibald
2008/5/16 David Cournapeau <[hidden email]>:

> Sebastian Haase wrote:
>> Hi,
>> can someone comment on these timing numbers ?
>> http://narray.rubyforge.org/bench.html.en
>>
>> Is the current numpy faster ?
>>
>
> It is hard to know without getting the same machine or having the
> benchmark sources. But except for add, all other operations rely on
> underlying blas/lapack (only matrix operations do if you have no cblas),
> so I am a bit surprised by the results.
>
> FWIW, doing 100 x "c = a + b" with 1e6 elements on a PIV prescott @ 3.2
> Ghz is about 2 sec, and I count numpy start:
>
> import numpy as np
>
> a = np.random.randn(1e6)
> b = np.random.randn(1e6)
>
> for i in range(100):
>    a + b
>
> And np.dot(a, b) for 3 iterations and 500x500 takes 0.5 seconds (again
> taking into account numpy import), but what you really do here is
> benchmarking your underlying BLAS (if numpy.dot does use BLAS, again,
> which it does at least when built with ATLAS).

There are four benchmarks: add, multiply, dot, and solve. dot and
solve use BLAS, and for them numpy ruby and octave are comparable. Add
and multiply are much slower in numpy, but they are implemented in
numpy itself.

Exactly why add and multiply are slower is an interesting question -
loop overhead? striding? cache behaviour?

Anne
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

David Cournapeau
On Sat, May 17, 2008 at 12:00 AM, Anne Archibald
<[hidden email]> wrote:

>
> There are four benchmarks: add, multiply, dot, and solve. dot and
> solve use BLAS, and for them numpy ruby and octave are comparable. Add
> and multiply are much slower in numpy, but they are implemented in
> numpy itself.

The benchmark was done in 2005, and we do not know how it was done (no
source). I don't know anything about ruby (that's my first ruby
"program") but:

cat > test1.py
require "narray"

a = NArray.float(1e6).fill(0)
b = NArray.float(1e6).fill(0)

for i in 1..200
        a + b
end
EOF

cat > test1.py
import numpy as np

a = np.zeros(1e6)
b = np.zeros(1e6)

for i in range(200):
        a + b
EOF

Give me extremely close results (now on my macbook with a core 2 duo).
One nice thing with narray is the speed when loading it (10 x faster),
but that may well be because narray is much smaller than numpy.

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

Pauli Virtanen-3
la, 2008-05-17 kello 00:39 +0900, David Cournapeau kirjoitti:

> On Sat, May 17, 2008 at 12:00 AM, Anne Archibald
> <[hidden email]> wrote:
>
> >
> > There are four benchmarks: add, multiply, dot, and solve. dot and
> > solve use BLAS, and for them numpy ruby and octave are comparable. Add
> > and multiply are much slower in numpy, but they are implemented in
> > numpy itself.
>
> The benchmark was done in 2005, and we do not know how it was done (no
> source). I don't know anything about ruby (that's my first ruby
> "program") but:
[clip]

The benchmark sources are in Narray's source directory.

I took a look and my conclusion is that the benchmark is simply flawed:
for Ruby, only user time is counted, while for Python, both user and
system times are counted. The code uses Python's time.clock() which
according to the documentation returns the CPU time (apparently user +
system). On the Ruby side it uses Process.times.utime which is the
elapsed user time.

Running the original tests as they are in NArray 0.5.9 yields (I took
representative ones from several runs. Eyeballing, the std between runs
appeared of the order of 0.1...0.2s):
       
        ### Numeric 24.2 (24.2-8ubuntu2)
        ### Narray 0.5.9 (0.5.9-2)
        ### numpy 1.0.4 (1:1.0.4-6ubuntu3)
        ###
        ### All of these from Ubuntu 8.04 packages.
       
        $ time ruby mul.rb
        a = NArray.float(1000000):
        [ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
        12.0, ... ]
        b = NArray.float(1000000):
        [ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
        12.0, ... ]
        calculating c = a*b ...
         Time: 3.05 sec
       
       
        real 0m5.039s
        user 0m3.116s
        sys 0m1.564s
       
Obviously, the reported time here is the user time only!
       
        $ time python mul.py   # the old Numeric
        a.typecode: d ,  a.shape: (1000000,)
        b.typecode: d ,  b.shape: (1000000,)
        calculating c = a*b ...
          Time:   6.020 sec
       
        real 0m6.999s
        user 0m4.308s
        sys 0m2.164s
       
Whereas here it must be the sum of the user and system times!

Running tests for numpy and fixed time counting for Ruby:

        $ time python mul_numpy.py   # the new numpy
        a.typecode: float64 ,  a.shape: (1000000,)
        b.typecode: float64 ,  b.shape: (1000000,)
        calculating c = a*b ...
          Time:   4.580 sec
       
        real 0m5.774s
        user 0m3.352s
        sys 0m1.996s
       
        $ time ruby mul_correct.rb  # using T.times.utime +
        T.times.stime
        a = NArray.float(1000000):
        [ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
        12.0, ... ]
        b = NArray.float(1000000):
        [ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
        12.0, ... ]
        calculating c = a*b ...
         Time: 4.57 sec
       
       
        real 0m5.045s
        user 0m3.060s
        sys 0m1.620s

I think this shows that there is no discernible difference between the
performance of numpy and Ruby's NArray. Even though the performance of
numpy and NArray is indeed better than that of Numeric, the difference
is not as large as the original benchmark led to believe.

Benchmark files attached, in case someone wants to contest my analysis.

--
Pauli Virtanen


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

mul.py (372 bytes) Download Attachment
mul.rb (188 bytes) Download Attachment
mul_correct.rb (200 bytes) Download Attachment
mul_numpy.py (368 bytes) Download Attachment
mybench.py (434 bytes) Download Attachment
mybench.rb (656 bytes) Download Attachment
mybench_correct.rb (700 bytes) Download Attachment
mybench_numpy.py (426 bytes) Download Attachment
signature.asc (196 bytes) Download Attachment