how much does binary size matter?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how much does binary size matter?

ralfgommers
Hi all,

In https://github.com/numpy/numpy/pull/13207 a discussion started about the tradeoff between performance gain for one function vs increasing the size of a NumPy build by a couple of percent. We also discussed that in the community call on Wednesday and concluded that it may be useful to ask here for some more feedback. Beyond disk/memory usage and download bandwidth, are there specific problems that people are struggling with regarding size of numpy binaries?

Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we causing a real problem for someone?

Thanks,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

Éric Depagne-2
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
Hi Ralf,

>
> Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
> causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd
even say in most of the countries around the world), 16 Mb is a significant
amount of data so increasing it is a burden.

Cheers,
Éric.

>
> Thanks,
> Ralf


--
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                            



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

Ilhan Polat
here is a baseline https://en.wikipedia.org/wiki/List_of_countries_by_Internet_connection_speeds . Probably a good idea to throttle values at 60% of the bandwidth and you get a crude average delay it would cause per 1MB worldwide.



On Fri, Apr 26, 2019 at 11:49 AM Éric Depagne <[hidden email]> wrote:
Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
Hi Ralf,

>
> Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
> causing a real problem for someone?
Access to large bandwidth is not universal at all, and in many countries (I'd
even say in most of the countries around the world), 16 Mb is a significant
amount of data so increasing it is a burden.

Cheers,
Éric.

>
> Thanks,
> Ralf


--
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                           



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

Éric Depagne-2
Le vendredi 26 avril 2019, 12:49:39 SAST Ilhan Polat a écrit :
Hi Ihlan,

That's an interesting link, but they provide the average, which is not a very
good indicator. I have myself a 100 Mb/s link where I live, which means that
as Akamai ranks my country with an average speed of 6.7 Mb/s, a lot of person
have a connection that does not reach 1 Mb/s. Of course, many of those will
not be interested in downloading numpy, so that might not be an issue.

Éric.

> here is a baseline
> https://en.wikipedia.org/wiki/List_of_countries_by_Internet_connection_speed
> s . Probably a good idea to throttle values at 60% of the bandwidth and you
> get a crude average delay it would cause per 1MB worldwide.
>
> On Fri, Apr 26, 2019 at 11:49 AM Éric Depagne <[hidden email]> wrote:
> > Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
> > Hi Ralf,
> >
> > > Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
> > > causing a real problem for someone?
> >
> > Access to large bandwidth is not universal at all, and in many countries
> > (I'd
> > even say in most of the countries around the world), 16 Mb is a
> > significant
> > amount of data so increasing it is a burden.
> >
> > Cheers,
> > Éric.
> >
> > > Thanks,
> > > Ralf
> >
> > --
> > Un clavier azerty en vaut deux
> > ----------------------------------------------------------
> > Éric Depagne
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion


--
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                            



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

Matthew Brett
Hi,

Obviously this is a trade-off; if we can increase binary size we can
add more optimizations, for more platforms, and development will be a
little faster, because we are not having to spend time optimizing for
binary size.

If people on slow internet connections had to download numpy multiple
times, I guess this would be an issue, but I've lived behind
excruciatingly slow and unreliable connections, in Cuba, and, for a
download you do a few times a year, I very much doubt there would be
practical difference between 16M and say, 20M.  If it's 16M vs 50M,
then I think it's worth having the discussion, with the relevant
trade-offs.

Cheers,

Matthew

On Fri, Apr 26, 2019 at 1:30 PM Éric Depagne <[hidden email]> wrote:

>
> Le vendredi 26 avril 2019, 12:49:39 SAST Ilhan Polat a écrit :
> Hi Ihlan,
>
> That's an interesting link, but they provide the average, which is not a very
> good indicator. I have myself a 100 Mb/s link where I live, which means that
> as Akamai ranks my country with an average speed of 6.7 Mb/s, a lot of person
> have a connection that does not reach 1 Mb/s. Of course, many of those will
> not be interested in downloading numpy, so that might not be an issue.
>
> Éric.
>
> > here is a baseline
> > https://en.wikipedia.org/wiki/List_of_countries_by_Internet_connection_speed
> > s . Probably a good idea to throttle values at 60% of the bandwidth and you
> > get a crude average delay it would cause per 1MB worldwide.
> >
> > On Fri, Apr 26, 2019 at 11:49 AM Éric Depagne <[hidden email]> wrote:
> > > Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
> > > Hi Ralf,
> > >
> > > > Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
> > > > causing a real problem for someone?
> > >
> > > Access to large bandwidth is not universal at all, and in many countries
> > > (I'd
> > > even say in most of the countries around the world), 16 Mb is a
> > > significant
> > > amount of data so increasing it is a burden.
> > >
> > > Cheers,
> > > Éric.
> > >
> > > > Thanks,
> > > > Ralf
> > >
> > > --
> > > Un clavier azerty en vaut deux
> > > ----------------------------------------------------------
> > > Éric Depagne
> > >
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> --
> Un clavier azerty en vaut deux
> ----------------------------------------------------------
> Éric Depagne
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

Julian Taylor-3
In reply to this post by Éric Depagne-2
Hi,
We understand that it can be burden, of course a larger binary is bad
but that bad usually also comes with good, like better performance or
more features.

How much of a burden is it and where is the line between I need twice as
long to download it which is just annoying and I cannot use it anymore
because for example it does not fit onto my device anymore.

Are there actual environments or do you know of any environments where
the size of the numpy binary has an impact on whether it can be deployed
or not or where it is more preferable for numpy to be small than it is
to be fast or full of features.

This is interesting to us just to judge on how to handle marginal
improvements which come with relatively large increases in binary size.
With some use case information we can better estimate were it is
worthwhile to think about alternatives or to spend more benchmarking
work to determine the most important cases and where not.

If there are such environments there are other options than blocking or
complicating future enhancements, like for example add a compile time
option to make it smaller again by e.g. stripping out hardware specific
code or avoiding size expensive optimizations.
But without concrete usecases this appears to be a not something worth
spending time on.

On 26.04.19 11:47, Éric Depagne wrote:

> Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
> Hi Ralf,
>
>>
>> Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
>> causing a real problem for someone?
> Access to large bandwidth is not universal at all, and in many countries (I'd
> even say in most of the countries around the world), 16 Mb is a significant
> amount of data so increasing it is a burden.
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

Éric Depagne-2
Le vendredi 26 avril 2019, 21:13:22 SAST Julian Taylor a écrit :
Hi all,

It seems that my message was misinterpreted, so let me clarify a few things.

I'm not saying that increasing the size of the binary is a bad thing,
specially if there are lots of improvements that caused this increase.

My message was just a note to be sure that bandwidth availability is not
forgotten, as it's fairly easy (I'm guilty of that myself) to take for granted
that downloads will always be fast and hassle free.

Concerning the environments where it matters, I currently live in South Africa,
and even if things are improving fast in terms of bandwidth availability,
there is still a long way for people to get fast access at their houses for a
fee that is accessible. So I'd say that environments where the size of
binaries has no impact are the clear minority here.

That said, as I've raised the issue I wanted, and you are aware of it, I do not
see a reason to increase the size of this thread.

Cheers,
Éric.

> Hi,
> We understand that it can be burden, of course a larger binary is bad
> but that bad usually also comes with good, like better performance or
> more features.
>
> How much of a burden is it and where is the line between I need twice as
> long to download it which is just annoying and I cannot use it anymore
> because for example it does not fit onto my device anymore.
>
> Are there actual environments or do you know of any environments where
> the size of the numpy binary has an impact on whether it can be deployed
> or not or where it is more preferable for numpy to be small than it is
> to be fast or full of features.
>
> This is interesting to us just to judge on how to handle marginal
> improvements which come with relatively large increases in binary size.
> With some use case information we can better estimate were it is
> worthwhile to think about alternatives or to spend more benchmarking
> work to determine the most important cases and where not.
>
> If there are such environments there are other options than blocking or
> complicating future enhancements, like for example add a compile time
> option to make it smaller again by e.g. stripping out hardware specific
> code or avoiding size expensive optimizations.
> But without concrete usecases this appears to be a not something worth
> spending time on.
>
> On 26.04.19 11:47, Éric Depagne wrote:
> > Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
> > Hi Ralf,
> >
> >> Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
> >> causing a real problem for someone?
> >
> > Access to large bandwidth is not universal at all, and in many countries
> > (I'd even say in most of the countries around the world), 16 Mb is a
> > significant amount of data so increasing it is a burden.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion


--
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                            



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how much does binary size matter?

ralfgommers


On Sat, Apr 27, 2019 at 8:04 PM Éric Depagne <[hidden email]> wrote:
Le vendredi 26 avril 2019, 21:13:22 SAST Julian Taylor a écrit :
Hi all,

It seems that my message was misinterpreted, so let me clarify a few things.

I'm not saying that increasing the size of the binary is a bad thing,
specially if there are lots of improvements that caused this increase.

My message was just a note to be sure that bandwidth availability is not
forgotten, as it's fairly easy (I'm guilty of that myself) to take for granted
that downloads will always be fast and hassle free.

Thanks Eric, your point is clear and we definitely won't forget to consider users on older hardware or behind slow connections.


Concerning the environments where it matters, I currently live in South Africa,
and even if things are improving fast in terms of bandwidth availability,
there is still a long way for people to get fast access at their houses for a
fee that is accessible. So I'd say that environments where the size of
binaries has no impact are the clear minority here.

That said, as I've raised the issue I wanted, and you are aware of it, I do not
see a reason to increase the size of this thread.

Cheers,
Éric.

> Hi,
> We understand that it can be burden, of course a larger binary is bad
> but that bad usually also comes with good, like better performance or
> more features.
>
> How much of a burden is it and where is the line between I need twice as
> long to download it which is just annoying and I cannot use it anymore
> because for example it does not fit onto my device anymore.
>
> Are there actual environments or do you know of any environments where
> the size of the numpy binary has an impact on whether it can be deployed
> or not or where it is more preferable for numpy to be small than it is
> to be fast or full of features.

Here is my take on it:
The case of this PR is borderline. If we would write down a hard criterion, this likely would not meet it. Rationale: if we get 100 PRs like this, the average performance of numpy for a user would not change all that much, however we have by then blown up the size NumPy takes up (disk/RAM/download/etc) by a factor 2.4. *However*, we won't get 100 PRs like this. So judging this based on such a criterion isn't quite right. We have this PR now, and it's good to go. Presumably it helps @qwhelan significantly. So I'm +0.5 for merging it.

Also note that Cython has the same problem: taking one function and putting it in a .pyx file gives a huge amount of bloat (example: `scipy.ndimage.label`). We had the same discussion there, but it never became a practical issue because there were not many other PRs like that.

tl;dr let's merge this, and let's try not to make these kinds of changes a habit

Cheers,
Ralf


>
> This is interesting to us just to judge on how to handle marginal
> improvements which come with relatively large increases in binary size.
> With some use case information we can better estimate were it is
> worthwhile to think about alternatives or to spend more benchmarking
> work to determine the most important cases and where not.
>
> If there are such environments there are other options than blocking or
> complicating future enhancements, like for example add a compile time
> option to make it smaller again by e.g. stripping out hardware specific
> code or avoiding size expensive optimizations.
> But without concrete usecases this appears to be a not something worth
> spending time on.
>
> On 26.04.19 11:47, Éric Depagne wrote:
> > Le vendredi 26 avril 2019, 11:10:56 SAST Ralf Gommers a écrit :
> > Hi Ralf,
> >
> >> Right now a wheel is 16 MB. If we increase that by 10%/50%/100% - are we
> >> causing a real problem for someone?
> >
> > Access to large bandwidth is not universal at all, and in many countries
> > (I'd even say in most of the countries around the world), 16 Mb is a
> > significant amount of data so increasing it is a burden.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion


--
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                           



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion