The mu.py script will keep running and never end.

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

The mu.py script will keep running and never end.

Hongyi Zhao
Hi,

My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
try to run the script
<https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
but it will keep running and never end. When I use 'Ctrl + c' to
terminate it, it will give the following output:

$ python mu.py
[-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]

I have to terminate it and obtained the following information:

^CTraceback (most recent call last):
  File "mu.py", line 38, in <module>
    integrand=DOS*fermi_array(energy,mu,kT)
  File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
line 2108, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
line 2192, in _vectorize_call
    outputs = ufunc(*inputs)
  File "mu.py", line 8, in fermi
    return 1./(exp((E-mu)/kT)+1)
KeyboardInterrupt


Any helps and hints for this problem will be highly appreciated?

Regards,
--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Robert Kern-2
You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster. 

On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
Hi,

My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
try to run the script
<https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
but it will keep running and never end. When I use 'Ctrl + c' to
terminate it, it will give the following output:

$ python mu.py
[-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]

I have to terminate it and obtained the following information:

^CTraceback (most recent call last):
  File "mu.py", line 38, in <module>
    integrand=DOS*fermi_array(energy,mu,kT)
  File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
line 2108, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
line 2192, in _vectorize_call
    outputs = ufunc(*inputs)
  File "mu.py", line 8, in fermi
    return 1./(exp((E-mu)/kT)+1)
KeyboardInterrupt


Any helps and hints for this problem will be highly appreciated?

Regards,
--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>
> You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.

Yes, it really does the trick. See the following for the benchmark
based on your suggestion:

$ time python mu.py
[-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]

real    0m41.056s
user    0m43.970s
sys    0m3.813s


But are there any ways to further improve/increase efficiency?

Regards,
HY

>
> On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>>
>> Hi,
>>
>> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> try to run the script
>> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> but it will keep running and never end. When I use 'Ctrl + c' to
>> terminate it, it will give the following output:
>>
>> $ python mu.py
>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>
>> I have to terminate it and obtained the following information:
>>
>> ^CTraceback (most recent call last):
>>   File "mu.py", line 38, in <module>
>>     integrand=DOS*fermi_array(energy,mu,kT)
>>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> line 2108, in __call__
>>     return self._vectorize_call(func=func, args=vargs)
>>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> line 2192, in _vectorize_call
>>     outputs = ufunc(*inputs)
>>   File "mu.py", line 8, in fermi
>>     return 1./(exp((E-mu)/kT)+1)
>> KeyboardInterrupt
>>
>>
>> Any helps and hints for this problem will be highly appreciated?
>>
>> Regards,
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Andrea Gavana
Hi,

On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>
> You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.

Yes, it really does the trick. See the following for the benchmark
based on your suggestion:

$ time python mu.py
[-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]

real    0m41.056s
user    0m43.970s
sys    0m3.813s


But are there any ways to further improve/increase efficiency?


I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?

Andrea.




Regards,
HY

>
> On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>>
>> Hi,
>>
>> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> try to run the script
>> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> but it will keep running and never end. When I use 'Ctrl + c' to
>> terminate it, it will give the following output:
>>
>> $ python mu.py
>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>
>> I have to terminate it and obtained the following information:
>>
>> ^CTraceback (most recent call last):
>>   File "mu.py", line 38, in <module>
>>     integrand=DOS*fermi_array(energy,mu,kT)
>>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> line 2108, in __call__
>>     return self._vectorize_call(func=func, args=vargs)
>>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> line 2192, in _vectorize_call
>>     outputs = ufunc(*inputs)
>>   File "mu.py", line 8, in fermi
>>     return 1./(exp((E-mu)/kT)+1)
>> KeyboardInterrupt
>>
>>
>> Any helps and hints for this problem will be highly appreciated?
>>
>> Regards,
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Andrea Gavana


On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
Hi,

On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>
> You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.

Yes, it really does the trick. See the following for the benchmark
based on your suggestion:

$ time python mu.py
[-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]

real    0m41.056s
user    0m43.970s
sys    0m3.813s


But are there any ways to further improve/increase efficiency?


I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?

Andrea.


I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.





Regards,
HY

>
> On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>>
>> Hi,
>>
>> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> try to run the script
>> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> but it will keep running and never end. When I use 'Ctrl + c' to
>> terminate it, it will give the following output:
>>
>> $ python mu.py
>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>
>> I have to terminate it and obtained the following information:
>>
>> ^CTraceback (most recent call last):
>>   File "mu.py", line 38, in <module>
>>     integrand=DOS*fermi_array(energy,mu,kT)
>>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> line 2108, in __call__
>>     return self._vectorize_call(func=func, args=vargs)
>>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> line 2192, in _vectorize_call
>>     outputs = ufunc(*inputs)
>>   File "mu.py", line 8, in fermi
>>     return 1./(exp((E-mu)/kT)+1)
>> KeyboardInterrupt
>>
>>
>> Any helps and hints for this problem will be highly appreciated?
>>
>> Regards,
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:

>
>
>
> On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>>
>> Hi,
>>
>> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>>>
>>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>>> >
>>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>>>
>>> Yes, it really does the trick. See the following for the benchmark
>>> based on your suggestion:
>>>
>>> $ time python mu.py
>>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>>
>>> real    0m41.056s
>>> user    0m43.970s
>>> sys    0m3.813s
>>>
>>>
>>> But are there any ways to further improve/increase efficiency?
>>
>>
>>
>> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>>
>> Andrea.
>
>
>
> I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.

Still not so clear on your solutions for this problem. Could you
please post here the corresponding snippet of your enhancement?

Regards,
HY

>
>>
>>
>>>
>>>
>>> Regards,
>>> HY
>>>
>>> >
>>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>>> >> try to run the script
>>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>>> >> terminate it, it will give the following output:
>>> >>
>>> >> $ python mu.py
>>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>> >>
>>> >> I have to terminate it and obtained the following information:
>>> >>
>>> >> ^CTraceback (most recent call last):
>>> >>   File "mu.py", line 38, in <module>
>>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>>> >> line 2108, in __call__
>>> >>     return self._vectorize_call(func=func, args=vargs)
>>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>>> >> line 2192, in _vectorize_call
>>> >>     outputs = ufunc(*inputs)
>>> >>   File "mu.py", line 8, in fermi
>>> >>     return 1./(exp((E-mu)/kT)+1)
>>> >> KeyboardInterrupt
>>> >>
>>> >>
>>> >> Any helps and hints for this problem will be highly appreciated?
>>> >>
>>> >> Regards,
>>> >> --
>>> >> Hongyi Zhao <[hidden email]>
>>> >> _______________________________________________
>>> >> NumPy-Discussion mailing list
>>> >> [hidden email]
>>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > [hidden email]
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> --
>>> Hongyi Zhao <[hidden email]>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Andrea Gavana


On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>
>
>
> On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>>
>> Hi,
>>
>> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>>>
>>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>>> >
>>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>>>
>>> Yes, it really does the trick. See the following for the benchmark
>>> based on your suggestion:
>>>
>>> $ time python mu.py
>>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>>
>>> real    0m41.056s
>>> user    0m43.970s
>>> sys    0m3.813s
>>>
>>>
>>> But are there any ways to further improve/increase efficiency?
>>
>>
>>
>> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>>
>> Andrea.
>
>
>
> I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.

Still not so clear on your solutions for this problem. Could you
please post here the corresponding snippet of your enhancement?

I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.

The timings of your approach is highly dependent on the size of your “energy” and “DOS” array - not to mention calling trapz 6000 times in a loop. Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...



Regards,
HY
>
>>
>>
>>>
>>>
>>> Regards,
>>> HY
>>>
>>> >
>>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>>> >> try to run the script
>>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>>> >> terminate it, it will give the following output:
>>> >>
>>> >> $ python mu.py
>>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>>> >>
>>> >> I have to terminate it and obtained the following information:
>>> >>
>>> >> ^CTraceback (most recent call last):
>>> >>   File "mu.py", line 38, in <module>
>>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>>> >> line 2108, in __call__
>>> >>     return self._vectorize_call(func=func, args=vargs)
>>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>>> >> line 2192, in _vectorize_call
>>> >>     outputs = ufunc(*inputs)
>>> >>   File "mu.py", line 8, in fermi
>>> >>     return 1./(exp((E-mu)/kT)+1)
>>> >> KeyboardInterrupt
>>> >>
>>> >>
>>> >> Any helps and hints for this problem will be highly appreciated?
>>> >>
>>> >> Regards,
>>> >> --
>>> >> Hongyi Zhao <[hidden email]>
>>> >> _______________________________________________
>>> >> NumPy-Discussion mailing list
>>> >> [hidden email]
>>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > [hidden email]
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> --
>>> Hongyi Zhao <[hidden email]>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:

>
>
>
> On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>>
>> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> >>>
>> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> >>> >
>> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> >>>
>> >>> Yes, it really does the trick. See the following for the benchmark
>> >>> based on your suggestion:
>> >>>
>> >>> $ time python mu.py
>> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >>>
>> >>> real    0m41.056s
>> >>> user    0m43.970s
>> >>> sys    0m3.813s
>> >>>
>> >>>
>> >>> But are there any ways to further improve/increase efficiency?
>> >>
>> >>
>> >>
>> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> >>
>> >> Andrea.
>> >
>> >
>> >
>> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>>
>> Still not so clear on your solutions for this problem. Could you
>> please post here the corresponding snippet of your enhancement?
>
>
> I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>
> The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -

The size of the “energy” and “DOS” array is Problem-related and
shouldn't be reduced arbitrarily.

> not to mention calling trapz 6000 times in a loop.

I'm currently thinking on parallelization the execution of the for
loop, say, with joblib <https://github.com/joblib/joblib>, but I still
haven't figured out the corresponding codes. If you have some
experience on this type of solution, could you please give me some
more hints?

>  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>
>>
>>
>> Regards,
>> HY
>> >
>> >>
>> >>
>> >>>
>> >>>
>> >>> Regards,
>> >>> HY
>> >>>
>> >>> >
>> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> >>> >> try to run the script
>> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> >>> >> terminate it, it will give the following output:
>> >>> >>
>> >>> >> $ python mu.py
>> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >>> >>
>> >>> >> I have to terminate it and obtained the following information:
>> >>> >>
>> >>> >> ^CTraceback (most recent call last):
>> >>> >>   File "mu.py", line 38, in <module>
>> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >>> >> line 2108, in __call__
>> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >>> >> line 2192, in _vectorize_call
>> >>> >>     outputs = ufunc(*inputs)
>> >>> >>   File "mu.py", line 8, in fermi
>> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> >>> >> KeyboardInterrupt
>> >>> >>
>> >>> >>
>> >>> >> Any helps and hints for this problem will be highly appreciated?
>> >>> >>
>> >>> >> Regards,
>> >>> >> --
>> >>> >> Hongyi Zhao <[hidden email]>
>> >>> >> _______________________________________________
>> >>> >> NumPy-Discussion mailing list
>> >>> >> [hidden email]
>> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>> >
>> >>> > _______________________________________________
>> >>> > NumPy-Discussion mailing list
>> >>> > [hidden email]
>> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Hongyi Zhao <[hidden email]>
>> >>> _______________________________________________
>> >>> NumPy-Discussion mailing list
>> >>> [hidden email]
>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Evgeni Burovski
The script seems to be computing the particle numbers for an array of chemical potentials.

Two ways of speeding it up, both are likely simpler then using dask:

First: use numpy

1. Move constructing mu_all out of the loop (np.linspace)
2. Arrange the integrands into a 2d array
3. np.trapz along an axis which corresponds to a single integrand array
(Or avoid the overhead of trapz by just implementing the trapezoid formula manually)

Second:

Move the loop into cython.




вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
>
>
>
> On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>>
>> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> >>>
>> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> >>> >
>> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> >>>
>> >>> Yes, it really does the trick. See the following for the benchmark
>> >>> based on your suggestion:
>> >>>
>> >>> $ time python mu.py
>> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >>>
>> >>> real    0m41.056s
>> >>> user    0m43.970s
>> >>> sys    0m3.813s
>> >>>
>> >>>
>> >>> But are there any ways to further improve/increase efficiency?
>> >>
>> >>
>> >>
>> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> >>
>> >> Andrea.
>> >
>> >
>> >
>> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>>
>> Still not so clear on your solutions for this problem. Could you
>> please post here the corresponding snippet of your enhancement?
>
>
> I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>
> The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -

The size of the “energy” and “DOS” array is Problem-related and
shouldn't be reduced arbitrarily.

> not to mention calling trapz 6000 times in a loop.

I'm currently thinking on parallelization the execution of the for
loop, say, with joblib <https://github.com/joblib/joblib>, but I still
haven't figured out the corresponding codes. If you have some
experience on this type of solution, could you please give me some
more hints?

>  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>
>>
>>
>> Regards,
>> HY
>> >
>> >>
>> >>
>> >>>
>> >>>
>> >>> Regards,
>> >>> HY
>> >>>
>> >>> >
>> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> >>> >> try to run the script
>> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> >>> >> terminate it, it will give the following output:
>> >>> >>
>> >>> >> $ python mu.py
>> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >>> >>
>> >>> >> I have to terminate it and obtained the following information:
>> >>> >>
>> >>> >> ^CTraceback (most recent call last):
>> >>> >>   File "mu.py", line 38, in <module>
>> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >>> >> line 2108, in __call__
>> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >>> >> line 2192, in _vectorize_call
>> >>> >>     outputs = ufunc(*inputs)
>> >>> >>   File "mu.py", line 8, in fermi
>> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> >>> >> KeyboardInterrupt
>> >>> >>
>> >>> >>
>> >>> >> Any helps and hints for this problem will be highly appreciated?
>> >>> >>
>> >>> >> Regards,
>> >>> >> --
>> >>> >> Hongyi Zhao <[hidden email]>
>> >>> >> _______________________________________________
>> >>> >> NumPy-Discussion mailing list
>> >>> >> [hidden email]
>> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>> >
>> >>> > _______________________________________________
>> >>> > NumPy-Discussion mailing list
>> >>> > [hidden email]
>> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Hongyi Zhao <[hidden email]>
>> >>> _______________________________________________
>> >>> NumPy-Discussion mailing list
>> >>> [hidden email]
>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Evgeni Burovski
On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
<[hidden email]> wrote:

>
> The script seems to be computing the particle numbers for an array of chemical potentials.
>
> Two ways of speeding it up, both are likely simpler then using dask:
>
> First: use numpy
>
> 1. Move constructing mu_all out of the loop (np.linspace)
> 2. Arrange the integrands into a 2d array
> 3. np.trapz along an axis which corresponds to a single integrand array
> (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)


Roughly like this:
https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99



> Second:
>
> Move the loop into cython.
>
>
>
>
> вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
>>
>> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>> >>
>> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> >> >>>
>> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> >> >>> >
>> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> >> >>>
>> >> >>> Yes, it really does the trick. See the following for the benchmark
>> >> >>> based on your suggestion:
>> >> >>>
>> >> >>> $ time python mu.py
>> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>>
>> >> >>> real    0m41.056s
>> >> >>> user    0m43.970s
>> >> >>> sys    0m3.813s
>> >> >>>
>> >> >>>
>> >> >>> But are there any ways to further improve/increase efficiency?
>> >> >>
>> >> >>
>> >> >>
>> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> >> >>
>> >> >> Andrea.
>> >> >
>> >> >
>> >> >
>> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>> >>
>> >> Still not so clear on your solutions for this problem. Could you
>> >> please post here the corresponding snippet of your enhancement?
>> >
>> >
>> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>> >
>> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
>>
>> The size of the “energy” and “DOS” array is Problem-related and
>> shouldn't be reduced arbitrarily.
>>
>> > not to mention calling trapz 6000 times in a loop.
>>
>> I'm currently thinking on parallelization the execution of the for
>> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
>> haven't figured out the corresponding codes. If you have some
>> experience on this type of solution, could you please give me some
>> more hints?
>>
>> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>> >
>> >>
>> >>
>> >> Regards,
>> >> HY
>> >> >
>> >> >>
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> Regards,
>> >> >>> HY
>> >> >>>
>> >> >>> >
>> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> >> >>> >>
>> >> >>> >> Hi,
>> >> >>> >>
>> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> >> >>> >> try to run the script
>> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> >> >>> >> terminate it, it will give the following output:
>> >> >>> >>
>> >> >>> >> $ python mu.py
>> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>> >>
>> >> >>> >> I have to terminate it and obtained the following information:
>> >> >>> >>
>> >> >>> >> ^CTraceback (most recent call last):
>> >> >>> >>   File "mu.py", line 38, in <module>
>> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >> >>> >> line 2108, in __call__
>> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >> >>> >> line 2192, in _vectorize_call
>> >> >>> >>     outputs = ufunc(*inputs)
>> >> >>> >>   File "mu.py", line 8, in fermi
>> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> >> >>> >> KeyboardInterrupt
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Any helps and hints for this problem will be highly appreciated?
>> >> >>> >>
>> >> >>> >> Regards,
>> >> >>> >> --
>> >> >>> >> Hongyi Zhao <[hidden email]>
>> >> >>> >> _______________________________________________
>> >> >>> >> NumPy-Discussion mailing list
>> >> >>> >> [hidden email]
>> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >>> >
>> >> >>> > _______________________________________________
>> >> >>> > NumPy-Discussion mailing list
>> >> >>> > [hidden email]
>> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Hongyi Zhao <[hidden email]>
>> >> >>> _______________________________________________
>> >> >>> NumPy-Discussion mailing list
>> >> >>> [hidden email]
>> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >
>> >> > _______________________________________________
>> >> > NumPy-Discussion mailing list
>> >> > [hidden email]
>> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>
>> >>
>> >>
>> >> --
>> >> Hongyi Zhao <[hidden email]>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> [hidden email]
>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
In reply to this post by Evgeni Burovski
On Sun, Oct 11, 2020 at 2:56 PM Evgeni Burovski
<[hidden email]> wrote:
>
> The script seems to be computing the particle numbers for an array of chemical potentials.
>
> Two ways of speeding it up, both are likely simpler then using dask:

What do you mean by saying *dask*?

>
> First: use numpy
>
> 1. Move constructing mu_all out of the loop (np.linspace)
> 2. Arrange the integrands into a 2d array
> 3. np.trapz along an axis which corresponds to a single integrand array
> (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
> Second:
>
> Move the loop into cython.

Will this be more efficient than the schema like parallelization based
on python modules, say, joblib?

>
>
>
>
> вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
>>
>> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>> >>
>> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> >> >>>
>> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> >> >>> >
>> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> >> >>>
>> >> >>> Yes, it really does the trick. See the following for the benchmark
>> >> >>> based on your suggestion:
>> >> >>>
>> >> >>> $ time python mu.py
>> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>>
>> >> >>> real    0m41.056s
>> >> >>> user    0m43.970s
>> >> >>> sys    0m3.813s
>> >> >>>
>> >> >>>
>> >> >>> But are there any ways to further improve/increase efficiency?
>> >> >>
>> >> >>
>> >> >>
>> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> >> >>
>> >> >> Andrea.
>> >> >
>> >> >
>> >> >
>> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>> >>
>> >> Still not so clear on your solutions for this problem. Could you
>> >> please post here the corresponding snippet of your enhancement?
>> >
>> >
>> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>> >
>> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
>>
>> The size of the “energy” and “DOS” array is Problem-related and
>> shouldn't be reduced arbitrarily.
>>
>> > not to mention calling trapz 6000 times in a loop.
>>
>> I'm currently thinking on parallelization the execution of the for
>> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
>> haven't figured out the corresponding codes. If you have some
>> experience on this type of solution, could you please give me some
>> more hints?
>>
>> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>> >
>> >>
>> >>
>> >> Regards,
>> >> HY
>> >> >
>> >> >>
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> Regards,
>> >> >>> HY
>> >> >>>
>> >> >>> >
>> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> >> >>> >>
>> >> >>> >> Hi,
>> >> >>> >>
>> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> >> >>> >> try to run the script
>> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> >> >>> >> terminate it, it will give the following output:
>> >> >>> >>
>> >> >>> >> $ python mu.py
>> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>> >>
>> >> >>> >> I have to terminate it and obtained the following information:
>> >> >>> >>
>> >> >>> >> ^CTraceback (most recent call last):
>> >> >>> >>   File "mu.py", line 38, in <module>
>> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >> >>> >> line 2108, in __call__
>> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >> >>> >> line 2192, in _vectorize_call
>> >> >>> >>     outputs = ufunc(*inputs)
>> >> >>> >>   File "mu.py", line 8, in fermi
>> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> >> >>> >> KeyboardInterrupt
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Any helps and hints for this problem will be highly appreciated?
>> >> >>> >>
>> >> >>> >> Regards,
>> >> >>> >> --
>> >> >>> >> Hongyi Zhao <[hidden email]>
>> >> >>> >> _______________________________________________
>> >> >>> >> NumPy-Discussion mailing list
>> >> >>> >> [hidden email]
>> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >>> >
>> >> >>> > _______________________________________________
>> >> >>> > NumPy-Discussion mailing list
>> >> >>> > [hidden email]
>> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Hongyi Zhao <[hidden email]>
>> >> >>> _______________________________________________
>> >> >>> NumPy-Discussion mailing list
>> >> >>> [hidden email]
>> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >
>> >> > _______________________________________________
>> >> > NumPy-Discussion mailing list
>> >> > [hidden email]
>> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>
>> >>
>> >>
>> >> --
>> >> Hongyi Zhao <[hidden email]>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> [hidden email]
>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
In reply to this post by Evgeni Burovski
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<[hidden email]> wrote:

>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <[hidden email]> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I can't find the cython part suggested by you, i.e., move the loop
into cython. Furthermore, I also learned that the numpy array is
optimized and has the performance close to C/C++.

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real    0m41.056s
> >> >> >>> user    0m43.970s
> >> >> >>> sys    0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
> >> >> >>> >> try to run the script
> >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
> >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >> >>> >> terminate it, it will give the following output:
> >> >> >>> >>
> >> >> >>> >> $ python mu.py
> >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>> >>
> >> >> >>> >> I have to terminate it and obtained the following information:
> >> >> >>> >>
> >> >> >>> >> ^CTraceback (most recent call last):
> >> >> >>> >>   File "mu.py", line 38, in <module>
> >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2108, in __call__
> >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2192, in _vectorize_call
> >> >> >>> >>     outputs = ufunc(*inputs)
> >> >> >>> >>   File "mu.py", line 8, in fermi
> >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
> >> >> >>> >> KeyboardInterrupt
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >> >>> >>
> >> >> >>> >> Regards,
> >> >> >>> >> --
> >> >> >>> >> Hongyi Zhao <[hidden email]>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> NumPy-Discussion mailing list
> >> >> >>> >> [hidden email]
> >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > NumPy-Discussion mailing list
> >> >> >>> > [hidden email]
> >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Hongyi Zhao <[hidden email]>
> >> >> >>> _______________________________________________
> >> >> >>> NumPy-Discussion mailing list
> >> >> >>> [hidden email]
> >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >
> >> >> > _______________________________________________
> >> >> > NumPy-Discussion mailing list
> >> >> > [hidden email]
> >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hongyi Zhao <[hidden email]>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> [hidden email]
> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > [hidden email]
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >>
> >> --
> >> Hongyi Zhao <[hidden email]>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> [hidden email]
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Evgeni Burovski


вс, 11 окт. 2020 г., 13:31 Hongyi Zhao <[hidden email]>:
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<[hidden email]> wrote:
>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <[hidden email]> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I can't find the cython part suggested by you, i.e., move the loop
into cython. Furthermore, I also learned that the numpy array is
optimized and has the performance close to C/C++.


Basically, it seems pure numpy is OK here and nothing  more sophisticated is needed.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
In reply to this post by Evgeni Burovski
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<[hidden email]> wrote:

>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <[hidden email]> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I try to run this notebook, but find that all of the
function/variable/method can't be found at all, if invoke them in
separate cells. See here for more details:

https://github.com/hongyi-zhao/test/blob/master/fermi_integrate_np.ipynb

Any hints for this problem?

Regards,
HY

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real    0m41.056s
> >> >> >>> user    0m43.970s
> >> >> >>> sys    0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
> >> >> >>> >> try to run the script
> >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
> >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >> >>> >> terminate it, it will give the following output:
> >> >> >>> >>
> >> >> >>> >> $ python mu.py
> >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>> >>
> >> >> >>> >> I have to terminate it and obtained the following information:
> >> >> >>> >>
> >> >> >>> >> ^CTraceback (most recent call last):
> >> >> >>> >>   File "mu.py", line 38, in <module>
> >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2108, in __call__
> >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2192, in _vectorize_call
> >> >> >>> >>     outputs = ufunc(*inputs)
> >> >> >>> >>   File "mu.py", line 8, in fermi
> >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
> >> >> >>> >> KeyboardInterrupt
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >> >>> >>
> >> >> >>> >> Regards,
> >> >> >>> >> --
> >> >> >>> >> Hongyi Zhao <[hidden email]>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> NumPy-Discussion mailing list
> >> >> >>> >> [hidden email]
> >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > NumPy-Discussion mailing list
> >> >> >>> > [hidden email]
> >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Hongyi Zhao <[hidden email]>
> >> >> >>> _______________________________________________
> >> >> >>> NumPy-Discussion mailing list
> >> >> >>> [hidden email]
> >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >
> >> >> > _______________________________________________
> >> >> > NumPy-Discussion mailing list
> >> >> > [hidden email]
> >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hongyi Zhao <[hidden email]>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> [hidden email]
> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > [hidden email]
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >>
> >> --
> >> Hongyi Zhao <[hidden email]>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> [hidden email]
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Evgeni Burovski
Just remove %%timeit 

пн, 12 окт. 2020 г., 5:56 Hongyi Zhao <[hidden email]>:
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<[hidden email]> wrote:
>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <[hidden email]> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I try to run this notebook, but find that all of the
function/variable/method can't be found at all, if invoke them in
separate cells. See here for more details:

https://github.com/hongyi-zhao/test/blob/master/fermi_integrate_np.ipynb

Any hints for this problem?

Regards,
HY

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real    0m41.056s
> >> >> >>> user    0m43.970s
> >> >> >>> sys    0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
> >> >> >>> >> try to run the script
> >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
> >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >> >>> >> terminate it, it will give the following output:
> >> >> >>> >>
> >> >> >>> >> $ python mu.py
> >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>> >>
> >> >> >>> >> I have to terminate it and obtained the following information:
> >> >> >>> >>
> >> >> >>> >> ^CTraceback (most recent call last):
> >> >> >>> >>   File "mu.py", line 38, in <module>
> >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2108, in __call__
> >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2192, in _vectorize_call
> >> >> >>> >>     outputs = ufunc(*inputs)
> >> >> >>> >>   File "mu.py", line 8, in fermi
> >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
> >> >> >>> >> KeyboardInterrupt
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >> >>> >>
> >> >> >>> >> Regards,
> >> >> >>> >> --
> >> >> >>> >> Hongyi Zhao <[hidden email]>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> NumPy-Discussion mailing list
> >> >> >>> >> [hidden email]
> >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > NumPy-Discussion mailing list
> >> >> >>> > [hidden email]
> >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Hongyi Zhao <[hidden email]>
> >> >> >>> _______________________________________________
> >> >> >>> NumPy-Discussion mailing list
> >> >> >>> [hidden email]
> >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >
> >> >> > _______________________________________________
> >> >> > NumPy-Discussion mailing list
> >> >> > [hidden email]
> >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hongyi Zhao <[hidden email]>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> [hidden email]
> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > [hidden email]
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >>
> >> --
> >> Hongyi Zhao <[hidden email]>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> [hidden email]
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
In reply to this post by Evgeni Burovski
On Sun, Oct 11, 2020 at 2:56 PM Evgeni Burovski
<[hidden email]> wrote:

>
> The script seems to be computing the particle numbers for an array of chemical potentials.
>
> Two ways of speeding it up, both are likely simpler then using dask:
>
> First: use numpy
>
> 1. Move constructing mu_all out of the loop (np.linspace)
> 2. Arrange the integrands into a 2d array
> 3. np.trapz along an axis which corresponds to a single integrand array
> (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)

Could you please give me some more explanations on the reasons why
doing so can improve performance?

>
> Second:
>
> Move the loop into cython.
>
>
>
>
> вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
>>
>> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>> >>
>> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> >> >>>
>> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> >> >>> >
>> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> >> >>>
>> >> >>> Yes, it really does the trick. See the following for the benchmark
>> >> >>> based on your suggestion:
>> >> >>>
>> >> >>> $ time python mu.py
>> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>>
>> >> >>> real    0m41.056s
>> >> >>> user    0m43.970s
>> >> >>> sys    0m3.813s
>> >> >>>
>> >> >>>
>> >> >>> But are there any ways to further improve/increase efficiency?
>> >> >>
>> >> >>
>> >> >>
>> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> >> >>
>> >> >> Andrea.
>> >> >
>> >> >
>> >> >
>> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>> >>
>> >> Still not so clear on your solutions for this problem. Could you
>> >> please post here the corresponding snippet of your enhancement?
>> >
>> >
>> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>> >
>> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
>>
>> The size of the “energy” and “DOS” array is Problem-related and
>> shouldn't be reduced arbitrarily.
>>
>> > not to mention calling trapz 6000 times in a loop.
>>
>> I'm currently thinking on parallelization the execution of the for
>> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
>> haven't figured out the corresponding codes. If you have some
>> experience on this type of solution, could you please give me some
>> more hints?
>>
>> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>> >
>> >>
>> >>
>> >> Regards,
>> >> HY
>> >> >
>> >> >>
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> Regards,
>> >> >>> HY
>> >> >>>
>> >> >>> >
>> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> >> >>> >>
>> >> >>> >> Hi,
>> >> >>> >>
>> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> >> >>> >> try to run the script
>> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> >> >>> >> terminate it, it will give the following output:
>> >> >>> >>
>> >> >>> >> $ python mu.py
>> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>> >>
>> >> >>> >> I have to terminate it and obtained the following information:
>> >> >>> >>
>> >> >>> >> ^CTraceback (most recent call last):
>> >> >>> >>   File "mu.py", line 38, in <module>
>> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >> >>> >> line 2108, in __call__
>> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> >> >>> >> line 2192, in _vectorize_call
>> >> >>> >>     outputs = ufunc(*inputs)
>> >> >>> >>   File "mu.py", line 8, in fermi
>> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> >> >>> >> KeyboardInterrupt
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Any helps and hints for this problem will be highly appreciated?
>> >> >>> >>
>> >> >>> >> Regards,
>> >> >>> >> --
>> >> >>> >> Hongyi Zhao <[hidden email]>
>> >> >>> >> _______________________________________________
>> >> >>> >> NumPy-Discussion mailing list
>> >> >>> >> [hidden email]
>> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >>> >
>> >> >>> > _______________________________________________
>> >> >>> > NumPy-Discussion mailing list
>> >> >>> > [hidden email]
>> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Hongyi Zhao <[hidden email]>
>> >> >>> _______________________________________________
>> >> >>> NumPy-Discussion mailing list
>> >> >>> [hidden email]
>> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >> >
>> >> > _______________________________________________
>> >> > NumPy-Discussion mailing list
>> >> > [hidden email]
>> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>
>> >>
>> >>
>> >> --
>> >> Hongyi Zhao <[hidden email]>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> [hidden email]
>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
In reply to this post by Evgeni Burovski
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<[hidden email]> wrote:

>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <[hidden email]> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I've done the comparison of the real execution time for your version
I've compared the execution efficiency of your above method and the
original method of the python script by directly using fermi() without
executing vectorize() on it. Very surprisingly, the latter is more
efficient than the former, see following for more info:

$ time python fermi_integrate_np.py
[[1.03000000e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real    1m8.797s
user    0m47.204s
sys    0m27.105s
$ time python mu.py
[[1.03000000e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real    0m38.829s
user    0m41.541s
sys    0m3.399s

So, I think that the benchmark dataset used by you for testing code
efficiency is not so appropriate. What's your point of view on this
testing results?

Regards,
HY

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real    0m41.056s
> >> >> >>> user    0m43.970s
> >> >> >>> sys    0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
> >> >> >>> >> try to run the script
> >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
> >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >> >>> >> terminate it, it will give the following output:
> >> >> >>> >>
> >> >> >>> >> $ python mu.py
> >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>> >>
> >> >> >>> >> I have to terminate it and obtained the following information:
> >> >> >>> >>
> >> >> >>> >> ^CTraceback (most recent call last):
> >> >> >>> >>   File "mu.py", line 38, in <module>
> >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2108, in __call__
> >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2192, in _vectorize_call
> >> >> >>> >>     outputs = ufunc(*inputs)
> >> >> >>> >>   File "mu.py", line 8, in fermi
> >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
> >> >> >>> >> KeyboardInterrupt
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >> >>> >>
> >> >> >>> >> Regards,
> >> >> >>> >> --
> >> >> >>> >> Hongyi Zhao <[hidden email]>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> NumPy-Discussion mailing list
> >> >> >>> >> [hidden email]
> >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > NumPy-Discussion mailing list
> >> >> >>> > [hidden email]
> >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Hongyi Zhao <[hidden email]>
> >> >> >>> _______________________________________________
> >> >> >>> NumPy-Discussion mailing list
> >> >> >>> [hidden email]
> >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >
> >> >> > _______________________________________________
> >> >> > NumPy-Discussion mailing list
> >> >> > [hidden email]
> >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hongyi Zhao <[hidden email]>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> [hidden email]
> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > [hidden email]
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >>
> >> --
> >> Hongyi Zhao <[hidden email]>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> [hidden email]
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Andrea Gavana
Hi,

On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <[hidden email]> wrote:
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
<[hidden email]> wrote:
>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> <[hidden email]> wrote:
> >
> > The script seems to be computing the particle numbers for an array of chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I've done the comparison of the real execution time for your version
I've compared the execution efficiency of your above method and the
original method of the python script by directly using fermi() without
executing vectorize() on it. Very surprisingly, the latter is more
efficient than the former, see following for more info:

$ time python fermi_integrate_np.py
[[1.03000000e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real    1m8.797s
user    0m47.204s
sys    0m27.105s
$ time python mu.py
[[1.03000000e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real    0m38.829s
user    0m41.541s
sys    0m3.399s

So, I think that the benchmark dataset used by you for testing code
efficiency is not so appropriate. What's your point of view on this
testing results?


  Evgeni has provided an interesting example on how to speed up your code - granted, he used toy data but the improvement is real. As far as I can see, you haven't specified how big are your DOS etc... vectors, so it's not that obvious how to draw any conclusions. I find it highly puzzling that his implementation appears to be slower than your original code. 

In any case, if performance is so paramount for you, then I would suggest you to move in the direction Evgeni was proposing, i.e. shifting your implementation to C/Cython or Fortran/f2py. I had much better results myself using Fortran/f2py than pure NumPy or C/Cython, but this is mostly because my knowledge of Cython is quite limited. That said, your problem should be fairly easy to implement in a compiled language.

Andrea.

 

Regards,
HY

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real    0m41.056s
> >> >> >>> user    0m43.970s
> >> >> >>> sys    0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
> >> >> >>> >> try to run the script
> >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
> >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >> >>> >> terminate it, it will give the following output:
> >> >> >>> >>
> >> >> >>> >> $ python mu.py
> >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
> >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>> >>
> >> >> >>> >> I have to terminate it and obtained the following information:
> >> >> >>> >>
> >> >> >>> >> ^CTraceback (most recent call last):
> >> >> >>> >>   File "mu.py", line 38, in <module>
> >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2108, in __call__
> >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
> >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >> >>> >> line 2192, in _vectorize_call
> >> >> >>> >>     outputs = ufunc(*inputs)
> >> >> >>> >>   File "mu.py", line 8, in fermi
> >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
> >> >> >>> >> KeyboardInterrupt
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >> >>> >>
> >> >> >>> >> Regards,
> >> >> >>> >> --
> >> >> >>> >> Hongyi Zhao <[hidden email]>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> NumPy-Discussion mailing list
> >> >> >>> >> [hidden email]
> >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > NumPy-Discussion mailing list
> >> >> >>> > [hidden email]
> >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Hongyi Zhao <[hidden email]>
> >> >> >>> _______________________________________________
> >> >> >>> NumPy-Discussion mailing list
> >> >> >>> [hidden email]
> >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >> >
> >> >> > _______________________________________________
> >> >> > NumPy-Discussion mailing list
> >> >> > [hidden email]
> >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hongyi Zhao <[hidden email]>
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> [hidden email]
> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > [hidden email]
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >>
> >> --
> >> Hongyi Zhao <[hidden email]>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> [hidden email]
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Hongyi Zhao
On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana <[hidden email]> wrote:

>
> Hi,
>
> On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <[hidden email]> wrote:
>>
>> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>> <[hidden email]> wrote:
>> >
>> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
>> > <[hidden email]> wrote:
>> > >
>> > > The script seems to be computing the particle numbers for an array of chemical potentials.
>> > >
>> > > Two ways of speeding it up, both are likely simpler then using dask:
>> > >
>> > > First: use numpy
>> > >
>> > > 1. Move constructing mu_all out of the loop (np.linspace)
>> > > 2. Arrange the integrands into a 2d array
>> > > 3. np.trapz along an axis which corresponds to a single integrand array
>> > > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>> >
>> >
>> > Roughly like this:
>> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>>
>> I've done the comparison of the real execution time for your version
>> I've compared the execution efficiency of your above method and the
>> original method of the python script by directly using fermi() without
>> executing vectorize() on it. Very surprisingly, the latter is more
>> efficient than the former, see following for more info:
>>
>> $ time python fermi_integrate_np.py
>> [[1.03000000e+01 4.55561775e+17]
>>  [1.03001000e+01 4.55561780e+17]
>>  [1.03002000e+01 4.55561786e+17]
>>  ...
>>  [1.08997000e+01 1.33654085e+21]
>>  [1.08998000e+01 1.33818034e+21]
>>  [1.08999000e+01 1.33982054e+21]]
>>
>> real    1m8.797s
>> user    0m47.204s
>> sys    0m27.105s
>> $ time python mu.py
>> [[1.03000000e+01 4.55561775e+17]
>>  [1.03001000e+01 4.55561780e+17]
>>  [1.03002000e+01 4.55561786e+17]
>>  ...
>>  [1.08997000e+01 1.33654085e+21]
>>  [1.08998000e+01 1.33818034e+21]
>>  [1.08999000e+01 1.33982054e+21]]
>>
>> real    0m38.829s
>> user    0m41.541s
>> sys    0m3.399s
>>
>> So, I think that the benchmark dataset used by you for testing code
>> efficiency is not so appropriate. What's your point of view on this
>> testing results?
>
>
>
>   Evgeni has provided an interesting example on how to speed up your code - granted, he used toy data but the improvement is real. As far as I can see, you haven't specified how big are your DOS etc... vectors, so it's not that obvious how to draw any conclusions. I find it highly puzzling that his implementation appears to be slower than your original code.
>
> In any case, if performance is so paramount for you, then I would suggest you to move in the direction Evgeni was proposing, i.e. shifting your implementation to C/Cython or Fortran/f2py.

If so, I think that the C/Fortran based implementations should be more
efficient than the ones using Cython/f2py.


> I had much better results myself using Fortran/f2py than pure NumPy or C/Cython, but this is mostly because my knowledge of Cython is quite limited. That said, your problem should be fairly easy to implement in a compiled language.
>
> Andrea.
>
>
>>
>>
>> Regards,
>> HY
>>
>> >
>> >
>> >
>> > > Second:
>> > >
>> > > Move the loop into cython.
>> > >
>> > >
>> > >
>> > >
>> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
>> > >>
>> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>> > >> >>
>> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> > >> >> >>
>> > >> >> >> Hi,
>> > >> >> >>
>> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> > >> >> >>>
>> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> > >> >> >>> >
>> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> > >> >> >>>
>> > >> >> >>> Yes, it really does the trick. See the following for the benchmark
>> > >> >> >>> based on your suggestion:
>> > >> >> >>>
>> > >> >> >>> $ time python mu.py
>> > >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> > >> >> >>>
>> > >> >> >>> real    0m41.056s
>> > >> >> >>> user    0m43.970s
>> > >> >> >>> sys    0m3.813s
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> But are there any ways to further improve/increase efficiency?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> > >> >> >>
>> > >> >> >> Andrea.
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>> > >> >>
>> > >> >> Still not so clear on your solutions for this problem. Could you
>> > >> >> please post here the corresponding snippet of your enhancement?
>> > >> >
>> > >> >
>> > >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>> > >> >
>> > >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
>> > >>
>> > >> The size of the “energy” and “DOS” array is Problem-related and
>> > >> shouldn't be reduced arbitrarily.
>> > >>
>> > >> > not to mention calling trapz 6000 times in a loop.
>> > >>
>> > >> I'm currently thinking on parallelization the execution of the for
>> > >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
>> > >> haven't figured out the corresponding codes. If you have some
>> > >> experience on this type of solution, could you please give me some
>> > >> more hints?
>> > >>
>> > >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> Regards,
>> > >> >> HY
>> > >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> Regards,
>> > >> >> >>> HY
>> > >> >> >>>
>> > >> >> >>> >
>> > >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> > >> >> >>> >>
>> > >> >> >>> >> Hi,
>> > >> >> >>> >>
>> > >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> > >> >> >>> >> try to run the script
>> > >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> > >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> > >> >> >>> >> terminate it, it will give the following output:
>> > >> >> >>> >>
>> > >> >> >>> >> $ python mu.py
>> > >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> > >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> > >> >> >>> >>
>> > >> >> >>> >> I have to terminate it and obtained the following information:
>> > >> >> >>> >>
>> > >> >> >>> >> ^CTraceback (most recent call last):
>> > >> >> >>> >>   File "mu.py", line 38, in <module>
>> > >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> > >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> > >> >> >>> >> line 2108, in __call__
>> > >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> > >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> > >> >> >>> >> line 2192, in _vectorize_call
>> > >> >> >>> >>     outputs = ufunc(*inputs)
>> > >> >> >>> >>   File "mu.py", line 8, in fermi
>> > >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> > >> >> >>> >> KeyboardInterrupt
>> > >> >> >>> >>
>> > >> >> >>> >>
>> > >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
>> > >> >> >>> >>
>> > >> >> >>> >> Regards,
>> > >> >> >>> >> --
>> > >> >> >>> >> Hongyi Zhao <[hidden email]>
>> > >> >> >>> >> _______________________________________________
>> > >> >> >>> >> NumPy-Discussion mailing list
>> > >> >> >>> >> [hidden email]
>> > >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >> >>> >
>> > >> >> >>> > _______________________________________________
>> > >> >> >>> > NumPy-Discussion mailing list
>> > >> >> >>> > [hidden email]
>> > >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> --
>> > >> >> >>> Hongyi Zhao <[hidden email]>
>> > >> >> >>> _______________________________________________
>> > >> >> >>> NumPy-Discussion mailing list
>> > >> >> >>> [hidden email]
>> > >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >> >
>> > >> >> > _______________________________________________
>> > >> >> > NumPy-Discussion mailing list
>> > >> >> > [hidden email]
>> > >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> --
>> > >> >> Hongyi Zhao <[hidden email]>
>> > >> >> _______________________________________________
>> > >> >> NumPy-Discussion mailing list
>> > >> >> [hidden email]
>> > >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >
>> > >> > _______________________________________________
>> > >> > NumPy-Discussion mailing list
>> > >> > [hidden email]
>> > >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Hongyi Zhao <[hidden email]>
>> > >> _______________________________________________
>> > >> NumPy-Discussion mailing list
>> > >> [hidden email]
>> > >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: The mu.py script will keep running and never end.

Andrea Gavana
Hi,

On Mon, 12 Oct 2020 at 16.22, Hongyi Zhao <[hidden email]> wrote:
On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana <[hidden email]> wrote:
>
> Hi,
>
> On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <[hidden email]> wrote:
>>
>> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>> <[hidden email]> wrote:
>> >
>> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
>> > <[hidden email]> wrote:
>> > >
>> > > The script seems to be computing the particle numbers for an array of chemical potentials.
>> > >
>> > > Two ways of speeding it up, both are likely simpler then using dask:
>> > >
>> > > First: use numpy
>> > >
>> > > 1. Move constructing mu_all out of the loop (np.linspace)
>> > > 2. Arrange the integrands into a 2d array
>> > > 3. np.trapz along an axis which corresponds to a single integrand array
>> > > (Or avoid the overhead of trapz by just implementing the trapezoid formula manually)
>> >
>> >
>> > Roughly like this:
>> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>>
>> I've done the comparison of the real execution time for your version
>> I've compared the execution efficiency of your above method and the
>> original method of the python script by directly using fermi() without
>> executing vectorize() on it. Very surprisingly, the latter is more
>> efficient than the former, see following for more info:
>>
>> $ time python fermi_integrate_np.py
>> [[1.03000000e+01 4.55561775e+17]
>>  [1.03001000e+01 4.55561780e+17]
>>  [1.03002000e+01 4.55561786e+17]
>>  ...
>>  [1.08997000e+01 1.33654085e+21]
>>  [1.08998000e+01 1.33818034e+21]
>>  [1.08999000e+01 1.33982054e+21]]
>>
>> real    1m8.797s
>> user    0m47.204s
>> sys    0m27.105s
>> $ time python mu.py
>> [[1.03000000e+01 4.55561775e+17]
>>  [1.03001000e+01 4.55561780e+17]
>>  [1.03002000e+01 4.55561786e+17]
>>  ...
>>  [1.08997000e+01 1.33654085e+21]
>>  [1.08998000e+01 1.33818034e+21]
>>  [1.08999000e+01 1.33982054e+21]]
>>
>> real    0m38.829s
>> user    0m41.541s
>> sys    0m3.399s
>>
>> So, I think that the benchmark dataset used by you for testing code
>> efficiency is not so appropriate. What's your point of view on this
>> testing results?
>
>
>
>   Evgeni has provided an interesting example on how to speed up your code - granted, he used toy data but the improvement is real. As far as I can see, you haven't specified how big are your DOS etc... vectors, so it's not that obvious how to draw any conclusions. I find it highly puzzling that his implementation appears to be slower than your original code.
>
> In any case, if performance is so paramount for you, then I would suggest you to move in the direction Evgeni was proposing, i.e. shifting your implementation to C/Cython or Fortran/f2py.

If so, I think that the C/Fortran based implementations should be more
efficient than the ones using Cython/f2py.

That is not what I meant: what I meant is: write the time consuming part of your code in C or Fortran and then bridge it to Python using Cython or f2py. 

Andrea.




> I had much better results myself using Fortran/f2py than pure NumPy or C/Cython, but this is mostly because my knowledge of Cython is quite limited. That said, your problem should be fairly easy to implement in a compiled language.
>
> Andrea.
>
>
>>
>>
>> Regards,
>> HY
>>
>> >
>> >
>> >
>> > > Second:
>> > >
>> > > Move the loop into cython.
>> > >
>> > >
>> > >
>> > >
>> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <[hidden email]>:
>> > >>
>> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <[hidden email]> wrote:
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <[hidden email]> wrote:
>> > >> >>
>> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <[hidden email]> wrote:
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <[hidden email]> wrote:
>> > >> >> >>
>> > >> >> >> Hi,
>> > >> >> >>
>> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <[hidden email]> wrote:
>> > >> >> >>>
>> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <[hidden email]> wrote:
>> > >> >> >>> >
>> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
>> > >> >> >>>
>> > >> >> >>> Yes, it really does the trick. See the following for the benchmark
>> > >> >> >>> based on your suggestion:
>> > >> >> >>>
>> > >> >> >>> $ time python mu.py
>> > >> >> >>> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> > >> >> >>>
>> > >> >> >>> real    0m41.056s
>> > >> >> >>> user    0m43.970s
>> > >> >> >>> sys    0m3.813s
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> But are there any ways to further improve/increase efficiency?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> I believe it will get a bit better if you don’t column_stack an array 6000 times - maybe pre-allocate your output first?
>> > >> >> >>
>> > >> >> >> Andrea.
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your column_stack call and made me think you were stacking your results very many times, which is not the case.
>> > >> >>
>> > >> >> Still not so clear on your solutions for this problem. Could you
>> > >> >> please post here the corresponding snippet of your enhancement?
>> > >> >
>> > >> >
>> > >> > I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that.
>> > >> >
>> > >> > The timings of your approach is highly dependent on the size of your “energy” and “DOS” array -
>> > >>
>> > >> The size of the “energy” and “DOS” array is Problem-related and
>> > >> shouldn't be reduced arbitrarily.
>> > >>
>> > >> > not to mention calling trapz 6000 times in a loop.
>> > >>
>> > >> I'm currently thinking on parallelization the execution of the for
>> > >> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
>> > >> haven't figured out the corresponding codes. If you have some
>> > >> experience on this type of solution, could you please give me some
>> > >> more hints?
>> > >>
>> > >> >  Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> Regards,
>> > >> >> HY
>> > >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> Regards,
>> > >> >> >>> HY
>> > >> >> >>>
>> > >> >> >>> >
>> > >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao <[hidden email]> wrote:
>> > >> >> >>> >>
>> > >> >> >>> >> Hi,
>> > >> >> >>> >>
>> > >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I
>> > >> >> >>> >> try to run the script
>> > >> >> >>> >> <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py>,
>> > >> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
>> > >> >> >>> >> terminate it, it will give the following output:
>> > >> >> >>> >>
>> > >> >> >>> >> $ python mu.py
>> > >> >> >>> >> [-10.999 -10.999 -10.999 ...  20.     20.     20.   ] [4.973e-84
>> > >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> > >> >> >>> >>
>> > >> >> >>> >> I have to terminate it and obtained the following information:
>> > >> >> >>> >>
>> > >> >> >>> >> ^CTraceback (most recent call last):
>> > >> >> >>> >>   File "mu.py", line 38, in <module>
>> > >> >> >>> >>     integrand=DOS*fermi_array(energy,mu,kT)
>> > >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> > >> >> >>> >> line 2108, in __call__
>> > >> >> >>> >>     return self._vectorize_call(func=func, args=vargs)
>> > >> >> >>> >>   File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
>> > >> >> >>> >> line 2192, in _vectorize_call
>> > >> >> >>> >>     outputs = ufunc(*inputs)
>> > >> >> >>> >>   File "mu.py", line 8, in fermi
>> > >> >> >>> >>     return 1./(exp((E-mu)/kT)+1)
>> > >> >> >>> >> KeyboardInterrupt
>> > >> >> >>> >>
>> > >> >> >>> >>
>> > >> >> >>> >> Any helps and hints for this problem will be highly appreciated?
>> > >> >> >>> >>
>> > >> >> >>> >> Regards,
>> > >> >> >>> >> --
>> > >> >> >>> >> Hongyi Zhao <[hidden email]>
>> > >> >> >>> >> _______________________________________________
>> > >> >> >>> >> NumPy-Discussion mailing list
>> > >> >> >>> >> [hidden email]
>> > >> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >> >>> >
>> > >> >> >>> > _______________________________________________
>> > >> >> >>> > NumPy-Discussion mailing list
>> > >> >> >>> > [hidden email]
>> > >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> --
>> > >> >> >>> Hongyi Zhao <[hidden email]>
>> > >> >> >>> _______________________________________________
>> > >> >> >>> NumPy-Discussion mailing list
>> > >> >> >>> [hidden email]
>> > >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >> >
>> > >> >> > _______________________________________________
>> > >> >> > NumPy-Discussion mailing list
>> > >> >> > [hidden email]
>> > >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> --
>> > >> >> Hongyi Zhao <[hidden email]>
>> > >> >> _______________________________________________
>> > >> >> NumPy-Discussion mailing list
>> > >> >> [hidden email]
>> > >> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >> >
>> > >> > _______________________________________________
>> > >> > NumPy-Discussion mailing list
>> > >> > [hidden email]
>> > >> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Hongyi Zhao <[hidden email]>
>> > >> _______________________________________________
>> > >> NumPy-Discussion mailing list
>> > >> [hidden email]
>> > >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> --
>> Hongyi Zhao <[hidden email]>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Hongyi Zhao <[hidden email]>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
12