# help creating a reversed cumulative histogram

11 messages
Open this post in threaded view
|

## help creating a reversed cumulative histogram

 Administrator Hello fellow numy users, I posted some questions on histograms recently [1, 2] but still couldn't find  a solution. I am trying to create a inverse cumulative histogram [3] which shall look like [4] but with the higher values at the left. The classification shall follow this exemplary rule: class 1: 0 all values > 0 class 2: 10 all values > 10 class 3: 15 all values > 15 class 4: 20 all values > 20 class 5: 25 all values > 25 [...] I could get this easily in a spreadsheet by creating a matix with conditional statements (if VALUES_COL > CLASS_BOUNDARY; VALUES_COL; '-'). With python (numpy or pylab) I was not successful. The plotted histogram envelope turned out to be just the inverted curve as the one created with the spreadsheet app.         I have briely visualised the issue here [5]. I hope that this makes it more understandable.         Later I would like to sum and count all values in each bin as discussed in [2]. May someone give me pointer or hint on how to improve my code below to achive the desired histogram? Thanks a lot in advance, Timmie [1]: http://www.nabble.com/np.hist-with-masked-values-to25243905.html[2]: http://www.nabble.com/histogram%3A-sum-up-values-in-each-bin-to25171265.html[3]: http://en.wikipedia.org/wiki/Histogram#Cumulative_histogram[4]: http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=126[5]: http://www.scribd.com/doc/19371606/Distribution-Histogram##### CODE ##### normed = False values # loaded data as array bins = 10 ### sum ## taken from ## http://www.nabble.com/Scipy-and-statistics%3A-probability-density-function-to24683007.html#a24683304sums = np.histogram(values, weights=values,                                      normed=normed,                                      bins=bins) ecdf_sums = np.hstack([0.0, sums[0].cumsum() ]) ecdf_inv_sums = ecdf_sums[::-1] pylab.plot(sums[1], ecdf_inv_sums) pylab.show() _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 On Wed, Sep 2, 2009 at 18:15, Tim Michelsen<[hidden email]> wrote: > Hello fellow numy users, > I posted some questions on histograms recently [1, 2] but still couldn't > find  a solution. > > I am trying to create a inverse cumulative histogram [3] which shall > look like [4] but with the higher values at the left. Okay. That is completely different from what you've asked before. > The classification shall follow this exemplary rule: > > class 1: 0 > all values > 0 > > class 2: 10 > all values > 10 > > class 3: 15 > all values > 15 > > class 4: 20 > all values > 20 > > class 5: 25 > all values > 25 > > [...] > > I could get this easily in a spreadsheet by creating a matix with > conditional statements (if VALUES_COL > CLASS_BOUNDARY; VALUES_COL; '-'). > > With python (numpy or pylab) I was not successful. The plotted histogram > envelope turned out to be just the inverted curve as the one created > with the spreadsheet app. > sums = np.histogram(values, weights=values, >                                     normed=normed, >                                     bins=bins) > ecdf_sums = np.hstack([0.0, sums[0].cumsum() ]) > ecdf_inv_sums = ecdf_sums[::-1] This is not the kind of "inversion" that you are looking for. You want ecdf_inv_sums = ecdf_sums[-1] - ecdf_sums -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."   -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 On Wed, Sep 2, 2009 at 7:26 PM, Robert Kern<[hidden email]> wrote: > On Wed, Sep 2, 2009 at 18:15, Tim Michelsen<[hidden email]> wrote: >> Hello fellow numy users, >> I posted some questions on histograms recently [1, 2] but still couldn't >> find  a solution. >> >> I am trying to create a inverse cumulative histogram [3] which shall >> look like [4] but with the higher values at the left. > > Okay. That is completely different from what you've asked before. > >> The classification shall follow this exemplary rule: >> >> class 1: 0 >> all values > 0 >> >> class 2: 10 >> all values > 10 >> >> class 3: 15 >> all values > 15 >> >> class 4: 20 >> all values > 20 >> >> class 5: 25 >> all values > 25 >> >> [...] >> >> I could get this easily in a spreadsheet by creating a matix with >> conditional statements (if VALUES_COL > CLASS_BOUNDARY; VALUES_COL; '-'). >> >> With python (numpy or pylab) I was not successful. The plotted histogram >> envelope turned out to be just the inverted curve as the one created >> with the spreadsheet app. > >> sums = np.histogram(values, weights=values, >>                                     normed=normed, >>                                     bins=bins) >> ecdf_sums = np.hstack([0.0, sums[0].cumsum() ]) >> ecdf_inv_sums = ecdf_sums[::-1] > > This is not the kind of "inversion" that you are looking for. You want > > ecdf_inv_sums = ecdf_sums[-1] - ecdf_sums and you can plot the histogram with bar eisf_sums = ecdf_sums[-1] - ecdf_sums   # empirical inverse survival function of weights width = sums[1][1] - sums[1][0] rects1 = plt.bar(sums[1], eisf_sums, width, color='b') Are you sure you want cumulative weights in the histogram? Josef > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." >  -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 Administrator Hello Robert and Josef, thanks for the quick answers! I really appreciate this. >>> I am trying to create a inverse cumulative histogram [3] which shall >>> look like [4] but with the higher values at the left. >> Okay. That is completely different from what you've asked before. You are right. But it's soemtimes hard to decribe a desired and expected output in python terms and pseudocode. I still have to lern more numpy vocabs... I will evalute your answers and give feedback. Regards, Timmie _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 On Wed, Sep 2, 2009 at 19:11, Tim Michelsen<[hidden email]> wrote: > Hello Robert and Josef, > thanks for the quick answers! I really appreciate this. > >>>> I am trying to create a inverse cumulative histogram [3] which shall >>>> look like [4] but with the higher values at the left. >>> Okay. That is completely different from what you've asked before. > You are right. > But it's soemtimes hard to decribe a desired and expected output in > python terms and pseudocode. > I still have to lern more numpy vocabs... Actually, I apologize. I meant to delete that line before sending the message. It was unnecessary and abusive. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."   -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 Administrator > >>> Okay. That is completely different from what you've asked before. > > You are right. > > But it's soemtimes hard to decribe a desired and expected output in > > python terms and pseudocode. > > I still have to lern more numpy vocabs... > > Actually, I apologize. I meant to delete that line before sending the > message. It was unnecessary and abusive. Don't worry. I got it right the ways you meant it initially. No offence. Coding and math problems get more clear once you take the effort to explain and visualise it for others. You spend quite a lot of time responding here. I appreciate that. Best regards, Timmie _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 Administrator In reply to this post by josef.pktd > Hello, I have checked the snippets you proposed. It does what I wanted to achieve. Obviously, I had to substract the values as Robert demonstrated. This could also be perceived from the figure I posted. I still have see how I can optimise the code (c.f. below) or modify to be less complicated. It seemed so simple in the spreadsheet... > eisf_sums = ecdf_sums[-1] - ecdf_sums   > # empirical inverse survival > function of weights Can you recommend me a (literature) source where I can look up this term? I learned statistics in my mother tongue and seem to need a refresher on distributions... I would like to come up with the right terms next time. > Are you sure you want cumulative weights in >the histogram? You mean it doesn't make sense at all? I need: 1) the count of occurrences sorted in each bin     counts = np.histogram(values,                                     normed=normed,                                     bins=bins)     => here I obtain now the same as in the     spreadsheet 2) the sum of all values sorted in each bin     sums = np.histogram(values, weights=values,                                     normed=normed,                                     bins=bins)                                         => here I still obtain different values for the first     histogram value (eisf_sums[0]):     Numpy: eisf_sums     335.50026738, 319.21363636, 266.07724942,       198.10258741, 126.69270396, 67.98125874,       38.47335664,  24.75062937, 13.42121212,       2.48636364, 0.         Spreadsheet:     335.2351159, 319.2136364, 266.0772494,     198.1025874, 126.692704, 67.98125874,     38.47335664, 24.75062937, 13.42121212,     2.486363636, 0 Additionally, I would like to see these implemented as convenience functions in numpy or scipy. There should be out of the box functions for all kinds of distributions. Where is the best place to contrubute a final version? The scipy.stats? Thanks again for your input, Timmie ##### below the distilled code ##### ## histogram settings normed = False bins = 10 ## counts: gives expected results counts = np.histogram(values,                                     normed=normed,                                     bins=bins)                                     ecdf_counts = np.hstack([1.0, counts[0].cumsum() ]) ecdf_inv_counts = ecdf_counts[::-1] # empirical inverse survival function of weights eisf_counts = ecdf_counts[-1] - ecdf_counts   ### sum: does have deviations sums = np.histogram(values, weights=values,                                     normed=normed,                                     bins=bins) ecdf_sums = np.hstack([1.0, sums[0].cumsum() ]) ecdf_inv_sums = ecdf_sums[::-1] # empirical inverse survival function of weights eisf_sums = ecdf_sums[-1] - ecdf_sums ## # configure plot xlabel = 'Bins' ylabel_left = 'Counts' ylabel_right = 'Sum' fig1 = plt.figure() ax1 = fig1.add_subplot(111) # counts ax1.plot(counts[1], ecdf_inv_counts, 'r-') ax1.set_xlabel(xlabel) ax1.set_ylabel(ylabel_left, color='b') for tl in ax1.get_yticklabels():     tl.set_color('b') # sums ax2 = ax1.twinx() ax2.plot(sums[1], eisf_sums, 'b-') ax2.set_ylabel(ylabel_right, color='r') for tl in ax2.get_yticklabels():     tl.set_color('r') plt.show() _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 On Thu, Sep 3, 2009 at 9:23 AM, Tim Michelsen<[hidden email]> wrote: >> > Hello, > I have checked the snippets you proposed. > It does what I wanted to achieve. > Obviously, I had to substract the values as Robert > demonstrated. This could also be perceived from > the figure I posted. > > I still have see how I can optimise the code > (c.f. below) or modify to be less complicated. > It seemed so simple in the spreadsheet... > >> eisf_sums = ecdf_sums[-1] - ecdf_sums >> # empirical inverse survival this should have inverse in it, it was a cut and paste error empirical survival function would be just 1-ecdf however, as distributions they would require to be normed to 1, >> function of weights > Can you recommend me a (literature) source where > I can look up this term? > I learned statistics in my mother tongue and seem > to need a refresher on distributions... > I would like to come up with the right terms > next time. My first stop is usually wikipedia: http://en.wikipedia.org/wiki/Survival_functionhttp://de.wikipedia.org/wiki/Verteilungsfunktion#.C3.9Cberlebenswahrscheinlichkeitand the ISI - INTERNATIONAL STATISTICAL INSTITUTE glossary for terms in different languages http://isi.cbs.nl/glossary/bloken83.htm> >> Are you sure you want cumulative weights in >>the histogram? > You mean it doesn't make sense at all? It depends on what you want, ecdf as it is calculated, with the weights argument in the histogram, gives you the cumulative sum of the values, not the count. In the case of the weight of pigs, it would be to cumulative weight of all pigs with a weight less than the given bin boundary weight. If values were income, then it would be the aggregated income of all individual with an income below the bin bin boundary. So it makes sense, given this is what you want (below). > > I need: > 1) the count of occurrences sorted in each bin >    counts = np.histogram(values, >                                    normed=normed, >                                    bins=bins) >    => here I obtain now the same as in the >    spreadsheet > > 2) the sum of all values sorted in each bin >    sums = np.histogram(values, weights=values, >                                    normed=normed, >                                    bins=bins) > >    => here I still obtain different values for the first >    histogram value (eisf_sums[0]): >    Numpy: eisf_sums >    335.50026738, 319.21363636, 266.07724942, >    198.10258741, 126.69270396, 67.98125874, >    38.47335664,  24.75062937, 13.42121212, >    2.48636364, 0. > >    Spreadsheet: >    335.2351159, 319.2136364, 266.0772494, >    198.1025874, 126.692704, 67.98125874, >    38.47335664, 24.75062937, 13.42121212, >    2.486363636, 0 there might be a mistake in the treatment of a cell when reversing, when I run your example the highest value is not equal to values.sum() this might match the spreadsheet, but I haven't compared isf = sums[0][::-1].cumsum()[::-1] But I'm not sure yet, what's going on. Josef > > Additionally, I would like to see these implemented > as convenience functions in numpy or scipy. > There should be out of the box functions for all kinds > of distributions. > Where is the best place to contrubute a final version? > The scipy.stats? > > Thanks again for your input, > Timmie > > ##### below the distilled code ##### > ## histogram settings > normed = False > bins = 10 > > ## counts: gives expected results > counts = np.histogram(values, >                                    normed=normed, >                                    bins=bins) > > ecdf_counts = np.hstack([1.0, counts[0].cumsum() ]) > ecdf_inv_counts = ecdf_counts[::-1] > # empirical inverse survival function of weights > eisf_counts = ecdf_counts[-1] - ecdf_counts > > > ### sum: does have deviations > sums = np.histogram(values, weights=values, >                                    normed=normed, >                                    bins=bins) > ecdf_sums = np.hstack([1.0, sums[0].cumsum() ]) > ecdf_inv_sums = ecdf_sums[::-1] > # empirical inverse survival function of weights > eisf_sums = ecdf_sums[-1] - ecdf_sums > > ## > # configure plot > xlabel = 'Bins' > ylabel_left = 'Counts' > ylabel_right = 'Sum' > > > fig1 = plt.figure() > ax1 = fig1.add_subplot(111) > > # counts > ax1.plot(counts[1], ecdf_inv_counts, 'r-') > ax1.set_xlabel(xlabel) > ax1.set_ylabel(ylabel_left, color='b') > for tl in ax1.get_yticklabels(): >    tl.set_color('b') > > # sums > ax2 = ax1.twinx() > ax2.plot(sums[1], eisf_sums, 'b-') > ax2.set_ylabel(ylabel_right, color='r') > for tl in ax2.get_yticklabels(): >    tl.set_color('r') > plt.show() > > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: help creating a reversed cumulative histogram

 Administrator > My first stop is usually wikipedia: [...] Thanks. So I I'known that I have to call the beast a "empirical inverse survival function", Robert would also have foundit easier to help. Anyway, step by step... > In the case of the weight of pigs, it would be to cumulative weight of > all pigs with a weight less than the given bin boundary weight. > If values were income, then it would be the aggregated income of all > individual with an income below the bin bin boundary. > So it makes sense, given this is what you want (below). Exactly! Or for precipitation: a) count: number of precipitation events that     ocurred up to a certain limit b) sum: precipitation total registered up to that limit > there might be a mistake in the treatment of a cell when > reversing, when I run your example the highest value is > not equal to values.sum() This has made me think again. Small point. See here: ecdf_sums = np.hstack([0.0, sums[0].cumsum() ]) ecdf_sums = np.hstack([sums[0].cumsum() ]) I had to adjust the classes in the spreadsheet by replacing the first class limit by 0.0. I had modifed this yesterday to a different value (0.265152) as I was testing the code. from: 0.265152, 0.487273, 0.709394, 0.931515, 1.153636, 1.375758, 1.597879, 1.820000, 2.042121, 2.264242, 2.486364 to: 0.0, 0.487273, 0.709394, 0.931515, 1.153636, 1.375758, 1.597879, 1.820000, 2.042121, 2.264242, 2.486364 Now everything is fine. Results and curves match. > But I'm not sure yet, what's going on. 1) first I didn't know how to develop the code for a     "empirical inverse survival function" in numpy 2) I screwed my spreadsheet classes up while     testing and verifying my numpy code. Again, would a function for the "empirical inverse survival function" qualify for the inclusion into numpy or scipy? Thanks for the help. Best regards, Timmie _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion