How to Create Bins From Continuous Measure in Google Sheets

You've probably summarized data with a single number, like an average. For instance, the median is an average. It tells you what number is in the middle of your data if you sorted the numbers from smallest to largest.

Averages tell you nothing about how the numbers are spread out. Think about the mean transaction value for an online store. Do most of your customers spend about that much? Or do ninety percent of them pay substantially less, while those remaining are big spenders? A histogram can answer this question.

What is a histogram?

A histogram is a chart describing the distribution of data. It partitions the range of the data and shows how many values fall into each partition.

How to make a histogram

  1. Choose how many bins you need.

  2. Calculate the width of the bins.

  3. Sort the data into bins.

  4. Create the histogram with the bar chart tool.

Example histogram application

We'll work through an example using transaction data from Distilled's online store. Specifically, we'll make a histogram that charts the distribution of transaction values.

See the example Sheet

First, choose how many buckets you need

Follow along on the "Buckets" tab of the example sheet.

First, we need to know how many bins to use. Strictly speaking, you can choose any number of these buckets. The best amount will be small enough to summarize the data, and large enough to infer something interesting from the resulting chart.

Here's a reliable approach:

  1. Count how many numbers you have.

  2. Take the square root of the count.

  3. Round the square root up to the nearest whole number.

Say you have 48 data points. Then, √48 ≈ 6.93, which rounds up to 7. That's how many bins to use.

Choose the width of the buckets

Follow along on the "Buckets" tab of the example sheet.

Once you know the number of bins, you must calculate the corresponding width of the bins, so that they cover all your data. If your buckets are too small, they won't include every number in your data.

As with the choice of how many bins to use, the bin width also matters. It should make the chart easier for the viewer to interpret. Using multiples of 2, 5, or 10 for the bin width can help. You'll see what I mean in the example sheet.

Here's the process:

  1. Find the range of your data (i.e., the maximum minus the minimum).

  2. Divide the range by the number of bins.

  3. Round up to the nearest "nice" number.

In the example sheet, the maximum is $5,473. The minimum is $20. So the range is $5,473 - $20 = $5,453.

The number of bins is 7. $5,453 / 7 = $779. That rounds up to $800, which is the nearest "nice" multiple of 2 and 10. We'll use $800 as our bin width.

You should always choose the width of the bins yourself. Don't let a spreadsheet app determine that for you. Your audience will use the number and size of the bins to interpret your data, and you want to shape what they see.

Sort your data into buckets

Follow along on the "Buckets" tab of the example sheet.

First, you need to determine the smallest value to chart. In the example, we've chosen to start at zero. Often (at least in business), that will be a reasonable choice. Transaction amounts start at free ($0) and get bigger from there. If you need the chart to start above zero, start at the nearest "nice" number below the smallest number in your data.

Now, list the upper boundary of each bin. That's easy with a spreadsheet. Start with the first number (the first "top end" of a bin). Get the rest of the rows by taking the number in the row before it and adding the bin width you previously calculated:

Note that there is no "0" row in our list of bins. Each number is the upper bound of a bin.

Now, let's count how many numbers are in each bucket. Fortunately, Sheets will do this for us. We'll use the FREQUENCY function. The result looks like this:

This cell is where we're using FREQUENCY.

FREQUENCY is counting the number of data points at or below each boundary. A transaction for $800 exactly falls into the $1,600 bin.

Create the histogram as a bar chart

Follow along on the "Buckets" tab of the example sheet.

Sheets has a built-in histogram tool. Don't use it. It's hard to control the resulting chart.

The good thing is, you've already done all the work you need to draw the histogram without fancy tools. The built-in bar chart is good enough. It's dead simple and much easier to control.

Start by selecting the bins and count:

Then click Insert > Chart:

Sheets should decide to insert a bar chart. If it doesn't, select "Column" as your chart type in the editor:

...and you're done! You should have something that looks like this chart:

Now, inspect your data

Follow along on the "Histogram" tab of the example sheet.

That was a lot of work for a simple chart. Here's the payoff. What do you see when you look at the histogram?

  • There are two kinds of transactions. Lots of transaction values fall in the smallest bin, and the few bigger ones that are more spread out. (In our business, smaller amounts tend to be subscription payments. Larger values are conference tickets.)
  • The thicker bar at $1,600 says, "lots of folks are buying one ticket for Distilled's conference."
  • About half of the "ticket" transactions are for multiple tickets.

If we had used "average transaction value" instead of a histogram, we wouldn't even see that there are two types of transactions!

 See the example Sheet

Further reading

  • Constructing a Histogram. Another summary with more technical details.

  • The Average Page Is a Myth. Ilya Grigorik writes about why page weight can't be described with averages. He uses histograms in his post to explain this.

Get blog posts via email

segerspostencell.blogspot.com

Source: https://www.distilled.net/resources/how-to-make-a-histogram-using-google-sheets/

0 Response to "How to Create Bins From Continuous Measure in Google Sheets"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel