Of course! The term "bins" and "labels" in Python is most commonly associated with grouping numerical data into intervals (bins) and then assigning a descriptive name (label) to each interval.

This is a fundamental task in data analysis and visualization. The most popular library for this is Pandas, which has a powerful function called pd.cut().
Let's break it down with clear examples.
The Core Concept: Bins and Labels
Imagine you have a list of people's ages and you want to group them into categories like "Child", "Teen", "Adult", etc.
- Data:
[5, 17, 25, 32, 45, 8, 91] - Bins (The Intervals): We need to define the age ranges.
0-12(Child)13-19(Teen)20-64(Adult)65+(Senior)
- Labels (The Names): The descriptive names for those intervals.
['Child', 'Teen', 'Adult', 'Senior']
The goal is to convert the raw age data into categorical data based on these rules.

Using pandas.cut()
This is the most direct and flexible way to achieve this. pd.cut() takes an array of values and divides it into discrete intervals.
A. Simple Example: Equal Width Bins
Let's start by creating some sample data and dividing it into a set number of bins of equal width.
import pandas as pd
import numpy as np
# 1. Sample Data
data = np.random.randint(0, 101, size=20) # 20 random numbers between 0 and 100
print("Original Data:")
print(data)
# Example output: [85 12 57 91 3 49 61 33 78 50 42 9 29 70 54 44 67 25 19 98]
# 2. Create Bins and Labels
# Let's create 3 bins: 0-33, 34-66, 67-100
num_bins = 3
bin_labels = ['Low', 'Medium', 'High']
# 3. Use pd.cut()
# `bins` can be an integer (for equal-width bins) or a list of cut-offs.
# `labels` assigns a name to each bin.
# `right=False` means the intervals are [left, right) (left-inclusive, right-exclusive).
# By default, it's (left, right] (right-inclusive).
binned_data = pd.cut(data, bins=num_bins, labels=bin_labels, right=False)
print("\nBinned Data (as a Categorical object):")
print(binned_data)
# Example output:
# [High, Low, Medium, High, Low, Medium, Medium, Medium, High, Medium, ...]
# Categories (3, object): [Low < Medium < High]
# 4. Create a DataFrame to see it clearly
df = pd.DataFrame({'Value': data, 'Category': binned_data})
print("\nDataFrame with Categories:")
print(df)
B. Example: Custom Bin Edges
Often, you want to define the exact boundaries for your bins, especially for real-world data like ages or income.
import pandas as pd
# 1. Sample Data (ages)
ages = [8, 15, 22, 35, 48, 60, 70, 5, 18, 25, 99]
# 2. Define Custom Bin Edges
# These edges define the intervals: [0-17], [18-35], [36-65], [66-100]
bin_edges = [0, 18, 36, 66, 100]
# 3. Define Corresponding Labels
age_labels = ['Child', 'Young Adult', 'Middle-Aged', 'Senior']
# 4. Use pd.cut()
# We don't need to specify `num_bins` here, we use the `bin_edges` list.
# `right=False` is important here to make 18-17.999... 'Child' and 18-35.999... 'Young Adult'.
age_categories = pd.cut(ages, bins=bin_edges, labels=age_labels, right=False)
# 5. Display in a DataFrame
df_ages = pd.DataFrame({'Age': ages, 'Age Group': age_categories})
print(df_ages)
Output:

Age Age Group
0 8 Child
1 15 Child
2 22 Young Adult
3 35 Young Adult
4 48 Middle-Aged
5 60 Middle-Aged
6 70 Senior
7 5 Child
8 18 Young Adult
9 25 Young Adult
10 99 Senior
Using numpy.histogram()
Sometimes, you just need the counts for each bin without creating categorical labels. numpy.histogram() is perfect for this. It returns the counts and the bin edges.
You can then manually assign labels if you wish.
import numpy as np
# 1. Sample Data
data = np.random.randn(100) # 100 random numbers from a standard normal distribution
# 2. Define number of bins or bin edges
num_bins = 5
# Or, define edges manually:
# bin_edges = [-3, -2, -1, 0, 1, 2, 3]
# 3. Use np.histogram()
counts, bin_edges = np.histogram(data, bins=num_bins)
print("Counts for each bin:")
print(counts)
# Example output: [ 2 15 38 32 13]
print("\nEdges of the bins:")
print(bin_edges)
# Example output: [-3.14159265 -1.88543673 -0.62928081 0.62687511 1.88203103 3.13718695]
# 4. (Optional) Create labels from the edges
# A common way is to take the average of the edges for each bin
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
bin_labels = [f"Bin {i+1} ({bin_centers[i]:.2f})" for i in range(len(bin_centers))]
print("\nGenerated Labels:")
print(bin_labels)
# Example output: ['Bin 1 (-2.51)', 'Bin 2 (-1.26)', 'Bin 3 (0.00)', 'Bin 4 (1.25)', 'Bin 5 (2.51)']
Using pandas.qcut() for Quantile Bins
What if your data is not uniformly distributed? For example, income data. Most people are in a lower income bracket, with a few outliers earning much more. Using pd.cut() with equal-width bins would result in most people being in the first bin.
In this case, pd.qcut() is your friend. It divides the data into bins based on quantiles (e.g., percentiles), so that each bin has (approximately) the same number of data points.
import pandas as pd
import numpy as np
# 1. Sample Skewed Data (e.g., incomes)
# Most values are small, with a few very large ones.
incomes = np.random.lognormal(mean=4, sigma=0.5, size=1000)
# 2. Define Quantiles
# Let's create 4 quartiles (0-25%, 25-50%, 50-75%, 75-100%)
quantiles = [0, 0.25, 0.5, 0.75, 1.0]
# 3. Define Labels for the Quantiles
income_labels = ['Low', 'Lower-Middle', 'Upper-Middle', 'High']
# 4. Use pd.qcut()
# This will ensure each group has roughly 250 people (1000 / 4).
income_groups = pd.qcut(incomes, q=quantiles, labels=income_labels)
# 5. Display in a DataFrame
df_income = pd.DataFrame({'Income': incomes, 'Income Group': income_groups})
# Verify the counts are roughly equal
print("Counts for each income group:")
print(df_income['Income Group'].value_counts())
Output:
Counts for each income group:
Upper-Middle 250
Low 250
Lower-Middle 250
High 250
Name: Income Group, dtype: int64
Notice how the counts are exactly equal (or as close as possible), which is the key feature of qcut.
Summary: cut() vs. qcut()
| Feature | pandas.cut() |
pandas.qcut() |
|---|---|---|
| Division Method | By values (e.g., 0-10, 10-20). | By rank/quantiles (e.g., 0-25th percentile, 25-50th). |
| Bin Width | Equal width. | Equal number of items per bin. |
| Best For | Data that is (or can be) uniformly distributed. | Skewed data (e.g., income, city populations). |
| Key Parameter | bins (integer or list of edges). |
q (integer or list of quantiles). |
Key Parameters for pd.cut() and pd.qcut()
x: The input array or Series of data.bins/q: The number of bins or the bin edges forcut; the number of quantiles forqcut.labels: (Optional) A list of labels to name the bins. If not provided, integer labels (0, 1, 2...) are used.right: (Default:True) Defines whether the bins are closed on the right edge(left, right]or left edge[left, right). This is crucial for handling edge cases (e.g., is a person who is exactly 18 a "Child" or a "Young Adult"?).include_lowest: (Default:False) Whether the first interval should be closed on the left side, e.g.,[left, right). Useful whenright=False.
