Python中bins与labels参数如何关联使用？-杰瑞科技汇

Of course! The term "bins" and "labels" in Python is most commonly associated with grouping numerical data into intervals (bins) and then assigning a descriptive name (label) to each interval.

（图片来源网络，侵删）

This is a fundamental task in data analysis and visualization. The most popular library for this is Pandas, which has a powerful function called pd.cut().

Let's break it down with clear examples.

The Core Concept: Bins and Labels

Imagine you have a list of people's ages and you want to group them into categories like "Child", "Teen", "Adult", etc.

Data: [5, 17, 25, 32, 45, 8, 91]
Bins (The Intervals): We need to define the age ranges.
- 0-12 (Child)
- 13-19 (Teen)
- 20-64 (Adult)
- 65+ (Senior)
Labels (The Names): The descriptive names for those intervals.
- ['Child', 'Teen', 'Adult', 'Senior']

The goal is to convert the raw age data into categorical data based on these rules.

（图片来源网络，侵删）

Using `pandas.cut()`

This is the most direct and flexible way to achieve this. pd.cut() takes an array of values and divides it into discrete intervals.

A. Simple Example: Equal Width Bins

Let's start by creating some sample data and dividing it into a set number of bins of equal width.

import pandas as pd
import numpy as np
# 1. Sample Data
data = np.random.randint(0, 101, size=20) # 20 random numbers between 0 and 100
print("Original Data:")
print(data)
# Example output: [85 12 57 91  3 49 61 33 78 50 42  9 29 70 54 44 67 25 19 98]
# 2. Create Bins and Labels
# Let's create 3 bins: 0-33, 34-66, 67-100
num_bins = 3
bin_labels = ['Low', 'Medium', 'High']
# 3. Use pd.cut()
# `bins` can be an integer (for equal-width bins) or a list of cut-offs.
# `labels` assigns a name to each bin.
# `right=False` means the intervals are [left, right) (left-inclusive, right-exclusive).
# By default, it's (left, right] (right-inclusive).
binned_data = pd.cut(data, bins=num_bins, labels=bin_labels, right=False)
print("\nBinned Data (as a Categorical object):")
print(binned_data)
# Example output:
# [High, Low, Medium, High, Low, Medium, Medium, Medium, High, Medium, ...]
# Categories (3, object): [Low < Medium < High]
# 4. Create a DataFrame to see it clearly
df = pd.DataFrame({'Value': data, 'Category': binned_data})
print("\nDataFrame with Categories:")
print(df)

B. Example: Custom Bin Edges

Often, you want to define the exact boundaries for your bins, especially for real-world data like ages or income.

import pandas as pd
# 1. Sample Data (ages)
ages = [8, 15, 22, 35, 48, 60, 70, 5, 18, 25, 99]
# 2. Define Custom Bin Edges
# These edges define the intervals: [0-17], [18-35], [36-65], [66-100]
bin_edges = [0, 18, 36, 66, 100]
# 3. Define Corresponding Labels
age_labels = ['Child', 'Young Adult', 'Middle-Aged', 'Senior']
# 4. Use pd.cut()
# We don't need to specify `num_bins` here, we use the `bin_edges` list.
# `right=False` is important here to make 18-17.999... 'Child' and 18-35.999... 'Young Adult'.
age_categories = pd.cut(ages, bins=bin_edges, labels=age_labels, right=False)
# 5. Display in a DataFrame
df_ages = pd.DataFrame({'Age': ages, 'Age Group': age_categories})
print(df_ages)

Output:

（图片来源网络，侵删）

   Age      Age Group
0    8          Child
1   15          Child
2   22  Young Adult
3   35  Young Adult
4   48   Middle-Aged
5   60   Middle-Aged
6   70        Senior
7    5          Child
8   18  Young Adult
9   25  Young Adult
10  99        Senior

Using `numpy.histogram()`

Sometimes, you just need the counts for each bin without creating categorical labels. numpy.histogram() is perfect for this. It returns the counts and the bin edges.

You can then manually assign labels if you wish.

import numpy as np
# 1. Sample Data
data = np.random.randn(100) # 100 random numbers from a standard normal distribution
# 2. Define number of bins or bin edges
num_bins = 5
# Or, define edges manually:
# bin_edges = [-3, -2, -1, 0, 1, 2, 3]
# 3. Use np.histogram()
counts, bin_edges = np.histogram(data, bins=num_bins)
print("Counts for each bin:")
print(counts)
# Example output: [ 2 15 38 32 13]
print("\nEdges of the bins:")
print(bin_edges)
# Example output: [-3.14159265 -1.88543673 -0.62928081  0.62687511  1.88203103  3.13718695]
# 4. (Optional) Create labels from the edges
# A common way is to take the average of the edges for each bin
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
bin_labels = [f"Bin {i+1} ({bin_centers[i]:.2f})" for i in range(len(bin_centers))]
print("\nGenerated Labels:")
print(bin_labels)
# Example output: ['Bin 1 (-2.51)', 'Bin 2 (-1.26)', 'Bin 3 (0.00)', 'Bin 4 (1.25)', 'Bin 5 (2.51)']

Using `pandas.qcut()` for Quantile Bins

What if your data is not uniformly distributed? For example, income data. Most people are in a lower income bracket, with a few outliers earning much more. Using pd.cut() with equal-width bins would result in most people being in the first bin.

In this case, pd.qcut() is your friend. It divides the data into bins based on quantiles (e.g., percentiles), so that each bin has (approximately) the same number of data points.

import pandas as pd
import numpy as np
# 1. Sample Skewed Data (e.g., incomes)
# Most values are small, with a few very large ones.
incomes = np.random.lognormal(mean=4, sigma=0.5, size=1000)
# 2. Define Quantiles
# Let's create 4 quartiles (0-25%, 25-50%, 50-75%, 75-100%)
quantiles = [0, 0.25, 0.5, 0.75, 1.0]
# 3. Define Labels for the Quantiles
income_labels = ['Low', 'Lower-Middle', 'Upper-Middle', 'High']
# 4. Use pd.qcut()
# This will ensure each group has roughly 250 people (1000 / 4).
income_groups = pd.qcut(incomes, q=quantiles, labels=income_labels)
# 5. Display in a DataFrame
df_income = pd.DataFrame({'Income': incomes, 'Income Group': income_groups})
# Verify the counts are roughly equal
print("Counts for each income group:")
print(df_income['Income Group'].value_counts())

Output:

Counts for each income group:
Upper-Middle    250
Low             250
Lower-Middle    250
High            250
Name: Income Group, dtype: int64

Notice how the counts are exactly equal (or as close as possible), which is the key feature of qcut.

Summary: `cut()` vs. `qcut()`

Feature	`pandas.cut()`	`pandas.qcut()`
Division Method	By values (e.g., 0-10, 10-20).	By rank/quantiles (e.g., 0-25th percentile, 25-50th).
Bin Width	Equal width.	Equal number of items per bin.
Best For	Data that is (or can be) uniformly distributed.	Skewed data (e.g., income, city populations).
Key Parameter	`bins` (integer or list of edges).	`q` (integer or list of quantiles).

Key Parameters for `pd.cut()` and `pd.qcut()`

x: The input array or Series of data.
bins / q: The number of bins or the bin edges for cut; the number of quantiles for qcut.
labels: (Optional) A list of labels to name the bins. If not provided, integer labels (0, 1, 2...) are used.
right: (Default: True) Defines whether the bins are closed on the right edge (left, right] or left edge [left, right). This is crucial for handling edge cases (e.g., is a person who is exactly 18 a "Child" or a "Young Adult"?).
include_lowest: (Default: False) Whether the first interval should be closed on the left side, e.g., [left, right). Useful when right=False.

Python中bins与labels参数如何关联使用？

The Core Concept: Bins and Labels

Using `pandas.cut()`

A. Simple Example: Equal Width Bins

B. Example: Custom Bin Edges

Using `numpy.histogram()`

Using `pandas.qcut()` for Quantile Bins

Summary: `cut()` vs. `qcut()`

Key Parameters for `pd.cut()` and `pd.qcut()`

99ANYc3cd6

国外Photoshop教程有哪些实用技巧？

Java如何实现LinkedList？核心步骤有哪些？

Java如何获取checkbox的选中状态？

SketchUp景观教程怎么学？从入门到精通有哪些技巧？

Java如何实现JSON编码？

ubuntu安装教程16.04

Java连接MySQL驱动时如何正确配置？

python thread(target

Java中如何创建新文件？

After Effects CS5教程怎么学？

Python socket 如何收发完整数据包？

Python contourf如何设置边缘线？

Python OpenCV如何高效识别二维码？

Java String转char数组，如何操作？

python request agent

hbase counter python

Python中bins与labels参数如何关联使用？

The Core Concept: Bins and Labels

Using pandas.cut()

A. Simple Example: Equal Width Bins

B. Example: Custom Bin Edges

Using numpy.histogram()

Using pandas.qcut() for Quantile Bins

Summary: cut() vs. qcut()

Key Parameters for pd.cut() and pd.qcut()

相关推荐

ubuntu安装教程16.04

Using `pandas.cut()`

Using `numpy.histogram()`

Using `pandas.qcut()` for Quantile Bins

Summary: `cut()` vs. `qcut()`

Key Parameters for `pd.cut()` and `pd.qcut()`