杰瑞科技汇

Python List 与 NumPy 数组如何选择?

Of course! This is a fundamental topic in Python data science. Here's a comprehensive comparison of Python lists and NumPy arrays, covering their differences, when to use each, and how to convert between them.


High-Level Summary

Feature Python list NumPy ndarray
Type of Elements Can hold elements of any data type (e.g., int, str, bool, other lists). Must hold elements of the same data type (e.g., all int, all float).
Performance Slower for numerical operations. Extremely fast for numerical operations due to C-level implementation.
Memory Usage Higher memory usage. More memory-efficient.
Functionality Basic built-in methods (.append(), .pop(), etc.). Rich library of mathematical, statistical, and linear algebra functions.
Syntax for Math Requires loops for element-wise operations. Supports vectorized operations (e.g., arr1 + arr2).
Multidimensionality Achieved with "lists of lists". Can be jagged (rows of different lengths). True multidimensional structure. All rows/columns must be the same size.
Dependencies Built-in, no installation needed. Requires the numpy library (pip install numpy).

Python Lists

A list is the built-in, workhorse data structure in Python. It's a collection that is ordered, changeable, and allows duplicate members.

Key Characteristics:

  • Heterogeneous: You can mix different data types in a single list.
  • Flexible: You can add or remove elements easily.
  • Inefficient for Math: Performing calculations on large lists requires writing loops, which is slow in Python.

Example:

# A list can hold different data types
my_list = [1, "hello", 3.14, True, [5, 6, 7]]
# Element-wise addition requires a loop
list_a = [1, 2, 3, 4]
list_b = [10, 20, 30, 40]
result_list = []
for i in range(len(list_a)):
    result_list.append(list_a[i] + list_b[i])
print(result_list)
# Output: [11, 22, 33, 44]

NumPy Arrays (NumPy ndarray)

NumPy (Numerical Python) is a fundamental package for scientific computing in Python. Its main object is the ndarray (N-dimensional array). It's a highly optimized, memory-efficient array for numerical operations.

Key Characteristics:

  • Homogeneous: All elements in an array must be of the same data type (e.g., all integers or all floats). This is what allows for its high performance.
  • Vectorized Operations: You can perform mathematical operations on entire arrays at once without writing loops. This is the single biggest performance benefit.
  • Multidimensional: It's designed to handle multi-dimensional data (like vectors, matrices, and tensors) efficiently.
  • Rich Functionality: Comes with a huge library of functions for linear algebra, Fourier transforms, random number generation, and more.

Example:

import numpy as np
# Create NumPy arrays
arr_a = np.array([1, 2, 3, 4])
arr_b = np.array([10, 20, 30, 40])
# Element-wise addition is simple and fast
result_array = arr_a + arr_b
print(result_array)
# Output: [11 22 33 44]
# You can also perform operations with a single number (broadcasting)
scaled_array = arr_a * 2
print(scaled_array)
# Output: [ 2  4  6  8]

Detailed Comparison

Performance and Speed

This is the most important reason to use NumPy. NumPy operations are executed by pre-compiled C or Fortran code, which is orders of magnitude faster than Python's interpreted loops.

Demonstration:

import numpy as np
import time
# Create a large list and a large NumPy array
size = 1000000
python_list = list(range(size))
numpy_array = np.arange(size)
# Time adding two lists
start_time = time.time()
result_list = [a + b for a, b in zip(python_list, python_list)]
list_time = time.time() - start_time
# Time adding two NumPy arrays
start_time = time.time()
result_array = numpy_array + numpy_array
array_time = time.time() - start_time
print(f"List addition took: {list_time:.4f} seconds")
print(f"NumPy addition took: {array_time:.6f} seconds")
# Example Output:
# List addition took: 0.0987 seconds
# NumPy addition took: 0.002100 seconds
# NumPy is ~47x faster in this example!

Memory Usage

NumPy arrays are more memory-efficient because they store a single data type in a contiguous block of memory, while Python lists store references to objects, which adds overhead.

import sys
# A Python list of integers
python_list = [1, 2, 3, 4, 5]
print(f"Size of Python list: {sys.getsizeof(python_list)} bytes")
# A NumPy array of integers
numpy_array = np.array([1, 2, 3, 4, 5])
print(f"Size of NumPy array: {numpy_array.nbytes} bytes")
# For a large number of elements, the difference is massive
large_list = [0] * 1000000
large_array = np.zeros(1000000, dtype=np.int8) # Use int8 to save even more memory
print(f"\nSize of large Python list: {sys.getsizeof(large_list)} bytes")
print(f"Size of large NumPy array: {large_array.nbytes} bytes")
# The NumPy array will be significantly smaller.

Functionality and Syntax

NumPy provides a vast suite of mathematical functions that operate on arrays.

arr = np.array([1, 2, 3, 4, 5])
# Basic math
print(f"Sum: {np.sum(arr)}")
print(f"Mean: {np.mean(arr)}")
print(f"Standard Deviation: {np.std(arr)}")
# More complex operations
print(f"Square: {arr ** 2}")
print(f"Square Root: {np.sqrt(arr)}")
# With a list, you'd have to import the `math` module and use a list comprehension:
import math
sqrt_list = [math.sqrt(x) for x in arr]
print(f"Square Root (list): {sqrt_list}")

Multidimensional Data

NumPy excels here. A "2D list" in Python is just a list of lists, which can be messy. A NumPy 2D array is a true matrix.

# Python list of lists (can be jagged)
matrix_list = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11] # This row is shorter!
]
# NumPy 2D array (must be rectangular)
matrix_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
# Accessing elements is similar
print(f"Python list element: {matrix_list[0][1]}")
print(f"NumPy array element: {matrix_array[0, 1]}")
# NumPy makes matrix operations trivial
# Transpose the matrix
transposed_array = matrix_array.T
print("\nTransposed NumPy Array:")
print(transposed_array)

When to Use Which?

Use a Python list when:

  • You need to store a collection of items of different data types (e.g., [1, "apple", True]).
  • The data size is small and performance is not a concern.
  • You need a dynamic structure where the size changes frequently (e.g., constantly appending or removing items).
  • You are working with jagged arrays (lists of lists where inner lists have different lengths).

Use a NumPy ndarray when:

  • You are performing numerical computations on large datasets (e.g., in data science, machine learning, physics simulations).
  • You need high performance and low memory usage.
  • You are working with homogeneous numerical data (e.g., all integers or all floats).
  • You are working with multi-dimensional data like vectors, matrices, or images.
  • You need to leverage linear algebra or other advanced mathematical functions.

Conversion Between Lists and NumPy Arrays

It's very easy to convert between them.

import numpy as np
# 1. Convert a Python list to a NumPy array
my_list = [10, 20, 30, 40, 50]
my_array = np.array(my_list)
print(f"Original list: {my_list}")
print(f"Converted array: {my_array}")
print(f"Type of converted object: {type(my_array)}")
# 2. Convert a NumPy array to a Python list
back_to_list = my_array.tolist()
print("\n----------------------------------")
print(f"Original array: {my_array}")
print(f"Converted list: {back_to_list}")
print(f"Type of converted object: {type(back_to_list)}")
分享:
扫描分享到社交APP
上一篇
下一篇