Of course! Working with DICOM images that contain JPEG (or JPEG-LS, JPEG 2000) pixel data is a very common task in medical imaging. The standard Python library for this is pydicom.

The key challenge is that pydicom can read the DICOM metadata (like patient information, image dimensions, etc.) from a JPEG-compressed file, but it cannot directly decompress the pixel data into a usable NumPy array. For that, you need an additional library.
Here's a complete guide covering the essentials.
Prerequisites: The Right Tools
You need two main libraries:
pydicom: The core library for reading, writing, and manipulating DICOM files.gdcmorpyjpegls: A decompressor library thatpydicomcan use.gdcm(GDCM - Grassroots DICOM): A powerful, C++-based library. It's the most robust choice and supports JPEG, JPEG-LS, and JPEG 2000. This is the recommended option.pyjpegls: A pure Python wrapper for the CharLS library. It's lighter but only supports JPEG-LS compression.
Installation
First, install pydicom. Then, install one of the decompressors.

Option A: Recommended (using gdcm)
This is the most reliable method. gdcm has its own dependencies that might need to be installed via your system's package manager (e.g., apt, brew, choco).
# Install pydicom pip install pydicom # Install gdcm (this can be tricky) # On Ubuntu/Debian: sudo apt-get update sudo apt-get install gdcm # On macOS (using Homebrew): brew install gdcm # On Windows, it's often easiest to use conda: conda install -c conda-forge gdcm # Now, install the Python binding for gdcm pip install python-gdcm
Option B: Simpler (using pyjpegls)
Use this if you are certain you only need to handle JPEG-LS compressed files and want to avoid system-level dependencies.
pip install pydicom pyjpegls
The Core Problem: PhotometricInterpretation
When you read a DICOM file, pydicom checks the PhotometricInterpretation tag. If it's something like RGB or YBR_FULL_422, it knows the pixel data is compressed and needs special handling.
If you try to access pixel_array on a compressed file without a decompressor, pydicom will raise an error:

# This will FAIL if no decompressor is found
import pydicom
import numpy as np
# ds = pydicom.dcmread("path/to/your_compressed_image.dcm")
# print(f"Photometric Interpretation: {ds.PhotometricInterpretation}")
# print(f"Rows: {ds.Rows}, Columns: {ds.Columns}")
# This line will raise a NotImplementedError or similar
# pixel_array = ds.pixel_array
The Solution: Reading and Decompressing
With gdcm installed, pydicom will automatically detect it and use it to decompress the pixel data when you access the pixel_array attribute. The process is seamless.
Here is a complete, working example.
Example Code
import pydicom
import numpy as np
import matplotlib.pyplot as plt
import os
# --- 1. Make sure you have a DICOM file with JPEG compression ---
# For this example, we'll create a dummy path.
# Replace this with the actual path to your DICOM file.
# You can find sample DICOM files online (e.g., from The Cancer Imaging Archive - TCIA).
try:
# Using a sample file from the internet for demonstration
# This is a CT image with JPEG2000 compression, which gdcm handles well.
file_url = "https://github.com/pydicom/pydicom/raw/master/tests/test_files/CT_small.dcm"
filename = "CT_small.dcm"
if not os.path.exists(filename):
import urllib.request
print(f"Downloading sample DICOM file...")
urllib.request.urlretrieve(file_url, filename)
ds = pydicom.dcmread(filename)
except FileNotFoundError:
print(f"Error: The file '{filename}' was not found.")
print("Please replace 'CT_small.dcm' with the path to your DICOM file.")
exit()
# --- 2. Check the metadata to confirm compression ---
print(f"File: {filename}")
print(f"Photometric Interpretation: {ds.PhotometricInterpretation}")
print(f"Transfer Syntax UID: {ds.file_meta.TransferSyntaxUID.name}")
print(f"Rows: {ds.Rows}, Columns: {ds.Columns}")
print(f"Bits Allocated: {ds.BitsAllocated}")
print("-" * 30)
# --- 3. The Magic: Access pixel_array ---
# With gdcm installed, this line will automatically decompress the JPEG data.
# It might take a moment for large or highly compressed images.
try:
pixel_array = ds.pixel_array
print("Successfully read and decompressed pixel data!")
print(f"Pixel array shape: {pixel_array.shape}")
print(f"Pixel array dtype: {pixel_array.dtype}")
print(f"Pixel value range: [{pixel_array.min()}, {pixel_array.max()}]")
except Exception as e:
print(f"Failed to read pixel data. This usually means no compatible decompressor (like gdcm) was found.")
print(f"Error: {e}")
exit()
# --- 4. Visualize the image (optional, requires matplotlib) ---
try:
plt.figure(figsize=(8, 8))
# For grayscale, use a colormap
plt.imshow(pixel_array, cmap="gray")
plt.title("Decompressed DICOM Image")
plt.colorbar()
plt.axis('off')
plt.show()
except ImportError:
print("\nMatplotlib is not installed. Skipping image visualization.")
print("To install it, run: pip install matplotlib")
Writing a DICOM File with JPEG Compression
Sometimes, you might want to take an uncompressed image (e.g., from a NumPy array) and save it in a DICOM file with JPEG compression to save space. pydicom makes this easy.
The key is to set the TransferSyntaxUID to a JPEG-compatible one before writing.
Common JPEG Transfer Syntax UIDs
- JPEG Baseline (Process 1):
pydicom.uid.JPEGBaseline8Bit - JPEG Lossless (Process 14):
pydicom.uid.JPEGLosslessSVRNonHierarchical - JPEG-LS Lossless:
pydicom.uid.JPEGLSLosslessOnly - JPEG-LS Lossy:
pydicom.uid.JPEGLSLossy - JPEG 2000 Image Compression:
pydicom.uid.JPEG2000ImageCompressionLosslessOnlyorpydicom.uid.JPEG2000ImageCompression
Example: Writing a JPEG-LS Compressed DICOM
import pydicom
import numpy as np
from pydicom.dataset import Dataset, FileMetaDataset
from pydicom.uid import ExplicitVRLittleEndian, JPEGLSLosslessOnly
# 1. Create a new DICOM dataset
# This will be our header information
new_ds = Dataset()
# 2. Add some required standard tags (you can customize these)
new_ds.SOPClassUID = '1.2.840.10008.5.1.4.1.1.2' # CT Image Storage
new_ds.SOPInstanceUID = pydicom.uid.generate_uid()
new_ds.PatientName = "Test^Patient"
new_ds.PatientID = "12345"
new_ds.StudyInstanceUID = pydicom.uid.generate_uid()
new_ds.SeriesInstanceUID = pydicom.uid.generate_uid()
# 3. Add pixel data related tags
# Let's create a sample 512x512 grayscale image
pixel_data = np.random.randint(0, 256, size=(512, 512), dtype=np.uint8)
new_ds.Rows = 512
new_ds.Columns = 512
new_ds.BitsAllocated = 8
new_ds.BitsStored = 8
new_ds.HighBit = 7
new_ds.PixelRepresentation = 0 # 0 for unsigned
new_ds.SamplesPerPixel = 1
new_ds.PhotometricInterpretation = "MONOCHROME2"
# 4. Set the Transfer Syntax to JPEG-LS Lossless
# This tells pydicom to compress the data when writing
new_ds.file_meta = FileMetaDataset()
new_ds.file_meta.TransferSyntaxUID = JPEGLSLosslessOnly
# 5. Add the pixel data
# pydicom will automatically compress it using the specified Transfer Syntax
new_ds.PixelData = pixel_data.tobytes()
# 6. Save the file
output_filename = "my_jpeg_ls_image.dcm"
pydicom.dcmwrite(output_filename, new_ds)
print(f"Successfully created a JPEG-LS compressed DICOM file: {output_filename}")
# --- Verification: Let's read it back ---
# This should now work easily if gdcm is installed
read_ds = pydicom.dcmread(output_filename)
print(f"\nVerification of written file:")
print(f"Photometric Interpretation: {read_ds.PhotometricInterpretation}")
print(f"Transfer Syntax UID: {read_ds.file_meta.TransferSyntaxUID.name}")
reconstructed_pixels = read_ds.pixel_array
print(f"Read back pixel array shape: {reconstructed_pixels.shape}")
print(f"Original and reconstructed arrays are equal: {np.array_equal(pixel_data, reconstructed_pixels)}")
Summary
| Task | Key Code/Concept | Required Libraries |
|---|---|---|
| Read DICOM with JPEG | ds = pydicom.dcmread("file.dcm")pixel_array = ds.pixel_array |
pydicom + gdcm (or pyjpegls) |
| Check if Compressed | print(ds.PhotometricInterpretation)print(ds.file_meta.TransferSyntaxUID.name) |
pydicom |
| Write DICOM with JPEG | new_ds.file_meta.TransferSyntaxUID = pydicom.uid.JPEGLSLosslessOnlynew_ds.PixelData = pixel_data.tobytes() |
pydicom |
| Why it Works | pydicom's pixel_array attribute acts as a property that triggers decompression if a compatible library is found. |
pydicom's internal logic + external decompressor |
