Image Preprocessing¶
Raw images captured by robot cameras are rarely ready for direct use in perception algorithms. Lighting variations, sensor noise, irrelevant background details, and inconsistent image sizes all degrade downstream tasks such as object detection, feature matching, and pose estimation. Image preprocessing transforms raw camera data into a cleaner, more consistent representation so that higher-level algorithms can operate reliably.
This tutorial covers the essential preprocessing techniques every robotics student should master—from color-space conversion and filtering to morphological operations and end-to-end pipelines.
Learning Objectives¶
After completing this tutorial you will be able to:
- Convert images between common color spaces (BGR, HSV, grayscale)
- Apply spatial filtering (Gaussian, median, bilateral) and understand their trade-offs
- Enhance image contrast using histogram equalization and CLAHE
- Use morphological operations to clean up binary masks
- Apply geometric transformations (resize, affine, perspective)
- Build a complete preprocessing pipeline for a robot vision task
Prerequisites¶
| Requirement | Details |
|---|---|
| Python | 3.8+ |
| Libraries | opencv-python, numpy, matplotlib |
| Prior knowledge | Basic Python, NumPy array indexing |
Install dependencies if needed:
1. Color Spaces¶
A color space defines how pixel colors are numerically represented. Choosing the right color space for a given task can simplify downstream processing significantly.
1.1 RGB / BGR¶
Most libraries (PIL, matplotlib, scikit-image) use RGB ordering, but OpenCV defaults to BGR. This historical choice traces back to early OpenCV development when the BGR byte order was common in video capture hardware.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load an image (returned in BGR order)
img_bgr = cv2.imread("robot_workspace.jpg")
# Convert BGR -> RGB for correct display with matplotlib
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(img_bgr) # wrong colors — channels swapped
axes[0].set_title("Raw BGR (displayed as RGB)")
axes[1].imshow(img_rgb) # correct colors
axes[1].set_title("Converted RGB")
for ax in axes:
ax.axis("off")
plt.tight_layout()
plt.show()
Common Pitfall
Forgetting the BGR→RGB conversion before calling plt.imshow() is the #1 beginner mistake in OpenCV + matplotlib workflows. The image will look blueish/reddish.
1.2 HSV / HSL¶
HSV (Hue, Saturation, Value) separates color (hue) from intensity (value), making it ideal for color-based segmentation—for example, detecting a colored ball or lane markings regardless of lighting.
| Channel | Range (OpenCV) | Meaning |
|---|---|---|
| H (Hue) | 0–179 | Color angle on the wheel |
| S (Saturation) | 0–255 | Color purity |
| V (Value) | 0–255 | Brightness |
Interactive HSV Trackbar Demo¶
import cv2
import numpy as np
def nothing(x):
pass
img = cv2.imread("robot_workspace.jpg")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
cv2.namedWindow("Trackbars")
cv2.createTrackbar("L-H", "Trackbars", 0, 179, nothing)
cv2.createTrackbar("L-S", "Trackbars", 0, 255, nothing)
cv2.createTrackbar("L-V", "Trackbars", 0, 255, nothing)
cv2.createTrackbar("U-H", "Trackbars", 179, 179, nothing)
cv2.createTrackbar("U-S", "Trackbars", 255, 255, nothing)
cv2.createTrackbar("U-V", "Trackbars", 255, 255, nothing)
while True:
l_h = cv2.getTrackbarPos("L-H", "Trackbars")
l_s = cv2.getTrackbarPos("L-S", "Trackbars")
l_v = cv2.getTrackbarPos("L-V", "Trackbars")
u_h = cv2.getTrackbarPos("U-H", "Trackbars")
u_s = cv2.getTrackbarPos("U-S", "Trackbars")
u_v = cv2.getTrackbarPos("U-V", "Trackbars")
lower = np.array([l_h, l_s, l_v])
upper = np.array([u_h, u_s, u_v])
mask = cv2.inRange(hsv, lower, upper)
result = cv2.bitwise_and(img, img, mask=mask)
cv2.imshow("Original", img)
cv2.imshow("Mask", mask)
cv2.imshow("Result", result)
if cv2.waitKey(1) & 0xFF == 27: # ESC to quit
break
cv2.destroyAllWindows()
Static HSV Color Detection Example¶
import cv2
import numpy as np
import matplotlib.pyplot as plt
img_bgr = cv2.imread("robot_workspace.jpg")
hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)
# Detect blue objects (typical range for blue in OpenCV HSV)
lower_blue = np.array([100, 50, 50])
upper_blue = np.array([130, 255, 255])
mask = cv2.inRange(hsv, lower_blue, upper_blue)
result = cv2.bitwise_and(img_bgr, img_bgr, mask=mask)
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
axes[0].imshow(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original")
axes[1].imshow(mask, cmap="gray")
axes[1].set_title("Blue Mask")
axes[2].imshow(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
axes[2].set_title("Detected Blue Regions")
for ax in axes:
ax.axis("off")
plt.tight_layout()
plt.show()
1.3 Grayscale Conversion¶
Many algorithms (edge detection, template matching, feature extraction) operate on single-channel grayscale images. Converting to grayscale reduces data by 3× and simplifies computations.
import cv2
import matplotlib.pyplot as plt
img_bgr = cv2.imread("robot_workspace.jpg")
gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
print(f"Original shape: {img_bgr.shape}") # (H, W, 3)
print(f"Grayscale shape: {gray.shape}") # (H, W)
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
axes[0].set_title("Color (RGB)")
axes[1].imshow(gray, cmap="gray")
axes[1].set_title("Grayscale")
for ax in axes:
ax.axis("off")
plt.tight_layout()
plt.show()
2. Image Filtering¶
Filtering (also called smoothing or blurring) suppresses noise and suppresses fine detail. The choice of filter depends on the noise type and whether you need to preserve edges.
2.1 Gaussian Filter¶
The Gaussian filter replaces each pixel with a weighted average of its neighbors, using a bell-curve (Gaussian) kernel. It is effective against Gaussian noise and is the most commonly used low-pass filter.
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
# Apply Gaussian blur with increasing kernel sizes
blur_3 = cv2.GaussianBlur(img, (3, 3), 0)
blur_7 = cv2.GaussianBlur(img, (7, 7), 0)
blur_15 = cv2.GaussianBlur(img, (15, 15), 0)
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
for ax, im, title in zip(axes,
[img, blur_3, blur_7, blur_15],
["Original", "3×3", "7×7", "15×15"]):
ax.imshow(im, cmap="gray")
ax.set_title(f"Gaussian {title}")
ax.axis("off")
plt.tight_layout()
plt.show()
Kernel Size Rules
- Must be odd and positive: 3, 5, 7, …
- Larger kernels → more smoothing, slower computation
- The third argument
0tells OpenCV to compute σ from the kernel size
2.2 Median Filter¶
The median filter replaces each pixel with the median of its neighborhood. It is especially effective at removing salt-and-pepper noise (random black/white pixels) while preserving edges better than a Gaussian filter.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
# Simulate salt-and-pepper noise
def add_salt_pepper(image, amount=0.02):
noisy = image.copy()
num_salt = int(amount * image.size / 2)
# Salt
coords = tuple(np.random.randint(0, d, num_salt) for d in image.shape)
noisy[coords] = 255
# Pepper
coords = tuple(np.random.randint(0, d, num_salt) for d in image.shape)
noisy[coords] = 0
return noisy
noisy = add_salt_pepper(img, amount=0.03)
median_3 = cv2.medianBlur(noisy, 3)
median_7 = cv2.medianBlur(noisy, 7)
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, im, title in zip(axes,
[noisy, median_3, median_7],
["Noisy", "Median 3×3", "Median 7×7"]):
ax.imshow(im, cmap="gray")
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
plt.show()
2.3 Bilateral Filter¶
The bilateral filter smooths the image while preserving edges. It considers both spatial distance and intensity difference when weighting neighbors—pixels across an edge contribute very little.
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
# cv2.bilateralFilter(src, d, sigmaColor, sigmaSpace)
bilateral = cv2.bilateralFilter(img, d=9, sigmaColor=75, sigmaSpace=75)
gaussian = cv2.GaussianBlur(img, (9, 9), 0)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, im, title in zip(axes,
[img, gaussian, bilateral],
["Original", "Gaussian 9×9", "Bilateral d=9"]):
ax.imshow(im, cmap="gray")
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
plt.show()
| Parameter | Meaning |
|---|---|
d |
Diameter of pixel neighborhood (use -1 to auto-compute from sigmaSpace) |
sigmaColor |
Larger → more colors in the neighborhood are mixed |
sigmaSpace |
Larger → farther pixels influence each other |
2.4 Filter Comparison¶
| Filter | Best For | Preserves Edges? | Speed |
|---|---|---|---|
| Gaussian | Gaussian / general noise | No | Fast |
| Median | Salt-and-pepper noise | Moderate | Medium |
| Bilateral | Edge-aware smoothing | Yes | Slow |
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
filters = {
"Gaussian (5×5)": cv2.GaussianBlur(img, (5, 5), 0),
"Median (5)": cv2.medianBlur(img, 5),
"Bilateral": cv2.bilateralFilter(img, 5, 50, 50),
}
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
axes[0].imshow(img, cmap="gray")
axes[0].set_title("Original")
for ax, (name, im) in zip(axes[1:], filters.items()):
ax.imshow(im, cmap="gray")
ax.set_title(name)
for ax in axes:
ax.axis("off")
plt.tight_layout()
plt.show()
3. Histogram Equalization¶
Histogram equalization redistributes pixel intensities to span the full dynamic range, improving contrast in poorly lit scenes—common in indoor robot environments.
3.1 Basic Histogram Equalization¶
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
equ = cv2.equalizeHist(img)
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
axes[0, 0].imshow(img, cmap="gray")
axes[0, 0].set_title("Original")
axes[0, 1].imshow(equ, cmap="gray")
axes[0, 1].set_title("Equalized")
axes[1, 0].hist(img.ravel(), bins=256, range=(0, 256))
axes[1, 0].set_title("Original Histogram")
axes[1, 1].hist(equ.ravel(), bins=256, range=(0, 256))
axes[1, 1].set_title("Equalized Histogram")
for ax in axes.flat:
if ax in axes[0]:
ax.axis("off")
plt.tight_layout()
plt.show()
3.2 CLAHE (Contrast Limited Adaptive Histogram Equalization)¶
Basic equalization uses a global histogram, which can over-amplify noise in uniform regions. CLAHE divides the image into small tiles and equalizes each tile independently, with a clip limit to prevent noise amplification.
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
# Global equalization
equ = cv2.equalizeHist(img)
# CLAHE
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
clahe_img = clahe.apply(img)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, im, title in zip(axes,
[img, equ, clahe_img],
["Original", "Global Equalization", "CLAHE (clip=2.0)"]):
ax.imshow(im, cmap="gray")
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
plt.show()
When to Use CLAHE
CLAHE is the preferred choice for robotics because scenes often have both bright and dark regions (e.g., a robot navigating from a hallway into a sunlit room). It avoids the washed-out look of global equalization.
4. Morphological Operations¶
Morphological operations process binary (or grayscale) images based on shape. They are essential for cleaning up thresholded masks—removing small noise blobs, filling tiny holes, and separating touching objects.
4.1 Erosion and Dilation¶
Erosion shrinks bright regions: a pixel becomes 1 only if all pixels under the kernel are 1.
Dilation expands bright regions: a pixel becomes 1 if any pixel under the kernel is 1.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Create a synthetic binary image for demonstration
img = np.zeros((200, 200), dtype=np.uint8)
cv2.circle(img, (60, 60), 30, 255, -1)
cv2.circle(img, (140, 60), 30, 255, -1)
cv2.circle(img, (100, 140), 40, 255, -1)
cv2.rectangle(img, (20, 120), (50, 180), 255, -1)
# Add some noise
noise = np.random.randint(0, 2, img.shape, dtype=np.uint8) * 255
img_noisy = cv2.bitwise_or(img, noise)
# Define a kernel
kernel = np.ones((5, 5), np.uint8)
eroded = cv2.erode(img_noisy, kernel, iterations=1)
dilated = cv2.dilate(img_noisy, kernel, iterations=1)
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, im, title in zip(axes,
[img_noisy, eroded, dilated],
["Noisy Binary", "Erosion (5×5)", "Dilation (5×5)"]):
ax.imshow(im, cmap="gray")
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
plt.show()
Kernel Variations¶
import cv2
import numpy as np
# Rectangular kernel
kernel_rect = np.ones((5, 5), np.uint8)
# Elliptical kernel
kernel_ellip = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
# Cross-shaped kernel
kernel_cross = cv2.getStructuringElement(cv2.MORPH_CROSS, (5, 5))
print("Rectangular:\n", kernel_rect)
print("Elliptical:\n", kernel_ellip)
print("Cross:\n", kernel_cross)
4.2 Opening and Closing¶
- Opening = erosion followed by dilation. Removes small bright noise while preserving larger structures.
- Closing = dilation followed by erosion. Fills small dark holes inside bright objects.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = np.zeros((200, 200), dtype=np.uint8)
cv2.circle(img, (100, 100), 50, 255, -1)
# Add noise and holes
noise = np.random.randint(0, 2, img.shape, dtype=np.uint8) * 255
# Simulate holes by setting random pixels to 0 inside the circle
img_dirty = cv2.bitwise_or(img, noise)
kernel = np.ones((7, 7), np.uint8)
opened = cv2.morphologyEx(img_dirty, cv2.MORPH_OPEN, kernel)
closed = cv2.morphologyEx(img_dirty, cv2.MORPH_CLOSE, kernel)
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, im, title in zip(axes,
[img_dirty, opened, closed],
["Dirty Binary", "Opening (noise removal)", "Closing (hole filling)"]):
ax.imshow(im, cmap="gray")
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
plt.show()
4.3 Practical Use: Noise Removal and Hole Filling¶
In robotics, after thresholding a color-detected object, the mask is often noisy. A typical cleanup sequence:
import cv2
import numpy as np
def clean_mask(mask, kernel_size=5, iterations=2):
"""Clean a binary mask using morphological operations."""
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
# Step 1: Remove small noise (opening)
cleaned = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=iterations)
# Step 2: Fill small holes (closing)
cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel, iterations=iterations)
return cleaned
# Usage with a color-detected mask:
# mask = cv2.inRange(hsv, lower_color, upper_color)
# clean = clean_mask(mask)
5. Image Transformation¶
Geometric transformations adjust the position, orientation, and size of images. They are used for normalizing input to neural networks, correcting camera perspective, and aligning images for stitching.
5.1 Resize and Crop¶
import cv2
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Resize to exact dimensions
resized = cv2.resize(img_rgb, (320, 240))
# Resize by scale factor
half = cv2.resize(img_rgb, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
# Crop a region of interest (y1:y2, x1:x2)
h, w = img_rgb.shape[:2]
cropped = img_rgb[h//4:3*h//4, w//4:3*w//4]
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, im, title in zip(axes,
[resized, half, cropped],
[f"Resized {resized.shape[1]}×{resized.shape[0]}",
f"Scaled 50% {half.shape[1]}×{half.shape[0]}",
f"Cropped {cropped.shape[1]}×{cropped.shape[0]}"]):
ax.imshow(im)
ax.set_title(title)
ax.axis("off")
plt.tight_layout()
plt.show()
Interpolation Methods
cv2.INTER_AREA— best for shrinking (avoids moiré)cv2.INTER_LINEAR— good default for enlargement (bilinear)cv2.INTER_CUBIC— higher quality but slowercv2.INTER_NEAREST— fastest, nearest-neighbor (use for masks)
5.2 Affine Transform¶
An affine transform preserves parallel lines (translation, rotation, scaling, shearing). You specify three pairs of corresponding points.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
h, w = img.shape[:2]
# Source points (in the original image)
pts_src = np.float32([[50, 50], [200, 50], [50, 200]])
# Destination points (where they should map to)
pts_dst = np.float32([[10, 100], [200, 50], [100, 250]])
# Compute the affine matrix
M = cv2.getAffineTransform(pts_src, pts_dst)
# Apply the transform
affine_img = cv2.warpAffine(img_rgb, M, (w, h))
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(img_rgb)
axes[0].set_title("Original")
for pt in pts_src:
axes[0].plot(pt[0], pt[1], 'ro')
axes[1].imshow(affine_img)
axes[1].set_title("Affine Transform")
for pt in pts_dst:
axes[1].plot(pt[0], pt[1], 'ro')
for ax in axes:
ax.axis("off")
plt.tight_layout()
plt.show()
Rotation Around a Point¶
import cv2
import numpy as np
img = cv2.imread("robot_workspace.jpg")
h, w = img.shape[:2]
# Rotate 30 degrees around the center
center = (w // 2, h // 2)
angle = 30
scale = 1.0
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(img, M, (w, h))
5.3 Perspective Transform¶
A perspective transform (homography) maps four source points to four destination points, correcting perspective distortion. This is useful for document scanning, AR tag detection, and bird's-eye-view generation.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread("robot_workspace.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
h, w = img.shape[:2]
# Four corners of a document/tag in the image (may be skewed)
pts_src = np.float32([
[56, 65],
[368, 52],
[28, 387],
[389, 390]
])
# Desired output rectangle
output_w, output_h = 300, 400
pts_dst = np.float32([
[0, 0],
[output_w, 0],
[0, output_h],
[output_w, output_h]
])
# Compute perspective matrix
M = cv2.getPerspectiveTransform(pts_src, pts_dst)
# Apply perspective warp
warped = cv2.warpPerspective(img_rgb, M, (output_w, output_h))
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].imshow(img_rgb)
axes[0].set_title("Original (with skew)")
# Draw source quadrilateral
src_quad = np.vstack([pts_src, pts_src[0]])
axes[0].plot(src_quad[:, 0], src_quad[:, 1], 'r-', linewidth=2)
for pt in pts_src:
axes[0].plot(pt[0], pt[1], 'ro')
axes[1].imshow(warped)
axes[1].set_title("Perspective Corrected")
for ax in axes:
ax.axis("off")
plt.tight_layout()
plt.show()
6. Practical Pipeline for Robot Vision¶
In real robot systems, you combine multiple preprocessing steps into a pipeline. Here is a complete example that detects a colored object and computes its center.
Pipeline Steps¶
Capture frame
→ Convert to HSV
→ Apply color range mask
→ Clean mask (morphological opening + closing)
→ Find contours
→ Compute bounding box / centroid
→ Draw result
Complete Pipeline Code¶
import cv2
import numpy as np
import matplotlib.pyplot as plt
def preprocess_frame(frame):
"""Apply noise reduction and enhancement."""
# Denoise with bilateral filter (preserves edges)
denoised = cv2.bilateralFilter(frame, d=9, sigmaColor=75, sigmaSpace=75)
return denoised
def create_color_mask(hsv_frame, lower, upper):
"""Create and clean a binary mask for the target color."""
mask = cv2.inRange(hsv_frame, lower, upper)
# Morphological cleanup
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=2)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=2)
return mask
def detect_object(frame, mask):
"""Find the largest contour and compute its centroid."""
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if not contours:
return None, None, frame
# Find the largest contour by area
largest = max(contours, key=cv2.contourArea)
# Skip tiny contours (noise)
if cv2.contourArea(largest) < 500:
return None, None, frame
# Compute bounding box and centroid
x, y, w, h = cv2.boundingRect(largest)
M = cv2.moments(largest)
if M["m00"] > 0:
cx = int(M["m10"] / M["m00"])
cy = int(M["m01"] / M["m00"])
else:
cx, cy = x + w // 2, y + h // 2
# Draw results
output = frame.copy()
cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.circle(output, (cx, cy), 5, (0, 0, 255), -1)
cv2.putText(output, f"({cx}, {cy})", (cx + 10, cy),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
return (cx, cy), (x, y, w, h), output
def run_pipeline(image_path):
"""Full preprocessing and detection pipeline."""
# --- Step 1: Capture / Load ---
frame = cv2.imread(image_path)
if frame is None:
raise FileNotFoundError(f"Cannot load image: {image_path}")
# --- Step 2: Denoise ---
denoised = preprocess_frame(frame)
# --- Step 3: Convert to HSV ---
hsv = cv2.cvtColor(denoised, cv2.COLOR_BGR2HSV)
# --- Step 4: Create color mask (targeting blue objects) ---
lower_blue = np.array([100, 50, 50])
upper_blue = np.array([130, 255, 255])
mask = create_color_mask(hsv, lower_blue, upper_blue)
# --- Step 5: Detect object ---
centroid, bbox, output = detect_object(denoised, mask)
if centroid:
print(f"Object detected at centroid: {centroid}, bbox: {bbox}")
else:
print("No object detected.")
# --- Step 6: Visualize ---
fig, axes = plt.subplots(1, 4, figsize=(18, 4))
axes[0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
axes[0].set_title("1. Original")
axes[1].imshow(cv2.cvtColor(denoised, cv2.COLOR_BGR2RGB))
axes[1].set_title("2. Denoised")
axes[2].imshow(mask, cmap="gray")
axes[2].set_title("3. Cleaned Mask")
axes[3].imshow(cv2.cvtColor(output, cv2.COLOR_BGR2RGB))
axes[3].set_title("4. Detection Result")
for ax in axes:
ax.axis("off")
plt.suptitle("Robot Vision Preprocessing Pipeline", fontsize=14)
plt.tight_layout()
plt.show()
return centroid, bbox
# --- Run the pipeline ---
if __name__ == "__main__":
centroid, bbox = run_pipeline("robot_workspace.jpg")
Live Camera Pipeline (for real robots)¶
import cv2
import numpy as np
def live_pipeline(camera_index=0):
"""Run the preprocessing pipeline on a live camera feed."""
cap = cv2.VideoCapture(camera_index)
if not cap.isOpened():
print("Error: Cannot open camera.")
return
# Define color range (adjust for your target object)
lower = np.array([100, 50, 50])
upper = np.array([130, 255, 255])
print("Press 'q' to quit.")
while True:
ret, frame = cap.read()
if not ret:
break
# Pipeline
denoised = cv2.bilateralFilter(frame, 9, 75, 75)
hsv = cv2.cvtColor(denoised, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, lower, upper)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=2)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=2)
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt) > 500:
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Show results
cv2.imshow("Frame", frame)
cv2.imshow("Mask", mask)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
live_pipeline()
7. Exercises¶
Exercise 1: Color Space Exploration (Easy)¶
Load an image containing at least two distinct colored objects. Convert to HSV and use trackbars to isolate each object by color. Report the HSV ranges you found.
Exercise 2: Noise Removal Challenge (Medium)¶
Add Gaussian noise (σ=25) and salt-and-pepper noise (5%) to a clean image. Apply Gaussian, median, and bilateral filters. Compare the results using PSNR (Peak Signal-to-Noise Ratio):
import numpy as np
def psnr(original, processed):
mse = np.mean((original.astype(float) - processed.astype(float)) ** 2)
if mse == 0:
return float('inf')
return 10 * np.log10(255.0 ** 2 / mse)
Which filter performs best for each noise type? Why?
Exercise 3: Build a Lane Detector (Hard)¶
Using a top-down camera view of a road/track:
- Convert to HSV and mask the lane markings (typically white or yellow)
- Clean the mask with morphological operations
- Apply perspective transform to get a bird's-eye view
- Detect lane lines using
cv2.HoughLinesP
Exercise 4: Document Scanner (Hard)¶
Write a function that:
- Detects the largest quadrilateral contour in an image
- Applies a perspective transform to produce a flat, rectangular output
- Applies CLAHE to enhance readability
Exercise 5: Preprocessing Tuning (Medium)¶
Given an image captured under poor lighting conditions (provided by your TA), design a preprocessing pipeline that maximizes the visibility of objects of interest. Document each step and explain your choices.
References¶
- OpenCV Documentation — docs.opencv.org
- OpenCV-Python Tutorials — opencv-python-tutroals.readthedocs.io
- Szeliski, R. (2022). Computer Vision: Algorithms and Applications, 2nd ed. Springer. Chapter 3: Image Processing.
- Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing, 4th ed. Pearson.
- Bradski, G. (2000). The OpenCV Library. Dr. Dobb's Journal of Software Tools.
This tutorial is part of the Robotics Course at CUHK-SZ.