Skip to content

Image Preprocessing

Raw images captured by robot cameras are rarely ready for direct use in perception algorithms. Lighting variations, sensor noise, irrelevant background details, and inconsistent image sizes all degrade downstream tasks such as object detection, feature matching, and pose estimation. Image preprocessing transforms raw camera data into a cleaner, more consistent representation so that higher-level algorithms can operate reliably.

This tutorial covers the essential preprocessing techniques every robotics student should master—from color-space conversion and filtering to morphological operations and end-to-end pipelines.


Learning Objectives

After completing this tutorial you will be able to:

  • Convert images between common color spaces (BGR, HSV, grayscale)
  • Apply spatial filtering (Gaussian, median, bilateral) and understand their trade-offs
  • Enhance image contrast using histogram equalization and CLAHE
  • Use morphological operations to clean up binary masks
  • Apply geometric transformations (resize, affine, perspective)
  • Build a complete preprocessing pipeline for a robot vision task

Prerequisites

Requirement Details
Python 3.8+
Libraries opencv-python, numpy, matplotlib
Prior knowledge Basic Python, NumPy array indexing

Install dependencies if needed:

pip install opencv-python numpy matplotlib

1. Color Spaces

A color space defines how pixel colors are numerically represented. Choosing the right color space for a given task can simplify downstream processing significantly.

1.1 RGB / BGR

Most libraries (PIL, matplotlib, scikit-image) use RGB ordering, but OpenCV defaults to BGR. This historical choice traces back to early OpenCV development when the BGR byte order was common in video capture hardware.

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load an image (returned in BGR order)
img_bgr = cv2.imread("robot_workspace.jpg")

# Convert BGR -> RGB for correct display with matplotlib
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(img_bgr)          # wrong colors — channels swapped
axes[0].set_title("Raw BGR (displayed as RGB)")
axes[1].imshow(img_rgb)          # correct colors
axes[1].set_title("Converted RGB")
for ax in axes:
    ax.axis("off")
plt.tight_layout()
plt.show()

Common Pitfall

Forgetting the BGR→RGB conversion before calling plt.imshow() is the #1 beginner mistake in OpenCV + matplotlib workflows. The image will look blueish/reddish.

1.2 HSV / HSL

HSV (Hue, Saturation, Value) separates color (hue) from intensity (value), making it ideal for color-based segmentation—for example, detecting a colored ball or lane markings regardless of lighting.

Channel Range (OpenCV) Meaning
H (Hue) 0–179 Color angle on the wheel
S (Saturation) 0–255 Color purity
V (Value) 0–255 Brightness

Interactive HSV Trackbar Demo

import cv2
import numpy as np

def nothing(x):
    pass

img = cv2.imread("robot_workspace.jpg")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

cv2.namedWindow("Trackbars")
cv2.createTrackbar("L-H", "Trackbars", 0, 179, nothing)
cv2.createTrackbar("L-S", "Trackbars", 0, 255, nothing)
cv2.createTrackbar("L-V", "Trackbars", 0, 255, nothing)
cv2.createTrackbar("U-H", "Trackbars", 179, 179, nothing)
cv2.createTrackbar("U-S", "Trackbars", 255, 255, nothing)
cv2.createTrackbar("U-V", "Trackbars", 255, 255, nothing)

while True:
    l_h = cv2.getTrackbarPos("L-H", "Trackbars")
    l_s = cv2.getTrackbarPos("L-S", "Trackbars")
    l_v = cv2.getTrackbarPos("L-V", "Trackbars")
    u_h = cv2.getTrackbarPos("U-H", "Trackbars")
    u_s = cv2.getTrackbarPos("U-S", "Trackbars")
    u_v = cv2.getTrackbarPos("U-V", "Trackbars")

    lower = np.array([l_h, l_s, l_v])
    upper = np.array([u_h, u_s, u_v])
    mask = cv2.inRange(hsv, lower, upper)
    result = cv2.bitwise_and(img, img, mask=mask)

    cv2.imshow("Original", img)
    cv2.imshow("Mask", mask)
    cv2.imshow("Result", result)

    if cv2.waitKey(1) & 0xFF == 27:  # ESC to quit
        break

cv2.destroyAllWindows()

Static HSV Color Detection Example

import cv2
import numpy as np
import matplotlib.pyplot as plt

img_bgr = cv2.imread("robot_workspace.jpg")
hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)

# Detect blue objects (typical range for blue in OpenCV HSV)
lower_blue = np.array([100, 50, 50])
upper_blue = np.array([130, 255, 255])
mask = cv2.inRange(hsv, lower_blue, upper_blue)
result = cv2.bitwise_and(img_bgr, img_bgr, mask=mask)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))
axes[0].imshow(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original")
axes[1].imshow(mask, cmap="gray")
axes[1].set_title("Blue Mask")
axes[2].imshow(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
axes[2].set_title("Detected Blue Regions")
for ax in axes:
    ax.axis("off")
plt.tight_layout()
plt.show()

1.3 Grayscale Conversion

Many algorithms (edge detection, template matching, feature extraction) operate on single-channel grayscale images. Converting to grayscale reduces data by 3× and simplifies computations.

import cv2
import matplotlib.pyplot as plt

img_bgr = cv2.imread("robot_workspace.jpg")
gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)

print(f"Original shape: {img_bgr.shape}")   # (H, W, 3)
print(f"Grayscale shape: {gray.shape}")     # (H, W)

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
axes[0].set_title("Color (RGB)")
axes[1].imshow(gray, cmap="gray")
axes[1].set_title("Grayscale")
for ax in axes:
    ax.axis("off")
plt.tight_layout()
plt.show()

2. Image Filtering

Filtering (also called smoothing or blurring) suppresses noise and suppresses fine detail. The choice of filter depends on the noise type and whether you need to preserve edges.

2.1 Gaussian Filter

The Gaussian filter replaces each pixel with a weighted average of its neighbors, using a bell-curve (Gaussian) kernel. It is effective against Gaussian noise and is the most commonly used low-pass filter.

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)

# Apply Gaussian blur with increasing kernel sizes
blur_3 = cv2.GaussianBlur(img, (3, 3), 0)
blur_7 = cv2.GaussianBlur(img, (7, 7), 0)
blur_15 = cv2.GaussianBlur(img, (15, 15), 0)

fig, axes = plt.subplots(1, 4, figsize=(16, 4))
for ax, im, title in zip(axes,
    [img, blur_3, blur_7, blur_15],
    ["Original", "3×3", "7×7", "15×15"]):
    ax.imshow(im, cmap="gray")
    ax.set_title(f"Gaussian {title}")
    ax.axis("off")
plt.tight_layout()
plt.show()

Kernel Size Rules

  • Must be odd and positive: 3, 5, 7, …
  • Larger kernels → more smoothing, slower computation
  • The third argument 0 tells OpenCV to compute σ from the kernel size

2.2 Median Filter

The median filter replaces each pixel with the median of its neighborhood. It is especially effective at removing salt-and-pepper noise (random black/white pixels) while preserving edges better than a Gaussian filter.

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)

# Simulate salt-and-pepper noise
def add_salt_pepper(image, amount=0.02):
    noisy = image.copy()
    num_salt = int(amount * image.size / 2)
    # Salt
    coords = tuple(np.random.randint(0, d, num_salt) for d in image.shape)
    noisy[coords] = 255
    # Pepper
    coords = tuple(np.random.randint(0, d, num_salt) for d in image.shape)
    noisy[coords] = 0
    return noisy

noisy = add_salt_pepper(img, amount=0.03)
median_3 = cv2.medianBlur(noisy, 3)
median_7 = cv2.medianBlur(noisy, 7)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, im, title in zip(axes,
    [noisy, median_3, median_7],
    ["Noisy", "Median 3×3", "Median 7×7"]):
    ax.imshow(im, cmap="gray")
    ax.set_title(title)
    ax.axis("off")
plt.tight_layout()
plt.show()

2.3 Bilateral Filter

The bilateral filter smooths the image while preserving edges. It considers both spatial distance and intensity difference when weighting neighbors—pixels across an edge contribute very little.

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)

# cv2.bilateralFilter(src, d, sigmaColor, sigmaSpace)
bilateral = cv2.bilateralFilter(img, d=9, sigmaColor=75, sigmaSpace=75)
gaussian = cv2.GaussianBlur(img, (9, 9), 0)

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, im, title in zip(axes,
    [img, gaussian, bilateral],
    ["Original", "Gaussian 9×9", "Bilateral d=9"]):
    ax.imshow(im, cmap="gray")
    ax.set_title(title)
    ax.axis("off")
plt.tight_layout()
plt.show()
Parameter Meaning
d Diameter of pixel neighborhood (use -1 to auto-compute from sigmaSpace)
sigmaColor Larger → more colors in the neighborhood are mixed
sigmaSpace Larger → farther pixels influence each other

2.4 Filter Comparison

Filter Best For Preserves Edges? Speed
Gaussian Gaussian / general noise No Fast
Median Salt-and-pepper noise Moderate Medium
Bilateral Edge-aware smoothing Yes Slow
import cv2
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)

filters = {
    "Gaussian (5×5)": cv2.GaussianBlur(img, (5, 5), 0),
    "Median (5)":     cv2.medianBlur(img, 5),
    "Bilateral":      cv2.bilateralFilter(img, 5, 50, 50),
}

fig, axes = plt.subplots(1, 4, figsize=(16, 4))
axes[0].imshow(img, cmap="gray")
axes[0].set_title("Original")
for ax, (name, im) in zip(axes[1:], filters.items()):
    ax.imshow(im, cmap="gray")
    ax.set_title(name)
for ax in axes:
    ax.axis("off")
plt.tight_layout()
plt.show()

3. Histogram Equalization

Histogram equalization redistributes pixel intensities to span the full dynamic range, improving contrast in poorly lit scenes—common in indoor robot environments.

3.1 Basic Histogram Equalization

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)
equ = cv2.equalizeHist(img)

fig, axes = plt.subplots(2, 2, figsize=(10, 8))
axes[0, 0].imshow(img, cmap="gray")
axes[0, 0].set_title("Original")
axes[0, 1].imshow(equ, cmap="gray")
axes[0, 1].set_title("Equalized")
axes[1, 0].hist(img.ravel(), bins=256, range=(0, 256))
axes[1, 0].set_title("Original Histogram")
axes[1, 1].hist(equ.ravel(), bins=256, range=(0, 256))
axes[1, 1].set_title("Equalized Histogram")
for ax in axes.flat:
    if ax in axes[0]:
        ax.axis("off")
plt.tight_layout()
plt.show()

3.2 CLAHE (Contrast Limited Adaptive Histogram Equalization)

Basic equalization uses a global histogram, which can over-amplify noise in uniform regions. CLAHE divides the image into small tiles and equalizes each tile independently, with a clip limit to prevent noise amplification.

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg", cv2.IMREAD_GRAYSCALE)

# Global equalization
equ = cv2.equalizeHist(img)

# CLAHE
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
clahe_img = clahe.apply(img)

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, im, title in zip(axes,
    [img, equ, clahe_img],
    ["Original", "Global Equalization", "CLAHE (clip=2.0)"]):
    ax.imshow(im, cmap="gray")
    ax.set_title(title)
    ax.axis("off")
plt.tight_layout()
plt.show()

When to Use CLAHE

CLAHE is the preferred choice for robotics because scenes often have both bright and dark regions (e.g., a robot navigating from a hallway into a sunlit room). It avoids the washed-out look of global equalization.


4. Morphological Operations

Morphological operations process binary (or grayscale) images based on shape. They are essential for cleaning up thresholded masks—removing small noise blobs, filling tiny holes, and separating touching objects.

4.1 Erosion and Dilation

Erosion shrinks bright regions: a pixel becomes 1 only if all pixels under the kernel are 1.

Dilation expands bright regions: a pixel becomes 1 if any pixel under the kernel is 1.

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Create a synthetic binary image for demonstration
img = np.zeros((200, 200), dtype=np.uint8)
cv2.circle(img, (60, 60), 30, 255, -1)
cv2.circle(img, (140, 60), 30, 255, -1)
cv2.circle(img, (100, 140), 40, 255, -1)
cv2.rectangle(img, (20, 120), (50, 180), 255, -1)

# Add some noise
noise = np.random.randint(0, 2, img.shape, dtype=np.uint8) * 255
img_noisy = cv2.bitwise_or(img, noise)

# Define a kernel
kernel = np.ones((5, 5), np.uint8)

eroded = cv2.erode(img_noisy, kernel, iterations=1)
dilated = cv2.dilate(img_noisy, kernel, iterations=1)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, im, title in zip(axes,
    [img_noisy, eroded, dilated],
    ["Noisy Binary", "Erosion (5×5)", "Dilation (5×5)"]):
    ax.imshow(im, cmap="gray")
    ax.set_title(title)
    ax.axis("off")
plt.tight_layout()
plt.show()

Kernel Variations

import cv2
import numpy as np

# Rectangular kernel
kernel_rect = np.ones((5, 5), np.uint8)

# Elliptical kernel
kernel_ellip = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

# Cross-shaped kernel
kernel_cross = cv2.getStructuringElement(cv2.MORPH_CROSS, (5, 5))

print("Rectangular:\n", kernel_rect)
print("Elliptical:\n", kernel_ellip)
print("Cross:\n", kernel_cross)

4.2 Opening and Closing

  • Opening = erosion followed by dilation. Removes small bright noise while preserving larger structures.
  • Closing = dilation followed by erosion. Fills small dark holes inside bright objects.
import cv2
import numpy as np
import matplotlib.pyplot as plt

img = np.zeros((200, 200), dtype=np.uint8)
cv2.circle(img, (100, 100), 50, 255, -1)

# Add noise and holes
noise = np.random.randint(0, 2, img.shape, dtype=np.uint8) * 255
# Simulate holes by setting random pixels to 0 inside the circle
img_dirty = cv2.bitwise_or(img, noise)

kernel = np.ones((7, 7), np.uint8)
opened = cv2.morphologyEx(img_dirty, cv2.MORPH_OPEN, kernel)
closed = cv2.morphologyEx(img_dirty, cv2.MORPH_CLOSE, kernel)

fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, im, title in zip(axes,
    [img_dirty, opened, closed],
    ["Dirty Binary", "Opening (noise removal)", "Closing (hole filling)"]):
    ax.imshow(im, cmap="gray")
    ax.set_title(title)
    ax.axis("off")
plt.tight_layout()
plt.show()

4.3 Practical Use: Noise Removal and Hole Filling

In robotics, after thresholding a color-detected object, the mask is often noisy. A typical cleanup sequence:

import cv2
import numpy as np

def clean_mask(mask, kernel_size=5, iterations=2):
    """Clean a binary mask using morphological operations."""
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))

    # Step 1: Remove small noise (opening)
    cleaned = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=iterations)

    # Step 2: Fill small holes (closing)
    cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel, iterations=iterations)

    return cleaned

# Usage with a color-detected mask:
# mask = cv2.inRange(hsv, lower_color, upper_color)
# clean = clean_mask(mask)

5. Image Transformation

Geometric transformations adjust the position, orientation, and size of images. They are used for normalizing input to neural networks, correcting camera perspective, and aligning images for stitching.

5.1 Resize and Crop

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Resize to exact dimensions
resized = cv2.resize(img_rgb, (320, 240))

# Resize by scale factor
half = cv2.resize(img_rgb, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)

# Crop a region of interest (y1:y2, x1:x2)
h, w = img_rgb.shape[:2]
cropped = img_rgb[h//4:3*h//4, w//4:3*w//4]

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, im, title in zip(axes,
    [resized, half, cropped],
    [f"Resized {resized.shape[1]}×{resized.shape[0]}",
     f"Scaled 50% {half.shape[1]}×{half.shape[0]}",
     f"Cropped {cropped.shape[1]}×{cropped.shape[0]}"]):
    ax.imshow(im)
    ax.set_title(title)
    ax.axis("off")
plt.tight_layout()
plt.show()

Interpolation Methods

  • cv2.INTER_AREA — best for shrinking (avoids moiré)
  • cv2.INTER_LINEAR — good default for enlargement (bilinear)
  • cv2.INTER_CUBIC — higher quality but slower
  • cv2.INTER_NEAREST — fastest, nearest-neighbor (use for masks)

5.2 Affine Transform

An affine transform preserves parallel lines (translation, rotation, scaling, shearing). You specify three pairs of corresponding points.

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
h, w = img.shape[:2]

# Source points (in the original image)
pts_src = np.float32([[50, 50], [200, 50], [50, 200]])

# Destination points (where they should map to)
pts_dst = np.float32([[10, 100], [200, 50], [100, 250]])

# Compute the affine matrix
M = cv2.getAffineTransform(pts_src, pts_dst)

# Apply the transform
affine_img = cv2.warpAffine(img_rgb, M, (w, h))

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(img_rgb)
axes[0].set_title("Original")
for pt in pts_src:
    axes[0].plot(pt[0], pt[1], 'ro')
axes[1].imshow(affine_img)
axes[1].set_title("Affine Transform")
for pt in pts_dst:
    axes[1].plot(pt[0], pt[1], 'ro')
for ax in axes:
    ax.axis("off")
plt.tight_layout()
plt.show()

Rotation Around a Point

import cv2
import numpy as np

img = cv2.imread("robot_workspace.jpg")
h, w = img.shape[:2]

# Rotate 30 degrees around the center
center = (w // 2, h // 2)
angle = 30
scale = 1.0

M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(img, M, (w, h))

5.3 Perspective Transform

A perspective transform (homography) maps four source points to four destination points, correcting perspective distortion. This is useful for document scanning, AR tag detection, and bird's-eye-view generation.

import cv2
import numpy as np
import matplotlib.pyplot as plt

img = cv2.imread("robot_workspace.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
h, w = img.shape[:2]

# Four corners of a document/tag in the image (may be skewed)
pts_src = np.float32([
    [56, 65],
    [368, 52],
    [28, 387],
    [389, 390]
])

# Desired output rectangle
output_w, output_h = 300, 400
pts_dst = np.float32([
    [0, 0],
    [output_w, 0],
    [0, output_h],
    [output_w, output_h]
])

# Compute perspective matrix
M = cv2.getPerspectiveTransform(pts_src, pts_dst)

# Apply perspective warp
warped = cv2.warpPerspective(img_rgb, M, (output_w, output_h))

fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].imshow(img_rgb)
axes[0].set_title("Original (with skew)")
# Draw source quadrilateral
src_quad = np.vstack([pts_src, pts_src[0]])
axes[0].plot(src_quad[:, 0], src_quad[:, 1], 'r-', linewidth=2)
for pt in pts_src:
    axes[0].plot(pt[0], pt[1], 'ro')
axes[1].imshow(warped)
axes[1].set_title("Perspective Corrected")
for ax in axes:
    ax.axis("off")
plt.tight_layout()
plt.show()

6. Practical Pipeline for Robot Vision

In real robot systems, you combine multiple preprocessing steps into a pipeline. Here is a complete example that detects a colored object and computes its center.

Pipeline Steps

Capture frame
    → Convert to HSV
    → Apply color range mask
    → Clean mask (morphological opening + closing)
    → Find contours
    → Compute bounding box / centroid
    → Draw result

Complete Pipeline Code

import cv2
import numpy as np
import matplotlib.pyplot as plt


def preprocess_frame(frame):
    """Apply noise reduction and enhancement."""
    # Denoise with bilateral filter (preserves edges)
    denoised = cv2.bilateralFilter(frame, d=9, sigmaColor=75, sigmaSpace=75)
    return denoised


def create_color_mask(hsv_frame, lower, upper):
    """Create and clean a binary mask for the target color."""
    mask = cv2.inRange(hsv_frame, lower, upper)

    # Morphological cleanup
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=2)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=2)

    return mask


def detect_object(frame, mask):
    """Find the largest contour and compute its centroid."""
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    if not contours:
        return None, None, frame

    # Find the largest contour by area
    largest = max(contours, key=cv2.contourArea)

    # Skip tiny contours (noise)
    if cv2.contourArea(largest) < 500:
        return None, None, frame

    # Compute bounding box and centroid
    x, y, w, h = cv2.boundingRect(largest)
    M = cv2.moments(largest)
    if M["m00"] > 0:
        cx = int(M["m10"] / M["m00"])
        cy = int(M["m01"] / M["m00"])
    else:
        cx, cy = x + w // 2, y + h // 2

    # Draw results
    output = frame.copy()
    cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.circle(output, (cx, cy), 5, (0, 0, 255), -1)
    cv2.putText(output, f"({cx}, {cy})", (cx + 10, cy),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

    return (cx, cy), (x, y, w, h), output


def run_pipeline(image_path):
    """Full preprocessing and detection pipeline."""
    # --- Step 1: Capture / Load ---
    frame = cv2.imread(image_path)
    if frame is None:
        raise FileNotFoundError(f"Cannot load image: {image_path}")

    # --- Step 2: Denoise ---
    denoised = preprocess_frame(frame)

    # --- Step 3: Convert to HSV ---
    hsv = cv2.cvtColor(denoised, cv2.COLOR_BGR2HSV)

    # --- Step 4: Create color mask (targeting blue objects) ---
    lower_blue = np.array([100, 50, 50])
    upper_blue = np.array([130, 255, 255])
    mask = create_color_mask(hsv, lower_blue, upper_blue)

    # --- Step 5: Detect object ---
    centroid, bbox, output = detect_object(denoised, mask)

    if centroid:
        print(f"Object detected at centroid: {centroid}, bbox: {bbox}")
    else:
        print("No object detected.")

    # --- Step 6: Visualize ---
    fig, axes = plt.subplots(1, 4, figsize=(18, 4))
    axes[0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    axes[0].set_title("1. Original")
    axes[1].imshow(cv2.cvtColor(denoised, cv2.COLOR_BGR2RGB))
    axes[1].set_title("2. Denoised")
    axes[2].imshow(mask, cmap="gray")
    axes[2].set_title("3. Cleaned Mask")
    axes[3].imshow(cv2.cvtColor(output, cv2.COLOR_BGR2RGB))
    axes[3].set_title("4. Detection Result")
    for ax in axes:
        ax.axis("off")
    plt.suptitle("Robot Vision Preprocessing Pipeline", fontsize=14)
    plt.tight_layout()
    plt.show()

    return centroid, bbox


# --- Run the pipeline ---
if __name__ == "__main__":
    centroid, bbox = run_pipeline("robot_workspace.jpg")

Live Camera Pipeline (for real robots)

import cv2
import numpy as np


def live_pipeline(camera_index=0):
    """Run the preprocessing pipeline on a live camera feed."""
    cap = cv2.VideoCapture(camera_index)

    if not cap.isOpened():
        print("Error: Cannot open camera.")
        return

    # Define color range (adjust for your target object)
    lower = np.array([100, 50, 50])
    upper = np.array([130, 255, 255])

    print("Press 'q' to quit.")

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        # Pipeline
        denoised = cv2.bilateralFilter(frame, 9, 75, 75)
        hsv = cv2.cvtColor(denoised, cv2.COLOR_BGR2HSV)
        mask = cv2.inRange(hsv, lower, upper)

        kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
        mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=2)
        mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=2)

        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL,
                                       cv2.CHAIN_APPROX_SIMPLE)
        for cnt in contours:
            if cv2.contourArea(cnt) > 500:
                x, y, w, h = cv2.boundingRect(cnt)
                cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

        # Show results
        cv2.imshow("Frame", frame)
        cv2.imshow("Mask", mask)

        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    cap.release()
    cv2.destroyAllWindows()


if __name__ == "__main__":
    live_pipeline()

7. Exercises

Exercise 1: Color Space Exploration (Easy)

Load an image containing at least two distinct colored objects. Convert to HSV and use trackbars to isolate each object by color. Report the HSV ranges you found.

Exercise 2: Noise Removal Challenge (Medium)

Add Gaussian noise (σ=25) and salt-and-pepper noise (5%) to a clean image. Apply Gaussian, median, and bilateral filters. Compare the results using PSNR (Peak Signal-to-Noise Ratio):

import numpy as np

def psnr(original, processed):
    mse = np.mean((original.astype(float) - processed.astype(float)) ** 2)
    if mse == 0:
        return float('inf')
    return 10 * np.log10(255.0 ** 2 / mse)

Which filter performs best for each noise type? Why?

Exercise 3: Build a Lane Detector (Hard)

Using a top-down camera view of a road/track:

  1. Convert to HSV and mask the lane markings (typically white or yellow)
  2. Clean the mask with morphological operations
  3. Apply perspective transform to get a bird's-eye view
  4. Detect lane lines using cv2.HoughLinesP

Exercise 4: Document Scanner (Hard)

Write a function that:

  1. Detects the largest quadrilateral contour in an image
  2. Applies a perspective transform to produce a flat, rectangular output
  3. Applies CLAHE to enhance readability

Exercise 5: Preprocessing Tuning (Medium)

Given an image captured under poor lighting conditions (provided by your TA), design a preprocessing pipeline that maximizes the visibility of objects of interest. Document each step and explain your choices.


References

  1. OpenCV Documentationdocs.opencv.org
  2. OpenCV-Python Tutorialsopencv-python-tutroals.readthedocs.io
  3. Szeliski, R. (2022). Computer Vision: Algorithms and Applications, 2nd ed. Springer. Chapter 3: Image Processing.
  4. Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing, 4th ed. Pearson.
  5. Bradski, G. (2000). The OpenCV Library. Dr. Dobb's Journal of Software Tools.

This tutorial is part of the Robotics Course at CUHK-SZ.