Object Tracking¶
Object tracking is one of the fundamental perception capabilities required for autonomous robots. While an object detector can tell you what is in a frame and where, a tracker tells you which object is which across time — maintaining consistent identities even through occlusions, appearance changes, and missed detections.
For a mobile robot navigating a crowded hallway, a drone following a target, or a robotic arm tracking a moving conveyor belt, reliable multi-object tracking (MOT) is essential for safe and effective operation.
Frame t Frame t+1 Frame t+2
┌──────────┐ ┌──────────┐ ┌──────────┐
│ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │
│ │ A │ │ │ │ A │ │ │ │ A │ │
│ └───┘ │ │ └───┘ │ │ └───┘ │
│ │ │ │ │ │
│ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │
│ │ B │ │ │ │ B │ │ │ │ B │ │
│ └───┘ │ │ └───┘ │ │ └───┘ │
└──────────┘ └──────────┘ └──────────┘
Detection only: "two boxes" "two boxes" "two boxes"
With tracking: A=1, B=2 A=1, B=2 A=1, B=2
Learning Objectives¶
By the end of this chapter you will be able to:
- Explain the difference between object detection and object tracking and when each is appropriate.
- Implement sparse and dense optical flow using OpenCV.
- Build a Kalman filter for single-object state estimation and tracking.
- Describe the SORT and DeepSORT tracking pipelines.
- Understand transformer-based tracking architectures.
- Evaluate trackers using standard MOT metrics (MOTA, MOTP, HOTA, IDF1).
- Integrate tracking into a ROS 2 robotics system.
1. Tracking vs. Detection¶
| Aspect | Detection | Tracking |
|---|---|---|
| Output | Bounding boxes + class labels per frame | Consistent IDs across frames |
| Temporal info | None (per-frame) | Maintains identity over time |
| Handles occlusion | Object disappears, reappears as "new" | Keeps ID through brief occlusions |
| Speed | Slower (full forward pass) | Faster (prediction without detection) |
| Failure mode | False positives / negatives | ID switches, track fragmentation |
When to use detection only¶
- Static scenes (e.g., counting objects on a shelf once).
- When you only need the current state, not a history.
When to use tracking¶
- Any task requiring temporal consistency: following a person, counting objects entering/leaving a zone, trajectory prediction.
- When you need to smooth noisy detections (Kalman filter predictions bridge gaps).
- When downstream planning depends on object identity (e.g., "avoid the red car specifically").
A common paradigm: detect then track — run a detector at each frame and use a tracker to associate detections over time.
2. Optical Flow¶
Optical flow estimates the apparent motion of pixels (or features) between two consecutive frames. It is a low-level building block used inside many trackers.
2.1 Lucas-Kanade (Sparse Optical Flow)¶
The Lucas-Kanade method tracks a sparse set of feature points. Given a small window around a point, it assumes brightness constancy and solves for the displacement \((u, v)\) that minimizes:
Using a first-order Taylor expansion this becomes the linear system:
Code: Sparse Optical Flow with Lucas-Kanade
import cv2
import numpy as np
def lucas_kanade_demo(video_path=0):
"""Track sparse features using Lucas-Kanade optical flow."""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"Cannot open video: {video_path}")
return
# Parameters for Shi-Tomasi corner detection
feature_params = dict(
maxCorners=200,
qualityLevel=0.01,
minDistance=10,
blockSize=7
)
# Lucas-Kanade parameters
lk_params = dict(
winSize=(15, 15),
maxLevel=2,
criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03)
)
# Random colors for visualization
np.random.seed(42)
colors = np.random.randint(0, 255, (500, 3), dtype=np.uint8)
ret, old_frame = cap.read()
if not ret:
return
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)
# Create a mask image for drawing tracks
mask = np.zeros_like(old_frame)
while True:
ret, frame = cap.read()
if not ret:
break
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Calculate optical flow
p1, st, err = cv2.calcOpticalFlowPyrLK(
old_gray, frame_gray, p0, None, **lk_params
)
if p1 is not None:
# Select good points
good_new = p1[st.flatten() == 1]
good_old = p0[st.flatten() == 1]
# Draw tracks
for i, (new, old) in enumerate(zip(good_new, good_old)):
a, b = new.ravel().astype(int)
c, d = old.ravel().astype(int)
mask = cv2.line(mask, (a, b), (c, d), colors[i % 500].tolist(), 2)
frame = cv2.circle(frame, (a, b), 5, colors[i % 500].tolist(), -1)
img = cv2.add(frame, mask)
cv2.imshow('Lucas-Kanade Optical Flow', img)
if cv2.waitKey(30) & 0xFF == 27: # ESC to quit
break
# Update previous frame and points
old_gray = frame_gray.copy()
# Re-detect features periodically
if len(good_new) < 50:
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)
else:
p0 = good_new.reshape(-1, 1, 2)
cap.release()
cv2.destroyAllWindows()
if __name__ == '__main__':
lucas_kanade_demo(0) # Use 0 for webcam, or provide a video file path
Key parameters:
| Parameter | Description |
|---|---|
maxCorners |
Maximum number of features to track |
qualityLevel |
Minimum accepted quality of corners (0–1) |
minDistance |
Minimum Euclidean distance between features |
winSize |
Size of the search window for LK |
maxLevel |
Number of pyramid levels (handles larger motions) |
2.2 Farneback Dense Optical Flow¶
Unlike sparse methods, dense optical flow computes a velocity vector for every pixel. Farneback's algorithm models the signal in a neighborhood with a polynomial expansion and estimates displacement.
import cv2
import numpy as np
def farneback_dense_flow_demo(video_path=0):
"""Compute and visualize dense optical flow using Farneback method."""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"Cannot open video: {video_path}")
return
ret, old_frame = cap.read()
if not ret:
return
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
hsv_mask = np.zeros_like(old_frame)
hsv_mask[:, :, 1] = 255 # Set saturation to maximum
while True:
ret, frame = cap.read()
if not ret:
break
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Compute dense optical flow
flow = cv2.calcOpticalFlowFarneback(
prev=old_gray,
next=frame_gray,
flow=None,
pyr_scale=0.5, # Pyramid scale
levels=3, # Number of pyramid levels
winsize=15, # Window size
iterations=3, # Iterations per pyramid level
poly_n=5, # Polynomial neighborhood size
poly_sigma=1.2, # Gaussian std for polynomial smoothing
flags=0
)
# Convert flow to polar coordinates (magnitude, angle)
magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])
# Use angle for hue, magnitude for value
hsv_mask[:, :, 0] = angle * 180 / np.pi / 2
hsv_mask[:, :, 2] = cv2.normalize(
magnitude, None, 0, 255, cv2.NORM_MINMAX
)
# Convert HSV to BGR for display
rgb_flow = cv2.cvtColor(hsv_mask, cv2.COLOR_HSV2BGR)
# Overlay on original frame
overlay = cv2.addWeighted(frame, 0.6, rgb_flow, 0.4, 0)
cv2.imshow('Dense Optical Flow (Farneback)', overlay)
cv2.imshow('Flow Field', rgb_flow)
if cv2.waitKey(30) & 0xFF == 27:
break
old_gray = frame_gray
cap.release()
cv2.destroyAllWindows()
if __name__ == '__main__':
farneback_dense_flow_demo(0)
Interpreting the HSV color wheel visualization:
0° (Red) → motion right
90° (Green) → motion down
180° (Cyan) → motion left
270° (Blue) → motion up
Brightness = speed
3. Kalman Filter for Tracking¶
The Kalman filter is the workhorse of object tracking. It provides an optimal (minimum variance) estimate of a linear system's state given noisy observations. Even when the true dynamics are mildly nonlinear (as in most tracking scenarios), it works remarkably well.
3.1 State Prediction¶
We model the object state as a vector. For tracking a bounding box center with constant velocity:
The state transition (predict) step:
where \(\mathbf{F}\) is the state transition matrix:
and \(\mathbf{Q}\) is the process noise covariance (models uncertainty in the motion model).
3.2 Measurement Update¶
When a measurement \(\mathbf{z}_k\) arrives (the detected bounding box center):
where \(\mathbf{H}\) is the observation matrix (maps state to measurement space) and \(\mathbf{R}\) is the measurement noise covariance.
3.3 Implementation — Tracking a Ball with cv2.KalmanFilter¶
import cv2
import numpy as np
def kalman_ball_tracker():
"""Track a colored ball using Kalman filter and HSV color segmentation."""
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Cannot open camera")
return
# --- Kalman Filter Setup ---
# State: [x, y, vx, vy] Measurement: [x, y]
kf = cv2.KalmanFilter(4, 2)
dt = 1.0 # Assume 1 time unit per frame
# State transition matrix F
kf.transitionMatrix = np.array([
[1, 0, dt, 0],
[0, 1, 0, dt],
[0, 0, 1, 0],
[0, 0, 0, 1]
], dtype=np.float32)
# Measurement matrix H
kf.measurementMatrix = np.array([
[1, 0, 0, 0],
[0, 1, 0, 0]
], dtype=np.float32)
# Process noise covariance Q
kf.processNoiseCov = np.eye(4, dtype=np.float32) * 1e-2
# Measurement noise covariance R
kf.measurementNoiseCov = np.eye(2, dtype=np.float32) * 1e-1
# Initial error covariance P
kf.errorCovPost = np.eye(4, dtype=np.float32)
initialized = False
# HSV range for a yellow tennis ball (adjust for your object)
lower_color = np.array([29, 86, 6])
upper_color = np.array([64, 255, 255])
while True:
ret, frame = cap.read()
if not ret:
break
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, lower_color, upper_color)
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
# Find contours
contours, _ = cv2.findContours(
mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
measurement = None
if contours:
c = max(contours, key=cv2.contourArea)
((mx, my), radius) = cv2.minEnclosingCircle(c)
if radius > 10:
measurement = np.array([[np.float32(mx)], [np.float32(my)]])
# Draw the detection
cv2.circle(frame, (int(mx), int(my)), int(radius), (0, 255, 0), 2)
# --- Kalman Filter ---
if not initialized and measurement is not None:
# Initialize state with first measurement
kf.statePost = np.array([
[measurement[0, 0]],
[measurement[1, 0]],
[0], [0]
], dtype=np.float32)
initialized = True
if initialized:
# Predict
prediction = kf.predict()
px, py = int(prediction[0, 0]), int(prediction[1, 0])
# Update (only if we have a detection)
if measurement is not None:
kf.correct(measurement)
# Draw predicted position
cv2.circle(frame, (px, py), 10, (0, 0, 255), 2)
cv2.putText(frame, f"Predicted ({px},{py})", (px + 15, py),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
# Draw corrected position
cx = int(kf.statePost[0, 0])
cy = int(kf.statePost[1, 0])
cv2.circle(frame, (cx, cy), 6, (255, 0, 0), -1)
cv2.putText(frame, f"Corrected ({cx},{cy})", (cx + 15, cy + 20),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)
cv2.imshow('Kalman Ball Tracker', frame)
if cv2.waitKey(30) & 0xFF == 27:
break
cap.release()
cv2.destroyAllWindows()
if __name__ == '__main__':
kalman_ball_tracker()
Tip: The Kalman filter continues to predict even when the detector misses the object (measurement is None). This "bridging" of gaps is one of its greatest strengths for tracking.
4. SORT — Simple Online Realtime Tracking¶
SORT (Bewley et al., 2016) is a pragmatic tracker that combines:
- External detections from any detector (e.g., Faster R-CNN, YOLO).
- Kalman filters — one per tracked object — to predict the next state.
- Hungarian algorithm — to associate predicted tracks with new detections.
┌──────────────────────────────────────────────────────────┐
│ SORT Pipeline │
│ │
│ Frame t ──► Detector ──► Detections D_t │
│ │ │
│ Existing Tracks ──► Kalman Predict ──► Predictions P_t │
│ │ │
│ ┌─────────┴──────────┐ │
│ │ Association │ │
│ │ (Hungarian algo │ │
│ │ on IoU matrix) │ │
│ └─────────┬──────────┘ │
│ ┌───────────┼───────────┐ │
│ Matched Unmatched D Unmatched T │
│ (update) (create new) (increment age) │
│ │ │
│ Tracks older than │
│ max_age frames ──► delete │
└──────────────────────────────────────────────────────────┘
Key design choices¶
| Component | Choice |
|---|---|
| Motion model | Kalman filter (constant velocity, [x,y,s,r,ẋ,ẏ,ṡ]) where s=area, r=aspect ratio |
| Association metric | IoU (Intersection over Union) between predicted boxes and detections |
| Assignment | Hungarian algorithm (scipy.optimize.linear_sum_assignment) |
| New tracks | Created when detection is unmatched for n_init consecutive frames |
| Track deletion | After max_age frames without a matched detection |
Strengths: Simple, fast (~260 FPS), works with any detector.
Weaknesses: ID switches during occlusions (only motion, no appearance).
5. DeepSORT¶
DeepSORT (Wojke et al., 2017) extends SORT by adding an appearance descriptor — a deep Re-ID (re-identification) embedding that helps distinguish objects even when they overlap or are occluded.
5.1 Deep Appearance Features¶
A small CNN (typically trained on a person Re-ID dataset) extracts a 128-dimensional embedding vector from each detected crop. Each track maintains a gallery of the last \(N\) embeddings.
Detection crop (128×64) ──► CNN (ResNet-based) ──► 128-D embedding
│
Stored in track's appearance
gallery (ring buffer of last 100)
The appearance cost between a track and a detection is:
5.2 Matching Cascade¶
DeepSORT replaces the single global Hungarian assignment with a cascade that gives priority to tracks that have been seen recently:
For age = 1, 2, 3, ... :
┌──────────────────────────────────────┐
│ Cost matrix = λ · d_app + (1-λ)·d_IoU │
│ (appearance + motion) │
│ Run Hungarian on remaining tracks │
│ of this age and unmatched detections │
└──────────────────────────────────────┘
This prevents a frequently-visible track from being "stolen" by a new detection that is actually closer to a long-lost track.
Architecture diagram:
┌───────────────────────────────────────────────────────┐
│ DeepSORT │
│ │
│ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │
│ │ Detector │───►│Detection │───►│ Re-ID CNN │ │
│ │ (YOLO, │ │ Crops │ │ (128-D embed) │ │
│ │ etc.) │ └──────────┘ └───────┬────────┘ │
│ └──────────┘ │ │
│ ┌───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Cascade Matching │ │
│ │ ┌──────────────┐ ┌────────────────────┐ │ │
│ │ │ IoU cost │ │ Appearance cost │ │ │
│ │ │ matrix │ │ matrix │ │ │
│ │ └──────┬───────┘ └────────┬───────────┘ │ │
│ │ └────────┬──────────┘ │ │
│ │ Combined cost matrix │ │
│ │ │ │ │
│ │ ┌────────────┼────────────┐ │ │
│ │ Matched Unmatched D Unmatched T │ │
│ └────────────────────────────────────────────┘ │
│ │ │
│ Kalman filter update / new track / delete │
└───────────────────────────────────────────────────────┘
6. Transformer-Based Trackers¶
Recent MOT methods leverage transformers to jointly perform detection and tracking in a single architecture.
MOTRv2 (2023)¶
MOTRv2 builds on DETR-style detectors. Key ideas:
- Track queries: Each existing track has a learnable query embedding that is propagated across frames. The decoder predicts the box for that track in the current frame.
- Detection queries: New objects are detected by "newborn" queries that are not associated with any track.
- The Hungarian matcher jointly handles detection and association in one step.
Frame t ──► CNN backbone ──► features
│
Track queries t-1 ──► │
▼
Transformer Decoder
┌───────────────────┐
│ Track queries t │──► existing tracks
│ New detection queries│──► new tracks
└───────────────────┘
TrackFormer (2022)¶
- Also DETR-based; introduces track embeddings that attend to the current frame's features.
- Uses attention to propagate identity across frames without explicit motion models.
- No need for a separate Kalman filter or Re-ID network — the transformer learns to track implicitly.
Comparison of tracking paradigms:
| Method | Motion Model | Appearance | Speed | Complexity |
|---|---|---|---|---|
| SORT | Kalman filter | None | ~260 FPS | Low |
| DeepSORT | Kalman filter | Re-ID CNN | ~40 FPS | Medium |
| TrackFormer | Learned attention | Learned | ~15 FPS | High |
| MOTRv2 | Learned attention | Learned | ~20 FPS | High |
7. Evaluation Metrics¶
Evaluating multi-object trackers requires metrics that capture different aspects of performance:
| Metric | Full Name | Measures | Range | Notes |
|---|---|---|---|---|
| MOTA | Multi-Object Tracking Accuracy | 1 - (FN + FP + ID_switches) / GT | (-∞, 100%] | Most common; penalizes misses, false alarms, and ID switches equally |
| MOTP | Multi-Object Tracking Precision | Average IoU of matched detections | [0, 1] | Measures localization quality, not association |
| HOTA | Higher Order Tracking Accuracy | Geometric mean of detection and association quality | [0, 100] | Balances DetA and AssA; preferred modern metric |
| IDF1 | ID F1 Score | Ratio of correctly identified detections over time | [0, 1] | Focuses on identity preservation |
HOTA decomposes into:
where \(\text{DetA}\) is detection accuracy and \(\text{AssA}\) is association accuracy at a given IoU threshold \(\alpha\).
Common benchmark datasets:
| Dataset | Domain | Sequences | Key Challenge |
|---|---|---|---|
| MOT17/20 | Pedestrian | 16 / 4 | Crowded scenes |
| KITTI | Driving | 21 | 3D tracking, varying scale |
| DanceTrack | Dance | 100+ | Fast motion, similar appearance |
| BDD100K | Driving | 1,000 | Diverse weather/conditions |
8. Multi-Object Tracking in ROS 2¶
In a ROS 2 robotics stack, tracking data flows as messages on topics.
Standard message types¶
Perception Pipeline in ROS 2:
Camera/RGB ──► /image_raw (sensor_msgs/Image)
│
▼
Detector Node ──► /detections (vision_msgs/Detection2DArray)
│
▼
Tracker Node ──► /tracked_objects (vision_msgs/Detection2DArray)
│ with unique IDs in detection results
▼
Planning/Control Node
Key message types:
| Message | Package | Purpose |
|---|---|---|
vision_msgs/Detection2DArray |
vision_msgs | Array of 2D detections with bounding boxes |
vision_msgs/Detection3DArray |
vision_msgs | 3D detections (for LiDAR or depth cameras) |
vision_msgs/ObjectHypothesisWithPose |
vision_msgs | Class + confidence + pose for each hypothesis |
geometry_msgs/PoseStamped |
geometry_msgs | Single tracked object pose |
custom_msg/TrackedObjectArray |
Custom | Often created for richer tracking info (ID, velocity, etc.) |
Example: Tracker node skeleton¶
import rclpy
from rclpy.node import Node
from vision_msgs.msg import Detection2DArray, Detection2D, ObjectHypothesisWithPose
from geometry_msgs.msg import PoseStamped
import numpy as np
from scipy.optimize import linear_sum_assignment
class SimpleTrackerNode(Node):
def __init__(self):
super().__init__('simple_tracker')
self.sub = self.create_subscription(
Detection2DArray, '/detections', self.detection_callback, 10
)
self.pub = self.create_publisher(
Detection2DArray, '/tracked_objects', 10
)
self.tracks = {} # id -> state (x, y, w, h)
self.next_id = 0
self.get_logger().info('SimpleTracker started.')
def detection_callback(self, msg: Detection2DArray):
detections = []
for det in msg.detections:
cx = det.bbox.center.position.x
cy = det.bbox.center.position.y
w = det.bbox.size_x
h = det.bbox.size_y
detections.append([cx, cy, w, h])
if not detections:
return
det_array = np.array(detections)
track_ids = list(self.tracks.keys())
track_states = np.array([self.tracks[tid] for tid in track_ids]) if track_ids else np.empty((0, 4))
if len(track_states) > 0:
# Simple Euclidean distance cost
cost = np.linalg.norm(
track_states[:, None, :2] - det_array[None, :, :2], axis=2
)
row_idx, col_idx = linear_sum_assignment(cost)
matched_tracks = set()
matched_dets = set()
for r, c in zip(row_idx, col_idx):
if cost[r, c] < 100: # threshold
self.tracks[track_ids[r]] = det_array[c]
matched_tracks.add(r)
matched_dets.add(c)
# Create new tracks for unmatched detections
for i in range(len(det_array)):
if i not in matched_dets:
self.tracks[self.next_id] = det_array[i]
self.next_id += 1
else:
for det in det_array:
self.tracks[self.next_id] = det
self.next_id += 1
# Publish tracked objects
out_msg = Detection2DArray()
out_msg.header = msg.header
for tid, state in self.tracks.items():
d = Detection2D()
d.bbox.center.position.x = float(state[0])
d.bbox.center.position.y = float(state[1])
d.bbox.size_x = float(state[2])
d.bbox.size_y = float(state[3])
hyp = ObjectHypothesisWithPose()
hyp.hypothesis.class_id = str(tid)
hyp.hypothesis.score = 1.0
d.results.append(hyp)
out_msg.detections.append(d)
self.pub.publish(out_msg)
def main():
rclpy.init()
node = SimpleTrackerNode()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
Popular ROS 2 tracking packages:
| Package | Description |
|---|---|
ros2_deepsort |
DeepSORT wrapper for ROS 2 |
yolov8_ros |
YOLOv8 with built-in tracking (BoT-SORT, ByteTrack) |
open_track |
Open-source multi-object tracker with ROS 2 interface |
9. Exercises¶
Exercise 1: Optical Flow Visualization¶
Run the Farneback dense optical flow demo on a recorded video. Modify the HSV visualization to highlight only fast-moving objects (threshold on magnitude). What happens when you increase levels and winsize?
Exercise 2: Kalman Filter Tuning¶
Using the ball-tracking Kalman filter code:
1. Increase processNoiseCov by 10x. What happens to the predictions?
2. Increase measurementNoiseCov by 10x. How does the filter respond?
3. Add width and height to the state vector so the filter also predicts the bounding box size.
Exercise 3: Implement SORT from Scratch¶
Using the Kalman filter code from Section 3 as a starting point:
1. Create a Track class that wraps cv2.KalmanFilter.
2. Implement IoU computation between two bounding boxes.
3. Use scipy.optimize.linear_sum_assignment for association.
4. Test on the MOT17 validation set.
Exercise 4: DeepSORT Appearance Features¶
Replace the IoU cost in Exercise 3 with a combined motion + appearance cost:
1. Load a pretrained Re-ID model (e.g., torchreid library).
2. Extract embeddings from detection crops.
3. Implement the matching cascade from Section 5.2.
Exercise 5: ROS 2 Tracker Node¶
Extend the ROS 2 tracker node skeleton from Section 8 to: 1. Use a Kalman filter per track instead of simple position matching. 2. Add track lifecycle management (creation after N consecutive detections, deletion after M missed frames). 3. Publish velocity estimates alongside position.
References¶
- Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. IJCAI.
- Farnebäck, G. (2003). Two-frame motion estimation based on polynomial expansion. SCIA.
- Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. ASME Journal of Basic Engineering.
- Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. ICIP.
- Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. ICIP.
- Zhang, Y., Sun, P., Jiang, Y., et al. (2022). ByteTrack: Multi-object tracking by associating every detection box. ECCV.
- Meinhardt, T., Kirillov, A., Leal-Taixe, L., & Feichtenhofer, C. (2022). TrackFormer: Multi-object tracking with transformers. CVPR.
- Zhang, Y., Wang, T., & Zhang, X. (2023). MOTRv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors. CVPR.
- Luiten, J., Osep, A., Dendorfer, P., et al. (2021). HOTA: A higher order metric for evaluating multi-object tracking. IJCV.
- OpenCV Documentation. Optical Flow — https://docs.opencv.org/4.x/d7/d8b/tutorial_py_lucas_kanade.html