Skip to content

Hands-on Projects

This course culminates in 10 hands-on projects that cover the major robot task categories — from navigation and perception to manipulation, language understanding, and multi-agent coordination. Each project is designed to give you end-to-end experience with a real robotic task, progressing through three algorithm tiers:

Tier Era Philosophy Examples
Traditional Pre-deep-learning Model-based, hand-crafted features PID, Bug, PDDL, IK
Classical Learning Deep learning era Data-driven, end-to-end neural networks DRL, DeepSORT, GraspNet, Seq2Seq
Modern / Foundation Foundation model era Pre-trained, few-shot, generalist LLM, Transformer, Foundation Model

By implementing all three tiers for each task, you will develop an intuition for when traditional methods suffice, when learning-based methods shine, and where foundation models are heading.


Project Overview

# Project Task Category Algorithm Spectrum Difficulty
1 Line Following Robot Navigation PID → Stanley → RL ⭐⭐
2 Autonomous Obstacle Avoidance Navigation Bug/VFH → DWA → DRL ⭐⭐⭐
3 SLAM & Autonomous Navigation SLAM gmapping → Cartographer → ORB-SLAM3 ⭐⭐⭐⭐
4 Visual Object Tracking Perception KCF → SORT/DeepSORT → Transformer ⭐⭐⭐
5 Robotic Arm Grasping Manipulation IK → GraspNet → RL ⭐⭐⭐
6 Voice Interaction Robot Language ASR+keywords → NLU pipeline → LLM ⭐⭐⭐
7 Multi-Robot Formation Multi-Agent Leader-Follower → Consensus → MARL ⭐⭐⭐⭐
8 Vision-Language Navigation VLN Modular → Seq2Seq → Foundation Model ⭐⭐⭐⭐
9 Object Assembly with TAMP TAMP PDDL → TAMP → LLM-based ⭐⭐⭐⭐⭐
10 Mobile Manipulation Mobile+Manipulation Decoupled → Joint → Foundation Model ⭐⭐⭐⭐⭐

The projects are arranged in a dependency graph. Start from the bottom (foundations) and work upward.

                    ┌─────────────────────────────────┐
                    │  10. Mobile Manipulation (⭐⭐⭐⭐⭐) │
                    └──────────┬──────────┬────────────┘
                               │          │
              ┌────────────────┘          └────────────────┐
              ▼                                            ▼
┌──────────────────────┐                    ┌───────────────────────────┐
│ 5. Robotic Arm       │                    │ 9. Assembly with TAMP     │
│    Grasping (⭐⭐⭐)    │                    │    (⭐⭐⭐⭐⭐)               │
└──────────┬───────────┘                    └─────────────┬─────────────┘
           │                                              │
           │         ┌───────────────────────────┐        │
           │         │ 8. Vision-Language        │        │
           │         │    Navigation (⭐⭐⭐⭐)    │        │
           │         └──────┬──────────┬────────┘        │
           │                │          │                 │
           │     ┌──────────┘          └──────────┐      │
           │     ▼                                ▼      │
           │ ┌──────────────────┐  ┌────────────────────┐│
           │ │ 4. Visual Object │  │ 6. Voice Interaction││
           │ │    Tracking (⭐⭐⭐)│  │    Robot (⭐⭐⭐)    ││
           │ └────────┬─────────┘  └─────────┬──────────┘│
           │          │                      │           │
           └──────────┼──────────────────────┼───────────┘
                      │                      │
           ┌──────────┼──────────────────────┘
           ▼          ▼
┌─────────────────────────────┐   ┌──────────────────────────┐
│ 7. Multi-Robot Formation    │   │ 3. SLAM & Autonomous     │
│    (⭐⭐⭐⭐)                  │   │    Navigation (⭐⭐⭐⭐)    │
└──────────┬──────────────────┘   └─────────────┬────────────┘
           │                                    │
           │         ┌──────────────────┐       │
           └────────►│ 2. Obstacle      │◄──────┘
                     │    Avoidance (⭐⭐⭐)│
                     └────────┬─────────┘
                     ┌────────▼─────────┐
                     │ 1. Line Following │
                     │    Robot (⭐⭐)    │
                     └──────────────────┘

Reading the diagram: arrows point from prerequisites to dependent projects. For example, Project 10 (Mobile Manipulation) requires skills from both Project 5 (Arm Grasping) and Project 9 (TAMP). You may work on projects at the same "level" in parallel.


Project Descriptions

1. Line Following Robot

Task: Build a robot that autonomously follows a line track on the ground using a camera or infrared sensor array.

Algorithm spectrum: Start with a classic PID controller to stay centered on the line. Advance to the Stanley controller for smoother path tracking with curvature awareness. Finally, train a reinforcement learning (RL) agent that learns the control policy directly from sensor observations, handling complex track geometries without explicit tuning.

Skills you will practice: Sensor reading, PWM motor control, feedback loops, basic RL training loops.


2. Autonomous Obstacle Avoidance

Task: Navigate a mobile robot through an environment cluttered with obstacles, reaching a goal without collisions.

Algorithm spectrum: Begin with the Bug algorithm and Vector Field Histogram (VFH) — reactive methods that make local decisions based on sensor readings. Progress to the Dynamic Window Approach (DWA), which plans velocity-space trajectories respecting kinematic constraints. Finish with Deep Reinforcement Learning (DRL), training an end-to-end policy in simulation (e.g., Gazebo or Isaac Sim) that transfers to the real robot.

Skills you will practice: Range sensor processing, local planning, velocity profiling, sim-to-real transfer.


3. SLAM & Autonomous Navigation

Task: Build a map of an unknown indoor environment and use it for autonomous goal-directed navigation.

Algorithm spectrum: Start with gmapping (a particle-filter-based 2D SLAM system). Move to Cartographer for more robust graph-based 2D/3D SLAM with loop closure. Finally, explore ORB-SLAM3 — a feature-based visual SLAM system that works with monocular, stereo, and RGB-D cameras, enabling SLAM in GPS-denied environments.

Skills you will practice: LiDAR/camera calibration, occupancy grids, pose graph optimization, loop closure detection, map-based navigation (Nav2).


4. Visual Object Tracking

Task: Track a specific moving object across video frames in real time, maintaining identity even through occlusions.

Algorithm spectrum: Implement the Kernelized Correlation Filter (KCF) as a fast, hand-crafted tracker. Advance to SORT/DeepSORT, which combines detection (YOLO) with Kalman filtering and Hungarian assignment for multi-object tracking. Push further with Transformer-based trackers (e.g., TransTrack, OSTrack) that leverage attention mechanisms for robust, long-term tracking.

Skills you will practice: Image feature extraction, bounding box association, Kalman filtering, attention mechanisms, real-time inference optimization.


5. Robotic Arm Grasping

Task: Enable a robotic arm to grasp objects of various shapes from a tabletop.

Algorithm spectrum: Begin with analytical inverse kinematics (IK) for known object poses using geometric or numerical solvers. Graduate to GraspNet, a deep learning model that predicts 6-DoF grasp poses from point clouds. Conclude with RL-based grasping, where a policy learns through trial-and-error in simulation (Isaac Gym) to handle unknown objects, partial occlusions, and cluttered scenes.

Skills you will practice: Forward/inverse kinematics, URDF modeling, point cloud processing, reward shaping, sim-to-real for manipulation.


6. Voice Interaction Robot

Task: Build a robot that understands spoken commands and responds intelligently through natural conversation.

Algorithm spectrum: Start with a pipeline of automatic speech recognition (ASR) + keyword spotting for simple command detection. Build a full NLU pipeline with intent classification and slot filling for structured command understanding. Finally, integrate a large language model (LLM) that handles open-ended dialogue, contextual reasoning, and complex instruction following.

Skills you will practice: Audio processing, speech-to-text APIs, intent/slot models, prompt engineering, LLM API integration, latency management.


7. Multi-Robot Formation

Task: Coordinate a team of mobile robots to maintain a desired geometric formation while navigating.

Algorithm spectrum: Implement the Leader-Follower approach where one robot leads and others track relative positions. Upgrade to Consensus-based algorithms where all robots agree on shared states through distributed communication. Explore Multi-Agent Reinforcement Learning (MARL), where agents learn cooperative policies that adapt to dynamic environments and communication failures.

Skills you will practice: Distributed systems, ROS 2 multi-robot communication, graph theory for consensus, MARL training (e.g., MAPPO), communication-robust policies.


8. Vision-Language Navigation

Task: Guide a robot through an environment using natural language instructions (e.g., "Go past the red sofa and into the kitchen").

Algorithm spectrum: Build a modular pipeline with separate modules for instruction parsing, visual feature extraction, and path planning. Train an Seq2Seq model that maps language and visual observations directly to navigation actions. Explore foundation model-based approaches (e.g., using CLIP, LLaVA, or GPT-4V) that perform zero-shot or few-shot VLN by grounding language in visual scenes.

Skills you will practice: Vision-language models, attention mechanisms, embodied AI simulators (Habitat, AI2-THOR), instruction grounding, evaluation metrics (SR, SPL).


9. Object Assembly with TAMP

Task: Plan and execute a multi-step object assembly task (e.g., building a simple structure from blocks).

Algorithm spectrum: Start with PDDL-based classical planning where task logic is hand-written in a planning domain description language. Move to integrated Task and Motion Planning (TAMP) that interleaves symbolic task planning with continuous motion planning (e.g., PDDLStream). Finally, explore LLM-based task planning, where a large language model generates action sequences from natural language task descriptions and scene observations.

Skills you will practice: Symbolic planning, PDDL encoding, motion planning (MoveIt), plan-and-execute loops, LLM grounding for robotics, constraint satisfaction.


10. Mobile Manipulation

Task: Combine mobile base locomotion with arm manipulation to perform tasks that require whole-body coordination (e.g., fetching objects from shelves).

Algorithm spectrum: Start with a decoupled approach where the base navigates and the arm manipulates independently with a handoff. Progress to joint planning that coordinates base and arm motions simultaneously for improved reachability and efficiency. Explore foundation model-based control (e.g., RT-2, Octo) that leverages large-scale pre-training for generalized mobile manipulation across diverse tasks and environments.

Skills you will practice: Whole-body kinematics, integrated motion planning, task sequencing, foundation model fine-tuning, real-world deployment challenges.


Skills Prerequisites

Before starting these projects, you should have:

Foundational Skills

Skill Minimum Level Recommended Resources
Python Intermediate — classes, async, decorators Preliminary
ROS 2 Basic — nodes, topics, services, launch files ROS Tutorials
Linux CLI Comfortable — bash, SSH, tmux Preliminary
Git Basic — branch, merge, PR workflow
Linear Algebra Vectors, matrices, transforms Robotics Math
Control Theory PID, state-space basics Planning
Project Group Recommended Knowledge
Navigation (Projects 1–2) Planning algorithms, Simulation
SLAM (Project 3) ROS 2, Perception basics
Perception (Project 4) Deep learning basics, OpenCV
Manipulation (Projects 5, 10) Manipulation fundamentals, Simulation
Language (Project 6) LLM basics, NLP fundamentals
Multi-Agent (Project 7) Multi-agent RL, ROS 2 multi-robot
VLN (Project 8) Perception, Language grounding
TAMP (Project 9) Planning, Manipulation, Agents

Hardware & Software

Resource Details
Simulation Gazebo, Isaac Sim, MuJoCo, PyBullet, Habitat (see Simulation)
Robot Platforms TurtleBot¾, Franka Emika, UR5, custom ROS 2 robots
Compute GPU recommended for Projects 4–10 (RTX 3060 or better)
ROS Version ROS 2 Humble / Iron (see ROS setup)

How to Use These Projects

  1. Pick a project that matches your current skill level and interests.
  2. Complete the Traditional tier first — it builds intuition and debugs the full pipeline.
  3. Move to the Classical Learning tier — compare against the traditional baseline.
  4. Experiment with the Modern tier — push the boundaries of what's possible.
  5. Document your results — each project should produce a short report comparing all three tiers.

Project Portfolio

Completing all 10 projects gives you a portfolio spanning the full robotics stack — from low-level control to high-level reasoning with foundation models. This is exactly the breadth that top robotics labs and companies look for.