3D Surgical Tool Tracking with ArUco Markers and OpenCV
Overview:
This project showcases a real-time 3D tracking system for surgical tools using computer vision techniques. The system employs a monocular webcam, a set of ArUco markers, and a custom 3D-printed tool mount to estimate the precise position and orientation of a surgical instrument’s tip with millimetric accuracy.
Hardware & Setup:
-
-
Tool Design: A 3D-printed rigid body (10×10×8 cm cube) with a truncated pyramid fixture holding five ArUco markers (IDs 5–9), designed in a known geometric configuration.
-
Reference Frame: Four table-mounted ArUco markers (IDs 0–3) define a global coordinate system; marker ID 0 serves as the origin.
-
Camera: A standard webcam positioned above the workspace captures real-time video input.
-
Technical Features:
The system is developed entirely in Python with OpenCV and includes the following key components:
OpenCV Tools & Techniques Used:
-
- cv2.VideoCapture(): Captures live video stream from webcam.
- cv2.cvtColor(): Converts frames to grayscale for ArUco detection.
- aruco.getPredefinedDictionary() and aruco.detectMarkers(): Detects ArUco markers from each video frame.
- aruco.estimatePoseSingleMarkers() and cv2.solvePnP(): Computes the pose (rotation + translation vectors) of each marker or the entire tool using camera intrinsics.
- cv2.Rodrigues(): Converts rotation vectors to matrices for 3D transformations.
- cv2.putText() and cv2.circle(): Renders pose information and tracking trails on screen.
- cv2.KalmanFilter(): Smooths the 3D position data using a 6D state Kalman filter (position + velocity).
- aruco.estimatePoseBoard(): Estimates the full tool’s pose from multiple rigidly-mounted markers.
Algorithmic Highlights:
-
Camera Calibration is performed beforehand, using real intrinsic and distortion coefficients.
-
Pose Estimation is performed using either
solvePnP()
(per marker) orestimatePoseBoard()
(using the entire tool marker configuration). -
The tool tip position is calculated by applying a rigid transformation to a predefined offset vector (11 cm below the cube center).
-
To ensure consistency despite camera scaling or alignment drift, a custom scale correction factor is applied per axis before filtering.
-
A Kalman Filter is implemented to reduce jitter and enhance stability of the pose output.
-
A visual trail of the tool tip is maintained and displayed to show historical motion.
Precision & Output:
-
All position estimates are converted to millimeters and rendered with sub-centimeter resolution.
-
The output includes real-time 3D coordinates of the tool tip, its Euler angles (roll, pitch, yaw), and the pose of each reference marker.
-
The system provides consistent tracking results suitable for surgical training, simulation, or robotic control applications.
A live demo video of the system in action is available here: (insert video link):