3D Surgical Tool Tracking with ArUco Markers and OpenCV

 

Overview:

This project showcases a real-time 3D tracking system for surgical tools using computer vision techniques. The system employs a monocular webcam, a set of ArUco markers, and a custom 3D-printed tool mount to estimate the precise position and orientation of a surgical instrument’s tip with millimetric accuracy.

    Hardware & Setup:

     

      • Tool Design: A 3D-printed rigid body (10×10×8 cm cube) with a truncated pyramid fixture holding five ArUco markers (IDs 5–9), designed in a known geometric configuration.

      • Reference Frame: Four table-mounted ArUco markers (IDs 0–3) define a global coordinate system; marker ID 0 serves as the origin.

      • Camera: A standard webcam positioned above the workspace captures real-time video input.

    Technical Features:

     

    The system is developed entirely in Python with OpenCV and includes the following key components:

    OpenCV Tools & Techniques Used:

      • cv2.VideoCapture(): Captures live video stream from webcam.
      • cv2.cvtColor(): Converts frames to grayscale for ArUco detection.
      • aruco.getPredefinedDictionary() and aruco.detectMarkers(): Detects ArUco markers from each video frame.
      • aruco.estimatePoseSingleMarkers() and cv2.solvePnP(): Computes the pose (rotation + translation vectors) of each marker or the entire tool using camera intrinsics.
      • cv2.Rodrigues(): Converts rotation vectors to matrices for 3D transformations.
      • cv2.putText() and cv2.circle(): Renders pose information and tracking trails on screen.
      • cv2.KalmanFilter(): Smooths the 3D position data using a 6D state Kalman filter (position + velocity).
      • aruco.estimatePoseBoard(): Estimates the full tool’s pose from multiple rigidly-mounted markers.

    Algorithmic Highlights:

     

    • Camera Calibration is performed beforehand, using real intrinsic and distortion coefficients.

    • Pose Estimation is performed using either solvePnP() (per marker) or estimatePoseBoard() (using the entire tool marker configuration).

    • The tool tip position is calculated by applying a rigid transformation to a predefined offset vector (11 cm below the cube center).

    • To ensure consistency despite camera scaling or alignment drift, a custom scale correction factor is applied per axis before filtering.

    • A Kalman Filter is implemented to reduce jitter and enhance stability of the pose output.

    • A visual trail of the tool tip is maintained and displayed to show historical motion.

     

    Precision & Output:

     

    • All position estimates are converted to millimeters and rendered with sub-centimeter resolution.

    • The output includes real-time 3D coordinates of the tool tip, its Euler angles (roll, pitch, yaw), and the pose of each reference marker.

    • The system provides consistent tracking results suitable for surgical training, simulation, or robotic control applications.

    A live demo video of the system in action is available here: (insert video link):