Sassan Mokhtar

PhD Candidate · Computer Vision Group · University of Bonn

Hi there, I’m Sassan — a PhD candidate in the Computer Vision Group at the University of Bonn under the supervision of Prof. Jürgen Gall. My research asks a simple question: can we stop human mistakes before they happen? I build systems that forecast errors from video and deliver real-time auditory feedback.

Previously, I interned at Endress + Hauser, applying large-language models to industrial anomaly-classification tasks, and at the Robot Learning Lab, University of Freiburg, where I worked on robot manipulation and scene-understanding problems. I come from an applied-mathematics background, holding an M.Sc. in Scientific Computing from Heidelberg University and a B.Sc. in Applied Mathematics from Shiraz University. View CV

Publications

Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models (CVPRW 2025)

PDF | Code | Poster | Video

Present VELM – a two-stage pipeline pairing an unsupervised vision expert with an LLM head to classify industrial anomaly types
Release MVTec-AC and VisA-AC, upgraded anomaly-classification benchmarks
Achieve 80.4 % accuracy on MVTec-AD (+5 % vs. prior work) and 84 % on MVTec-AC

CenterArt: Joint Shape Reconstruction and 6-DoF Grasp Estimation of Articulated Objects (ICRA Workshop 2024)

PDF | Code | Poster | Video

First method to jointly reconstruct 3-D shape and predict 6-DoF grasps for articulated objects
Create a dataset of valid 6-DoF grasp poses for articulated objects
Generate photo-realistic kitchen scenes with articulated objects

Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities (RA-L 2023)

PDF | Website | Video

First hyper-realistic multimodal synthetic dataset of diverse healthcare facilities
Over 1.5 M annotations spanning five scene-understanding tasks
Public benchmark with online evaluation server

Projects

Policy Learning for Real-time Generative Grasp Synthesis

Slides | Code

Designed a realistic Isaac Sim setup for mobile-manipulation grasping
Compared computer-vision and policy-learning approaches
Developed an interactive imitation-learning model that outperforms baselines

Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models

Poster

Learned a dynamical model with Gaussian mixture models from few demos
Refined the model using a Soft Actor-Critic policy
Processed visual observations in latent space via an autoencoder

Optimal Importance Sampling Change of Measure for Large Sums of Random Variables