Koorosh's Website

Papers

Minimizing Cover Time in Multi-Agent Variational Option Discovery

Variational option discovery methods in multi-agent reinforcement learning (MARL) are powerful tools for hierarchical control, especially in settings with sparse rewards. However, these methods often struggle with a critical challenge: they tend to learn localized options that explore only a small portion of the state space. This issue stems from the difficulty of encouraging widespread exploration while maximizing a variational lower bound inherent in these frameworks. We solve this by proposing the Multi-Agent Variational Covering Option Discovery (MAVCOD) algorithm. Our core contribution is the Connectivity-Aware Replay Buffer Graph (CARBG), a novel and efficient data structure that dynamically tracks approximate bounds for connectivity of the individual and joint state-transition graphs. By using these connectivity bounds as intrinsic rewards, MAVCOD explicitly guides agents to discover covering options that bridge disparate regions of the state space. We provide theoretical insights on how maximizing our intrinsic rewards minimizes the expected cover time of the state-transition graphs. Empirically, we demonstrate on challenging sparse-reward benchmarks that MAVCOD significantly outperforms a state-of-the-art baseline. Furthermore, state visitation heatmaps visually confirm that our method achieves substantially better exploration.

Hierarchical Planning Multi-Agent Reinforcement Learning

Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning

RLC 2025 (CoCoMARL Workshop)

Koorosh Moslemi Prof. Chi-Guhn Lee

Team formation and the dynamics of team-based learning have drawn significant interest in the context of Multi-Agent Reinforcement Learning (MARL). However, existing studies primarily focus on unilateral groupings, predefined teams, or fixed-population settings, leaving the effects of algorithmic bilateral grouping choices in dynamic populations underexplored. To address this gap, we introduce a framework for learning two-sided team formation in dynamic multi-agent systems. Through this study, we gain insight into what algorithmic properties in bilateral team formation influence policy performance and generalization. We validate our approach using widely adopted multi-agent scenarios, demonstrating competitive performance and improved generalization in most scenarios.

Team Formation Multi-Agent Reinforcement Learning

Paper

A Machine Learning Enhanced Decomposition Approach to Solving Maximum Clique on Quantum Annealers

Koorosh Moslemi Jerry Sun Zhixiao Xiong

Quantum computing, especially quantum annealing, holds promise for tackling intricate optimizationchallenges. However, its practical implementation confronts limitations like restricted hardwareconnectivity. This report describes our efforts to augment the performance of quantum annealersin solving the maximum clique problem through traditional graph decomposition techniques andmachine learning methodologies. Building on the DBK decomposition algorithm proposed by Pelofske et al. [2021], we propose a new RL-enhanced decomposition step, and two learning-assisted vertex selection methods (imitation learning and reinforcement learning). Preliminary experiments on medium-scale synthetic datasets show considerable improvements. All relevant data and code pertaining to this research are openly accessible on our Github repository.

GNN MCTS

Code Paper

Projects

Advanced Line Follower Robot

An advanced line follower robot that uses a camera and OpenCV for image processing to detect lines and shapes of different colors.

Hobby Robotics C++ OpenCV AVR

Details

Community Detection Using Genetic Algorithm and Cuckoo Search

An AI project on community detection in networks using Genetic Algorithm and Cuckoo Search, including locus-based representation and modularity as a fitness function.

Bsc AI Genetic Algorithm Cuckoo Search Community Detection

Details

Detecting Iranian Paper Money to Help People With Visual Disabilities

A deep learning project to detect Iranian paper money for visually impaired individuals, using TensorFlow Lite and transfer learning on MobileNet.

Hobby Deep Learning Machine Learning TensorFlow Lite MobileNet Android

Details

Designing an ETL Pipeline for a Data Warehouse

A three-phase project for a database course, including database design, ETL pipeline creation, and a ’time machine’ feature for database restoration.

Bsc ETL PostgreSQL Python Database Data Warehouse

Details

Solving a Real Examination Timetabling Problem

A combinatorial optimization project to schedule mid-term exams for the MCS department at Amirkabir University of Technology, using mathematical modeling and GAMS.

Bsc Combinatorial Optimization Mathematical Modeling GAMS MINLP

Details

Playing Lunar Lander with Expected Sarsa and a Neural Network

An improvement on a reinforcement learning project, using Expected Sarsa and a neural network to play Lunar Lander games, trained with Keras and TensorFlow.

Bsc Reinforcement Learning TensorFlow Keras Neural Networks

Details

Designing a Shooting Simulator Game

A shooting simulator game created for a school fair that uses a laser-equipped toy gun and an IP camera for scoring.

Hobby Computer Vision C++ OpenCV Game Development

Details

A Simple Line Follower Robot

A line follower robot developed during junior high school using an AVR microcontroller, designed with Proteus, and programmed with Bascom.

Hobby Robotics AVR Proteus Bascom

Details

Planning and Control of a Quadcopter

This project implements an enhanced 3D RRT* (Rapidly-exploring Random Tree Star) algorithm to plan drone trajectories through a sequence of gates while avoiding obstacles. The planner is adapted for realistic racing conditions, supporting both take-off and landing phases and fine-grained path smoothing for controller execution. Key innovations include sampling biases, strict collision checking during rewiring, and customized handling of drone kinematics (speed profiles, banking angles).

PhD Robotics RRT* Crazyflie

Mobile Manipulation Capstone Project

This repository contains the capstone project for the Coursera Modern Robotics Specialization (Course 6). The project implements autonomous mobile manipulation for a KUKA youBot to pick and place objects using trajectory planning, odometry, and feedback control.

Bsc CoppeliaSim Mobile Manipulation PI Controller Forward Kinematics

Extended Kalman Filter (EKF) for Mobile Robot

This repository contains a reference implementation of an Extended Kalman Filter (EKF) for planar mobile robot state estimation. The EKF estimates the robot pose (x, y, θ) using odometry/control inputs and range–bearing measurements to known landmarks.

PhD Kalman Filter State Estimation

Sparse Batch Optimization for 6-DOF Pose Estimation with IMU-Camera Fusion

The goal is to estimate the full 3D position and orientation trajectory of a sensor head as it moves through space. The solution fuses measurements from two different sensors—a stereo camera and an Inertial Measurement Unit (IMU)—using the Gauss-Newton method.

PhD Sensor Fusion Gauss-Newton State Estimation

Multi-Drone Pursuit-Evasion

This projects introduces a comprehensive framework for training autonomous drone swarms in pursuit-evasion tasks using multi-agent reinforcement learning. The work’s central innovation is a ‘progressive scenario architecture’, a six-tier system designed to systematically increase coordination complexity. This tiered approach guides agents from basic trajectory-following of static goals to the cooperative capture of intelligent and evasive targets. By bridging theoretical coordination strategies with realistic quadrotor physics and aerodynamics , this progressive system successfully validates that agents can learn sophisticated, cooperative behaviors in complex, multi-constraint environments.

PhD MARL Pursuit-Evasion Environment Design

Details

Education

University of Toronto

2023-2028

Ph.D in Reinforcement Learning and Robotics

CGPA: 3.91 out of 4

Extracurricular Activities:

University of Toronto Formula Racing (UTFR) - Working on Model Predictive Contouring Control (MPCC) in Driverless Vehicle (DV) division.
University of Toronto Aerospace Team (UTAT) - Working on planning and control for a hybrid V/STOL aircraft in Unmanned Aerial Systems (UAS) division.

Thesis:

Test-Time Adaptive Team-Aware Hierarchical Planning in Cooperative Multi-Agent Reinforcement Learning

Supervisor:

Prof. Chi-Guhn Lee

Amirkabir University of Technology

2019-2023

B.Sc. in Computer Science

CGPA: 19.57 out of 20

Taken Courses:

Course Name	Total Credit	Obtained Credit
Game Theory	20	20
Graph Machine Learning	20	20
Data Structures and Algorithms	20	20
Design and Analysis of Algorithms	20	18.5
Numerical Linear Algebra	20	20
Statistical Computing	20	20
Linear Optimization	20	20
Combinatorial Optimization	20	20
Data Mining	20	20
Artificial Intelligence	20	20

Accomplishments:

Ranked 1st in the class of 2023 among 82 students.

The school of National Organization for Development of Exceptional Talents (NODET)

2015-2019

Higher Secondary School Certificate

CGPA: 19.97 out of 20

Extracurricular Activities:

Junior Soccer League Robotics Team in Iran Open International RoboCup Competitions. Developed a Real-Time Ball Tracking Algorithm with OpenCV on Raspberry Pi. Designed a 3D-Printed Hyperbolic Mirror for 360-Degree Camera View using SolidWorks.
Competitive Programming.

Accomplishments:

Ranked 434 out of 164,278 students (top 0.3%) in the Mathematics field in the Iranian National University Entrance Exam.

Patent:

No. 13965014000301156 - “Circular Ruler for Measuring Angles, Drawing Circles, and Linear Measurement with Integrated Set Square Functionality”

Hi, I am Koorosh

Koorosh Moslemi

PhD Student at University of Toronto

Papers

Minimizing Cover Time in Multi-Agent Variational Option Discovery

Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning

A Machine Learning Enhanced Decomposition Approach to Solving Maximum Clique on Quantum Annealers

Projects

Advanced Line Follower Robot

Community Detection Using Genetic Algorithm and Cuckoo Search

Detecting Iranian Paper Money to Help People With Visual Disabilities

Designing an ETL Pipeline for a Data Warehouse

Solving a Real Examination Timetabling Problem

Playing Lunar Lander with Expected Sarsa and a Neural Network

Designing a Shooting Simulator Game

A Simple Line Follower Robot

Planning and Control of a Quadcopter

Mobile Manipulation Capstone Project

Extended Kalman Filter (EKF) for Mobile Robot

Sparse Batch Optimization for 6-DOF Pose Estimation with IMU-Camera Fusion

Multi-Drone Pursuit-Evasion

Education

University of Toronto

Ph.D in Reinforcement Learning and Robotics

CGPA: 3.91 out of 4

Extracurricular Activities:

Thesis:

Supervisor:

Amirkabir University of Technology

B.Sc. in Computer Science

CGPA: 19.57 out of 20

Taken Courses:

Accomplishments:

The school of National Organization for Development of Exceptional Talents (NODET)

Higher Secondary School Certificate

CGPA: 19.97 out of 20

Extracurricular Activities:

Accomplishments:

Patent: