Photo of Sabariswaran Mani

About Me

I'm Sabariswaran Mani (Sabarish), a fourth-year undergraduate at the Indian Institute of Technology, Kharagpur. My interests lie in computer vision and robotics, with a focus on autonomous ground vehicles and generative imagery. I'm passionate about bringing these technologies to real-world applications. Beyond research, I have a long-standing love for travel—scroll down to see some of my adventures!

Currently, I am a researcher at the Vision & AI Lab, IISc Bangalore, under the guidance of Prof. Venkatesh Babu, where I explore diffusion models and their applications. I'm also a member of the Autonomous Ground Vehicles Research Group and Quant Club at IIT KGP. Beyond research, I enjoy football, computer games, and a good plate of parotta.

My Research

PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar, Sachidanand V S, Sabariswaran Mani, Tejan Karmali, R. Venkatesh Babu,
ECCV 2024 (European Conference on Computer Vision)

website | paper | abstract | bibtex
@misc{rishubh2024precisecontrol,
      title={PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control},
      author={Rishubh Parihar, Sachidanand VS, Sabariswaran Mani, Tejan Karmali, R. Venkatesh Babu},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
      year={2024},
}
              

Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled W+ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the W+ space, we train a latent mapper to translate latent codes from W+ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning
Sabariswaran Mani, Sreyas Venkataraman*, Abhranil Chandra*, Adyan Rizvi*, Yash Sirvi*, Soumojit Bhattacharya*, Aritra Hazra
NeurIPS 2023 Workshop (Train Offline Test Online (TOTO) Workshop, Conference on Neural Information Processing Systems )

website | paper | abstract | bibtex
@inproceedings{
mani2024diffclone,
title={DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning},
author={Sabariswaran Mani and Abhranil Chandra and Sreyas Venkataraman and Adyan Rizvi and Yash Sirvi and Soumojit Bhattacharya and Aritra Hazra},
journal={arXiv preprint arXiv:2401.09243},
year={2024}
}
              

Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.

Experience

Research Intern

Vision & AI Lab, IISc Bangalore - Prof. Venkatesh Babu

  • Editing, non-destructive Personalization, Multi-person Compositional generation in Text-to-Image Diffusion Models.
  • Long-range Video Generation with Motion Diffusion Models.

AI Project Intern

Fractal Analytics

  • Multimodal(VLMs) Agents for navigation tasks in real GUI environments.
  • Integration of LLM T5-XL into their product - Kalaido AI Image Generator for Enhanced Semantic Alignment.

Undergraduate Researcher

Autonomous Ground Vehicles Research Group (AGV) - Prof. Debashis Chakravarty

  • Building our own self-driving car.
  • Worked for competitions like Machine Learning Reproducibility Challenge and F1-Tenth
  • Worked on Vision Pipeline solving Lane Detection, Object Detection, Drivable Area Detection, and Semantic Segmentation problems.

Executive Head & Senior Quant Researcher

Quant Club, IIT Kharagpur

  • Machine Learning based Algotrading Alphas.
  • Genetic Algorithms for Portfolio Optimization and Rule based strategy optimization.
  • Organized Summer of Quant 2024, Free-of-cost Course on Quantitative Finance attracting over 3000+ participants nationwide.

Junior Developer

Computer Graphics Lab, IIT Kharagpur

  • Applications combining Augmented Reality and Computer Vision.
  • 3D Animations, Rigging and Game development
  • Organized workshops introducing them at IIT Kharagpur

Competitive Achievements

Gold

Train Offline Test Online Workshop Challenge - NeurIPS 2023

Topped the Leaderboard with a Mean Reward of 12 and Success Rate of 62% on online Franka Panda, 52 and 91% respectively in Mujoco Simulator for Pouring task, outperforming the top competitor(BC+ MOCO).

Proposed DiffClone for Offline Behavioural Cloning, using conditional DDPMs for offline visual policy learning.

Gold

Adobe Behaviour Simulation Challenge

Inter IIT Techmeet 12.0 - Madras

Developed a method to predict tweet popularity using metadata, text analysis in a unique non-contextual approach.

Fine-tuned mPLUG-Owl2 and Llama-2 with policy-based reinforcement learning for Bandit-informed, personalized tweet generation, maximizing engagement metrics like likes through routed LLMs.

Gold

GrowSimplee's Warehouse Automation Challenge

Developed a prototype Industrial level dimensioning system using live 3D point cloud data from Depth camera.

Reduced error to just 0.2 percent for all three dimensions of any random shaped object moving on a conveyor belt system.

Travel

Destination 1
Paris, France

A beautiful trip exploring the streets of Paris, including the Eiffel Tower and Louvre Museum.

Destination 2
Tokyo, Japan

Visiting vibrant Tokyo with stops at Shibuya, Tokyo Tower, and the serene Meiji Shrine.

Destination 3
New York, USA

Exploring the Big Apple with visits to Times Square, Central Park, and the Statue of Liberty.