Saisamarth Rajesh (Sai) Phaye | Lead Audio AI/ML Engineer

Hello!

I'm Saisamarth Rajesh (Sai) Phaye, I build AI for real-time audio, speech, and sound understanding — bringing intelligence & clarity to sound

Get in touch

404: a thousand words lost
Background

I'm a Lead Audio AI/ML Engineer at  Logitech, where I work on Echo Cancellation for our video-conferencing devices and manage the audio pipeline for an upcoming flagship product. I’ve also co-led the development of a Speech Enhancement algorithm for real-time denoising and dereverberation, and our training approach was recently published at Interspeech 2025.

Previously, I co-founded Echolair, a music AI startup that transforms samples into unique variations, and served as Founder-in-Residence at The Sound of AI Accelerator. I also co-developed the first AI-powered Acoustic Echo Cancellation system at  Zoom, successfully deployed in Zoom Rooms, and built NLP models for recommendation systems at Shopee serving eight countries.

I graduated from Indian Institute of Technology Ropar in 2018 and conducted graduate research at National University of Singapore on Computational Sound Scene Analysis, publishing in IEEE ICASSP 2019.

When I'm not at my desk, I enjoy composing music and creating original pieces such as Child's Play. I'm also a former member of the NUS Guitar Ensemble. Outside of music, I’ve traveled to over 23 countries, and I like to play badminton, run, and climb rocks.

Skills
Languages
  • Python
  • Golang
  • JavaScript
  • JAVA
  • C
  • C++
  • SQL
  • MATLAB
Frameworks
  • PyTorch
  • Keras
  • TensorFlow
  • scikit-learn
  • OpenCV
  • ONNX
  • Librosa
  • NumPy/Pandas
Specializations
  • Speech Enhancement
  • Acoustic Echo Cancellation
  • Generative Speech Synthesis
  • Sound Scene Analysis
  • Music Information Retrieval
  • Face Analytics
  • Multimedia Processing
  • Anomaly Detection
Experience
Since Jan 2024
Lead Audio AI/ML Engineer
April 2023 - Jan 2024
Co-Founder & CTO
April 2023 - July 2023
Founder-in-Residence
July 2023 - Dec 2023
Audio AI Engineer (Founding Team)
April 2023 - June 2023
AI Consultant
Oct 2021 - June 2023
Audio AI Engineer
July 2020 - Sept 2021
Machine Learning Research Engineer
Machine Learning Research Engineer
Graduate Researcher
B.Tech. in Computer Science
Research Work

Introduced a one-stage, step-invariant flow-matching model for speech enhancement (SFMSE) that enables high-quality denoising in a single step while matching perceptual performance of diffusion-based baselines with ~60 neural evaluations.

DiffusionFlow MatchingSpeech Signal Processing

Proposed a novel paradigm that uses a model's own encoder as the loss function for speech enhancement, improving performance over traditional handcrafted or deep-feature losses.

Deep LearningSpeech Enhancement

Created a metric which evaluates statistical similarity between two data sources (for example, "does your training data match the deployment conditions?").

PythonPyTorchDeep LearningAudio Signal Processing

Built a two-stage system for claim verification that improves evidence retrieval using enriched question generation, achieving strong results on the AVeriTeC benchmark.

PyTorchLarge Language ModelsPrompt Engineering

End-to-end unsupervised anomaly detection system for CCTV factory videos at Panasonic. Novel ML algorithm for real-time detection of unauthorized access and machine anomalies.

Machine LearningEdge ComputingComputer VisionAnomaly Detection

Novel deep learning architecture for acoustic scene classification. Leverages band-wise temporal information achieving 14% relative improvement over DCASE 2018 baseline.
👨🏻‍💻Github Source Code👨🏻‍💻

TensorFlowPyTorchSound Scene Analysis

Top-10 team in REFUGE Challenge for glaucoma assessment from fundus photographs. Developed 2-level model for optic disc localization and cup/disc segmentation.

TensorFlowMedical ImagingComputer Vision

Enhanced Capsule Networks with multiple capsule levels and DenseNet integration. Published in ACCV 2018 and WiML NeurIPS 2019. Achieved state-of-the-art on MNIST with 20-fold reduction in training iterations.
👨🏻‍💻Github Source Code👨🏻‍💻

TensorFlowDeep LearningCapsule Networks

Multi-modal system using SVMs for audio and MLP for image processing to create synchronized multi-instrument videos, deployed as an android application.

Machine LearningMultimedia ProcessingAudio Processing