Engineering

AI/ML Engineer

Full-time

|

Hybrid - Chennai/Madurai, TN

| Exp.

3-4 Years

Posted on.

AI/ML Engineer - Multimodal Systems & Model Training

AI/ML Engineer - Multimodal Systems & Model Training

Skills Required

Python, PyTorch, TensorFlow, Linear Regression, NLP, NumPy, SciPy, Pandas, Matplotlib, Seaborn, Keras, Docker, Git, GitHub, Vectors, Agentic Frameworks, VLMs, RAG, LLM, TensorFlow/Keras

Role Summary


We seek an AI/ML Engineer to design, train, and optimize the multimodal AI systems powering Vizhi's intelligent coaching. You will focus on:

  • Vision-Language Models (VLMs): Adapting models (LLaVA, BLIP-2, Qwen-VL) to understand human movement and generate personalized coaching

  • Multimodal Data Pipelines: Fusing vision (pose/video), language (coaching cues), and sensor data (heart rate, IMU)

  • Model Training & Optimization: Implementing training loops, experiments, and deploying models to edge devices (<100ms latency)

  • Agentic Systems & RAG: Building retrieval-augmented generation and multi-step reasoning agents for grounded coaching

You'll collaborate with CV, Mobile ML, and Engineering teams to power real-time intelligent assistance.


Key Responsibilities


1) VLM Architecture & Multimodal Fusion

  • Evaluate and select state-of-the-art VLMs for fitness domain (LLaVA, Flamingo, BLIP-2, Qwen-VL)

  • Understand architecture tradeoffs: model size vs. accuracy vs. latency, vision encoder capacity, language decoder design

  • Design multimodal pipelines fusing pose keypoints, coaching text, and sensor data with temporal alignment

  • Build cross-modal embeddings using contrastive learning (align poses to coaching cues)

2) Model Training & Fine-Tuning

  • Implement training loops and model definitions in PyTorch

  • Use LoRA/QLoRA for efficient fine-tuning (1-5% parameters) on consumer GPUs

  • Run experiments: hyperparameter sweeps, architecture variations, augmentation strategies

  • Implement training pipelines with contrastive loss, supervised fine-tuning, and instruction tuning

  • Monitor metrics (loss curves, accuracy, validation) and debug training issues (NaNs, divergence, data mismatches)

  • Handle data imbalance through augmentation and oversampling

3) Data Pipeline & Dataset Management

  • Design and maintain datasets for vision, language, and multimodal tasks

  • Implement data pipelines: train/val/test splits, versioning (DVC), quality checks

  • Build preprocessing scripts: normalization, scaling, sequence preparation, temporal smoothing

  • Curate exercise descriptions, trainer coaching scripts, and form feedback annotations

  • Extract multimodal data from trainer recordings (video, audio, pose) and create aligned training pairs

  • Create golden standard templates for exercises (5-10 perfect rep examples)

4) Generative Coaching & Form Assessment

  • Train generative models: pose sequence → natural language coaching cue (5-30 tokens, <100ms)

  • Implement safety filters preventing dangerous suggestions

  • Support coaching personas (motivational, technical, balanced) via prompt engineering

  • Build form quality scoring (0-100) with explainable components: positioning, stability, ROM, timing, symmetry

  • Detect form degradation under fatigue

5) Model Optimization & Deployment

  • Quantization: INT8/FP16 post-training and quantization-aware training (<2% accuracy loss)

  • Knowledge distillation: train smaller student models (100M-300M params) mimicking teacher VLMs

  • Architecture optimization: efficient encoders (MobileViT), pruning, dynamic quantization

  • Export to ONNX/TFLite/CoreML for edge deployment

  • Profile on target devices (Snapdragon XR2+ smart glasses), achieve <100ms latency

  • Batch inference, embedding caching, inference optimization

6) Agentic Systems & RAG

  • Agentic Coaching: Design multi-step reasoning agents analyzing form, retrieving knowledge, generating grounded feedback

  • Multimodal RAG: Build retrieval-augmented generation using vector search over exercise standards and trainer libraries

  • Tool-Augmented VLM: Enable VLMs to invoke tools (pose analysis, biomechanics calculators, rep counters)

  • Agent Training Pipeline: Design agents monitoring data quality, detecting labeling issues, suggesting improvements

  • Model Coordination: Coordinate VLM, CV models, sensors through structured tool calls and shared context

7) Evaluation & Benchmarking

  • Define and track metrics: form classification (>90%), form quality prediction (<5 MAE), coaching accuracy (>85%)

  • Language quality: BLEU, ROUGE, METEOR, human evaluation (fluency, relevance, correctness)

  • Latency targets: vision encoder <20ms, language decoder <60ms, end-to-end <100ms

  • Test robustness across body types, environments, exercises, edge cases

  • Build validation datasets: 20+ exercises, diverse demographics, occlusions, extreme angles

  • Implement automated regression tests and CI integration

8) Reproducibility & Experiment Tracking

  • Use experiment tracking tools (Weights & Biases, MLflow, TensorBoard)

  • Ensure reproducibility: fix seeds, document dependencies, version datasets

  • Maintain documentation: training scripts, preprocessing steps, experiment results

  • Summarize findings in reports and dashboards for team

9) Cross-Functional Collaboration

  • Partner with CV Engineers on pose representation formats and quality validation

  • Work with Mobile ML on resource-aware deployment and profiling

  • Collaborate with Product/Trainers on coaching quality feedback

  • Participate in technical reviews, code reviews, and retrospectives


Required Skills & Experience


Educational Background

  • Bachelor's degree in Computer Science, ML, Mathematics, Statistics, Physics, or related field

  • Master's degree is a plus

  • Strong foundation in linear algebra, probability, statistics, calculus


Core ML & Deep Learning (3-4 years)


Programming & Frameworks:

  • Expert-level Python: NumPy, SciPy, pandas, Matplotlib/Seaborn

  • PyTorch (1.5+ years): building/training networks, custom loss functions, data loading

  • Hugging Face Transformers (1+ year): fine-tuning, model architectures, vision models

  • Experience with at least one DL framework (PyTorch primary, TensorFlow/Keras acceptable)

ML Fundamentals:

  • Train/val/test splits, overfitting vs. underfitting

  • Loss functions (MSE, cross-entropy), optimizers (SGD, Adam)

  • Regularization (dropout, weight decay, early stopping)

  • Gradient descent and training dynamics

  • Experience training small-to-medium models end-to-end

Data Engineering:

  • Working with real datasets: cleaning, transforming, augmenting

  • Data loaders and preprocessing pipelines

  • Logging metrics, saving checkpoints

  • Dataset versioning and quality checks


VLM & Multimodal AI (2+ years preferred)


Vision-Language Models:

  • Experience fine-tuning VLMs (CLIP, LLaVA, Flamingo, BLIP-2, Qwen-VL)

  • Understanding VLM architecture: vision encoders, language decoders, cross-attention

  • Multimodal fusion techniques: early/late fusion, contrastive learning

NLP & Language Models:

  • Transformer architecture: attention, self-attention, encoder-decoder

  • Autoregressive generation, beam search

  • Fine-tuning: instruction tuning, LoRA, parameter-efficient methods

  • Prompt engineering and tokenization (BPE, WordPiece)

  • Evaluation: BLEU, ROUGE, METEOR, human evaluation

Computer Vision (Intermediate):

  • Understanding of pose estimation: 2D/3D keypoints, skeleton representations, temporal modeling

  • Basic image processing: rotation, scaling, normalization

  • Comfortable reading CV code and papers

Model Optimization & Deployment

  • Quantization: INT8, FP16, post-training and QAT

  • Knowledge distillation: student-teacher models

  • Pruning and model compression

  • ONNX/TFLite conversion

  • Edge inference and low-latency optimization

Agentic AI & RAG Systems

  • RAG Systems: Vector databases, embedding selection, chunking strategies, retrieval evaluation

  • Agent Frameworks: ReAct-style agents, multi-agent patterns, tool/function calling, agent memory

  • LLM Orchestration: Prompt engineering, chain-of-thought, multi-step workflows, function calling

  • Tool Protocols: Model-tool interfaces, context sharing, JSON schemas, cross-model coordination

Software Engineering

  • Git workflows and code review

  • Clean, modular, well-documented code

  • Reproducibility: seed management, documentation

  • Debugging, logging, testing

  • CI/CD basics

Mathematical Foundation

  • Linear Algebra: Vectors, matrices, decompositions, eigenvalues

  • Calculus: Derivatives, chain rule, gradients, optimization

  • Probability & Statistics: Distributions, expectation/variance, hypothesis testing

  • Numerical Methods: Least-squares fitting, handling ill-conditioned problems


Preferred Qualifications


Advanced Multimodal & Generative AI

  • Experience with cutting-edge VLMs: GPT-4V, Gemini Vision, Claude Vision

  • Generative AI: seq2seq, RAG, RLHF

  • Multimodal distillation across modalities

  • Production ML: monitoring, drift detection, retraining, A/B testing

Specialized Domain Knowledge

  • Biomechanics or sports science

  • Human pose estimation experience

  • Computer vision tasks (detection, segmentation, tracking)

Advanced Optimization

  • Torch JIT, CUDA kernels

  • Mixed-precision training

  • Distributed training

  • GPU optimization and parallel processing

Research & Open Source

  • Published papers (NeurIPS, ICML, CVPR, ICCV, arXiv)

  • Kaggle competitions

  • GitHub repos with VLM/multimodal projects

  • Hugging Face contributions

Production Experience

  • Deployed ML models to real users

  • Model monitoring and retraining pipelines

  • Experiment tracking tools (W&B, MLflow)

  • Format conversion (ONNX, TFLite, CoreML)


What You'll Gain

  • Technical ownership of multimodal AI pipeline (data to production)

  • Deep expertise in VLMs, cross-modal learning, and agentic systems

  • Real-world impact on thousands of users' fitness experiences

  • Research-to-product bridge: cutting-edge AI shipped to real users

  • Contribution to Nutpaa's multimodal AI patents

  • Collaboration with CV, mobile, and product specialists

  • Path to Senior ML Engineer or Research Scientist roles

  • Hybrid working from Month 6+


MVP Success Criteria (8 Months)

  1. VLM selected, fine-tuned on 1,000+ pose-coaching pairs

  2. Multimodal data pipeline operational (vision + language + sensor alignment)

  3. Coaching generation >85% trainer approval, form assessment >90% accuracy

  4. Quantized model <2% accuracy loss, <100ms inference on device

  5. RAG and agentic coaching system functional

  6. Comprehensive evaluation across 20+ exercises, diverse demographics

  7. Training pipelines reproducible with experiment tracking

  8. Documentation and handoff ready


Work Arrangement

  • Duration: 8 months (March–October 2026)

  • Commitment: Full-time, 45–50 hours/week

  • Location: Hybrid – Chennai or Madurai

  • Post-MVP: Flexible hybrid working from Month 6+


Application Process

Email careers@nutpaa.ai with:

1. Resume highlighting:

  • ML/AI experience (3–4+ years)

  • VLM or multimodal ML projects (2+ years preferred)

  • PyTorch and Hugging Face expertise

  • Production deployment experience

  • RAG/agent framework experience

2. Portfolio:

  • GitHub: VLM fine-tuning, multimodal pipelines, training experiments, RAG systems

  • Technical writing: blog posts, papers, project documentation

  • Published work: arXiv, Kaggle, Hugging Face contributions

3. Statement (~200–250 words):

  • Interest in VLMs and multimodal AI for fitness

  • One ML/VLM project you led (challenge, approach, results)

  • Why real-world AI deployment matters to you

  • Excitement about early-stage deep-tech

Email Subject: AI/ML Engineer – Multimodal Systems – [Your Name]


Equal Opportunity

Nutpaa is an equal opportunity employer. We value ML fundamentals, VLM/multimodal expertise, and shipping mentality over strict credentialism.

Non-traditional backgrounds welcome: Candidates without formal degrees but with demonstrable expertise (portfolio, projects, shipped products) are encouraged to apply

Questions?

Email: careers@nutpaa.ai


Apply now to join us

Apply now to join us

First Name *
Middle Name
Last Name *
Email Address *
Phone no.
Current Location
LinkedIn
GitHub
Portfolio
Brief about you *
Resume *
Click to choose a file or drag here
Size limit: 1 MB
Loading captcha…