Greater Bengaluru Area, India

Hi, I'm Shuban.

I'm a 

I build reliable AI-driven systems and data pipelines — turning ambiguous problems into production software that teams can trust.

01 — About

Engineering that holds up in production.

I'm Shuban, a computer engineering graduate with close to 2 years of experience building real AI-driven systems and data pipelines. I work best at the intersection of strong Python engineering and applied AI.

I don't just build models or write backend code — I focus on turning ideas into reliable systems that can actually be used, maintained, and scaled. That means clean code, thoughtful design, and a constant eye on edge cases and long-term impact.

I've designed end-to-end pipelines, built APIs, worked with data that isn't perfect, and shaped AI solutions so they deliver consistent results rather than flashy demos. I care about correctness, clarity, and making systems that teams can trust.

What I bring to the table is ownership. I'm comfortable taking a problem from ambiguity to execution, asking the right questions early, and balancing speed with quality. I think like an engineer, but I always keep the business goal in mind — why something is being built matters as much as how.

Top skills

  • Python Engineering
  • Applied AI & Machine Learning
  • Data Pipeline Design
  • System Reliability & Scalability
  • Problem Solving & Ownership

02 — Experience

Where I've built things.

  1. AI Engineer & SDE-1 · CGI

    Jun 2025 — Present

    Full-time · Bangalore, India · On-site

    • Built a multi-agent OCR pipeline bridging unstructured documents to structured data, achieving ~97% extraction accuracy — a 93% improvement over traditional OCR — with human-in-the-loop escalation limited to ~4% of cases.
    • Reduced document processing costs by ~80% by replacing Vision LLMs with an optimized agentic architecture, processing 500-page documents in ~1.5 hours at ~$1 per document using cloud models.
    • Developed an intelligent document search and retrieval system surpassing baseline RAG approaches, validated with strong retrieval and grounding metrics.
  2. Lead Data Scientist — Research · Vellore Institute of Technology

    Jun 2024 — Aug 2024

    Full-time · Chennai, India · On-site

    • Collected a medical gait dataset using a ₹4L Xsens MTw Awinda IMU kit, capturing stair ascent/descent motion from tailbone, knees, and feet.
    • Built Python data pipelines to clean IMU signals and extract time- and frequency-domain features (FFT, IQR, mean, median).
    • Applied unsupervised clustering (K-Means, Agglomerative, Spectral) for BMI and weight-based classification, with agglomerative clustering performing best and weight-based labels outperforming BMI in several cases.
  3. Founding ML & Backend Engineer · GreenStitch.io

    Aug 2023 — Jan 2024

    Full-time · Bangalore, India · On-site

    • Owned the FastAPI-based Python backend, building and running production APIs for Scope 3 carbon emissions prediction with real-time and batch inference.
    • Designed normalized tables to store emission data and built a vector-based semantic search engine using NLP techniques like NER and word embeddings, hitting ~99% retrieval accuracy on sustainability data.
    • Built and deployed stacked ensemble models with TensorFlow & Keras following clean ETL and preprocessing pipelines, consistently achieving 99%+ accuracy and strong F1 scores in production.
    • Created advanced Excel carbon accounting tools with heavy edge-case handling — hidden source tables, locked formulas, complex queries, and embedded regression models for precise reporting.

Education

Vellore Institute of Technology, Chennai

2021 — 2025

B.Tech, Electronics and Computer Engineering

Grade: A · CGPA 8.48

Coursework: Artificial Intelligence, Machine Learning, Data Science, DSA, DBMS, Software Engineering

03 — Skills

Tools I reach for.

Languages & Data

  • Python Python
  • SQL SQL
  • PostgreSQL PostgreSQL
  • Redis Redis
  • Pandas Pandas
  • NumPy NumPy

Machine Learning

  • PyTorch PyTorch
  • TensorFlow TensorFlow
  • Keras Keras
  • Scikit-learn Scikit-learn
  • Jupyter Jupyter

Engineering & Cloud

  • FastAPI FastAPI
  • Docker Docker
  • Azure Azure
  • Git Git
  • GitHub GitHub
  • VS Code VS Code

04 — Projects

Things I've shipped.

Catalyst

A modular study companion combining FastAPI, Streamlit, Celery, and fine-tuned LLMs for adaptive learning, real-time doubt resolution, and progress tracking.

  • FastAPI
  • LLMs
  • NLP
  • PostgreSQL

Parking Lot Occupancy

Real-time parking slot occupancy detection from a single CCTV feed, powered by YOLOv8 and a custom PSAT alignment algorithm — 87.9% accuracy even at full capacity.

  • YOLOv8
  • Keras
  • Azure
  • Computer Vision

FALCON-Net

A Streamlit app analyzing the robustness of Siamese and Prototypical networks against adversarial attacks, with built-in defense strategies.

  • Adversarial ML
  • Few-Shot
  • Streamlit

EVenture Backend

A FastAPI-powered service for effortless road-trip planning tailored to EV owners, handling routing and charging logistics.

  • FastAPI
  • Python
  • APIs

Self-Driving Car

A self-driving car simulator built from scratch in Pygame, with a Deep Q-Network agent that learns to navigate dynamically generated tracks.

  • Reinforcement Learning
  • Pygame
  • DQN

2 Patents Filed

Filed patents including a Generative AI learning system with blockchain integration.

Best Paper Award — IEEE

Published in IEEE and received a Best Paper Award for research in applied AI.

Hackathon Finalist

Finalist in 3 National Hackathons and 1 International Hackathon.

Invited Speaker

Conducted a session on Azure Fundamentals and deploying ML models on Azure.

05 — Research

Published papers.

FALCONNet: A Multi-Defense Approach for Securing Few-Shot Learning Against Adversarial Attacks

This paper compares few-shot learning models — Siamese, Prototypical Networks, and our proposed FalconNet — on the Omniglot dataset. We assess their robustness against adversarial attacks (PGD, FGSM) and test defense strategies like adversarial training and defensive distillation. FalconNet, with these defenses, outperforms the other models in accuracy and stability under attack.

Open full paper (PDF)

06 — Contact

Let's build something worth building.

I'm always open to interesting problems in applied AI, GenAI, and data engineering. Reach out — I'd love to talk.

Say hello