Machine Learning Engineer

Building smaller,
smarter models.

I specialize in fine-tuning and deploying Small Language Models — making AI efficient, private, and production-ready. Currently at wAI Advanced Industries, Islamabad.

View Projects ↓ GitHub
About

A bit about me

I'm a Machine Learning Engineer based in Islamabad, Pakistan, with a deep interest in making AI smaller, more efficient, and deployable at the edge. I believe the future of AI isn't just in scaling up — it's in making powerful models accessible everywhere.

My work centers on fine-tuning Small Language Models (270M–3B parameters), optimizing them through quantization, and deploying them for real-world enterprise use. I build full pipelines — from data preparation and training to evaluation and production serving.

Beyond my core ML work, I develop agentic AI systems using LangChain and LangGraph, build robust backends with FastAPI, and work with RAG systems to create intelligent, context-aware applications.

Education
BS Information Technology
The Islamia University of Bahawalpur · 2021 – 2025
Current Role
ML Engineer
wAI Advanced Industries · Nov 2025 – Present
Focus Areas
SLMs · Quantization · Edge AI
Making AI efficient, private, and production-ready
Experience

Where I've worked

Building production ML systems and fine-tuning pipelines.
Nov 2025 — Present
Machine Learning Engineer
wAI Advanced Industries · Islamabad
Fine-tuning Small Language Models for enterprise products. Building automated evaluation pipelines, deploying models to production servers, and engineering FastAPI backends that orchestrate ML workflows end-to-end. Focused on optimizing inference latency and memory efficiency.
Jul 2025 — Sep 2025
AI Engineer — Intern
Nexus Technologies
Worked on applied AI projects, gaining hands-on experience with model training workflows, data pipeline design, and foundational ML engineering practices.
Projects

Selected work

Real-world ML systems I've built and shipped.
🧠
Alara SLM Training
Multi-Agent Swarm Enablement
Fine-tuned SLMs (270M–3B) for the Alara Multi-Agent Swarm architecture. Trained task-specific models for sentiment, structured output, and reasoning with 4-bit quantization.
PyTorch Unsloth Quantization Multi-Agent
🌍
Multilingual Sentiment Engine
Gemma 270M · 4 Languages · 89% Accuracy
Built a multilingual sentiment model covering English, Urdu, Arabic, and Roman Urdu. Achieved 4× inference speedup through quantization while maintaining accuracy.
Gemma Unsloth Multilingual NLP
📋
Structured Output LLM
Schema-Valid JSON Generation
Fine-tuned Gemma 270M to reliably produce schema-valid JSON outputs for automation workflows. Designed end-to-end dataset transformation and validation pipelines.
Gemma JSON Schema Data Pipelines
Automated Fine-Tuning System
FastAPI Backend · Full Pipeline
Engineered a FastAPI service that ingests JSONL datasets, triggers configurable fine-tuning pipelines, and generates evaluation reports with version tracking.
FastAPI JSONL MLOps
🔗
Reasoning-Enhanced Model
Llama 3.2 3B · Chain-of-Thought
Improved logical inference through Chain-of-Thought fine-tuning on Llama 3.2 3B. Deployed via Llama.cpp for structured reasoning evaluation.
Llama 3.2 CoT Llama.cpp
🔍
Research Assistant System
LangChain + LangGraph + ArXiv
Developed a multi-step retrieval assistant that synthesizes information from ArXiv and research sources using contextual memory and LangGraph orchestration.
LangChain LangGraph RAG
Skills

Tech stack

Tools and technologies I work with daily.
⚙️
ML & LLM Engineering
PyTorch Scikit-Learn Unsloth LLaMA-Factory Gemma Qwen Llama 4-bit Quantization Chain-of-Thought Multi-task Learning
🛠️
Backend & Pipelines
Python FastAPI REST APIs JSONL Pipelines Gradio LangChain LangGraph
🚀
Deployment & DevOps
GGUF Conversion Ollama Llama.cpp HuggingFace Spaces Docker AWS EC2 Git Ubuntu Linux
📊
Data & Analysis
Pandas NumPy Matplotlib Seaborn Multilingual Datasets
🤖
AI Systems
RAG Systems MCP Agentic Frameworks Multi-Agent Swarms Structured Outputs
🎯
Specializations
Small Language Models Edge AI Model Optimization On-device Inference Privacy-focused AI
Blog

Writing & thoughts

Notes on ML engineering, small models, and building in public.
Coming Soon
Why Small Language Models Are the Future of Enterprise AI
Exploring why efficiency, privacy, and on-device deployment matter more than parameter counts for real-world applications.
Read →
Coming Soon
A Practical Guide to 4-bit Quantization
How to shrink models by 4× without destroying accuracy — the tools, tradeoffs, and techniques I use daily.
Read →
Coming Soon
Building Automated Fine-Tuning Pipelines with FastAPI
From JSONL ingestion to evaluation reports — how to build a self-serve fine-tuning system for your team.
Read →
Contact

Let's connect

Open to collaborations, freelance work, and interesting conversations about AI.