AI Engineering Intern
StitchStudio · Chicago, IL · Remote
Client: Internal Project
- Building LangChain agents on Llama 3 with prompt-routing and state-machine orchestration.
- RAG with FAISS—sub-100ms retrieval over million-scale embeddings.
I build backends that survive real traffic, CUDA stacks that earn their speedup, and LLM systems that ship in production—not demos.
Project · GPT-2 Inference Engine
On A40, the largest gains came from not recomputing past keys and values—memory traffic dropped before raw FLOPs did.
CUDA · FlashAttention-2 · Nsight Compute
Read finding →Project · ThinkerCUDA
ThinkerCUDA taught me that oversized thread blocks can hide latency on paper—and lose on real hardware.
CUDA · C++ · HPC
Read finding →Course · UIUC CS 598 · PACT
PACT research: a lightweight speculator can truncate redundant LLM turns without tanking task accuracy.
LLM agents · DPO · Phi-3-mini
Read finding →Research · Intelligent Surveillance over 5G Edge
Our 5G surveillance work showed when to infer on-device—and when centralized GPUs still win.
Edge AI · 5G · Real-time inference
Read finding →Total Industry Experience (3+ Years)
Production software across telecom, AI platforms, and chartered accountancy workflow automation.
StitchStudio · Chicago, IL · Remote
Client: Internal Project
Cognizant Technology Solutions · Tamil Nadu, India
Client: Verizon
BSP & CO. · Tamil Nadu, India
Client: Internal Project
SMZ & CO. · Kuala Lumpur · Remote
Client: Internal Project
Hands-on projects across inference, GPU kernels, agents, and cloud-native platforms.
GPT-2 forward pass from scratch: FlashAttention-2, KV-cache, and memory tiling profiled on NVIDIA A40.
3D convolution and tiled matmul kernels—6× throughput over CPU through coalescing and occupancy tuning.
Agentic audit workflows with BMAD-METHOD; routes prompts by task complexity for throughput and accuracy.
EKS microservices with ALB autoscaling; blue-green and canary releases for zero-downtime deploys.
Peer-reviewed edge AI and in-progress work on faster, leaner LLM agents.
UIUC CS 598 · In progress
Speculator model (Phi-3-mini) trims LLM over-deliberation; DPO alignment balances accuracy and latency.
Conference paper
Optimized real-time inference across edge and cloud—latency and bandwidth under 5G constraints.
Languages and platforms I reach for when performance and reliability both matter.
Daily drivers
Scale & reliability
Deploy & operate
Models & pipelines
University of Illinois Urbana-Champaign
Aug 2025 – Present
Parallel programming · Systems for GenAI · Applied ML · Cloud · LLMs
SRM Institute of Science and Technology
Jun 2019 – Jul 2023
DSA · Compilers · DBMS · Networks · Automata
Dean's recognition for students who lead at scale—across academics, sport, and campus life.
Dean's Award · Undergraduate cohort 2019–2023
SRM Institute of Science and Technology · Jun 2019 – May 2023
Core strengthLeadership skills
Awarded for sustained campus impact—scaling student operations, mentoring peers, and connecting technical communities with university-wide programs.
Open to full-time roles in systems, AI infrastructure, autonomy, and GPU computing.
+1 217-249-4900 · Champaign, IL