AI Engineer with 3+ years of experience building proprietary middleware and agentic workflows to drive operational efficiency. Specialized in Generative AI fine-tuning (RLHF, GRPO, LoRA) and deploying high-throughput, cloud-native solutions on GCP/Vertex AI and AWS.
- AI & Agentic Systems: LangGraph (State Graphs), CrewAI, Prompt Engineering, RAG Pipelines (LangChain), Agentic Feedback Loops, GitHub Copilot
- Fine-Tuning & Eval: RLHF, GRPO, LoRA/QLoRA, SFT, LLM Evaluation (BLEU, ROUGE)
- Engineering & Ops: FastAPI, SQL, Streamlit, React, Docker, Git, GitHub Actions
- Clinical & Compliance: HIPAA, GDPR/ENISA, RBAC, FHIR Standards, DICOM
- Cloud & Hardware: GCP (Vertex AI), AWS, NVIDIA H100, AMD MI300X, vLLM, PySpark
- Built a multimodal assistant using MedGemma 4B for real-time medical image analysis (X-ray, MRI) and SOAP note generation.
- Architected a HIPAA-compliant RAG pipeline for patient context retrieval using FHIR-compliant EHR data and encrypted DICOM images.
- Developed a LangGraph-based state graph where multiple MedGemma 4B instances execute in parallel to compute independent differential diagnoses and inter-agent agreement.
- Developed a fine-tuning framework at the intersection of SFT and GRPO using LoRA, Unsloth, and Hugging Face.
- Optimized model inference using vLLM's PagedAttention and multi-GPU setups to ensure low-latency, real-time reasoning.
- Developed a LangGraph workflow utilizing Chain-of-Thought (CoT) patterns and a RAG architecture to analyze complex soccer queries.
- Used vLLM's PagedAttention and multi-GPU setups to ensure low-latency responses for real-time agentic reasoning.
- Deployed a high-concurrency AlphaZero agent on Hugging Face Spaces.
- Engineered the communication infrastructure to handle real-time state updates.
- Optimized model inference for a web-based environment to serve complex AI models via serverless-style hosting.
- Orchestrated a PySpark and BigQuery pipeline to process over 1 million images for real-time feature extraction.
- Developed models for object detection, showcasing the ability to handle massive-scale image data pipelines.
Freelance | Machine Learning Specialist
- Deployed a custom "Morning Briefing" multi-agent system using CrewAI to automate patient data synthesis and streamline clinical reporting.
- Executed RLHF and GRPO fine-tuning on NVIDIA H100 and AMD MI300X GPUs to optimize model performance for clinical safety.
Sopra Steria | Machine Learning Engineer
- Engineered a RAG pipeline using LangChain and ChromaDB to transform static GDPR/ENISA policy documents into interactive agentic interfaces.
- Developed FastAPI-based middleware and BiLSTM models to automate the routing of 10,000+ monthly service tickets, reducing manual triage by 40%.
- MS in Applied Data Analytics, Boston University (Jan 2024 - May 2025).
- Research Assistant (Reinforcement Learning): Architected an open-source C++ package integrating LibTorch with OpenAI Gym for high-performance RL including A2C and PPO.
- Teaching Assistant (Big Data Analytics): Led sessions on deploying PySpark solutions across GCP and AWS for large-scale batch and streaming.
- Email: 23.deepak.s@gmail.com
- Links: LinkedIn | GitHub | AI Writeups


