RohanChavan_

AI / ML Engineer building production LLM systems. Try the taxonomy that fed our 1st-place defender.

statusAvailable May 2026
locationBlacksburg, VA · open to relocate
focusLLM safety | MLOps | agents
attack taxonomy explorer · nova tournament 2
T21st
input

try a jailbreak, prompt injection, or benign request — categorized against the taxonomy that fed our 1st-place defender.

ctrl↵
46%attack reduction
117Ksynthetic examples
1st/ 10 teams
runs/ | training history

Five runs, shipped.

/01
2025

HokieTokie Amazon Nova Trusted AI1ST PLACE

Red-team side of Team HokieTokie. Built the attack taxonomy and the agent-driven pipeline that auto-generated the adversarial corpus used to train our defender model. 1st in Tournament 2 against Claude-3.7 Sonnet and CodeLlama-70B.

PyTorchTransformersAgent pipelinesAdversarial MLvLLMCodeGuru
RoleRed-Team Engineer
TeamTeam HokieTokie
WhenJan 2025 – Jun 2025
Context

Problem

Amazon and Together AI gave 10 teams a pretrained code-gen model and one job: make it resist prompt-injection and jailbreak attempts without over-refusing legitimate coding help. Teams went head-to-head in adversarial tournaments.

What I did

  • Built the attack taxonomy. Python vulnerability classes and adversarial patterns. Drove how the team generated training data across the project.
  • Agent-driven attack simulations. Attacker agent probes the defender, logs what gets through, pushes those failures back as training data for the next round. Cut hand-curating attacks out of the loop.
  • Adversarial corpus at scale. The attack-side data that fed the fused defender model and moved our Tournament scores.
  • Team placed 1st in Tournament 2, 2nd in Tournament 1, against Claude-3.7 Sonnet and CodeLlama-70B baselines.
/02
2025

LLM Benchmarking Service AutoUnifySF | INTERN

Evaluation + model-selection layer for an AI coding-agent platform.

FastAPIPostgreSQLPytestJSON SchemaGitHub ActionsDocker
RoleSWE Intern (AI/ML) · LLM Optimization & Selection track
TeamTeam Wiseman · 3 eng
WhenMay 2025 – Aug 2025
Context

Problem

Different agent roles in AutoUnify's coding-agent pipeline needed different model behavior. Worker agents needed strong code generation and task completion. QA agents needed strict validation and deterministic output. Using the same model + prompt everywhere was unreliable, and prompt/parameter changes were landing in production without a way to tell whether they actually helped.

What I did

  • LLM benchmarking microservice on FastAPI + PostgreSQL. Stored benchmark runs, prompt versions, model outputs, p50/p95 latency, cost per 1K tokens, and schema-validation results. Exposed scoring data through REST APIs that orchestration logic consumed to pick a model + config per agent role.
  • Role-specific prompt + parameter packs for worker and QA agents. Standardized temperature, top-p, max tokens, and schema strictness. Cut manual prompt-tuning effort by ~25%.
  • Regression-safe CI/CD via Pytest, lint, JSON-schema checks, and benchmark thresholds in GitHub Actions. Blocked degraded configs before merge and improved merge velocity by ~20%.
  • Observability for model selection: p50/p95 latency, cost/1K tokens, quality score, schema-adherence rate, and regression status logged per benchmark run.
/03
2026

Housing Policy Predictor Virginia Tech / VCHRGRA | CURRENT

LLM + RAG pipeline predicting which housing affordability policies a jurisdiction is likely to adopt. Three policy classes, 6,809 knowledge chunks in ChromaDB, community profiling across five jurisdiction types. Graduate Research Assistant under Dr. Zhang.

ChromaDBLangChainFastAPILlama 3.3 70BTogether AIall-MiniLM-L6-v2
RoleGraduate Research Assistant
TeamVCHR | Dr. Ruichuan Zhang
WhenMar 2026 – now
Linkin progress
Context

Problem

U.S. jurisdictions trying to improve housing affordability have no good way to know which policies fit their local context. This system builds that match: given a jurisdiction's demographic and economic profile, predict which of the three policy classes it is likely to adopt.

What I did

  • LLM + RAG prediction pipeline over ChromaDB (6,809 chunks, all-MiniLM-L6-v2 embeddings). Two-pass retrieval: k=20 first pass, k=5 reranked. Llama 3.3 70B via Together AI with Groq fallback. Three policy classes: Density Bonus, ADU, Affordable Dwelling Unit Ordinances.
  • Community profile system segmenting jurisdictions into five types (rural low-income, urban high-cost, college town, suburban growing, others) to ground predictions in local context.
  • BPS data integration via flat-file parsing of the Building Permit Survey. Wage data sourced manually via BLS where APIs returned null.
  • 95 tests passing, grounding checks passing on the held-out eval set, BAD_COMPARABLE detection implemented. FastAPI backend on LangChain with evaluation across demographic slices.
/04
2025

CareRoute distributed healthcareCODEFEST | 4TH

36-hour sprint building a 6-microservice distributed health booking system communicating over JSON-RPC 2.0, deployed to AWS EC2.

FastAPIPostgreSQLDockerAWS EC2JSON-RPC 2.0
RoleBackend
Team4 devs
WhenOct 2025
Context

Problem

50+ teams, 36 hours, a brief to build healthcare infrastructure. We chose autonomous microservices to prove we could decompose and reassemble a real workflow under time pressure.

What I did

  • 6 autonomous microservices coordinating via JSON-RPC 2.0 with automated workflow triggers.
  • Deployed to AWS EC2 and validated 12 end-to-end bookings.
  • 4th place + honorable mention among 50+ competing teams.
/05
2024

Workflow Automation Colgate-Palmolive GBSPROD | 400+ USERS

Three FastAPI apps replacing the worst manual onboarding and permissions workflows. 400+ employees, 200+ hours/month saved.

FastAPIRedisAsync I/ORBAC
RoleProcess Automation Intern
TeamGBS ops
WhenJan 2024 – Jun 2024
Linkinternal
Context

Problem

Employee onboarding and permission ops were manual, slow, and error-prone. I owned three FastAPI apps replacing the worst offenders.

What I did

  • Shipped 3 FastAPI apps with role-based workflows. Saved 200+ hours/month across 400+ employees.
  • Cut response times from 30s to under 10s via Redis caching + async I/O while handling 1K+ daily interactions.
  • Observability dashboards for latency, error rate, throughput across all three services.
notes/ | technical journal

Short posts, real opinions.

no thought leadership, just receipts
history/ | checkpoints

Four stops, each production.

research labs | SF startups | global enterprise
shipped at every checkpoint
Mar 2026PresentBlacksburg, VA
Research Assistant
Virginia Tech | Dr. Ruichuan Zhang | Construction Informatics
Building an LLM-powered housing policy recommender over Census ACS + HUD datasets. LangChain tool-calling agents, FastAPI backend with modular data-source clients.
May 2025Aug 2025San Francisco, CA
Software Engineering Intern, AI/ML
AutoUnify | Series A | SF
Built an LLM benchmarking microservice (FastAPI + PostgreSQL) for an AI coding-agent platform. Tracked p50/p95 latency, cost/1K tokens, and schema-adherence across model + prompt variants. Added Pytest + JSON-schema regression gates in CI that cut manual prompt-tuning effort by ~25% and improved merge velocity by ~20%.
Jan 2025Jun 2025Blacksburg, VA
AI/ML Engineer, Nova Trusted AI Challenge
Virginia Tech | Team HokieTokie | Amazon x Together AI
Fine-tuned LLMs via SFT + DPO on 117K synthetic examples. Cut adversarial attack success 46% and placed 1st in Tournament 2 among 10 global teams. Outperformed Claude-3.7 Sonnet on safety benchmarks.
Jan 2024Jun 2024Mumbai, India
Process Automation Engineer Intern
Colgate-Palmolive | Global Business Services
Shipped 3 production FastAPI apps automating onboarding + permissions for 400+ employees. Cut response times from 30s to under 10s via Redis + async I/O. 70% automation rate, 200+ hours saved monthly.
metrics/ | subject profile

Capabilities matrix.

I'm finishing an MS in Computer Engineering at Virginia Tech (3.71 GPA) with coursework across Advanced ML, Computer Vision, and Trustworthy ML.

Before grad school I was shipping automation at Colgate-Palmolive's global ops, the kind of boring but load-bearing work that teaches you to respect production.

Now I spend most of my time on applied LLM research: fine-tuning for safety, agentic systems over real-world APIs, and the evaluation infrastructure that keeps any of it trustworthy.

I care about systems that hold up under pressure, not just demos.

(contact)contact/ | open an issue

Let's build
something that holds.

rohanchavan0701@gmail.com