RohanChavan_

AI / ML Engineer building production LLM systems. Try the taxonomy that fed our 1st-place defender.

statusAvailable May 2026

locationBlacksburg, VA · open to relocate

focusLLM safety | MLOps | agents

attack taxonomy explorer · nova tournament 2

T21st

input

46%attack reduction

117Ksynthetic examples

1st/ 10 teams

runs/ | training history

Five runs, shipped.

/01
2025

HokieTokie Amazon Nova Trusted AI1ST PLACE

Red-team side of Team HokieTokie. Built the attack taxonomy and the agent-driven pipeline that auto-generated the adversarial corpus used to train our defender model. 1st in Tournament 2 against Claude-3.7 Sonnet and CodeLlama-70B.

PyTorchTransformersAgent pipelinesAdversarial MLvLLMCodeGuru

RoleRed-Team Engineer

TeamTeam HokieTokie

WhenJan 2025 – Jun 2025

Link↗ github

Context

Problem

Amazon and Together AI gave 10 teams a pretrained code-gen model and one job: make it resist prompt-injection and jailbreak attempts without over-refusing legitimate coding help. Teams went head-to-head in adversarial tournaments.

What I did

Built the attack taxonomy. Python vulnerability classes and adversarial patterns. Drove how the team generated training data across the project.
Agent-driven attack simulations. Attacker agent probes the defender, logs what gets through, pushes those failures back as training data for the next round. Cut hand-curating attacks out of the loop.
Adversarial corpus at scale. The attack-side data that fed the fused defender model and moved our Tournament scores.
Team placed 1st in Tournament 2, 2nd in Tournament 1, against Claude-3.7 Sonnet and CodeLlama-70B baselines.

/02
2025

LLM Benchmarking Service AutoUnifySF | INTERN

Evaluation + model-selection layer for an AI coding-agent platform.

FastAPIPostgreSQLPytestJSON SchemaGitHub ActionsDocker

RoleSWE Intern (AI/ML) · LLM Optimization & Selection track

TeamTeam Wiseman · 3 eng

WhenMay 2025 – Aug 2025

Link↗ writeup

Context

Problem

Different agent roles in AutoUnify's coding-agent pipeline needed different model behavior. Worker agents needed strong code generation and task completion. QA agents needed strict validation and deterministic output. Using the same model + prompt everywhere was unreliable, and prompt/parameter changes were landing in production without a way to tell whether they actually helped.

What I did

LLM benchmarking microservice on FastAPI + PostgreSQL. Stored benchmark runs, prompt versions, model outputs, p50/p95 latency, cost per 1K tokens, and schema-validation results. Exposed scoring data through REST APIs that orchestration logic consumed to pick a model + config per agent role.
Role-specific prompt + parameter packs for worker and QA agents. Standardized temperature, top-p, max tokens, and schema strictness. Cut manual prompt-tuning effort by ~25%.
Regression-safe CI/CD via Pytest, lint, JSON-schema checks, and benchmark thresholds in GitHub Actions. Blocked degraded configs before merge and improved merge velocity by ~20%.
Observability for model selection: p50/p95 latency, cost/1K tokens, quality score, schema-adherence rate, and regression status logged per benchmark run.

/03
2026

Housing Policy Predictor Virginia Tech / VCHRGRA | CURRENT

LLM + RAG pipeline predicting which housing affordability policies a jurisdiction is likely to adopt. Three policy classes, 6,809 knowledge chunks in ChromaDB, community profiling across five jurisdiction types. Graduate Research Assistant under Dr. Zhang.

ChromaDBLangChainFastAPILlama 3.3 70BTogether AIall-MiniLM-L6-v2

RoleGraduate Research Assistant

TeamVCHR | Dr. Ruichuan Zhang

WhenMar 2026 – now

Linkin progress

Context

Problem

U.S. jurisdictions trying to improve housing affordability have no good way to know which policies fit their local context. This system builds that match: given a jurisdiction's demographic and economic profile, predict which of the three policy classes it is likely to adopt.

What I did

LLM + RAG prediction pipeline over ChromaDB (6,809 chunks, all-MiniLM-L6-v2 embeddings). Two-pass retrieval: k=20 first pass, k=5 reranked. Llama 3.3 70B via Together AI with Groq fallback. Three policy classes: Density Bonus, ADU, Affordable Dwelling Unit Ordinances.
Community profile system segmenting jurisdictions into five types (rural low-income, urban high-cost, college town, suburban growing, others) to ground predictions in local context.
BPS data integration via flat-file parsing of the Building Permit Survey. Wage data sourced manually via BLS where APIs returned null.
95 tests passing, grounding checks passing on the held-out eval set, BAD_COMPARABLE detection implemented. FastAPI backend on LangChain with evaluation across demographic slices.

/04
2025

CareRoute distributed healthcareCODEFEST | 4TH

36-hour sprint building a 6-microservice distributed health booking system communicating over JSON-RPC 2.0, deployed to AWS EC2.

FastAPIPostgreSQLDockerAWS EC2JSON-RPC 2.0

RoleBackend

Team4 devs

WhenOct 2025

Link↗ github

Context

Problem

50+ teams, 36 hours, a brief to build healthcare infrastructure. We chose autonomous microservices to prove we could decompose and reassemble a real workflow under time pressure.

What I did

6 autonomous microservices coordinating via JSON-RPC 2.0 with automated workflow triggers.
Deployed to AWS EC2 and validated 12 end-to-end bookings.
4th place + honorable mention among 50+ competing teams.

/05
2024

Workflow Automation Colgate-Palmolive GBSPROD | 400+ USERS

Three FastAPI apps replacing the worst manual onboarding and permissions workflows. 400+ employees, 200+ hours/month saved.

FastAPIRedisAsync I/ORBAC

RoleProcess Automation Intern

TeamGBS ops

WhenJan 2024 – Jun 2024

Linkinternal

Context

Problem

Employee onboarding and permission ops were manual, slow, and error-prone. I owned three FastAPI apps replacing the worst offenders.

What I did

Shipped 3 FastAPI apps with role-based workflows. Saved 200+ hours/month across 400+ employees.
Cut response times from 30s to under 10s via Redis caching + async I/O while handling 1K+ daily interactions.
Observability dashboards for latency, error rate, throughput across all three services.

notes/ | technical journal

Short posts, real opinions.

no thought leadership, just receipts

history/ | checkpoints

Four stops, each production.

research labs | SF startups | global enterprise
shipped at every checkpoint

Mar 2026 – PresentBlacksburg, VA

Research Assistant

Virginia Tech | Dr. Ruichuan Zhang | Construction Informatics

Building an LLM-powered housing policy recommender over Census ACS + HUD datasets. LangChain tool-calling agents, FastAPI backend with modular data-source clients.

May 2025 – Aug 2025San Francisco, CA

Software Engineering Intern, AI/ML

AutoUnify | Series A | SF

Built an LLM benchmarking microservice (FastAPI + PostgreSQL) for an AI coding-agent platform. Tracked p50/p95 latency, cost/1K tokens, and schema-adherence across model + prompt variants. Added Pytest + JSON-schema regression gates in CI that cut manual prompt-tuning effort by ~25% and improved merge velocity by ~20%.

Jan 2025 – Jun 2025Blacksburg, VA

AI/ML Engineer, Nova Trusted AI Challenge

Virginia Tech | Team HokieTokie | Amazon x Together AI

Fine-tuned LLMs via SFT + DPO on 117K synthetic examples. Cut adversarial attack success 46% and placed 1st in Tournament 2 among 10 global teams. Outperformed Claude-3.7 Sonnet on safety benchmarks.

Jan 2024 – Jun 2024Mumbai, India

Process Automation Engineer Intern

Colgate-Palmolive | Global Business Services

Shipped 3 production FastAPI apps automating onboarding + permissions for 400+ employees. Cut response times from 30s to under 10s via Redis + async I/O. 70% automation rate, 200+ hours saved monthly.

metrics/ | subject profile

Capabilities matrix.

I'm finishing an MS in Computer Engineering at Virginia Tech (3.71 GPA) with coursework across Advanced ML, Computer Vision, and Trustworthy ML.

Before grad school I was shipping automation at Colgate-Palmolive's global ops, the kind of boring but load-bearing work that teaches you to respect production.

Now I spend most of my time on applied LLM research: fine-tuning for safety, agentic systems over real-world APIs, and the evaluation infrastructure that keeps any of it trustworthy.

I care about systems that hold up under pressure, not just demos.

(contact)contact/ | open an issue

Let's build
something that holds.

rohanchavan0701@gmail.com

GitHub ↗LinkedIn ↗Resume.pdf ↗

RohanChavan_

Five runs, shipped.

HokieTokie Amazon Nova Trusted AI1ST PLACE

Problem

What I did

LLM Benchmarking Service AutoUnifySF | INTERN

Problem

What I did

Housing Policy Predictor Virginia Tech / VCHRGRA | CURRENT

Problem

What I did

CareRoute distributed healthcareCODEFEST | 4TH

Problem

What I did

Workflow Automation Colgate-Palmolive GBSPROD | 400+ USERS

Problem

What I did

Short posts, real opinions.

Data beat algorithms on Nova

Gates beat evals

Agents over real APIs: what's hard, what isn't

Benchmarking AI coding agents, not just models

Four stops, each production.

Capabilities matrix.

Let's build
something that holds.

RohanChavan_

Five runs, shipped.

HokieTokie Amazon Nova Trusted AI1ST PLACE

Problem

What I did

LLM Benchmarking Service AutoUnifySF | INTERN

Problem

What I did

Housing Policy Predictor Virginia Tech / VCHRGRA | CURRENT

Problem

What I did

CareRoute distributed healthcareCODEFEST | 4TH

Problem

What I did

Workflow Automation Colgate-Palmolive GBSPROD | 400+ USERS

Problem

What I did

Short posts, real opinions.

Data beat algorithms on Nova

Gates beat evals

Agents over real APIs: what's hard, what isn't

Benchmarking AI coding agents, not just models

Four stops, each production.

Capabilities matrix.

Let's buildsomething that holds.

Let's build
something that holds.