Taj Khunkhun

Founding AI Engineer

Taj Khunkhun

Transforming experimental prototypes into production-grade agentic AI systems. 8+ years in backend systems, 3+ years specializing in Multi-Agent Orchestration and Autonomous Reasoners.

LangGraph · CrewAI · AutoGen · MCP · RAG

What I Work With

Technical Skills

🤖

AI & Agents

LangChainLangGraphCrewAIAutoGenAutoGPTMCPAgentic RAGGraphRAGLangSmithMulti-Agent OrchestrationPrompt Engineering
🧠

ML & Deep Learning

PyTorchTensorFlowKerasscikit-learnspaCyNLPLoRA / QLoRAReinforcement LearningNeural NetworksMLOps / LLMOps

Languages & Frameworks

PythonGolangTypeScriptJavaScriptDjangoFastAPIFlaskNode.jsNext.jsReactAngular
🗄️

Data & Infrastructure

SnowflakeDatabricksSparkKafkaAirflowBigQueryNeo4jRedisPineconeElasticsearchChromaDBPostgreSQL
☁️

Cloud & DevOps

AWSAzureGCPKubernetesDockerTerraformCI/CDMicrosoft FabricCopilot StudioAzure Data Lake

Where I've Worked

Experience

Lextar AI Legal Solutions, Inc.

Founding AI Engineer

Remote

Jan 2026 - Present
  • Architected a full-stack enterprise legal AI platform for Canadian and US immigration law using FastAPI, Next.js, PostgreSQL, SQLAlchemy, and ChromaDB for hybrid RAG.
  • Engineered an 11-step lawyer-faithful reasoning pipeline integrating DeepSeek LLM, hierarchical issue framing, conflict resolution, and strict grounding verification.
  • Built a hybrid GraphRAG retrieval system combining ChromaDB vector search with an in-memory knowledge graph (1,300+ legal nodes, 445 relationships) across 6,200+ ingested chunks.
  • Implemented multi-tenant SaaS with Clerk authentication, role-based access control, Stripe checkout, PDF invoices, and automated SMTP email receipts.
  • Designed superadmin dashboard with platform-wide analytics, user management, per-organization usage tracking, and real-time reasoning unit consumption monitoring.
  • Built real-time streaming legal analysis workspace with SSE, live reasoning trace visualization, and role-specific output formatting.
  • Implemented strict legal output governance including grounding verification, cross-reference resolution, and confidence-weighted scoring.

Product Demo

Hewlett Packard Enterprise

Senior AI Engineer | Agentic AI Engineer

Spring, Texas (Remote)

Jan 2023 - Dec 2025
  • Designed multi-agent Plan-and-Execute architecture using LangGraph, reducing infinite loop failures by 35%.
  • Architected MCP server layer standardizing tool integration, reducing custom tool-binding code by 60%.
  • Reduced inference costs by 45% via Router Agent dynamically triaging between Llama 2 7B and GPT-4.
  • Implemented HITL checkpoint system for financial workflows with 0.85 confidence threshold.
  • Built observability dashboards using LangSmith and Arize Phoenix, resolving 10s+ latency bottlenecks.
  • Architected LLM evaluation framework using DeepEval across 5,000+ test cases.
  • Reduced costs by 52% and p95 latency by 300ms through semantic caching with Redis.
  • Developed Shadow Deployment pipeline for prompt engineering A/B tests on live traffic with 0% user impact.
  • Designed multi-agent collaboration framework with Planner, Executor, and Critic agents via structured message-passing.
  • Eliminated catastrophic forgetting in Llama 2 by mixing 15% pre-training replay data.
  • Reduced training VRAM by 65% using QLoRA 4-bit quantization for 70B parameter models.
  • Improved inference throughput 4x via multi-LoRA serving (vLLM/LoRAX) for 10+ adapters.
  • Architected multi-region Kafka with sub-second cross-region replication.
  • Engineered tiered memory system preserving intent across 20+ agent handoffs.
  • Led migration from monolithic Django to Microservices with Docker and Kubernetes.
  • Built high-concurrency FastAPI architecture handling 10k+ concurrent WebSocket connections.
  • Built MCP-compliant tool registry enabling dynamic tool discovery at runtime.
  • Architected Self-Correction loop for SQL agent, reducing syntax errors by 50%.
  • Built NER pipeline with Keras Bi-LSTM achieving 25% F1-score improvement.
  • Resolved critical GIL bottlenecks by refactoring to Python Multiprocessing and Celery/Redis worker cluster.
  • Engineered custom spaCy and Transformer models integrated with Fabric Lakehouse using lazy-loading across Spark clusters.

Adobe

Data & Machine Learning Engineer

San Jose, California

Aug 2019 - Jan 2023
  • Architected ETL pipeline processing 5TB+ multi-modal data using Spark.
  • Eliminated vector-relational desync via CDC (Debezium + Kafka) with 99.9% consistency.
  • Engineered Blue-Green re-indexing for zero-downtime migrations across 50M+ vectors.
  • Optimized semantic search by 40% via hierarchical document indexing.
  • Reduced vector storage costs by $8k/month through tiered data strategy.
  • Deployed Semantic Data Guard monitoring data drift with 15% deviation alerting.
  • Standardized AI Data Contracts across four teams enforcing GDPR/CCPA compliance.
  • Reduced inference latency by 65% via model distillation on NVIDIA A100 GPUs.
  • Built synthetic data engine using SDV and GPT-3.5, improving minority tasks by 18%.
  • Engineered Fail-Soft orchestration saving $15k/month in compute costs.
  • Solved multi-modal cold start problem with tiered embedding cache (Redis + Elasticsearch), reducing first-response time from 2.4s to 400ms.
  • Built programmatic A/B testing framework promoting models based on Ground Truth alignment with statistical significance tests.

Enterprise Work

Projects

Full-stack legal AI SaaS platform with hybrid GraphRAG retrieval (1,300+ node knowledge graph + vector search), 11-step structured legal reasoning pipeline, multi-tenant RBAC, real-time streaming analysis workspace, and governance-grade output verification for Canadian and US immigration law.

FastAPINext.jsPostgreSQLChromaDBDeepSeekGraphRAGClerkStripeSSE

Multi-agent Plan-and-Execute architectures, Router Agents, MCP server integrations, HITL checkpoints, semantic caching, shadow deployment pipelines, and LLM evaluation frameworks for HPE's enterprise AI platform.

LangGraphMCPNVIDIA NIMRAGMulti-AgentLangSmithDeepEvalGPT-4Llama 2

Fine-tuned LLMs with QLoRA on GPU clusters, multi-region Kafka replication, NER pipelines, multi-agent memory systems, and multi-LoRA inference serving across Kubernetes-based ML platform.

SparkKafkaKubernetesspaCyKerasBi-LSTMPyTorchQLoRAvLLM

Migrated monolithic services to microservices, high-concurrency event-driven APIs, MCP-compliant tool registries, SQL-generating agents, and GIL/integration bottleneck resolution.

DjangoFastAPIDockerKubernetesMCPAgentic AICeleryRedis

High-throughput ETL/search pipelines, CDC-based vector sync, embedding drift management, synthetic data generation, inference optimization, and A/B testing frameworks.

SparkKafkaDebeziumPineconeElasticsearchGrafanaGPT-3.5NVIDIA A100

Open Source

Side Projects

Personal projects exploring agentic AI patterns, multi-agent architectures, and intelligent automation.

Agentic AI Chat Analyzer

1

AI-powered platform for analyzing agent chat transcripts. Performs exploratory data analysis, LLM-based summarization, and sentiment classification through an interactive Streamlit frontend.

  • Modular data pipeline (ingestion, cleaning, transformation)
  • EDA with word clouds and sentiment visualizations
  • Model caching for offline operation
FastAPIStreamlitHuggingFaceFlan-T5RoBERTaDockerPandas
View on GitHub

AI Recruiter

2

Intelligent recruitment platform that automates candidate discovery by scanning GitHub profiles and Google Scholar to identify qualified AI/ML professionals with relevance scoring.

  • Multi-source profile analysis (GitHub + Google Scholar)
  • Relevance-based intelligence scoring for AI/ML skills
  • Geographic filtering and co-author extraction
PythonFlaskWeb ScrapingDockerNLP
View on GitHub

AI Email Agent (Supervisor Mode)

Email automation system using supervisor-pattern multi-agent architecture that categorizes emails, generates RAG-powered responses, proofreads with AI, and sends replies via Gmail.

  • Supervisor pattern for dynamic agent coordination
  • RAG-powered response generation from knowledge base
  • AI proofreading layer before sending
LangChainLangGraphGroqLlama 3.3ChromaDBGmail APIFastAPI
View on GitHub

Simple Chatbot

Lightweight, rule-based chatbot with Gradio UI that answers questions about healthcare automation agents using fuzzy string matching -- no external LLM calls required.

  • Weighted scoring: string similarity (60%) + keyword matching (40%)
  • Confidence threshold for answer selection
  • Graceful fallback responses listing available topics
PythonGradioNLPFuzzy Matching
View on GitHub

Academic Background

Education

Master of Computer Science

Santa Clara University

Aug 2019 - Jun 2020

Bachelor of Computer Science

Santa Clara University

Sep 2015 - Jun 2019

Let's Connect

Get in Touch

I'm always open to discussing new opportunities in Agentic AI, Multi-Agent Systems, and production ML engineering.