About the Role
You will build the AI layer that makes our platform work - agents that reason about enterprise data, pipelines that process it reliably, and evaluation systems that catch failures before they reach production. This is not prompt wrapping or demo engineering. It is production AI at enterprise scale, in compliance-heavy environments, on data that is messy by design.
The Problem You’ll Own:
LLMs are powerful. They are also brittle in ways that matter enormously when a model is making decisions about a multi-million dollar purchase order, a patient record, or a financial transaction. Hallucinations, context failures, retrieval mismatches, and inconsistent outputs are not acceptable in our use case.
Your job is to build the systems that make AI reliable in exactly these environments: validation agents that reason about business rules, evaluation pipelines that catch regressions, and observability tooling that tells you when something has gone wrong before the customer does.
What You’ll Do:
+ Build and deploy LLM-powered agents for enterprise data validation — reading specs, reasoning about business rules, identifying failure modes, and generating structured outputs
+ Design and own evaluation frameworks: automated test suites, LLM-as-judge pipelines, regression detection, and benchmarks that track whether our agents are improving
+ Build RAG pipelines that work reliably on real enterprise data — messy schemas, inconsistent formats, mixed structured and unstructured content
+ Integrate AI systems with enterprise infrastructure (SAP, Snowflake, Databricks, Postgres, REST APIs) with attention to latency, data residency, and compliance
+ Design agentic workflows with tool use, multi-step reasoning, and deterministic guardrails
+ Build observability tooling: trace agent reasoning, track output reliability, and detect hallucinations or drift in production
+ Work directly with FDSEs to understand real deployment failures and translate them into system improvements
The Stack:
+ Languages: Python (primary), Go, Node.js
+ AI/ML: LLMs (Claude, GPT-4, Command R+), RAG, vector databases, embeddings, fine-tuning
+ Evaluation: LLM-as-judge, automated eval pipelines, custom benchmarks
+ Data: Snowflake, Databricks, Postgres
+ Infra: containers, Kubernetes / ECS / Cloud Run
+ Tools: LangChain, LlamaIndex, OpenAI / Anthropic APIs, LangSmith
Compensation & Logistics:
Salary: INR 30 - 45 Lakhs (mid) / 70 Lakhs - 1 Cr+ (senior) depending on experience
Equity: Early-stage equity grant
Requirements
Production builder: you’ve shipped LLM-powered features real users depend on and debugged them when they broke
LLM practitioner: you understand hallucinations, retrieval failures, context limits, and what it takes to make agents deterministic enough for enterprise use
Systems thinker: you design for latency, failure modes, retry logic, and observability before features
Enterprise-aware: data residency, compliance, audit trails, and deterministic guardrails are first-class design constraints for you
Background That Maps Well:
3+ years in AI/ML or backend engineering with strong AI exposure
Hands-on production experience with LLM APIs (Anthropic, OpenAI, Cohere)
Experience designing evaluation frameworks: automated evals, regression tests, or LLM-as-judge pipelines
Strong Python; experience with LangChain, LlamaIndex, or similar agentic frameworks
Familiarity with RAG architectures: chunking, embedding models, vector DBs, retrieval quality
About the Company
We’re building systems that continuously validate data and business processes across large enterprise environments. Enterprises run on multiple systems: ERP (e.g., SAP), APIs, internal tools, and data platforms (Databricks, Snowflake, Postgres). Inconsistencies in data - either from external vendors, internal processes, or data migrations break workflows. When AI is layered on top, those failures scale.
We build the layer that:
+ Prevents inconsistent data entry
+ Detects inconsistencies across systems
+ Validates business logic in real time
+ Enables AI-driven workflows to run safely and reliably
We’re already live at a Fortune 100 AI company and launching at Fortune 500 scale companies in healthcare and financial services.
