Our client is seeking a
Senior AI Engineer responsible for building and scaling agentic AI and LLM-powered systems in production, with a focus on reliability, safety, governance, automation, and measurable business impact. (Hybrid, Toronto, 3 days onsite)
This role will design, deploy, and optimize production-grade agentic AI solutions that integrate with operational and engineering ecosystems. The position focuses on tool-calling agents, LLM orchestration, RAG pipelines, evaluation frameworks, and governance controls to improve diagnostics, automation, and operational resilience.
Required skills and experience
- 5+ years of software development experience in one or more languages such as Python, C/C++, Go, or Java.
- Strong hands-on experience building and maintaining large-scale Python applications.
- 3+ years of experience designing, architecting, testing, and deploying production ML systems.
- Experience with model deployment and serving, evaluation and monitoring, data processing pipelines, and model fine-tuning workflows.
- Practical experience with Large Language Models, including API integration, prompt engineering, fine-tuning or adaptation, and applications using RAG and tool-using agents.
- Experience with vector retrieval, function calling, and secure tool execution.
- Understanding of commercial and open-source LLMs and their capabilities, including models such as OpenAI, Gemini, Llama, Qwen, and Claude.
- Strong foundation in applied statistics, machine learning concepts, algorithms, and data structures.
- Excellent analytical and problem-solving skills with a strong sense of ownership and urgency.
- Ability to communicate complex ideas clearly and collaborate effectively across global teams.
Preferred skills
- Proficiency building and operating cloud-based infrastructure, ideally within AWS.
- Experience with containerized services such as ECS and EKS.
- Experience with serverless architecture such as Lambda.
- Familiarity with data services including S3, DynamoDB, and Redshift.
- Experience with orchestration tools such as Step Functions.
- Experience with model serving platforms such as SageMaker.
- Experience with infrastructure-as-code tools such as Terraform and CloudFormation.
Technology environment
Python, C/C++, Go, Java, LLMs, RAG, vector retrieval, function calling, secure tool execution, AWS, ECS, EKS, Lambda, S3, DynamoDB, Redshift, Step Functions, SageMaker, Terraform, CloudFormation
Key responsibilities
- Design and build agentic AI systems using tool-calling agents, retrieval, structured reasoning, and secure action execution aligned to MCP protocol.
- Develop and enforce guardrails for safety, compliance, policy adherence, and least-privilege access.
- Productionize LLM-based solutions by building evaluation frameworks for open-source and foundational models.
- Implement retrieval pipelines, prompt engineering strategies, response validation, and self-correction mechanisms for production use cases.
- Integrate AI agents with observability, incident management, and deployment platforms to support automated diagnostics, runbook execution, remediation, and post-incident summarization with full traceability.
- Partner with production engineers and application teams to translate operational challenges into agentic AI roadmaps and business-aligned solutions.
- Define objective functions tied to reliability, risk reduction, cost optimization, and measurable operational outcomes.
- Build safety and governance controls including validator models, adversarial testing, policy checks, deterministic fallbacks, circuit breakers, and rollback strategies.
- Establish continuous evaluation practices to measure usefulness, correctness, reliability, and risk.
- Optimize system performance, latency, and cost through prompt engineering, context management, caching, model routing, and model distillation.
- Leverage batching, streaming, and parallel tool-calling approaches to meet production SLOs under real-world load.
- Build and maintain RAG pipelines, including domain knowledge curation, data quality validation, feedback loops, milestone frameworks, and knowledge freshness processes.
- Drive strong engineering standards through design reviews, experimentation rigor, and mentorship on agent architecture, evaluation methodologies, and safe deployment patterns.
#LI-TF1