Make your AI assistants reliable: discover a modular routing architecture to reach 99.9% accuracy in function execution.
Introduction
Modern conversational AI systems can generate fluent responses, but creating dependable task-oriented assistants requires more than natural language generation capabilities. While OpenAI's function-calling interface represents significant progress by enabling large language models (LLMs) to select appropriate backend functions and populate JSON arguments, practical implementations typically achieve only 90% tool-selection accuracy. For search and e-commerce applications, each misrouted call results in lost revenue or poor user experience, making this accuracy threshold insufficient.
To achieve 99% reliability, we redesigned the system architecture by limiting the LLM's role to natural language understanding only. A dedicated orchestration layer manages the interaction between the model and business logic through three components:
- Intent router: Deterministic classification rules or specialized models determine which API to invoke.
- Schema reconstruction: Type validators and template-based parameter filling ensure all arguments meet syntactic and semantic requirements before API calls.
- Fallback loops: When required arguments are missing, the system automatically requests clarification from users and retries the operation.
This explicit, modular approach provides granular control, complete traceability, and the ability to correct errors easily. The resulting system maintains conversational fluidity while delivering the reliability of traditional programmatic interfaces.
This article explains the pattern through a complete chatbot implementation supporting three common retail functions:
- Product recommendation: Classical product search through natural language queries.
- Order tracking: Request the status of the order
- Store location: Obtain the address of the nearest store
This framework extends to returns processing, warranty claims, personalized recommendations, and other scenarios requiring conversational interfaces with reliable API integration.
Enhancing e-commerce search through controlled function routing
One of the main advantages of the proposed solution is its extensibility with respect to the existing solution: it is simply plugged between the customer interface (web-page) and the back-end.

Traditional keyword search systems, including those enhanced with vector retrieval, face four fundamental challenges that impact retail performance. Controlled routing addresses each issue systematically.
- Vocabulary mismatch to semantic bridging :Customers rarely use merchants' exact product taxonomies. Terms like "fireplace", "wood-burning stove", and "chimney hearth" may reference identical product categories, but lexical search treats them as distinct entities. By processing all queries through an LLM to extract semantic meaning - producing structured output like {category:"fire_stove"} - and passing this normalized intent to catalog APIs, the system eliminates synonym variations and consistently returns relevant results.
- Long-form query processing to signal extraction: Consider the query: "I'm looking for a gift for my 18-year-old son who likes blue bikes and hiking." Bag-of-words engines assign equal weight to words like "gift", "old", and "likes" alongside critical product attributes. The controlled routing system extracts essential information into structured format, this allows search backends to receive clean, structured filters rather than noisy text input.
- Advanced filter adoption to conversational filtering: While sophisticated users understand faceted search capabilities, most avoid complex filter interfaces. The chatbot naturally collects filter criteria through conversation: "Blue, yes. Prefer city or BMX, not mountain. He rides mostly in town." The system then executes the advanced query automatically. Users experience natural interaction, while merchandising teams receive precise analytics data.
- Dual intent resolution to unified response: When customers ask "iPhone 15 Pro water-resistance depth limits", traditional search must choose between providing specifications or suggesting products. Controlled routing handles both intents simultaneously. The orchestrator identifies both specification lookup and product recommendation needs, performs Retrieval-Augmented Generation (RAG) over documentation to provide "IP68, 6 meters for 30 minutes", and includes a product carousel with available variants. This approach satisfies multiple user needs without requiring navigation between different interfaces.
Layered conversational-routing architecture
The proposed architecture replaces monolithic function-calling approaches with a layered routing pipeline that positions a lightweight, deterministic controller between the LLM and all downstream APIs. Each layer maintains a single, well-defined responsibility - intent detection, parameter completion, or business logic execution - ensuring failures remain isolated, traceable, and correctable.
In this design, the LLM's function is limited to its core strength: natural language interpretation. All decisions regarding tool selection, input completeness validation, and error recovery are managed through explicit, testable code components. This approach produces a conversational system that maintains the natural interaction quality of end-to-end LLM solutions while achieving the high reliability standards required by enterprise applications.
Intent detection layer
“Spot the subject, not the syntax”
Before executing any functional operations, the system must determine the user's actual request. This determination process consists of two stages:
- Conversation condensation: Multiple messages, clarification exchanges, and system prompts are consolidated into concise one (or two) sentence summaries, ensuring subsequent prompts remain brief, cost-effective, and contextually rich. These summaries are cached for efficiency. When users modify specific details ("actually it's order 59-A, not 59-B"), the system updates only the relevant portion of the summary rather than reprocessing the entire conversation thread.
- Intent classification: The condensed summary is processed by a lightweight classifier - implemented through few-shot prompting, small fine-tuned models, or regular expression heuristics for minimum viable products - that outputs a single label from a predefined set such as PRODUCT_SEARCH, ORDER_STATUS, or STORE_LOCATION. The bounded and deterministic nature of this task enables classifier replacement without affecting other system components and supports real-time precision monitoring through production dashboards.
Benefits :
- Accelerated routing: Summarization reduces token requirements and response latency.
- Complete traceability: Intent labels and confidence scores are logged for every interaction turn.
- Maintenance efficiency: Routing errors are corrected by adjusting the classifier component rather than retraining the primary language model.
Request-analysis layer
“Fill in the blanks, politely”
Following intent identification, the system requires complete and valid argument sets for the corresponding API. This layer implements a dedicated sub-pipeline with three sequential stages:
- Use-case-specific extraction: Each intent uses a dedicated extraction prompt or rule set to identify candidate parameters from the conversation summary. The system prioritizes the most recent value for each field, preventing outdated information retention ("blue… no, make that red"). The modular design of these small, single-purpose prompts enables independent iteration, translation, or replacement, achieving true plug-and-play modularity.
- Completeness verification: A schema definition specifies required, optional, and mutually exclusive fields for each API call. When mandatory fields are absent, the system generates automatic follow-up queries: "I can check that order for you - could you share the order number?" This validation loop continues until all requirements are satisfied or the user terminates the interaction.
- Input validation and sanitization: Regular expressions and type validators identify malformed inputs including email addresses, phone numbers, postal codes, and order numbers. Upon validation failure, the system requests corrections using natural language rather than displaying technical error messages. Successfully processed, typed values are transmitted as structured dictionaries or data classes ready for backend consumption.
Benefits :
- Separation of responsibilities: Natural language processing remains within LLM capabilities while data integrity is managed through deterministic code.
- Safe extensibility: Adding new intents requires only creating a single extractor prompt rather than modifying the entire system architecture.
Business logic layer
“Execute and explain”
With intent identification and parameter validation complete, the final layer executes two primary functions efficiently:
- API orchestration: The system routes clean argument objects to appropriate microservices - product catalogs, order management systems, or store locators. This layer manages retries, timeout handling, and authentication requirements centrally, ensuring frontend code never exposes credentials or handles low-level service communication.
- Response generation: Raw JSON payloads from backend services are processed through dedicated templating LLM calls focused exclusively on natural language formatting."Your order 59-A shipped on June 14 and is expected tomorrow". For product search operations, the layer can enhance results with structured formats including markdown tables or carousel components, providing ready-to-use output for any user interface implementation.
Main benefit:
Enhanced observability: All outbound API calls and incoming payloads can be logged, monitored, or replayed without additional LLM prompting, maintaining debugging processes within standard DevOps tooling environments.
Conclusion
Large language models provide unprecedented natural language understanding capabilities, but comprehension alone does not guarantee reliable execution. In critical e-commerce applications, where individual API routing errors can result in lost transactions or failed support commitments, system reliability must approach perfection. The architecture presented shows how both objectives can be achieved simultaneously: delivering the natural, conversational interactions customers expect while maintaining the deterministic, auditable performance engineering teams require.
Through decomposition into three specialized layers (Intent Detection, Request Analysis, and Business Logic) the system restricts LLM functionality to semantic interpretation while delegating all irreversible decisions to verifiable code components. This approach delivers measurable improvements:
- Enhanced accuracy: Tests demonstrates 99.9%+ tool-selection accuracy, compared to approximately 90% with monolithic function-calling approaches.
- Accelerated development: New intents and validation rules can also be deployed easily.
- Operational visibility: Clean logging, confidence scoring, and schema-validated payloads integrate seamlessly with existing DevOps infrastructure.
- Business impact: Complex real-world queries are efficiently resolved into precise search results, order updates, and location services, improving customer satisfaction and revenue generation.
The framework's domain-agnostic design enables broad application. Substituting "product search" with "policy lookup", "ticket creation", or "travel booking" requires no architectural modifications. Any scenario requiring conversational input to API execution benefits from controlled function routing's stabilizing influence on LLM behavior.
When developing conversational features, avoid routing all functionality through single "intelligent" prompts. Instead, leverage models for human-like meaning extraction while employing traditional code for machine-optimized tasks: deterministic decision-making, input validation, and service invocation. This approach produces conversational assistants that feel intuitive while performing predictably, transforming customer inquiries into measurable business outcomes.siness mesurables.