ADR-0024: Multilingual SQL Intent Detection via LLM Classification
ADR-0024: Multilingual SQL Intent Detection via LLM Classification
Status: Accepted
Date: 2026-02-02
Deciders: ant11 (Architect), George (Product Owner)
Technical Story: Task-SQL-TOOL — UA intent detection failing for Ukrainian questions
Context
The current detectSqlQueryIntent() method in optimized-ai.service.ts uses hardcoded English keywords to determine if a user question requires SQL generation. This approach fails for:
- Ukrainian questions ("skilky hravtsiv?")
- German, Spanish, French, and other languages
- Semantic equivalents without exact keyword matches
Problem Statement
When a user asks "skilky hravtsiv zareyestruvalosia vchora?" (How many players registered yesterday?), the keyword-based detection returns false, causing the system to skip SQL generation entirely.
Decision Drivers
- Scalability — Must support 100+ languages without maintaining keyword lists
- Maintainability — Minimize ongoing keyword curation effort
- Latency — Keep response time under 3 seconds
- Cost — Minimize additional LLM calls
- Reliability — Graceful degradation on LLM failures
Considered Options
Option A: Expand Keyword Lists
Add Ukrainian keywords to the existing detection method.
Pros:
- Zero latency impact
- No additional LLM calls
Cons:
- Requires manual curation for each language
- Doesn't scale (100+ languages × 50+ keywords = 5000+ entries)
- Missing keywords cause silent failures
Option B: Pre-Translation Layer
Translate user question to English, then apply existing keyword detection.
Pros:
- Leverages existing English keywords
- Works for any language
Cons:
- +1 LLM call per question (+200-500ms latency)
- Translation errors compound
- Additional cost
Option C: LLM Classification (Selected)
Use the LLM's native multilingual capabilities to classify intent directly.
Pros:
- Zero keyword maintenance
- Works for any language the LLM understands
- Single LLM call with classification prompt
- Leverages existing SQL generation infrastructure
Cons:
- Requires async refactoring of
detectSqlQueryIntent() - Need fallback mechanism for LLM failures
- Slightly higher complexity
Decision
Option C: LLM Classification — Use LLM-based intent classification with English keyword fallback.
Weighted Evaluation
| Criteria | Weight | A (Keywords) | B (Translation) | C (LLM) |
|---|---|---|---|---|
| Scalability | 3 | 1 | 2 | 3 |
| Maintainability | 3 | 1 | 2 | 3 |
| Latency | 2 | 3 | 1 | 2 |
| Cost | 1 | 3 | 1 | 2 |
| Reliability | 2 | 2 | 2 | 2 |
| Total | 1.8 | 1.9 | 2.6 |
Implementation
Phase 1: Create async classification method
async classifySqlIntent(question: string): Promise<{
requiresSql: boolean;
confidence: number;
reasoning?: string;
}> {
const prompt = `Classify if this question requires database SQL query.
Question: "${question}"
Respond with JSON:
{"requiresSql": true/false, "confidence": 0.0-1.0, "reasoning": "brief explanation"}
Examples of SQL-required questions:
- "How many players registered yesterday?" → true
- "skilky hravtsiv?" → true
- "What are the top 10 games?" → true
- "Hello, how are you?" → false
- "What can you do?" → false`;
const result = await this.chatWithModel([
{ role: 'system', content: 'You are a question classifier. Output only valid JSON.' },
{ role: 'user', content: prompt }
], {
model: 'gemini-3-flash-preview', // fallback: gemini-flash-latest (NEVER use <2.5)
temperature: 0,
maxTokens: 100
});
// Parse and return
}
Phase 2: Add fallback mechanism
async classifySqlIntentWithFallback(question: string): Promise<boolean> {
try {
const result = await this.classifySqlIntent(question);
if (result.confidence > 0.7) {
return result.requiresSql;
}
// Low confidence → fall back to keywords
return this.detectSqlQueryIntentByKeywords(question);
} catch (error) {
this.logger.warn('LLM classification failed, using keyword fallback');
return this.detectSqlQueryIntentByKeywords(question);
}
}
Phase 3: Refactor callers
Update all callers to use async version:
// Before
if (this.detectSqlQueryIntent(question)) { ... }
// After
if (await this.classifySqlIntentWithFallback(question)) { ... }
Consequences
Positive
- Multilingual support — Works for any language without keyword lists
- Semantic understanding — Catches paraphrased SQL questions
- Reduced maintenance — No keyword curation required
- Future-proof — New languages automatically supported
Negative
- Async refactoring — All callers must be updated
- LLM dependency — Intent detection now requires LLM availability
- Latency — +50-100ms for classification call (mitigated by fast model)
Neutral
- English keywords retained — As fallback mechanism
- Testing complexity — Need mocks for LLM classification tests
Verification
- Unit tests for
classifySqlIntent()with multilingual examples - Integration test: Ukrainian question → SQL generated → results returned
- Latency benchmark: measure classification overhead
- Failure mode test: LLM timeout → fallback to keywords
References
../../agent_ops/OUTBOX/task_sql_tool_implementation.md- Gemini 2.0 Flash Docs