ADR-0024: Multilingual SQL Intent Detection via LLM Classification

Status: Accepted
Date: 2026-02-02
Deciders: ant11 (Architect), George (Product Owner)
Technical Story: Task-SQL-TOOL — UA intent detection failing for Ukrainian questions

Context

The current detectSqlQueryIntent() method in optimized-ai.service.ts uses hardcoded English keywords to determine if a user question requires SQL generation. This approach fails for:

Ukrainian questions ("skilky hravtsiv?")
German, Spanish, French, and other languages
Semantic equivalents without exact keyword matches

Problem Statement

When a user asks "skilky hravtsiv zareyestruvalosia vchora?" (How many players registered yesterday?), the keyword-based detection returns false, causing the system to skip SQL generation entirely.

Decision Drivers

Scalability — Must support 100+ languages without maintaining keyword lists
Maintainability — Minimize ongoing keyword curation effort
Latency — Keep response time under 3 seconds
Cost — Minimize additional LLM calls
Reliability — Graceful degradation on LLM failures

Considered Options

Option A: Expand Keyword Lists

Add Ukrainian keywords to the existing detection method.

Pros:

Zero latency impact
No additional LLM calls

Cons:

Requires manual curation for each language
Doesn't scale (100+ languages × 50+ keywords = 5000+ entries)
Missing keywords cause silent failures

Option B: Pre-Translation Layer

Translate user question to English, then apply existing keyword detection.

Pros:

Leverages existing English keywords
Works for any language

Cons:

+1 LLM call per question (+200-500ms latency)
Translation errors compound
Additional cost

Option C: LLM Classification (Selected)

Use the LLM's native multilingual capabilities to classify intent directly.

Pros:

Zero keyword maintenance
Works for any language the LLM understands
Single LLM call with classification prompt
Leverages existing SQL generation infrastructure

Cons:

Requires async refactoring of detectSqlQueryIntent()
Need fallback mechanism for LLM failures
Slightly higher complexity

Decision

Option C: LLM Classification — Use LLM-based intent classification with English keyword fallback.

Weighted Evaluation

Criteria	Weight	A (Keywords)	B (Translation)	C (LLM)
Scalability	3	1	2	3
Maintainability	3	1	2	3
Latency	2	3	1	2
Cost	1	3	1	2
Reliability	2	2	2	2
Total		1.8	1.9	2.6

Implementation

Phase 1: Create async classification method

async classifySqlIntent(question: string): Promise<{
  requiresSql: boolean;
  confidence: number;
  reasoning?: string;
}> {
  const prompt = `Classify if this question requires database SQL query.
  
Question: "${question}"

Respond with JSON:
{"requiresSql": true/false, "confidence": 0.0-1.0, "reasoning": "brief explanation"}

Examples of SQL-required questions:
- "How many players registered yesterday?" → true
- "skilky hravtsiv?" → true  
- "What are the top 10 games?" → true
- "Hello, how are you?" → false
- "What can you do?" → false`;

  const result = await this.chatWithModel([
    { role: 'system', content: 'You are a question classifier. Output only valid JSON.' },
    { role: 'user', content: prompt }
  ], {
    model: 'gemini-3-flash-preview', // fallback: gemini-flash-latest (NEVER use <2.5)
    temperature: 0,
    maxTokens: 100
  });
  
  // Parse and return
}

Phase 2: Add fallback mechanism

async classifySqlIntentWithFallback(question: string): Promise<boolean> {
  try {
    const result = await this.classifySqlIntent(question);
    if (result.confidence > 0.7) {
      return result.requiresSql;
    }
    // Low confidence → fall back to keywords
    return this.detectSqlQueryIntentByKeywords(question);
  } catch (error) {
    this.logger.warn('LLM classification failed, using keyword fallback');
    return this.detectSqlQueryIntentByKeywords(question);
  }
}

Phase 3: Refactor callers

Update all callers to use async version:

// Before
if (this.detectSqlQueryIntent(question)) { ... }

// After  
if (await this.classifySqlIntentWithFallback(question)) { ... }

Consequences

Positive

Multilingual support — Works for any language without keyword lists
Semantic understanding — Catches paraphrased SQL questions
Reduced maintenance — No keyword curation required
Future-proof — New languages automatically supported

Negative

Async refactoring — All callers must be updated
LLM dependency — Intent detection now requires LLM availability
Latency — +50-100ms for classification call (mitigated by fast model)

Neutral

English keywords retained — As fallback mechanism
Testing complexity — Need mocks for LLM classification tests

Verification

Unit tests for classifySqlIntent() with multilingual examples
Integration test: Ukrainian question → SQL generated → results returned
Latency benchmark: measure classification overhead
Failure mode test: LLM timeout → fallback to keywords

References

../../agent_ops/OUTBOX/task_sql_tool_implementation.md
Gemini 2.0 Flash Docs