ADR-0024: Multilingual SQL Intent Detection via LLM Classification

Status: Accepted
Date: 2026-02-02
Deciders: ant11 (Architect), George (Product Owner)
Technical Story: Task-SQL-TOOL — UA intent detection failing for Ukrainian questions

Context

The current detectSqlQueryIntent() method in optimized-ai.service.ts uses hardcoded English keywords to determine if a user question requires SQL generation. This approach fails for:

  • Ukrainian questions ("skilky hravtsiv?")
  • German, Spanish, French, and other languages
  • Semantic equivalents without exact keyword matches

Problem Statement

When a user asks "skilky hravtsiv zareyestruvalosia vchora?" (How many players registered yesterday?), the keyword-based detection returns false, causing the system to skip SQL generation entirely.

Decision Drivers

  1. Scalability — Must support 100+ languages without maintaining keyword lists
  2. Maintainability — Minimize ongoing keyword curation effort
  3. Latency — Keep response time under 3 seconds
  4. Cost — Minimize additional LLM calls
  5. Reliability — Graceful degradation on LLM failures

Considered Options

Option A: Expand Keyword Lists

Add Ukrainian keywords to the existing detection method.

Pros:

  • Zero latency impact
  • No additional LLM calls

Cons:

  • Requires manual curation for each language
  • Doesn't scale (100+ languages × 50+ keywords = 5000+ entries)
  • Missing keywords cause silent failures

Option B: Pre-Translation Layer

Translate user question to English, then apply existing keyword detection.

Pros:

  • Leverages existing English keywords
  • Works for any language

Cons:

  • +1 LLM call per question (+200-500ms latency)
  • Translation errors compound
  • Additional cost

Option C: LLM Classification (Selected)

Use the LLM's native multilingual capabilities to classify intent directly.

Pros:

  • Zero keyword maintenance
  • Works for any language the LLM understands
  • Single LLM call with classification prompt
  • Leverages existing SQL generation infrastructure

Cons:

  • Requires async refactoring of detectSqlQueryIntent()
  • Need fallback mechanism for LLM failures
  • Slightly higher complexity

Decision

Option C: LLM Classification — Use LLM-based intent classification with English keyword fallback.

Weighted Evaluation

CriteriaWeightA (Keywords)B (Translation)C (LLM)
Scalability3123
Maintainability3123
Latency2312
Cost1312
Reliability2222
Total1.81.92.6

Implementation

Phase 1: Create async classification method

async classifySqlIntent(question: string): Promise<{
  requiresSql: boolean;
  confidence: number;
  reasoning?: string;
}> {
  const prompt = `Classify if this question requires database SQL query.
  
Question: "${question}"

Respond with JSON:
{"requiresSql": true/false, "confidence": 0.0-1.0, "reasoning": "brief explanation"}

Examples of SQL-required questions:
- "How many players registered yesterday?" → true
- "skilky hravtsiv?" → true  
- "What are the top 10 games?" → true
- "Hello, how are you?" → false
- "What can you do?" → false`;

  const result = await this.chatWithModel([
    { role: 'system', content: 'You are a question classifier. Output only valid JSON.' },
    { role: 'user', content: prompt }
  ], {
    model: 'gemini-3-flash-preview', // fallback: gemini-flash-latest (NEVER use <2.5)
    temperature: 0,
    maxTokens: 100
  });
  
  // Parse and return
}

Phase 2: Add fallback mechanism

async classifySqlIntentWithFallback(question: string): Promise<boolean> {
  try {
    const result = await this.classifySqlIntent(question);
    if (result.confidence > 0.7) {
      return result.requiresSql;
    }
    // Low confidence → fall back to keywords
    return this.detectSqlQueryIntentByKeywords(question);
  } catch (error) {
    this.logger.warn('LLM classification failed, using keyword fallback');
    return this.detectSqlQueryIntentByKeywords(question);
  }
}

Phase 3: Refactor callers

Update all callers to use async version:

// Before
if (this.detectSqlQueryIntent(question)) { ... }

// After  
if (await this.classifySqlIntentWithFallback(question)) { ... }

Consequences

Positive

  • Multilingual support — Works for any language without keyword lists
  • Semantic understanding — Catches paraphrased SQL questions
  • Reduced maintenance — No keyword curation required
  • Future-proof — New languages automatically supported

Negative

  • Async refactoring — All callers must be updated
  • LLM dependency — Intent detection now requires LLM availability
  • Latency — +50-100ms for classification call (mitigated by fast model)

Neutral

  • English keywords retained — As fallback mechanism
  • Testing complexity — Need mocks for LLM classification tests

Verification

  1. Unit tests for classifySqlIntent() with multilingual examples
  2. Integration test: Ukrainian question → SQL generated → results returned
  3. Latency benchmark: measure classification overhead
  4. Failure mode test: LLM timeout → fallback to keywords

References