5 min readMohammad Shaker

Arabic Diacritics Done Right: How Amal Handles Tashkeel, Shadda, and Hamza

Amal handles the full complexity of Arabic diacritics: 8 tashkeel marks, 4 alef variants, 3 hamza variants, and Lam-Alef ligatures. The app's speech recognition, text rendering, and similarity scoring all treat diacritized Arabic differently from undiacritized Arabic.

Engineering

Quick Answer

Amal handles the full complexity of Arabic diacritics: 8 tashkeel marks, 4 alef variants, 3 hamza variants, and Lam-Alef ligatures. The app's speech recognition, text rendering, and similarity scoring all treat diacritized Arabic differently from undiacritized Arabic.

## Arabic Diacritics Done Right: How Amal Handles Tashkeel, Shadda, and Hamza Amal handles the full complexity of Arabic diacritics: 8 tashkeel marks (fatha, damma, kasra, shadda, sukun, fathatan, dammatan, kasratan), 4 alef variants (standard, madda, hamza above, hamza below, wasla), 3 hamza variants (isolated, on waw, on ya), and Lam-Alef ligatures. The app's speech recognition, text rendering, and similarity scoring all treat diacritized Arabic ("كَتَبَ") differently from undiacritized Arabic ("كتب") — a critical distinction most Arabic learning apps ignore. ### Why Diacritics Matter for Learning **The Ambiguity Problem** Arabic without diacritics is ambiguous: - "كتب" can mean: - "kataba" (he wrote) — past tense - "kutub" (books) — plural noun - "kutiba" (it was written) — passive voice All are spelled identically without diacritics. Diacritics remove ambiguity. **The Learning Progression** 1. **Beginner**: Learn to read WITH diacritics (easy — vowels are marked) 2. **Intermediate**: Practice WITH diacritics until automatic 3. **Advanced**: Gradually remove diacritics, reading becomes harder 4. **Fluent**: Read without diacritics fluently (native-level reading) Most Arabic learning apps skip step 1 — they don't teach diacritics at all, or strip them away. This teaches bad habits. Amal's progression is scientifically correct. ### Our Unicode-Level Implementation **The Diacritical Marks** (8 total) ```dart // lib/src/utils/arabic_extension.dart class ArabicExtension { static const Map tashkeelMarks = { 'FATHA': '\u064E', // َ (vowel 'a') 'DAMMA': '\u064F', // ُ (vowel 'u') 'KASRA': '\u0650', // ِ (vowel 'i') 'SUKUN': '\u0652', // ْ (no vowel) 'SHADDA': '\u0651', // ّ (doubled letter) 'FATHATAN': '\u064B', // ً (tanween 'an') 'DAMMATAN': '\u064C', // ٌ (tanween 'un') 'KASRATAN': '\u064D', // ٍ (tanween 'in') }; static const Map alefVariants = { 'ALEF_STANDARD': 'ا', // ا 'ALEF_WITH_MADDA': 'آ', // آ (elongated) 'ALEF_WITH_HAMZA_ABOVE': 'أ', // أ 'ALEF_WITH_HAMZA_BELOW': 'إ', // إ 'ALEF_WASLA': 'ٰ', // ٰ (connecting alef) }; static const Map hamzaVariants = { 'HAMZA_ISOLATED': 'ء', // Standalone hamza 'HAMZA_ON_WAW': 'ؤ', // Hamza on waw (و + hamza) 'HAMZA_ON_YEH': 'ئ', // Hamza on yeh (ي + hamza) }; } ``` **Quranic Diacritics and Uthmani Stops** For Thurayya, we support Quranic-specific marks: ```dart static const Map quranicMarks = { 'STOP_FULL': 'ۖ', // Full stop (‖) 'STOP_HALF': 'ۗ', // Half stop 'STOP_QUA': 'ۙ', // Qua stop 'STOP_NECESSARY': 'ۚ', // Necessary stop 'TAJWEED_ELONGATION': '۝', // Elongation indicator }; ``` ### Diacritic-Aware Speech Recognition **Context Biasing with Diacritics** When a child is learning "كَتَبَ" (he wrote, past tense), we bias speech recognition toward that exact vocalization: ```python # src/services/stt_client.py def recognize_with_diacritical_context(audio_bytes, expected_text): # expected_text = "كَتَبَ" (with diacritics) # Create speech context hint speech_context = { 'phrases': [expected_text], 'boost': 20.0 # High boost for expected text } # Send to Google Cloud STT response = google_stt_client.recognize( audio=audio_bytes, language_code='ar-SA', speech_contexts=[speech_context] ) # Result: Google STT is biased toward "kataba" pronunciation return response ``` **Diacritic-Aware Similarity Scoring** Similarity scoring distinguishes diacritized from undiacritized: ```python def compare_pronunciations(expected, actual): """ expected: "كَتَبَ" (with diacritics) actual: "كتب" (child's attempt, possibly undiacritized) """ # Strip diacritics for coarse comparison expected_base = strip_diacritics(expected) # "كتب" actual_base = strip_diacritics(actual) # "كتب" # Base similarity (ignoring diacritics) base_similarity = string_similarity(expected_base, actual_base) # 1.0 (perfect) # Diacritical bonus (if child's attempt includes diacritics) diacritic_bonus = 0.0 if has_diacritics(actual): diacritic_accuracy = diacritics_match_ratio(expected, actual) diacritic_bonus = diacritic_accuracy * 0.15 # Up to +15% for correct diacritics # Final score final_score = min(base_similarity + diacritic_bonus, 1.0) return { 'base_score': base_similarity, 'diacritic_bonus': diacritic_bonus, 'final_score': final_score, 'feedback': 'Great! Pronunciation is perfect. Next, practice the diacritical marks.' } ``` This means: - Child says "كتب" (undiacritized) → 85-90% score (correct base, missing diacritics) - Child says "كَتَبَ" (fully diacritized) → 98%+ score (perfect) - Progression is clear: first master base pronunciation, then add diacritical subtlety ### RTL Rendering Challenges **Text Direction Management** ```dart // lib/src/screens/lesson_screen.dart Column( children: [ Directionality( textDirection: TextDirection.rtl, // For Arabic text child: Text( 'كَتَبَ', textAlign: TextAlign.right, // Right-aligned for RTL style: TextStyle( fontFamily: 'IBMPlexSansArabic', fontSize: 36, height: 1.8, // Extra line height for diacritics ), ), ), // English instructions below Directionality( textDirection: TextDirection.ltr, // For English child: Text( 'Pronounce: "he wrote"', textAlign: TextAlign.left, // Left-aligned for LTR ), ), ], ) ``` **Connected Letter Shaping** Arabic letters change form depending on position: - Isolated: "ك" (Kaf) - Initial: "كَـــ" (Kaf at start of word) - Medial: "ـــكَـــ" (Kaf in middle) - Final: "ـــكَ" (Kaf at end) The IBMPlexSansArabic font handles shaping automatically, but we need proper Unicode sequences: ```dart // Correct: Uses Unicode joining characters String word = 'ك' + '\u0640' + 'ت' + '\u0640' + 'ب'; // Kashida (extension character) // Incorrect: Direct concatenation String word = 'ك' + 'ت' + 'ب'; // May not shape correctly on all devices ``` ### Bidirectional Text Mixing When English and Arabic appear together: ```dart RichText( textDirection: TextDirection.rtl, // Overall RTL text: TextSpan( children: [ TextSpan(text: 'means ', style: englishStyle), // LTR TextSpan(text: 'كتاب', style: arabicStyle), // RTL TextSpan(text: ' (book)', style: englishStyle), // LTR ], ), ) ``` Result: "means كتاب (book)" displayed with correct bidirectional flow. ### FAQ **Q: Why force diacritics on beginner learners? Doesn't that make it harder?** A: Initially, yes. But learning with diacritics creates stronger letter-sound associations. Research shows diacritical learning produces faster fluency. After mastery with diacritics, reading without them is natural progression. **Q: What if my child's keyboard doesn't support typing diacritics?** A: The app never asks children to type diacritics. Recognition and pronunciation are speech-based. Only adults (teachers, content creators) need to input diacritics, and they use specialized Arabic keyboards. **Q: Does Amal support non-standard diacritical combinations?** A: We support all Unicode-standardized combinations. Rare or custom combinations may not render correctly, but standard Quranic and modern Arabic are fully supported.

Related Articles