4 min readMohammad Shaker

How Our Analytics Lake Tracks Learning Outcomes (Not Just Clicks)

Unlike most EdTech apps that track clicks and time-on-screen, Alphazed's analytics lake tracks actual learning outcomes: pronunciation improvement, concept mastery transitions, spaced repetition effectiveness, and Bloom's level progression.

Engineering

Quick Answer

Unlike most EdTech apps that track clicks and time-on-screen, Alphazed's analytics lake tracks actual learning outcomes: pronunciation improvement, concept mastery transitions, spaced repetition effectiveness, and Bloom's level progression.

## How Our Analytics Lake Tracks Learning Outcomes (Not Just Clicks) Unlike most EdTech apps that track clicks and time-on-screen, Alphazed's analytics lake tracks actual learning outcomes: pronunciation accuracy improvement over time, concept mastery transitions (beginner → intermediate → advanced), spaced repetition effectiveness (are review sessions reducing forgetting?), and Bloom's Taxonomy level progression. This data drives curriculum improvements and proves to parents that their children are genuinely learning, not just playing. ### Three-Tier Analytics Architecture **Tier 1: Mobile Events** (Real-time from app) When a child completes an exercise, the app sends an event: ```json { "event_type": "attempt_complete", "concept_id": "letter_ba", "exercise_type": "select", "accuracy_score": 0.89, "attempt_number": 3, "session_id": "session_abc123", "timestamp": "2026-03-28T14:35:22Z", "is_correct": true, "response_time_ms": 2400 } ``` **Tier 2: Backend Enrichment** (Context added server-side) Backend augments with user properties: ```json { "...event...", "user_id": "user_456", "age_group": "5-7", "persona": "intermediate", "days_since_signup": 34, "total_practice_minutes": 487, "app_name": "amal", "device_type": "Android", "country": "US" } ``` **Tier 3: Analytics Lake** (Asynchronous, SQL-queryable) ``` Backend sends enriched event → SQS queue (fire-and-forget) ↓ (doesn't wait for analytics) ↓ (user experience unaffected) Kinesis Firehose (batches events every 5 min or 100MB) ↓ S3 (partitioned: s3://alphazed-analytics/amal/2026/03/28/events.parquet) ↓ AWS Glue (crawls S3 every 1 hour, infers schema) ↓ Athena (Presto SQL engine for querying) ↓ Dashboard (real-time parent dashboard + internal analytics) ``` ### Learning-Outcome Metrics We Track **Event Type 1: Attempt Complete** Triggered whenever a child completes an exercise. ```sql SELECT user_id, concept_id, ROUND(AVG(accuracy_score), 2) as avg_accuracy, COUNT(*) as total_attempts, SUM(CASE WHEN is_correct THEN 1 ELSE 0 END) as correct_count, DATE(FROM_UNIXTIME(timestamp / 1000)) as date FROM analytics_lake.attempt_complete WHERE app_name = 'amal' AND concept_id = 'letter_ba' GROUP BY user_id, concept_id, date ORDER BY date DESC ``` Result: "Letter ب: user_456 improved from 72% accuracy (week 1) to 94% (week 3)" **Event Type 2: Concept Mastery Transition** Triggered when a concept changes mastery level (beginner → intermediate). ```json { "event_type": "mastery_transition", "concept_id": "word_kitab", "from_level": "beginner", "to_level": "intermediate", "hlr_half_life_before": 4.0, "hlr_half_life_after": 8.0, "timestamp": "2026-03-20T10:15:00Z" } ``` Tracking mastery transitions reveals curriculum effectiveness: - How many children reach intermediate per concept? - Average time to reach intermediate? - Which concepts are bottlenecks? **Event Type 3: HLR Half-Life Growth** During spaced repetition, we track memory strength: ```sql SELECT user_id, concept_id, DATE(FROM_UNIXTIME(timestamp / 1000)) as date, MAX(hlr_half_life_hours) as max_half_life, COUNT(DISTINCT CASE WHEN is_correct THEN 1 END) as correct_reviews, COUNT(DISTINCT CASE WHEN NOT is_correct THEN 1 END) as incorrect_reviews FROM analytics_lake.hlr_update GROUP BY user_id, concept_id, date ``` Result: "Juz Amma Al-Ikhlas: user_789 achieved 256-hour half-life (2-week stability) after 7 correct reviews" **Event Type 4: Speech Recognition Accuracy Trends** Pronunciation improvement over time: ```sql SELECT user_id, DATE_TRUNC('week', FROM_UNIXTIME(timestamp / 1000)) as week, AVG(similarity_score) as avg_pronunciation_accuracy, APPROX_PERCENTILE(similarity_score, 0.5) as median_accuracy FROM analytics_lake.speech_recognition_result WHERE concept_type = 'letter' GROUP BY user_id, week ORDER BY week DESC ``` Result: "User's pronunciation accuracy improved 18% over 8 weeks of consistent practice" **Event Type 5: Bloom's Taxonomy Progression** Tracking cognitive level advancement: ```json { "event_type": "blooms_level_completion", "concept_id": "word_kitab", "blooms_level_achieved": 4, "user_age_group": "5-7", "time_to_level_days": 14, "attempt_count": 47, "timestamp": "2026-03-25T16:45:00Z" } ``` Tracking: How many children reach Bloom's Level 4 (Analyze)? On average, how long? ### How This Drives Product Decisions **Decision 1: Redesign a Content Byte** - Query: "Which content bytes have >40% incorrect attempts?" - Result: "Word-building exercise for consonant clusters has 52% error rate" - Action: Content team redesigns the exercise (more scaffolding, slower progression) - Validation: Re-run query 2 weeks later, error rate should drop to <25% **Decision 2: Adjust Exercise Mix** - Query: "Which exercise types have highest engagement + learning outcome?" - Result: Physics games have 30% higher engagement but physics games have 15% higher accuracy improvement - Action: Increase physics game frequency in adaptive lessons **Decision 3: Identify Struggling Concepts** - Query: "Concepts where >30% of users never reach intermediate level?" - Result: "Emphasis consonants (ص, ض, ط, ظ) are consistently difficult" - Action: Create supplementary content (more pronunciation drills, slower progression) ### Comparing to Competitors | Metric | Duolingo | Amal/Thurayya | |--------|----------|---| | **Tracks clicks** | ✓ XP, streak | ✓ (but secondary) | | **Tracks accuracy** | ✗ | ✓ Per-concept | | **Tracks memory decay** | ✗ | ✓ HLR half-life | | **Tracks learning outcomes** | ✗ | ✓ Mastery transitions | | **Tracks pronunciation** | ✗ | ✓ Speech accuracy trends | | **Data-driven product decisions** | Engagement focus | Learning focus | ### FAQ **Q: Is my child's data in the analytics lake?** A: Yes, anonymized. We track learning metrics (not personally identifiable). You can see your child's metrics in the parent dashboard; researchers cannot see individual children's names. **Q: How long is data retained?** A: Live data (past 12 months): in Athena for querying. Historical data: archived to S3 for 7 years (compliance). Retention is configurable per data type. **Q: Can I export my child's learning analytics?** A: Yes. Dashboard has an "Export Report" button that generates a PDF with personalized learning outcomes for the past 3 months.

Related Articles