📊 ExplainGrade Test Results

Performance Metrics & Dataset Validation

Last Updated: Loading...
🔗
Pearson Correlation
Loading...
Measures semantic understanding quality
Accuracy
Loading...
% of correct predictions (±0.5 threshold)
📏
Mean Absolute Error
Loading...
Average prediction deviation
📊
R-Squared (R²)
Loading...
Variance explained fraction
🎓
Grade Agreement
Loading...
Same grade level matching
📈
RMSE
Loading...
Root mean squared error
📊 Prediction Distribution
🎯 Predicted vs Expected

📊 Test Datasets

Dataset Size Domain Status
SQuAD (Stanford) 100K+ Q&A pairs Wikipedia reading comprehension ✓ Ready
Mohler Dataset 2,442 grades Short answer grading (Software Engineering) ✓ Ready
Combined Test 2,500+ samples Multi-domain validation ✓ Ready

📖 Metric Interpretations

🔗 Pearson Correlation Coefficient (-1 to 1)
What it measures: Linear relationship between predicted and expected scores.
Values: Loading...
Interpretation: Values > 0.7 indicate strong agreement with expected grading patterns.
✓ Accuracy (%)
What it measures: Percentage of predictions within acceptable threshold (±0.5 points).
Calculation: (Correct Predictions / Total) × 100
Target: >85% accuracy for production deployment.
📏 Mean Absolute Error (MAE)
What it measures: Average absolute difference between predicted and expected scores.
Unit: Points (0-10 scale)
Target: <0.5 points for high-quality grading.
📊 R-Squared (0 to 1)
What it measures: Proportion of variance in expected scores explained by predictions.
Values: 0.5+ is acceptable, 0.7+ is strong, 0.9+ is excellent.
Interpretation: R² = 0.85 means 85% of grade variation is properly captured.
🎓 Grade Agreement (%)
What it measures: Percentage of predictions in same letter grade (A/B/C/D/F) as expected.
Grades: F(0-2), D(2-4), C(4-6), B(6-8), A(8-10)
Target: >90% for consistent grade assignment.
📈 RMSE (Root Mean Squared Error)
What it measures: Square root of average squared differences (penalizes large errors).
vs MAE: More sensitive to outliers than MAE.
Target: <0.6 points for robust grading.

⚡ Performance Benchmarks

Metric ExplainGrade Industry Standard Status
Pearson Correlation Computing... 0.70+
Accuracy Computing... 85%+
MAE Computing... <0.5
R-Squared Computing... 0.70+