Release Notes
Stay up to date with every InceptBench release — new features, improvements, and fixes.
InceptBench Release Notes
v2.3.6 — Integrity Checks & AP Content Support
March 17, 2026
New Features
- Added
integrity_checkmetric to detect prompt injection and score manipulation attempts in submitted content. - Added support to evaluate AP (Advanced Placement) content.
Bug Fixes
- Fixed an issue causing blank PNGs to be generated when converting SVGs to PNG format.
- Fixed substandard matching logic that was incorrectly aligning content to the wrong curriculum standards.
v2.3.5 — Evaluator Accuracy & LLM Consolidation
March 10, 2026
Improvements
- Migrated to Gemini 3, improving response quality and evaluation consistency.
- Consolidated LLM factory for cleaner model management across evaluators.
- Integrated Langfuse for better observability and tracing of evaluation runs.
Bug Fixes
- Fixed evaluator image analysis incorrectly counting visual elements, leading to inaccurate scores.
- Fixed difficulty alignment misassessment where content was being assigned incorrect difficulty bands.
- Fixed curriculum alignment misinterpretation causing content to be mapped to the wrong standards.
- Fixed SVG rendering issues affecting image-based evaluation accuracy.
v2.3.3 — Curriculum Versioning & Image Evaluation
Feb 10, 2026
New Features
- Added support for versioning in the curriculum API, enabling callers to pin evaluations to a specific curriculum version.
Improvements
- Evaluator now more strictly follows curriculum metadata, reducing drift between question content and expected standards alignment.
- Significant improvements to image evaluation accuracy and consistency.
v2.3.2 — Content Type Enforcement & Image Evaluation
February 2026
New Features
- Evaluator now penalizes when generated content type doesn’t match the requested type (e.g. request is fill-in-the-blank but generated content is MCQ).
- Evaluator now penalizes when an image is expected in the content but not present.
Improvements
- Improved image evaluation using Gemini Agentic for more accurate visual content analysis.
- Reduced hallucinations in curriculum assessment boundary detection.
v2.3.0 — Evaluator Flexibility
Jan 10, 2026
Improvements
- Major evaluator improvements including enhanced flexibility around curriculum stems, allowing for more natural question variations while maintaining alignment with standards.