InceptBench Evaluators & API
One unified evaluator for questions, quizzes, reading passages, and articles.
Automatically classifies content and routes to specialized assessment methods.
What Gets Evaluated?
InceptBench automatically classifies your content and evaluates it across 8-12 quality dimensions depending on content type:
Questions
Quizzes
Reading Passages
Every evaluation returns:
- Content type — Automatically classified (question, quiz, fiction_reading, nonfiction_reading, article, other)
- Overall rating — Qualitative rating (SUPERIOR, ACCEPTABLE, INFERIOR)
- Overall score (0.0-1.0) — Holistic quality assessment
- Dimension scores — Individual scores with reasoning for each metric
- Suggested improvements — Actionable recommendations
Quickstart
Evaluate educational content in seconds with our evaluator:
API Usage
curl -X POST "https://api.inceptbench.com/evaluate" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
--data '{
"generated_content": [
{
"id": "q1",
"content": "What is 2+2? A) 3 B) 4 C) 5 D) 6"
}
]
}'
Note: To obtain an API_KEY, please reach out to the InceptBench team.
Input Format
InceptBench uses a single unified input format. The content field accepts any format — plain text, JSON object, or structured data. The evaluator automatically classifies and routes your content.
Content Item Schema
{
"generated_content": [
{
"id": "optional-unique-id",
"curriculum": "common_core",
"request": {
"grade": "7",
"subject": "mathematics",
"type": "mcq",
"difficulty": "medium",
"locale": "en-US",
"skills": {
"lesson_title": "Congruent and Similar Triangles",
"substandard_id": "CCSS.MATH.CONTENT.7.G.A.1+3"
},
"instructions": "A real-world problem involving congruent and similar triangles"
},
"content": "Your content here (string, JSON object, or any format)"
}
],
"curriculum_version": "1.2"
}
Top-level Fields
| Field | Required | Default | Description |
|---|---|---|---|
generated_content | Yes | — | Array of content items to evaluate (1-100 items) |
curriculum_version | No | Latest | Curriculum version to use (e.g., "1.2"). If not specified, uses the latest version. |
Content Item Fields
| Field | Required | Default | Description |
|---|---|---|---|
content | Yes | — | Content to evaluate. Accepts any format: plain text, JSON object, or structured data |
id | No | Auto-generated | Unique identifier for the content item |
curriculum | No | common_core | Curriculum for alignment evaluation |
request | No | null | Optional metadata about generation request (see below) |
Request Metadata Fields (all optional)
| Field | Description | Example Values |
|---|---|---|
grade | Grade level | "K", "1", "7", "12" |
subject | Subject area | "mathematics", "english", "science" |
type | Content type hint | "mcq", "fill-in", "article", "quiz" |
difficulty | Difficulty level | "easy", "medium", "hard" |
locale | Language-region code | "en-US", "en-AE", "ar-AE" |
skills | Skills information | JSON object or string with lesson/standard info |
instructions | Generation prompt | The instruction/prompt used to generate this content |
content field has no enforced schema. You can pass plain text like "What is 2+2?" or structured JSON with question, answer, options, etc. InceptBench will automatically parse and evaluate whatever format you provide.Content Examples
Here are examples of different content types. These are examples, not required schemas — you can structure your content however you prefer.
Multiple Choice Question
{
"generated_content": [
{
"id": "mcq-example",
"request": {
"grade": "3",
"subject": "math",
"type": "mcq",
"difficulty": "easy"
},
"content": {
"question": "Which figure shows equal parts?",
"answer": "A",
"answer_explanation": "Only A has all equal parts",
"answer_options": [
{ "key": "A", "text": "Figure A" },
{ "key": "B", "text": "Figure B" },
{ "key": "C", "text": "Figure C" },
{ "key": "D", "text": "Figure D" }
]
}
}
]
}
Plain Text Question
{
"generated_content": [
{
"content": "What is the value of x in 3x + 7 = 22? A) 3 B) 4 C) 5 D) 6. The answer is C because subtracting 7 from both sides gives 3x = 15, then dividing by 3 gives x = 5."
}
]
}
Reading Passage
{
"generated_content": [
{
"id": "reading-1",
"request": {
"grade": "5",
"subject": "english",
"type": "article"
},
"content": "# The Water Cycle\n\nWater is always moving on Earth. It goes from oceans to clouds to rain and back again. This is called the water cycle...\n\n## Questions\n\n1. What powers the water cycle?\n2. Where does most evaporation occur?"
}
]
}
Quiz (Multiple Questions)
{
"generated_content": [
{
"id": "quiz-1",
"request": {
"grade": "4",
"subject": "math",
"type": "quiz"
},
"content": {
"title": "Fractions Assessment",
"questions": [
{
"question": "What is 1/2 + 1/4?",
"answer": "3/4",
"options": ["1/2", "3/4", "1/4", "1"]
},
{
"question": "Which fraction is larger: 2/3 or 3/5?",
"answer": "2/3",
"options": ["2/3", "3/5", "They are equal"]
}
]
}
}
]
}
Images in Content
Images are automatically detected from the content string. No separate image_url field is required.
Supported formats:
| Format | Example |
|---|---|
| Direct URL | https://example.com/image.png |
| Markdown |  |
| HTML | <img src="https://example.com/image.png"> |
Example with image:
{
"generated_content": [
{
"content": "Look at the triangle below:\n\n\n\nWhat is the area of the triangle if the base is 6 cm and height is 4 cm?"
}
]
}
When images are detected:
- They are sent to vision-capable models for analysis
- Object counting is performed automatically
- Visual properties are analyzed for educational relevance
Example Response
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"evaluations": {
"q1": {
"content_type": "question",
"overall_rating": "ACCEPTABLE",
"overall": {
"score": 0.85,
"reasoning": "Well-constructed question with clear answer options and appropriate difficulty.",
"suggested_improvements": "Consider adding a visual stimulus for enhanced engagement."
},
"factual_accuracy": {
"score": 1.0,
"reasoning": "The mathematical content and answer are correct.",
"suggested_improvements": null
},
"educational_accuracy": {
"score": 1.0,
"reasoning": "Fulfills educational intent for the target grade level.",
"suggested_improvements": null
},
"localization_quality": {
"score": 1.0,
"reasoning": "Content is culturally and linguistically appropriate for the target audience.",
"suggested_improvements": null
},
"curriculum_alignment": {
"score": 1.0,
"reasoning": "Aligns well with CCSS.MATH.3.OA standards.",
"suggested_improvements": null
},
"clarity_precision": {
"score": 1.0,
"reasoning": "Clear wording appropriate for grade 3 students.",
"suggested_improvements": null
},
"specification_compliance": {
"score": 1.0,
"reasoning": "Meets format and structure requirements.",
"suggested_improvements": null
},
"reveals_misconceptions": {
"score": 1.0,
"reasoning": "Distractors effectively target common misconceptions.",
"suggested_improvements": null
},
"difficulty_alignment": {
"score": 1.0,
"reasoning": "Difficulty matches intended grade level.",
"suggested_improvements": null
},
"passage_reference": {
"score": 1.0,
"reasoning": "N/A - no passage context required.",
"suggested_improvements": null
},
"distractor_quality": {
"score": 1.0,
"reasoning": "Distractors are plausible and educationally meaningful.",
"suggested_improvements": null
},
"stimulus_quality": {
"score": 0.0,
"reasoning": "No visual stimulus provided.",
"suggested_improvements": "Consider adding a visual diagram to enhance engagement."
},
"mastery_learning_alignment": {
"score": 1.0,
"reasoning": "Supports mastery learning principles.",
"suggested_improvements": null
}
}
},
"evaluation_time_seconds": 12.34,
"inceptbench_version": "2.3.0",
"curriculum_version": "1.2",
"failed_items": null
}
Content Types & Metrics
The type field in your request can be any string (e.g., "mcq", "fill-in", "article"). The evaluator will automatically classify your content and map it to one of these internal types:
| Content Type | Description | Key Metrics |
|---|---|---|
question | Single educational question | curriculum_alignment, clarity_precision, specification_compliance, reveals_misconceptions, difficulty_alignment, passage_reference, distractor_quality, stimulus_quality, mastery_learning_alignment |
quiz | Multiple questions together | concept_coverage, difficulty_distribution, non_repetitiveness, test_preparedness, answer_balance |
fiction_reading | Fictional narrative passages | reading_level_match, length_appropriateness, topic_focus, engagement, accuracy_and_logic, question_quality, stimulus_quality |
nonfiction_reading | Informational passages | reading_level_match, length_appropriateness, topic_focus, engagement, accuracy_and_logic, question_quality, stimulus_quality |
article | Instructional educational articles | curriculum_alignment, teaching_quality, worked_examples, practice_problems, follows_direct_instruction, stimulus_quality, diction_and_sentence_structure |
other | General educational content | educational_value, direct_instruction_alignment, content_appropriateness, clarity_and_organization, engagement, stimulus_quality |
Universal Metrics (all content types)
| Metric | Type | Description |
|---|---|---|
overall | 0.0-1.0 | Holistic quality score |
overall_rating | Enum | Qualitative rating: SUPERIOR (≥0.99), ACCEPTABLE (≥0.85), INFERIOR (<0.85) |
factual_accuracy | Binary | Content is factually correct |
educational_accuracy | Binary | Fulfills educational intent |
localization_quality | Binary | Cultural and linguistic appropriateness for target audience |
API Reference
Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /evaluate | Evaluate content items (1-100 per request) |
GET | /versions | List available InceptBench versions |
GET | /curriculums | List available curriculums and versions |
GET | /health | Health check |
/versions Response
{
"current": "2.3.3",
"supported_versions": ["2.3.3", "2.3.2", "2.3.1"],
"default": "2.3.3"
}
/curriculums Response
{
"default_curriculum": "common_core",
"curricula": {
"common_core": {
"versions": [
{"version": "1.2", "version_date": "2026-01-14"}
],
"latest": "1.2",
"description": "Common Core State Standards (CCSS)"
}
}
}
Versioning
InceptBench supports two types of versioning to give you control over evaluation consistency:
InceptBench Version
Control which version of the InceptBench evaluator is used for your evaluations.
Check available versions:
curl https://api.inceptbench.com/versions
Use a specific version (via query parameter):
curl -X POST "https://api.inceptbench.com/evaluate?version=2.3.1" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"generated_content": [{"content": "What is 2+2?"}]}'
For the Python package, install a specific version via pip:
pip install inceptbench==2.3.1
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
version | Query string | No | Latest | InceptBench version to use (e.g., "2.3.1") |
Curriculum Version
Control which version of the curriculum standards is used for alignment evaluation.
Check available curriculums and versions:
curl https://api.inceptbench.com/curriculums \
-H "Authorization: Bearer YOUR_API_KEY"
Use a specific curriculum version (via request body):
{
"generated_content": [
{"content": "What is 2+2?"}
],
"curriculum_version": "1.2"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
curriculum_version | String | No | Latest | Curriculum version to use (e.g., "1.2") |
Response Version Info
All evaluation responses include version information so you know exactly which versions were used:
{
"request_id": "abc-123",
"inceptbench_version": "2.3.3",
"curriculum_version": "1.2",
"evaluations": {...},
"evaluation_time_seconds": 15.2
}
Authentication
Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Rate Limits
- 100 items per request maximum
- 10 concurrent requests per API key
- Typical evaluation time: 10-30 seconds per item
Interactive API Docs
Explore the full API with our interactive documentation:
CLI Reference
Installation
pip install inceptbench
Environment Variables
Before using the CLI, set the following environment variables:
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY | Yes | OpenAI API key (powers evaluations) |
INCEPT_API_KEY | Yes | InceptBench API key (for curriculum search) |
GEMINI_API_KEY | No | Google Gemini API key (for image analysis) |
ANTHROPIC_API_KEY | No | Anthropic API key (for image analysis fallback) |
Commands
| Command | Description |
|---|---|
inceptbench example | Create a sample content.json file |
inceptbench evaluate <file.json> | Evaluate content from JSON file |
inceptbench evaluate --raw "content" | Evaluate raw content string |
inceptbench --version | Show version |
Options
| Option | Short | Description |
|---|---|---|
--output FILE | -o | Save results to JSON file |
--verbose | -v | Show verbose/debug output |
--full | -f | Output complete JSON evaluation results |
--raw | Evaluate raw content (string, file, or folder path) | |
--curriculum NAME | Curriculum for evaluation (default: common_core) | |
--curriculum-version VER | Curriculum version (e.g., “1.2”). Default: latest | |
--generation-prompt | Generation prompt (with –raw only) | |
--max-threads N | Maximum parallel evaluations for folder mode (default: 10) |
Examples
# Create sample input file
inceptbench example
# Evaluate from JSON file
inceptbench evaluate content.json
# Evaluate with specific curriculum version
inceptbench evaluate content.json --curriculum-version 1.2
# Evaluate and save results
inceptbench evaluate content.json -o results.json
# Evaluate raw content
inceptbench evaluate --raw "What is 2+2? A) 3 B) 4 C) 5 D) 6"
# Evaluate with curriculum context and version
inceptbench evaluate --raw "Solve for x: 2x + 5 = 15" \
--curriculum common_core \
--curriculum-version 1.2 \
--generation-prompt "Grade 7 algebra"
# Evaluate all files in a folder (parallel processing)
inceptbench evaluate ./my_content/
inceptbench evaluate ./my_content/ -o results.json --max-threads 20
# Full JSON output mode
inceptbench evaluate content.json -f
# Verbose mode
inceptbench evaluate content.json -v