InceptBench Evaluators & API

One unified evaluator for questions, quizzes, reading passages, and articles.
Automatically classifies content and routes to specialized assessment methods.


What Gets Evaluated?

InceptBench automatically classifies your content and evaluates it across 8-12 quality dimensions depending on content type:

Questions

MCQs, fill-in, match — 12 metrics including distractor quality

Quizzes

Multiple questions — concept coverage, difficulty distribution

Reading Passages

Fiction & nonfiction — reading level, engagement, accuracy

Every evaluation returns:

  • Content type — Automatically classified (question, quiz, fiction_reading, nonfiction_reading, article, other)
  • Overall rating — Qualitative rating (SUPERIOR, ACCEPTABLE, INFERIOR)
  • Overall score (0.0-1.0) — Holistic quality assessment
  • Dimension scores — Individual scores with reasoning for each metric
  • Suggested improvements — Actionable recommendations

Quickstart

Evaluate educational content in seconds with our evaluator:

Choose method:

API Usage

bash

curl -X POST "https://api.inceptbench.com/evaluate" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  --data '{
    "generated_content": [
        {
        "id": "q1",
        "content": "What is 2+2? A) 3 B) 4 C) 5 D) 6"
        }
    ]
}'

Note: To obtain an API_KEY, please reach out to the InceptBench team.


Input Format

InceptBench uses a single unified input format. The content field accepts any format — plain text, JSON object, or structured data. The evaluator automatically classifies and routes your content.

Content Item Schema

json

{
  "generated_content": [
    {
      "id": "optional-unique-id",
      "curriculum": "common_core",
      "request": {
        "grade": "7",
        "subject": "mathematics",
        "type": "mcq",
        "difficulty": "medium",
        "locale": "en-US",
        "skills": {
          "lesson_title": "Congruent and Similar Triangles",
          "substandard_id": "CCSS.MATH.CONTENT.7.G.A.1+3"
        },
        "instructions": "A real-world problem involving congruent and similar triangles"
      },
      "content": "Your content here (string, JSON object, or any format)"
    }
  ],
  "curriculum_version": "1.2"
}

Top-level Fields

FieldRequiredDefaultDescription
generated_contentYesArray of content items to evaluate (1-100 items)
curriculum_versionNoLatestCurriculum version to use (e.g., "1.2"). If not specified, uses the latest version.

Content Item Fields

FieldRequiredDefaultDescription
contentYesContent to evaluate. Accepts any format: plain text, JSON object, or structured data
idNoAuto-generatedUnique identifier for the content item
curriculumNocommon_coreCurriculum for alignment evaluation
requestNonullOptional metadata about generation request (see below)

Request Metadata Fields (all optional)

FieldDescriptionExample Values
gradeGrade level"K", "1", "7", "12"
subjectSubject area"mathematics", "english", "science"
typeContent type hint"mcq", "fill-in", "article", "quiz"
difficultyDifficulty level"easy", "medium", "hard"
localeLanguage-region code"en-US", "en-AE", "ar-AE"
skillsSkills informationJSON object or string with lesson/standard info
instructionsGeneration promptThe instruction/prompt used to generate this content
Flexible Content: The content field has no enforced schema. You can pass plain text like "What is 2+2?" or structured JSON with question, answer, options, etc. InceptBench will automatically parse and evaluate whatever format you provide.

Content Examples

Here are examples of different content types. These are examples, not required schemas — you can structure your content however you prefer.

Multiple Choice Question

json

{
  "generated_content": [
    {
      "id": "mcq-example",
      "request": {
        "grade": "3",
        "subject": "math",
        "type": "mcq",
        "difficulty": "easy"
      },
      "content": {
        "question": "Which figure shows equal parts?",
        "answer": "A",
        "answer_explanation": "Only A has all equal parts",
        "answer_options": [
          { "key": "A", "text": "Figure A" },
          { "key": "B", "text": "Figure B" },
          { "key": "C", "text": "Figure C" },
          { "key": "D", "text": "Figure D" }
        ]
      }
    }
  ]
}

Plain Text Question

json

{
  "generated_content": [
    {
      "content": "What is the value of x in 3x + 7 = 22? A) 3 B) 4 C) 5 D) 6. The answer is C because subtracting 7 from both sides gives 3x = 15, then dividing by 3 gives x = 5."
    }
  ]
}

Reading Passage

json

{
  "generated_content": [
    {
      "id": "reading-1",
      "request": {
        "grade": "5",
        "subject": "english",
        "type": "article"
      },
      "content": "# The Water Cycle\n\nWater is always moving on Earth. It goes from oceans to clouds to rain and back again. This is called the water cycle...\n\n## Questions\n\n1. What powers the water cycle?\n2. Where does most evaporation occur?"
    }
  ]
}

Quiz (Multiple Questions)

json

{
  "generated_content": [
    {
      "id": "quiz-1",
  "request": {
        "grade": "4",
        "subject": "math",
        "type": "quiz"
  },
  "content": {
        "title": "Fractions Assessment",
        "questions": [
          {
            "question": "What is 1/2 + 1/4?",
            "answer": "3/4",
            "options": ["1/2", "3/4", "1/4", "1"]
          },
          {
            "question": "Which fraction is larger: 2/3 or 3/5?",
            "answer": "2/3",
            "options": ["2/3", "3/5", "They are equal"]
          }
        ]
      }
    }
  ]
}

Images in Content

Images are automatically detected from the content string. No separate image_url field is required.

Supported formats:

FormatExample
Direct URLhttps://example.com/image.png
Markdown![description](https://example.com/image.png)
HTML<img src="https://example.com/image.png">

Example with image:

json

{
  "generated_content": [
    {
      "content": "Look at the triangle below:\n\n![triangle](https://example.com/triangle.png)\n\nWhat is the area of the triangle if the base is 6 cm and height is 4 cm?"
  }
  ]
}

When images are detected:

  • They are sent to vision-capable models for analysis
  • Object counting is performed automatically
  • Visual properties are analyzed for educational relevance

Example Response

json

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "evaluations": {
    "q1": {
      "content_type": "question",
      "overall_rating": "ACCEPTABLE",
      "overall": {
        "score": 0.85,
        "reasoning": "Well-constructed question with clear answer options and appropriate difficulty.",
        "suggested_improvements": "Consider adding a visual stimulus for enhanced engagement."
      },
      "factual_accuracy": {
        "score": 1.0,
        "reasoning": "The mathematical content and answer are correct.",
        "suggested_improvements": null
      },
      "educational_accuracy": {
        "score": 1.0,
        "reasoning": "Fulfills educational intent for the target grade level.",
        "suggested_improvements": null
      },
      "localization_quality": {
        "score": 1.0,
        "reasoning": "Content is culturally and linguistically appropriate for the target audience.",
        "suggested_improvements": null
      },
      "curriculum_alignment": {
        "score": 1.0,
        "reasoning": "Aligns well with CCSS.MATH.3.OA standards.",
        "suggested_improvements": null
      },
      "clarity_precision": {
        "score": 1.0,
        "reasoning": "Clear wording appropriate for grade 3 students.",
        "suggested_improvements": null
      },
      "specification_compliance": {
        "score": 1.0,
        "reasoning": "Meets format and structure requirements.",
        "suggested_improvements": null
      },
      "reveals_misconceptions": {
        "score": 1.0,
        "reasoning": "Distractors effectively target common misconceptions.",
        "suggested_improvements": null
      },
      "difficulty_alignment": {
        "score": 1.0,
        "reasoning": "Difficulty matches intended grade level.",
        "suggested_improvements": null
      },
      "passage_reference": {
        "score": 1.0,
        "reasoning": "N/A - no passage context required.",
        "suggested_improvements": null
      },
      "distractor_quality": {
        "score": 1.0,
        "reasoning": "Distractors are plausible and educationally meaningful.",
        "suggested_improvements": null
      },
      "stimulus_quality": {
        "score": 0.0,
        "reasoning": "No visual stimulus provided.",
        "suggested_improvements": "Consider adding a visual diagram to enhance engagement."
      },
      "mastery_learning_alignment": {
        "score": 1.0,
        "reasoning": "Supports mastery learning principles.",
        "suggested_improvements": null
      }
    }
  },
  "evaluation_time_seconds": 12.34,
  "inceptbench_version": "2.3.0",
  "curriculum_version": "1.2",
  "failed_items": null
}

Content Types & Metrics

The type field in your request can be any string (e.g., "mcq", "fill-in", "article"). The evaluator will automatically classify your content and map it to one of these internal types:

Content TypeDescriptionKey Metrics
questionSingle educational questioncurriculum_alignment, clarity_precision, specification_compliance, reveals_misconceptions, difficulty_alignment, passage_reference, distractor_quality, stimulus_quality, mastery_learning_alignment
quizMultiple questions togetherconcept_coverage, difficulty_distribution, non_repetitiveness, test_preparedness, answer_balance
fiction_readingFictional narrative passagesreading_level_match, length_appropriateness, topic_focus, engagement, accuracy_and_logic, question_quality, stimulus_quality
nonfiction_readingInformational passagesreading_level_match, length_appropriateness, topic_focus, engagement, accuracy_and_logic, question_quality, stimulus_quality
articleInstructional educational articlescurriculum_alignment, teaching_quality, worked_examples, practice_problems, follows_direct_instruction, stimulus_quality, diction_and_sentence_structure
otherGeneral educational contenteducational_value, direct_instruction_alignment, content_appropriateness, clarity_and_organization, engagement, stimulus_quality

Universal Metrics (all content types)

MetricTypeDescription
overall0.0-1.0Holistic quality score
overall_ratingEnumQualitative rating: SUPERIOR (≥0.99), ACCEPTABLE (≥0.85), INFERIOR (<0.85)
factual_accuracyBinaryContent is factually correct
educational_accuracyBinaryFulfills educational intent
localization_qualityBinaryCultural and linguistic appropriateness for target audience

API Reference

Endpoints

MethodEndpointDescription
POST/evaluateEvaluate content items (1-100 per request)
GET/versionsList available InceptBench versions
GET/curriculumsList available curriculums and versions
GET/healthHealth check

/versions Response

json

{
  "current": "2.3.3",
  "supported_versions": ["2.3.3", "2.3.2", "2.3.1"],
  "default": "2.3.3"
}

/curriculums Response

json

{
  "default_curriculum": "common_core",
  "curricula": {
    "common_core": {
      "versions": [
        {"version": "1.2", "version_date": "2026-01-14"}
      ],
      "latest": "1.2",
      "description": "Common Core State Standards (CCSS)"
    }
  }
}

Versioning

InceptBench supports two types of versioning to give you control over evaluation consistency:

InceptBench Version

Control which version of the InceptBench evaluator is used for your evaluations.

Check available versions:

bash

curl https://api.inceptbench.com/versions

Use a specific version (via query parameter):

bash

curl -X POST "https://api.inceptbench.com/evaluate?version=2.3.1" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"generated_content": [{"content": "What is 2+2?"}]}'

For the Python package, install a specific version via pip:

bash

pip install inceptbench==2.3.1
ParameterTypeRequiredDefaultDescription
versionQuery stringNoLatestInceptBench version to use (e.g., "2.3.1")

Curriculum Version

Control which version of the curriculum standards is used for alignment evaluation.

Check available curriculums and versions:

bash

curl https://api.inceptbench.com/curriculums \
  -H "Authorization: Bearer YOUR_API_KEY"

Use a specific curriculum version (via request body):

json

{
  "generated_content": [
    {"content": "What is 2+2?"}
  ],
  "curriculum_version": "1.2"
}
FieldTypeRequiredDefaultDescription
curriculum_versionStringNoLatestCurriculum version to use (e.g., "1.2")

Response Version Info

All evaluation responses include version information so you know exactly which versions were used:

json

{
  "request_id": "abc-123",
  "inceptbench_version": "2.3.3",
  "curriculum_version": "1.2",
  "evaluations": {...},
  "evaluation_time_seconds": 15.2
}
Recommendation: Use the default (latest) versions unless you have a specific need for consistency in an ongoing evaluation project or backward compatibility with existing integrations.

Authentication

Include your API key in the Authorization header:

bash

Authorization: Bearer YOUR_API_KEY

Rate Limits

  • 100 items per request maximum
  • 10 concurrent requests per API key
  • Typical evaluation time: 10-30 seconds per item

Interactive API Docs

Explore the full API with our interactive documentation:


CLI Reference

Installation

bash

pip install inceptbench

Environment Variables

Before using the CLI, set the following environment variables:

VariableRequiredDescription
OPENAI_API_KEYYesOpenAI API key (powers evaluations)
INCEPT_API_KEYYesInceptBench API key (for curriculum search)
GEMINI_API_KEYNoGoogle Gemini API key (for image analysis)
ANTHROPIC_API_KEYNoAnthropic API key (for image analysis fallback)

Commands

CommandDescription
inceptbench exampleCreate a sample content.json file
inceptbench evaluate <file.json>Evaluate content from JSON file
inceptbench evaluate --raw "content"Evaluate raw content string
inceptbench --versionShow version

Options

OptionShortDescription
--output FILE-oSave results to JSON file
--verbose-vShow verbose/debug output
--full-fOutput complete JSON evaluation results
--rawEvaluate raw content (string, file, or folder path)
--curriculum NAMECurriculum for evaluation (default: common_core)
--curriculum-version VERCurriculum version (e.g., “1.2”). Default: latest
--generation-promptGeneration prompt (with –raw only)
--max-threads NMaximum parallel evaluations for folder mode (default: 10)

Examples

bash

# Create sample input file
inceptbench example

# Evaluate from JSON file
inceptbench evaluate content.json

# Evaluate with specific curriculum version
inceptbench evaluate content.json --curriculum-version 1.2

# Evaluate and save results
inceptbench evaluate content.json -o results.json

# Evaluate raw content
inceptbench evaluate --raw "What is 2+2? A) 3 B) 4 C) 5 D) 6"

# Evaluate with curriculum context and version
inceptbench evaluate --raw "Solve for x: 2x + 5 = 15" \
  --curriculum common_core \
  --curriculum-version 1.2 \
  --generation-prompt "Grade 7 algebra"

# Evaluate all files in a folder (parallel processing)
inceptbench evaluate ./my_content/
inceptbench evaluate ./my_content/ -o results.json --max-threads 20

# Full JSON output mode
inceptbench evaluate content.json -f

# Verbose mode
inceptbench evaluate content.json -v

PyPI Package