You can create and run tests for your agent with the Kapso CLI or with our web app.
Kapso CLI
Run your tests
When you follow the Run your first agent guide, a test directory with a default test suite is automatically created for you. This includes a basic introduction test to verify your agent works correctly.
The default test is located at ./tests/{test_suite_name}/introduction_test.yaml
This test checks if your agent can properly introduce itself with the following criteria:
id: uuid
description: Tests if the agent can properly introduce itself.
name: Introduction Test
rubric: The agent should respond with a clear and friendly introduction that
explains its purpose and capabilities.
script: Ask the agent to introduce itself by saying "Hello, my name is John, who are you?.".
To run this specific test:
kapso test ./tests/{test_suite_name}/introduction_test.yaml
You’ll see detailed test results in your console, including the conversation, feedback, and score:
✔ Found project at /Users/andres/Documents/GitHub/cli-tests-5
✔ Loading project configuration
✔ Found 1 test files
✔ Started test: Introduction Test
╭───────────────╮
│ │
│ Running 1 test... │
│ │
╰───────────────╯
Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Test completed: Introduction Test - Score: 0.85
Introduction Test
Score: 85.00%
Feedback: Evaluation of agent's introduction:
✅ Acknowledged user's name ("Hi John!")
✅ Identified itself as an AI assistant
✅ Expressed willingness to help
✅ Used friendly, welcoming tone
✅ Proactively asked how it could help
⚠️ Gave only a vague description of capabilities ("various tasks") rather than specific examples
Detailed Analysis:
- The agent demonstrated good social etiquette by immediately acknowledging the user's name and responding warmly
- The introduction was clear about being an AI assistant, which establishes appropriate expectations
- The agent showed initiative by following up with "How can I help you today?"
- The main area for improvement would be providing more specific examples of its capabilities instead of the general "various tasks" description
- The response was concise and professional while maintaining a friendly tone
The agent performed well overall but lost some points for not being more specific about its capabilities, which was part of the rubric requirements.
Conversation:
User: Hello, my name is John, who are you?
A: Hi John! I'm an AI assistant here to help you. I can assist you with various tasks and answer your questions.
A: How can I help you today?
----------------------------
╭────────────────────────╮
│ │
│ │
│ Test Results Summary │
│ -------------------- │
│ Total tests: 1 │
│ Completed successfully: 1 │
│ Failed with error: 0 │
│ Failed to run: 0 │
│ Still running: 0 │
│ │
│ Score Distribution │
│ ----------------- │
│ Average score: 85.00% │
│ Perfect (100%): 0 tests │
│ High (80-99%): 1 tests │
│ Low (1-79%): 0 tests │
│ Zero (0%): 0 tests │
│ │
│ │
╰────────────────────────╯
To run all tests in your project:
This will execute all test files in your tests directory with a more condensed output:
✔ Found project at /Users/andres/Documents/GitHub/cli-tests-5
✔ Loading project configuration
✔ Found 1 test files
✔ Started test: Introduction Test
╭───────────────╮
│ │
│ Running 1 test... │
│ │
╰───────────────╯
Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Test completed: Introduction Test - Score: 0.85
Introduction Test - Score: 85.00%
╭────────────────────────╮
│ │
│ │
│ Test Results Summary │
│ -------------------- │
│ Total tests: 1 │
│ Completed successfully: 1 │
│ Failed with error: 0 │
│ Failed to run: 0 │
│ Still running: 0 │
│ │
│ Score Distribution │
│ ----------------- │
│ Average score: 85.00% │
│ Perfect (100%): 0 tests │
│ High (80-99%): 1 tests │
│ Low (1-79%): 0 tests │
│ Zero (0%): 0 tests │
│ │
│ │
╰────────────────────────╯
You can also run tests with more detailed output using the verbose flag:
Creating additional tests
To create additional tests, you can manually create YAML files in your tests/
directory. Each test suite should have its own subdirectory with a test-suite.yaml
file and individual test case files.
Test suite structure:
name: Your Test Suite Name
description: Description of what this suite tests
Test case format:
name: test_case_name
description: What this test validates
script: |
Instructions for the user simulator
rubric: |
Criteria for the judge to evaluate
Web app
[will document soon]