Skip to content

CLI Reference

Complete reference for Skillet command-line tools.

eval

Run evaluations against a baseline or with a skill loaded.

bash
skillet eval <name> [skill] [options]

Arguments

ArgumentRequiredDescription
nameYesEval set name, directory path, or single .yaml file
skillNoPath to skill directory (omit for baseline)

The name argument accepts:

  • A name: looks in ~/.skillet/evals/<name>/
  • A directory path: loads all .yaml files recursively
  • A single .yaml file path

Options

FlagShortTypeDefaultDescription
--samples-sint3Number of iterations per eval
--max-evals-mintallMaximum evals to run (randomly sampled)
--toolsstrallComma-separated list of allowed tools
--parallel-pint3Number of parallel workers
--skip-cacheboolfalseSkip reading from cache (still writes)
--trustboolfalseSkip confirmation for setup/teardown scripts
--no-summaryboolfalseSkip the failure summary LLM call

Examples

bash
# Baseline eval (no skill)
skillet eval browser-fallback

# Eval with skill
skillet eval browser-fallback ~/.claude/skills/browser-fallback

# Single file
skillet eval ./evals/my-skill/001.yaml skill/

# More samples for statistical confidence
skillet eval my-skill -s 5

# Random sample of 5 evals
skillet eval my-skill -m 5

# More parallelism
skillet eval my-skill -p 5

# Force fresh runs (ignore cache)
skillet eval my-skill --skip-cache

# Skip script confirmation prompts
skillet eval my-skill --trust

# Skip failure summary (saves one LLM call)
skillet eval my-skill --no-summary

# Restrict available tools
skillet eval my-skill --tools "Read,Write,Bash"

tune

Iteratively improve a skill until target pass rate or max rounds.

bash
skillet tune <name> <skill> [options]

Arguments

ArgumentRequiredDescription
nameYesEval set name or path
skillYesPath to skill directory

Options

FlagShortTypeDefaultDescription
--rounds-rint5Maximum tuning rounds
--target-tfloat100.0Target pass rate percentage
--samples-sint1Samples per eval per round
--parallel-pint3Number of parallel workers
--output-opathautoCustom output path for results JSON

How It Works

  1. Runs all evals against the current skill
  2. Records pass rate and failure details
  3. If pass rate < target, uses DSPy optimization to generate improved skill
  4. Writes improved skill to disk
  5. Repeats until target reached or max rounds

Results are saved to ~/.skillet/tunes/<name>/<timestamp>.json (or custom path with -o).

Examples

bash
# Default tuning (5 rounds, 100% target)
skillet tune conventional-comments ~/.claude/skills/conventional-comments

# Aim for 80% pass rate
skillet tune conventional-comments skill/ -t 80

# More rounds
skillet tune conventional-comments skill/ -r 10

# Custom output file
skillet tune conventional-comments skill/ -o results.json

# Multiple samples for more reliable pass rates
skillet tune conventional-comments skill/ -s 3

create

Create a new skill from captured evals.

bash
skillet create <name> [options]

Arguments

ArgumentRequiredDescription
nameYesEval set name (from ~/.skillet/evals/<name>/)

Options

FlagShortTypeDefaultDescription
--dir-dpath~Base directory for output
--promptstrnoneExtra prompt to customize generation

Output is written to <dir>/.claude/skills/<name>/SKILL.md.

Examples

bash
# Create in home directory (default)
skillet create browser-fallback
# Creates ~/.claude/skills/browser-fallback/SKILL.md

# Create in current directory
skillet create browser-fallback -d .
# Creates ./.claude/skills/browser-fallback/SKILL.md

# Create in specific project
skillet create browser-fallback -d /path/to/project

# Add extra instructions
skillet create browser-fallback --prompt "Be concise, max 20 lines"

generate-evals

Generate candidate eval files from a SKILL.md.

bash
skillet generate-evals <skill> [options]

Arguments

ArgumentRequiredDescription
skillYesPath to skill directory or SKILL.md file

Options

FlagShortTypeDefaultDescription
--output-opathautoOutput directory for candidate files
--maxint5Max evals per category
--domain-dstrallFilter to specific domain(s): triggering, functional, performance

Domains

Each generated eval is tagged with a domain indicating what aspect of the skill it tests:

DomainDescription
triggeringDoes the skill activate when it should (and stay silent when it shouldn't)?
functionalDoes the skill produce correct output once triggered?
performanceDoes the skill meet quality, latency, or efficiency expectations?

By default all domains are generated. Use --domain to filter to specific ones. The flag can be repeated to select multiple domains.

Examples

bash
# Generate from skill directory
skillet generate-evals ~/.claude/skills/browser-fallback

# Custom output location
skillet generate-evals skill/ -o ./my-evals/

# Limit to 3 per category
skillet generate-evals skill/ --max 3

# Only triggering evals
skillet generate-evals skill/ -d triggering

# Triggering and functional evals
skillet generate-evals skill/ -d triggering -d functional

lint

Lint a SKILL.md file for common issues.

bash
skillet lint <path> [options]

Arguments

ArgumentRequiredDescription
pathYes*Path to a SKILL.md file (*not required with --list-rules)

Options

FlagTypeDefaultDescription
--no-llmboolfalseSkip LLM-assisted lint rules
--list-rulesboolfalseList all available rules and exit

Rules

Skillet ships with 14 built-in rules across four categories:

Naming

RuleSeverityDescription
filename-casewarningSkill file must be named exactly SKILL.md
folder-kebab-casewarningSkill folder name must be kebab-case
name-kebab-casewarningName field must be kebab-case
name-matches-folderwarningName field must match the parent folder name
name-no-reservederrorName must not contain claude or anthropic

Structure

RuleSeverityDescription
frontmatter-validwarningFrontmatter must include name and description
frontmatter-delimiterserrorYAML frontmatter must have --- delimiters
frontmatter-no-xmlerrorNo XML angle brackets (< >) in frontmatter
description-lengthwarningDescription must be under 1,024 characters
body-word-countwarningSkill body should be under 5,000 words
no-readmewarningNo README.md inside skill folder

Fields

RuleSeverityDescription
field-licensewarningRecommended license field should be present
field-compatibilitywarningRecommended compatibility field should be present
field-metadatawarningRecommended metadata field should be present

Exit Codes

CodeMeaning
0No issues found
1One or more findings
2Error (file not found, invalid input)

Examples

bash
# Lint a skill
skillet lint ~/.claude/skills/my-skill/SKILL.md

# Lint without LLM-assisted checks
skillet lint --no-llm path/to/SKILL.md

# List all available rules
skillet lint --list-rules

compare

Compare baseline vs skill results from cache.

bash
skillet compare <name> <skill>

Arguments

ArgumentRequiredDescription
nameYesEval set name or path
skillYesPath to skill directory

Run skillet eval <name> and skillet eval <name> <skill> first to populate the cache.

Examples

bash
skillet compare browser-fallback ~/.claude/skills/browser-fallback

show

Inspect cached eval results without re-running any evals.

bash
skillet show <name> [options]

Arguments

ArgumentRequiredDescription
nameYesEval set name or path

Options

FlagShortTypeDefaultDescription
--eval-estrnoneShow detailed results for a specific eval file
--skill-spathnoneShow results with a skill loaded instead of baseline

Examples

bash
# Show cached baseline results
skillet show browser-fallback

# Show results with a skill
skillet show browser-fallback --skill ~/.claude/skills/browser-fallback

# Show detailed results for a specific eval
skillet show browser-fallback --eval 001.yaml

# Combine: specific eval with skill
skillet show browser-fallback --skill path/to/skill --eval 001.yaml

Environment Variables

VariableDefaultDescription
SKILLET_DIR~/.skilletBase directory for evals, cache, and tune results

Exit Codes

CodeMeaning
0Success
1Error (invalid arguments, missing files, etc.)

Released under the MIT License.