Use non-Claude models

Don't have Claude? Most Chinese LLMs (GLM, Qwen, Moonshot, DeepSeek, etc.) are OpenAI-API compatible — use the openai-api executor directly:

bash

# GLM (Zhipu)
export OPENAI_API_KEY="your Zhipu API key"
export OPENAI_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
omk eval --executor openai-api --model glm-4-plus \
  --judge-models openai-api:glm-4-plus --no-cache

# Qwen (Alibaba)
export OPENAI_API_KEY="your Qwen API key"
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
omk eval --executor openai-api --model qwen-plus \
  --judge-models openai-api:qwen-plus

# DeepSeek
export OPENAI_API_KEY="your DeepSeek API key"
export OPENAI_BASE_URL="https://api.deepseek.com"
omk eval --executor openai-api --model deepseek-chat \
  --judge-models openai-api:deepseek-chat

# Moonshot (Kimi)
export OPENAI_API_KEY="your Moonshot API key"
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
omk eval --executor openai-api --model moonshot-v1-8k \
  --judge-models openai-api:moonshot-v1-8k

Ollama local model:

bash

cd examples/custom-executor
omk eval --control baseline --treatment echo-assistant \
  --executor "python ollama-executor.py" --model llama3 --no-judge --report-only

--report-only is useful for this tiny demo set: omk still prints the verdict, but the teaching sample size does not rewrite the command exit code.

About the judge

--judge-models <list> picks the LLM judge(s). Format: executor:model[,executor:model]. Default: ${executor}:haiku (or claude:haiku when no --executor set)
1 entry = single judge; ≥ 2 entries = multi-judge ensemble + inter-judge agreement
If you don't have Claude, point --judge-models at whatever you have, e.g. --judge-models openai-api:glm-4-plus
Add --no-judge to skip the LLM judge and rely on assertions alone

See Executors for the full executor list and Artifact & variant layout for how to specify what gets evaluated.

Use non-Claude models ​

About the judge ​

Use non-Claude models

About the judge