Extella Guide
1.What is Extella
1.1Why an Agent, Not a Chatbot
You've probably used ChatGPT, Claude, or other large language models. Every conversation starts from scratch. The model doesn't remember yesterday. It can't open your file, send an email, run a script, or save a report. At best, it generates code that you then copy and run yourself.
Extella is not a chatbot. It's an AI agent that executes tasks, not advises how to do them.
The formula: AI chat + automation + persistent memory + execution on your device + personal toolkit — all in one place.
You describe a need. Extella creates an Expert (an executable module), saves it to your library, and runs it. The result — an actual file, a sent message, processed data — stays on your device. This isn't text in a chat. It's an object that persists after the session ends.
1.2Extella vs Standard LLMs — Fundamental Differences
ChatGPT tells you what to do. Extella does it for you.
| Feature | ChatGPT / Standard LLMs | Extella |
|---|---|---|
| Primary Purpose | Text generation | Real action execution |
| Memory | Current chat only. New session — clean slate | Persistent: Concepts, Rules, Experts are saved permanently |
| Code Execution | Generates code. You run it yourself | Runs automatically via Experts on your device |
| Reusability | Each request — from scratch | Created Expert runs repeatedly with any parameters |
| Security | Data goes to OpenAI/Anthropic | Files are processed locally and never leave your device |
| Personalization | System prompt is fixed | Rules — dynamic prompt, changes during execution |
| Outcome | Text in chat | Files, data, reports, automated processes |
| Integrations | Plugins (limited set) | Any: Telegram, email, API, file system |
1.3Architecture: How It Works
Extella uses a client-server architecture with two components:
- Server component — the AI brain: language model, knowledge base, Expert and agent management.
- Client component — Listener: a background process on your device that receives tasks from the agent and runs them locally.
Listener is the executor. When the agent says "create a PDF," Listener runs the corresponding Expert on your machine with full access to your files.
By default, Experts run directly in the Listener environment. When strict dependency isolation is needed, use the isolated=true parameter — the Expert then runs in a clean Python venv. This isn't Docker: no heavy virtualization, no root access required. Full access to the user's file system is preserved.
1.4Data Security
- Anthropic Claude is the primary model for chat and code generation in Extella. It processes text requests. The Pro plan allows connecting your own LLM providers and local models.
- OpenAI API is used only for vector embeddings (semantic search of Concepts, KV Store, Rules, and Experts). Data is vectorized but not stored by OpenAI.
- Corporate files and API keys are NEVER sent to providers. Files are processed locally. Keys are stored in an encrypted KV Store on your device.
1.5Key Terms: Glossary
Before moving forward, it's important to understand the six core entities of the platform. Each will be explained in detail in the following sections.
| Entity | What It Is | Analogy |
|---|---|---|
| Expert | Saved executable module — a function that does one specific thing | Tool on your shelf |
| Agent | AI-specialist with model, tools, instructions and memory | Employee with a job title |
| Profile | Group of agents under one project/client | A department in a company |
| Concept | Unit of technical knowledge with semantic search | Note in knowledge base |
| KV Store | Encrypted key-value store: API keys, tokens, data | Safe with deposit boxes |
| Rule | Behavioral instruction embedded in the agent's system prompt | Job description |
| Team | Group of agents working on one project with shared Concepts, Rules, and an orchestrating agent | Department or project team |
1.6The Compounding Effect: Why Extella Gets Stronger Every Day
Typical AI tools don't accumulate value. Every new ChatGPT conversation starts from scratch. Extella works fundamentally differently.
Experts:
- Day 1: Created an Expert "read Excel spreadsheet" → saved
- Day 30: 15 Experts — a library of tools
- Day 90: 50+ Experts — you're not "asking AI" anymore — you're "running tools"
Concepts:
- First time solving a PDF issue → saved as a Concept
- A pattern for working with a specific API → saved
- Each Concept makes the system smarter. This is institutional memory
Rules:
- First: "always ask for confirmation before deleting"
- Then: "save files to ~/Documents/Extella/"
- Then: "if task > 1 step — describe the plan first"
With each Rule, the agent becomes more precise. After 30 days, Extella understands you better than ever.
Metaphor: A Stone Bridge
Each task you solve with Extella is a stone in the foundation. One stone changes nothing. A hundred stones build a bridge to automation of any complexity. In a year, you'll have a personal system that knows your context, tools, and preferences—one that grows stronger every day.
2.Quick Start
A step-by-step guide from downloading the application to completing your first task. Follow the steps in order—each builds on the previous one.
2.1Step 1: Installing the Application and Creating an Account
- Download Extella Desktop from www.extella.ai for your OS (macOS, Linux, Windows).
- Install it like any standard application.
- Create an account and sign in.
Immediately after signing in, Listener starts in the background and performs initial registration:
- → Creates a device record in the system
- → Retrieves a unique Device ID (Target UUID)
- → Establishes a connection with the Extella server
System tray status: "Connected" — everything is working.
2.2Step 2: Understanding Device ID
Device ID is your device's unique identifier. It looks like: 09f7d600-996c-4c9f-a19e-f5bfe433da0e.
Why you need it:
- The agent knows WHERE to execute tasks. "Read my file" — the system understands which machine the file is on.
- If you have multiple devices (Mac Studio at home, MacBook at the office) — each has its own Device ID. You choose where to run the task.
Where to find Device ID:
- In the Extella Desktop interface — bottom section of the application.
- Via agent: "show my devices" — the agent returns a list with UUIDs and descriptions.
- Via API: POST https://api.extella.ai/api/defaults/get_target
Default Target — the device where Experts run by default. Changed via set_default_target.
2.3Step 3: Getting an API Token
API token — a key string that verifies your identity in the system. Required for Listener authentication and programmatic calls.
Via agent (easiest method):
Type in chat: "Generate an API token for me" — the agent creates a token instantly. Optionally specify a name, e.g., "Mac Studio listener".
Copy and save the token — it's used for Listener configuration.
Managing tokens via agent:
- "Show my tokens" — list of all active tokens
- "Revoke token [name]" — instant deactivation
2.4Step 4: Your First Agent Request
Open the Extella Desktop interface. Type your first natural language request:
"Create a 3-slide PDF presentation about our product. Save it to Downloads."
What happens:
- The agent analyzes the request
- Creates an Expert (an executable module — e.g., using ReportLab)
- Runs it on your device via the Listener
- Within seconds, a PDF file appears in ~/Downloads/
This is your first Extella result.
2.5Step 5: Expert saved to library
The created Expert is automatically saved to your personal library with a name (e.g., generate_product_presentation_pdf). Now you can:
- Run it again with different parameters (different text, different title)
- Modify it: "change the background color to blue"
- Use it as part of a more complex workflow
This is the key difference from chatbots: solve a task once—the tool remains forever.
2.6Step 6: First Rule
A Rule is an instruction that applies to every interaction. Add your first rule:
"Add a rule: always respond in English"
Or other useful rules:
- "Save all files to ~/Documents/Extella/"
- "If a task takes more than one step—describe the plan first"
- "Always ask for confirmation before deleting data"
Now every time the agent generates a response or creates a file, these rules are applied automatically—no reminders needed.
2.7Step 7: First Concepts
Concepts are the agent's long-term memory. They accumulate automatically. After the first completed task, the agent saves:
- Which library worked best for PDF generation
- How to handle errors from a specific API
- Which approach worked for your task
You can also add them manually:
"Remember: I prefer pandas over openpyxl for working with tables"
The more tasks you solve, the smarter your agent becomes.
2.8Checklist: Ready to Go
| # | Action | Status |
|---|---|---|
| 1 | Extella Desktop installed and running | ⬜ |
| 2 | Listener shows Connected in system tray | ⬜ |
| 3 | Device ID registered (visible in interface) | ⬜ |
| 4 | API token obtained and saved | ⬜ |
| 5 | First request sent and response received | ⬜ |
| 6 | First Expert created and visible in library | ⬜ |
| 7 | First Rule added | ⬜ |
| 8 | First Concept saved | ⬜ |
If all 8 items are checked, your platform is configured. Now the real fun begins: scaling the system.
3.KV Store, Concepts, Rules
Section 1 provided a brief glossary. Section 2 covered the practical quickstart. This section examines each of the three components in detail: how they work, what they can do, when to use them, and what not to store.
Note: the default agent comes with a set of pre-installed basic Rules and Concepts. You can view and modify them at any time.
3.1Why Three Storage Systems Instead of One
| Example | Solution | Why this approach |
|---|---|---|
| Look up an API key by exact name | KV Store — exact search by key | Data is encrypted, searched exactly and quickly |
| Recall context from a past PDF session | Concepts — semantic search | Semantic search, not keyword matching |
| Agent always responds in English | Rules — auto-loaded into every prompt | No need to search — always active |
KV Store is a "vault"—you store items with an exact name and retrieve them by that name. Concepts is a "knowledge search index"—it finds information by query meaning, even when the wording differs. Rules are "reflexes"—they load automatically with every user message, so the agent operates with them from the first word of the conversation.
3.2KV Store — Encrypted Data Storage
Each KV entry has three fields: a unique key name, a value (text or JSON, up to 1 GB), and a description used for semantic search.
Encryption and PIN
All values are encrypted with your PIN. The agent decrypts them automatically via kv_get. This protects credentials from leaking in logs and exports.
Important when running an Expert on a different device: if the PIN on that device differs, decryption returns garbage—you'll see an 'invalid decimal literal' error. Solution: pass the pin explicitly when calling: run_expert('name', {}, pin='your_pin').
What KV Store Contains
Typical data categories:
| Category | Key examples |
|---|---|
| Service API keys | telegram_bot_token, anthropic_api_key, openai_api_key, tavily_api_key |
| Device Target UUIDs | mac_studio_target, ubuntu_vm_target, macbook_target |
| URLs and endpoints | aios_backend_url, webhook_slack, api_crm_url |
| Session data | session_history, cache_results (JSON arrays) |
| Configurations | typefully_social_set_id, redis_url, redis_token |
KV Store holds more than just short strings. A value can be a complete JSON array with session data history—KV becomes a fast key-value cache for agents.
Semantic Search in KV Store
Each record has an embedding (OpenAI text-embedding-3-small) generated from key + description. This enables semantic search:
# Forgot the exact key name?
kv_search("telegram bot token")
# Finds: telegram_bot_token, telegram_bot_token_taskboard
# Works even if description is in Russian and query in English, or vice versa
Writing Good Descriptions
A description isn't a comment—it's a semantic search index. The more precise and informative, the more reliably agents will find the key.
# Good description:
kv_set(key="anthropic_key", value="sk-...",
description="Anthropic Claude API key (main production, updated 2025-03)")
# Bad description — search won't help:
kv_set(key="k1", value="sk-...", description="key")
Agent Auto-Search Algorithm — The Golden Rule
Agents NEVER ask for credentials first. They follow a strict algorithm:
- 1. Need a key? → kv_search("<service> key token")
- 2. Found it? → kv_get(key) — automatic decryption
- 3. Not found? → only then ask the user
- 4. User provided it? → kv_set + permanent storage
If you saved tavily_api_key with the description "Tavily web search API key" once, the next time you request web search, the agent finds it automatically—without asking a single question.
Core Principle: Experts Never Access KV Directly
This is a fundamental architectural security principle. An Expert is a pure function. Agents inject credentials via params. The Expert receives the already-decrypted value as a parameter—and knows nothing about KV Store at all.
# WRONG: Expert accesses KV directly
def send_telegram(text: str) -> dict:
import requests
# Value is encrypted ($enc:...) — Expert can't decrypt it!
r = requests.post("https://api.extella.ai/api/kv/get", ...)
token = r.json()["value"] # gets garbage
# RIGHT: Agent decrypts and injects
def send_telegram(text: str, bot_token: str = "") -> dict:
import requests
# bot_token already decrypted by agent and passed via params
url = f"https://api.telegram.org/bot{bot_token}/sendMessage"
# ... rest of the logic
This ensures: security (credentials not in code), reusability (one Expert, different tokens), testability (any input data without KV dependency).
3.3Concepts — Semantic Knowledge Memory
A Concept is a text fragment (knowledge, pattern, solution) stored with semantic search. Concepts use vector search: the meaning of your query is matched against saved knowledge, not just keywords.
When an agent saves a concept like "For PDF generation in Docker, use ReportLab — not wkhtmltopdf, which requires X11," the system immediately sends the text to OpenAI, receives a 1536-dimensional vector, and stores it alongside the text. When searching for "create PDF in container," the query is also converted to a vector, and the system finds the nearest match. Even though the query contains neither "ReportLab" nor "X11," the semantic distance is minimal.
Concept examples
- "Extella execution environment: Experts run on the local device via the Listener. Python venv is an optional parameter (isolated=true). Not Docker."
- "For reading .docx files: python-docx. Installation: extella-pip install python-docx"
- "PDF on Linux: if wkhtmltopdf is unavailable — use ReportLab directly. Installation: extella-pip install reportlab"
- "Telegram bot getUpdates: offset = last update_id + 1. Otherwise the same messages will be received again."
What to store vs what not to store
| Store in Concepts | Do NOT store in Concepts |
|---|---|
| Patterns and problem solutions | API-keys and tokens (→KV Store) |
| Library installation instructions | JSON-session data and caches |
| Architectural decisions of the project | Specific file paths |
| Business requirements and specifications | Personal data |
| Insights from work experience | Configurations containing passwords or secrets |
| Technical limitations and workarounds | Temporary data that becomes outdated |
Why you should never store credentials in Concepts: concepts are found by meaning. If you save an API key, semantic search will find it when queried for "need a key" — and it will appear in the context as plain text, without encryption. This is a security violation.
Correct pattern: generalized insight from experience
Agent encounters an error → resolves it → extracts generalized knowledge → saves it:
# After the agent solved the PDF problem:
concept_add(
"For PDF generation in headless environments (Docker, server without X11)"
" use ReportLab. wkhtmltopdf requires a graphical display"
" and doesn't work in containers without Xvfb."
)
# In the future, it will be automatically found when querying:
concept_search("PDF in Docker")
# -> Finds with high similarity, even though 'wkhtmltopdf' isn't in the query
A Concept is a generalized insight from experience, not raw data.
Concept Operations
| Operation | MCP tool | Description |
|---|---|---|
| Create | concept_add | Text → embedding → save |
| Find | concept_search | Semantic search by meaning (not by keywords) |
| Update | concept_update | Edit text + regenerate embedding |
| Delete | concept_remove | Delete by ID |
| List | concept_list | All concepts of an agent or profile |
The global=true parameter enables searching Concepts across all profile Experts—knowledge from one Expert becomes accessible to others. Without global=true, an Expert can only access its own Concepts.
3.4Rules — Dynamic System Prompt
Loading Mechanism
A Rule is a behavioral instruction loaded with EVERY user message via rules_list. Here's how it works:
1. User sends a message 2. System calls rules_list() -> retrieves all active Rules 3. Rules are embedded into the system prompt BEFORE processing the request 4. Agent generates a response considering all Rules
This cycle occurs automatically on every conversation turn. A Rule is not a memory query—it's part of the agent's "personality."
The Expert doesn't "recall" Rules—it operates with them from the first word of the conversation. This is the fundamental difference from Concepts, which must be explicitly searched.
Example Rules to Get Started
- "Always respond in English"
- "Always ask for confirmation before deleting files or data"
- "If a task requires more than one step—describe the plan first, then execute"
- "When a task is complete—briefly explain what was done"
- "Save all created files to ~/Documents/Extella/"
- "Never store credentials in Concepts—use KV Store"
Limits and Restrictions
Maximum length per Rule: 4,000 characters. This is sufficient for detailed instructions. For extensive technical knowledge, use Concepts.
Rules are independent: you can have 50 Rules, and all will apply simultaneously. Application order is not guaranteed—write Rules so they don't conflict with each other.
global=true for Rules
Rules with global=true are visible to all agents in the profile. This lets you define common behavior guidelines for all agents in one profile without configuring each one individually.
Difference between Rules and Concepts
| Parameter | Rules | Concepts |
|---|---|---|
| Loading | Automatically with each message | Only on explicit search concept_search |
| Impact | Always in system prompt | Only when found and added to context |
| Search | None — all are loaded | Semantic search by meaning |
| Data type | Instructions, constraints, style | Knowledge, facts, solution patterns |
| Size limit | 4000 characters per rule | Unlimited (TEXT in PostgreSQL) |
| Example | "Always respond in Russian" | "For PDF in Docker use ReportLab" |
Mnemonic: if the behavior should ALWAYS apply—it's a Rule. If the knowledge might be needed SOMETIMES—it's a Concept.
Operations with Rules
| Operation | MCP tool | Description |
|---|---|---|
| Create | rules_add | New rule (rule_id is generated automatically) |
| Update | rules_update | Edit the text of an existing rule |
| Delete | rules_remove | Delete a rule by rule_id |
| List | rules_list | Get all rules (called automatically with each message) |
3.5Comparison table of three storage types
| Characteristic | KV Store | Concepts | Rules |
|---|---|---|---|
| Data type | Key-value + description | Semantic knowledge (text) | Behavioral instruction |
| Encryption | ✅ User PIN | ❌ No | ❌ No |
| Search | Exact by key + semantic | Semantic only | No — all are loaded |
| Auto-loading | ❌ | ❌ | ✅ On every message |
| Embeddings | pgvector (from key + description) | pgvector (from concept text) | N/A |
| Embedding model | text-embedding-3-small | text-embedding-3-small | N/A |
| Value limit | TEXT (up to 1 GB) | TEXT | 4000 characters |
| What it stores | API-keys,UUID, URL, JSON, session data | Knowledge, patterns, solutions, insights | Constraints, protocols, response style |
| global flag | ✅ | ✅ | ✅ |
| Isolation | By agent_id / profile_id | By agent_id / profile_id | By agent_id / profile_id |
3.6Data isolation: three levels
All core tables (KV, Concepts, Rules, Targets, Experts) include agent_id and profile_id columns. The three-level isolation model:
| Level | Analogy | Description |
|---|---|---|
| user_id | Building owner | Global user identifier |
| profile_id | Floor (department) | Group of agents for a single project/client |
| agent_id | Room on the floor | Specific agent within a profile |
How the global flag works
global=false (default) — "I see only my office": the agent sees only its own data (filtered by agent_id). The "Researcher" agent cannot see concepts belonging to the "Writer" agent. global=true — "I see the entire floor": the agent sees data from all agents in the profile (filtered by profile_id). Researcher + Writer + Analyst — all three agents share one profile.
INSERT is always yours
When creating a new record (concept_add, kv_set, rules_add), the system ALWAYS uses the current agent_id and profile_id. You cannot create a record "for another agent." This ensures data belongs to whoever created it.
When reading/updating/deleting (concept_search, kv_get, rules_list), results are filtered by the global flag. Without global=true — only your data. With global=true — data from all agents in the profile.
4.Experts & Automations
An Expert is an atomic automation that persists forever. Create it once — it runs indefinitely. This section covers everything from Expert types to creating scheduled automated tasks.
4.1Four Types of Experts
1. SIMPLE — Single-Task Building Blocks
| Expert | What it does | API key? |
|---|---|---|
| convert_pdf_to_text | Extracts text from PDF | No |
| send_telegram_message | Sends message | Yes (bot_token) |
| excel_query | SQL query to .xlsx | No |
| word_generate | Generates .docx from JSON | No |
2. COMPLEX — Multi-Stage Pipelines
- decompile_binary_to_pseudocode: file → disassembly → graph → pseudocode
- generate_3d_model_from_photo: photo → depth map → 3D mesh → .obj
3. NESTED — Orchestrators (cspl=nohup)
Call other Experts via REST API. Example:
fetch_emails -> extract_data -> check_crm -> create_task -> send_notification
→ For more on nohup Experts (script structure, {{placeholders}} syntax, no return statement, manual include): Section 8, subsection 8.4.
→ Pattern for parallel worker execution + synchronization via wait_tasks: Section 7.
4. INTEGRATION — Technology Wrappers
| Subtype | Examples |
|---|---|
| CLI wrapper | ffmpeg, ImageMagick, pandoc, git |
| Library wrapper | Pillow, pandas, BeautifulSoup |
| External API | Telegram, OpenAI, Notion, Jira |
| Database | SQLite, PostgreSQL |
4.2Expert Structure: 5 Required Elements
Expert template:
$extens("include.py")
include("import requests", ["extella-pip install requests"])
def expert_name(param1: str = "", param2: int = 0) -> dict:
import requests
if not param1:
return {"status": "error", "message": "param1 required"}
try:
# ... logic ...
return {"status": "success", "result": "..."}
except Exception as e:
return {"status": "error", "message": str(e)}
5 required elements for every Expert:
- 1. Directive: $extens("include.py") — first line, mandatory
- 2. Dependencies: include(..., ["extella-pip install ..."])
- 3. Signature: def name(param: str = "") -> dict — explicit types, defaults, no *args/**kwargs
- 4. Validation: check inputs, return early on error
- 5. Return: always a dict with a status field
4.3Description — a search index, not a comment
When you save an Expert, the backend generates an embedding from the name + description fields. This embedding powers search_blocks — the semantic search across your library. A poor description means the Expert won't be found when you need it.
| ❌ Bad — not searchable | ✅ Good — semantically searchable |
|---|---|
| description="" | description="Sends message to Telegram. Parameters: chat_id — chat ID; message — text; bot_token_key — token key in KV Store" |
| description="utility" | description="Converts PDF to text via pdfplumber. Parameters: file_path — path to PDF; max_pages — page limit (0=all)" |
Rule: description = one sentence describing what the Expert does + a list of all parameters with their purpose.
4.4Names — snake_case. Saving with an existing name overwrites the Expert
The Expert name is a unique key in the library. Requirements:
• snake_case only: send_telegram_message, convert_pdf_to_text, get_server_metrics
• No spaces, hyphens, or Cyrillic characters
• Saving with a name that's already taken → the previous version is overwritten without warning
• For versioning, use suffixes: analyze_document_v2, or explicitly delete the old version
4.5The isolated=True parameter — run in a clean environment
You can call run_expert with isolated=True — the Expert will run in a fresh venv without interference from other Experts' dependencies:
run_expert('my_expert', {'param': 'value'}, isolated=True)
When to use: dependency conflicts between Experts, non-standard library versions, reproducibility during debugging.
4.6extella-pip install — mandatory rule
include("from pdfplumber import open as pdf_open", ["extella-pip install pdfplumber"])
Always use extella-pip install, not pip install or pip3 install. This ensures packages are installed in the correct Expert virtual environment.
Multiple dependencies:
include("import pandas", ["extella-pip install pandas", "extella-pip install openpyxl"])
4.7Generalization Principle: Avoid Hardcoding
Bad — hardcoded:
def process_invoice():
file_path = "/Users/ivan/Downloads/invoice.pdf" # Only works on one machine!
Good — parameterized:
def process_invoice(file_path: str = "", output_dir: str = "") -> dict:
if not file_path:
return {"status": "error", "message": "file_path required"}
Four absolute prohibitions:
- ❌ Hardcoding paths, keys, IDs — pass everything through parameters
- ❌ *args/**kwargs in signatures — use only explicit named parameters
- ❌ Returning binary data — return only file paths
- ❌ Fourth prohibition: Experts must never access KV Store directly
An Expert must not call /api/kv/get or any other Extella API from within its code to retrieve credentials. This violates isolation and creates a hidden dependency on the cloud.
| ❌ Forbidden — expert pulls KV itself | ✅ Correct — the agent injects via params |
|---|---|
| def send_msg(chat_id): r = requests.get('/api/kv/get') token = r.json()['value'] # ... uses token | def send_msg(chat_id, bot_token=''): if not bot_token: return {'status': 'error'} # uses bot_token directly |
Correct pattern: the agent retrieves credentials from KV via MCP (kv_get/kv_search) and passes values to the Expert as parameters at runtime:
# Agent (outside Expert code):
token = kv_get('telegram_bot_token')['value'] # agent decrypts via PIN
run_expert('send_telegram', {'chat_id': id, 'bot_token': token}) # injects
Expert = pure logic with no external calls. Data and credentials = parameters from the agent.
4.8CLI Wrappers: 5 Lines Instead of 50
Example: ImageMagick via subprocess:
$extens("include.py")
include("import subprocess", [])
def resize_image(input_path: str="", output_path: str="", width: int=800, height: int=600) -> dict:
import subprocess
if not input_path:
return {"status": "error", "message": "input_path required"}
size = str(width) + "x" + str(height)
cmd = ["convert", input_path, "-resize", size, output_path]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
return {"status": "error", "stderr": result.stderr}
return {"status": "success", "output": output_path}
ffmpeg, pandoc, git, docker, rsync — any of these can become an Expert in 5-10 lines.
4.9Cron Jobs: Scheduled Automation
A Cron job is a nohup Expert that runs on a schedule. Create one with a single phrase:
"Create a job: every morning at 9:00 AM, generate a server metrics summary"
The agent creates a background process — no crontab files, no YAML. Important:
- Cron runs through Listener — the device must be powered on
- Logs: /tmp/nohup_<name>.log
- To stop: "Stop Cron job <name>"
- After reboot — requires manual restart
How to Technically Stop a Cron Job
A Cron job is an OS nohup process. It runs independently of chat and Listener. Three ways to stop it:
| Method | Command / action | When |
|---|---|---|
| Via agent (recommended) | "Stop Cron task <name>" — agent finds PID from .pidfile and sends SIGTERM | Primary method |
| Via Listener UI | Listener tab → find process → Cancel button | If agent is unavailable |
| Manually via terminal | kill $(cat /tmp/nohup_<name>.pid) or: kill <PID> | Emergency stop |
Diagnostics — find the PID and logs of a running job:
cat /tmp/nohup_<name>.pid # process PID tail -f /tmp/nohup_<name>.log # real-time logs ps aux | grep <name> # check if process is alive
When the device reboots, the nohup process terminates. The PID file remains but references a non-existent process. For auto-start on reboot — add Expert launch to launchd (macOS) or systemd (Linux).
| Pattern | Command | Cron | What it does |
|---|---|---|---|
| Monitoring | "Every 5 minutes check availability of example.com" | */5 * * * * | Ping, KV-record, notification on failure |
| Daily summary | "At 9:00 AM — server metrics summary" | 0 9 * * * | Metrics, report, concept of the day |
| Weekly analysis | "On Sunday at 6:00 PM — weekly error analysis" | 0 18 * * 0 | Log aggregation, pattern concepts |
| Monthly audit | "On the 1st at 10:00 AM — token audit" | 0 10 1 * * | Token scanning, report |
Self-improving loop: Cron agent writes Concepts. Next week, it uses them as context. After a year — an Expert on your system's problem history.
5.Agents & Teams
Now that you understand Experts, Concepts, KV Store, and Rules, it's time to explore how they come together in agents and teams. This section covers the power of multi-agent orchestration, the full agent customization options available on the Pro plan, and how to build your own team of AI specialists.
5.1What is an Agent
An agent is an AI specialist with a specific role, its own memory, and a set of tools. Unlike a standard chatbot, an agent is a configured entity: it knows its specialization, remembers interaction history, and can perform real actions. An agent consists of:
- Model — Claude, Gemini, Qwen, Llama, GPT — you choose
- System instructions — specialization, working style, and boundaries
- Tool set — MCP functions: concept_add, run_expert, web_search, kv_get, etc.
- Profile — an isolated workspace the agent belongs to
- Memory — Concepts (knowledge), KV Store (data and keys), Rules (behavioral guidelines)
A single agent can handle hundreds of tasks. Multiple agents in a team form an entire specialized department. Each agent is a precision instrument for a specific domain, not a universal Swiss army knife.
5.2Creating and Configuring an Agent
On Plus and Flex plans, the default agent (or multiple agents) comes preconfigured with the system prompt and all parameters already set—no user action required. The only difference between Plus and Flex is that Flex users can use their own LLM provider API key instead of paying credits to use Extella's key.
On the Pro plan, you get full control over every agent parameter. This is a fundamental shift: instead of accepting an out-of-the-box agent, you become its architect. You can configure your own agent in the right slide-out panel of the chatbot interface.
| Parameter | Configuration | Example usage |
|---|---|---|
| Model | GPT-4o, Claude Opus, Gemini Pro, Mistral, local via Ollama, etc. | Claude Opus for deep document analysis |
| System prompt | Precise instruction: role, style, boundaries, specialization | "You are a financial analyst. Respond only in JSON." |
| Temperature | Balance accuracy <-> creativity (0.0 = predictable, 1.0 = creative) | 0.2 for analytics; 0.9 for copywriting |
| Top-P / Top-K | Managing token probability distribution | Top-P 0.9 for diverse text generation |
| Max tokens | Maximum response length | 4096 for documents; 512 for brief responses |
| Tools & MCP | Which tools are available to the agent | Financial tools only for finance specialists |
| Memory settings | Which memory type it uses: concepts, rules, KV | Long-term memory + thematic concepts |
| Rules setting | Which rules apply in which situations | "When uncertain — clarify before acting" |
| Response format | Output structure: text, JSON, markdown, schema | Strict JSON for integration with CRM system |
Examples of Specialized Agents
- Analyst agent: Claude Opus, temperature 0.2, JSON-only output, financial tools only
- Copywriter agent: GPT-4o, temperature 0.9, no tools, detailed brand voice in the prompt
- Code reviewer agent: Mistral, minimal context, security checklist in system prompt
- Research agent: Gemini, web_search + concept_add, maximum context, high recursion_limit
Reusability: configure once, use forever
Configured agents are saved to your library and accessible from a dropdown menu at any time. No need to paste prompts or select parameters each time—the agent already knows who it is and what it can do. This changes your workflow: instead of "find an agent for this task," you think "pick the right one from my library."
Agent limits on the Pro plan
The Pro plan does not limit the number of agents a user can create.
5.3Three Agent Interaction Patterns
1. Recursion — agent calls itself iteratively
The agent processes data in chunks, calling itself with refined parameters. Each iteration starts with a clean context. Protection against infinite loops is provided by the recursion_limit parameter when creating the agent (recommended: 5–15).
Example: A CTO checks 47 API endpoints for test coverage, processing 15 per iteration. Iteration 1: 1–15 (12 covered, 3 missing). Iteration 2: 16–30 (14 covered, 1 missing). Iteration 3: 31–47 (13 covered, 4 missing). Final report: 39/47 covered (83%), 8 require tests.
2. Escalation — sub-agent signals the orchestrator
When a specialized agent encounters a situation outside its competence, it escalates back to the orchestrator. The orchestrator then decides whether to reassign the task, bring in another specialist, or adjust the delegation parameters.
Example: A CCO analyzing competitors discovers news about a competitor's $50M Series B round. This is a strategic factor—the CCO escalates to the orchestrator: "New competitive factor detected. Recommend revisiting positioning." The orchestrator brings in the Corporate Director.
3. Cross-calls — peer-level agents
Agents communicate directly with each other without an intermediary—when a task requires data from an adjacent domain. Technically, this works through the MCP tool agent_run with the target agent's known agent_id. All calls are logged.
Example: A CTO directly asks the Corporate Director: "Estimate infrastructure costs: 3 microservices, GPU A10G, 10K requests/day." Receives a $2,400/month estimate and includes it in the architecture document without extra hops through the orchestrator.
5.4Teams — Multi-Agent Systems with Delegation
In traditional platforms, a single agent "tries to be everything to everyone." Its context window quickly fills with irrelevant data. The longer the session, the less accurate the responses become.
Extella solves this differently—by creating a Team that collaborates on a single project: an orchestrator agent receives a task, breaks it into subtasks, and delegates each to a specialized agent within a clean, fully relevant context. Each specialist focuses solely on their domain.
Example: the task "analyze the product, prepare competitive analysis, financial model, and pitch for Seed round":
| Step | Agent | Clean context |
|---|---|---|
| Competitive analysis | CCO | Market data, G2/Gartner, competitor pricing models |
| Technical feasibility | CTO | Platform architecture, ML models, integrations, cost |
| Financial model | Corporate Director | CTO assessment + inference costs + CAC/LTV benchmarks |
| Pitch deck | CCO | Competitive analysis (step 1) + financial model (step 3) |
| Synthesis | Orchestrator | All results: competitive analysis + technical + financials + pitch |
Each agent has its own Experts, KV Store, Concepts, and Rules. Each focuses on their expertise. The result is not a generic blurred response, but a structured document where each section is prepared by a specialist.
Key advantage: a specialist agent with a clean 32K token context makes more accurate decisions than a generalist agent with an overloaded 200K token context.
A Team in the Extella platform is a multi-agent system that operates as a single unit. You send a task to the Team, and it determines which agent to route it to—or how to split it among several.
What each Team includes
- Goal and context — what it was created for, what it does well
- Members with roles — each agent knows its role (Research, Writing, Review, Execution, etc.)
- Dedicated Concepts — a knowledge base specific to this Team only
- Dedicated Rules — behavioral rules applied within the system
- Orchestration prompt — delegation logic: the criteria for routing tasks to agents
How a Team makes decisions
A Team operates in auto mode: upon receiving a task, the orchestrator analyzes it, matches it against member roles and delegation rules, then routes it to the appropriate agent (or several in parallel). No human involvement in this distribution.
In the future, the delegation mechanism will be enhanced with a trained RL model that will make faster and more accurate decisions based on the Team's accumulated experience.
One agent — in multiple Teams simultaneously
A Team doesn't duplicate agents — it references them. An agent can formally be a member of multiple Teams. However, each agent carries its own fixed configuration: system prompt, Rules, and Concepts do not change based on which Team the agent belongs to. This means the same agent cannot be used for fundamentally different purposes simply by assigning it to different Teams — choose agents whose configuration matches their intended role within each Team.
Team examples
| Team | Participants (Roles) | Purpose |
|---|---|---|
| Content Studio | Research -> Writer -> Editor -> SEO-reviewer | Creating content materials from research to publication |
| Due Diligence | Financial Analyst + Legal Reviewer + Market Researcher | Collecting and synthesizing company data for investment decisions |
| Product Sprint | PM + Tech Lead + UX Reviewer | Task Breakdown and Technical Specification Development |
| Personal Knowledge Base | Collector + Summarizer + Tagger | Structuring incoming information into a personal knowledge base |
Agent limits on the Pro plan
Team creation is limited to a maximum of 3. The number of agents within each Team is limited to 5.
5.5Interface: My Agents and My Teams
On the Pro plan, the agent dropdown in the upper left panel of the interface has the following structure:
▼ Default Agents
• Extella Claude Sonnet (preconfigured)
• Extella GPT-5 (preconfigured)
▼ My Agents
• Financial Analyst (your agent)
• Copywriter (your agent)
• Code Reviewer (your agent)
▼ My Teams • Content Studio (your team)
• Due Diligence (your team)
• Product Sprint (your team)
Each group is expandable. Any agent or Team is clickable—selecting one assigns it to the current chat. Note: you can select either an entire Team or an individual agent.
Creating a new agent
Open the right settings panel—the Agent Builder section contains agent configuration fields: name, model, provider, system prompt, temperature, and tools. You can also create and configure an agent through chat—describe the agent you want to Extella, and she will handle everything automatically.
Creating a new Team
Open the right settings panel—the Team Builder section. Here you can create a team, assign it a name, and add available agents to it, designating a master (or orchestrating) agent responsible for Team management. Team configuration and populating it with Rules and Concepts happens directly through chat with Extella.
5.6Creating and configuring a Team through interaction with Extella
As an alternative to interface configuration, you can create and configure a Team through chatbot interaction. The user describes in words what they want to achieve. Extella uses reasoning to form a Team object and saves it.
What the user describes
- Team purpose — what tasks it's created for, what it should do well
- Composition — which agents to include (from existing configured agents or new ones)
- Roles — who is responsible for what within the system
- Rules — how the Team should behave, what to consider when delegating
- Knowledge — specific context that system agents need
The user can describe everything at once or answer Extella's clarifying questions.
What Extella does when creating a Team
| Step | What Happens Automatically |
|---|---|
| Defines composition | Selects suitable agents from the library. If none available — suggests creating or using Extella Agents |
| Assigns roles | Creates role descriptions for each agent within the Team |
| Creates orchestration prompt | Instructions for the orchestrator: how to distribute tasks, what criteria to consider |
| Creates Concepts Team | Stores specific context and knowledge provided by the user |
| Creates Team Rules | Records behavioral rules for the system |
| Stores Team | The object appears in the My Teams list in the dropdown menu |
Example Team creation dialog
User:
"Create a content marketing team. I need: a researcher (gathers topic data), a writer (creates content), an editor (reviews style), and an SEO specialist (optimizes). They work sequentially. Rule: the final text must contain at least 3 keywords from the brief."
Extella:
"Creating Team Content Studio with four agents. Setting up the chain: Researcher -> Writer -> Editor -> SEO. Adding the keyword rule to Team Rules. Done — Content Studio now appears in My Teams."
Editing a Team via chat
Message Extella in the chat at any time:
- "Add a social media agent to Content Studio" — Extella adds an agent with the Social Media role
- "Change Researcher's role — now they handle statistics research" — Extella updates the role
- "Add a rule: verify facts through two sources" — Extella adds the rule to Team Rules
- "Remove Editor from Due Diligence" — Extella removes the agent from the Team (the agent itself is preserved)
5.7Data isolation: two levels
On the Pro plan, agent data (Concepts, Rules, KV Store, Experts) is isolated by default and never mixes with other agents' data. Visibility is controlled by the global parameter, available in most MCP tools and REST endpoints.
| Level | What It Covers | Parameter | Default |
|---|---|---|---|
| Agent | Current agent only | global=false | ✅ yes |
| Profile | All agents within one profile | global=true | — |
How this works in practice:
global=false (default) — the agent sees only its own Concepts, Rules, KV pairs, and Experts. Other agents' data remains inaccessible, even within the same profile. global=true — the agent sees data from all agents in the profile. Use this when you need to share a common knowledge base across a team.
Key principle: isolation is not hierarchy. Data doesn't automatically "leak" up or down. The developer or agent explicitly controls scope through the global parameter with each call.
Example:
# Add concept only for current agent (default) concept_add(text="...", global=False) # Find concept in any agent of the profile concept_search(query="...", global=True)
Note: Profile (Team) is a container for agents, not a separate storage level. There is no dedicated "Team storage" for Concepts or Rules.
5.8Production Agents in Extella
Below are actual agents running in the Extella system. Each specializes in its domain with its own configuration, tools, and model.
| Agent Name | Model | Specialization |
|---|---|---|
| Extella (CEO) | Claude Sonnet 4.6 | Main orchestrator: delegation, strategy, results synthesis |
| CCO | Gemini 2.5 Flash | B2B sales, GTM strategy, competitive analysis, pitches, pricing |
| CTO | Gemini 2.5 Flash | Platform architecture, CSPL Experts, API, security, infrastructure |
| Corporate Director | Qwen 3.6 Plus | Finance, legal, compliance, capital structure, investors |
| Extella Architect | Qwen 3.6 Plus (NVIDIA) | Complex reasoning, deep architectural analysis |
| Llama 4 Maverick | Llama 4 Maverick | 1M context, multimodal, parallel processing of large volumes |
| Step 3.5 Flash | Step 3.5 Flash | 196B MoE, 262K context — batch processing, writing sections |
| Llama 3.3 70B | Llama 3.3 70B (Groq) | Ultra-fast inference (300+ tokens/sec) — urgent tasks |
| Architect R1 | DeepSeek R1 | Chain-of-thought reasoning, complex logical tasks, planning |
| Auto Router | Auto Router (OpenRouter) | Automatic selection of optimal model + fallback chain |
Workers (Llama, Step, DeepSeek, Qwen) are used by the orchestrator for parallel processing: writing 10 sections simultaneously, analyzing 5 competitors, generating 20 pitch variations. These are the system's "computational muscle."
5.9Agent Growth Over Time
An agent in Extella is not a static tool. It grows like a real employee: accumulating knowledge in Concepts, guidelines in Rules, and patterns in KV Store. With each week of work, it becomes smarter and more precise.
| Stage | Agent state | What happens | Example |
|---|---|---|---|
| Day 1 | Empty profile | Competent as a good LLM, but doesn't know your business | CTO writes Python, but doesn't know your architecture |
| Week 2 | 15-30 concepts, 5-10 rules | Knows code style, typical mistakes, team preferences | "Inproject FastAPI+PostgreSQL+Redis. Migrations use Alembic." |
| Month 3 | 100-300 concepts, Cron patterns | Suggests solutions from project history, predicts problems | "Add caching" -> immediately suggests Redis pattern you already use |
| Month 6 | 500+ concepts, autonomous operation | Project expert. Makes decisions without additional context | Autonomously alerts about degradation based on historical data |
Path to autonomy:
- Day 1: "What do you use for the database?"
- Week 2: "I recommend PostgreSQL, as you already use"
- Month 3: "I'll add Redis cache, similar to the auth module"
- Month 6: The agent proposes changes, implements them, and reports back—without additional context
Exporting Agent Conversations
After 6 months of work, you can export all conversations:
POST https://api.extella.ai/api/agent/export/chats Authorization: Bearer <your-token> // By agent: {"by": "agent", "id": "agent_xyz..."} // By profile (all agents): {"by": "profile", "id": "team_abc..."}
The resulting JSON containing thousands of QA pairs and discussions serves as a valuable dataset for quality analysis or fine-tuning. The actual training process is performed outside the Extella platform—on your own infrastructure or through external services.
6.Local Models & Tunnels
6.1Why Do You Need a Tunnel?
In Section 5, you learned that you can connect any language model to an agent—including a local one running on your laptop or home server. This opens up significant possibilities: complete data privacy, zero inference cost, any uncensored models, and offline operation.
However, there's a technical detail: local LLM servers (Ollama, LM Studio, llama.cpp, etc.) listen only on localhost by default—an address accessible only from the same device. Extella operates as a cloud service and physically cannot reach your localhost directly.
The solution is a tunnel. This is a program that creates an encrypted "bridge" between your device and a public HTTPS address on the internet. Extella connects to the public address, and the tunnel transparently forwards requests to your localhost. To Extella, it looks like a standard cloud-based API server.
| Scenario | Without tunnel | With tunnel |
|---|---|---|
| Extella + local model | ❌ Unreachable | ✅ Works via public URL |
| Access from phone | ❌ Localhost only | ✅ Any device |
| Demo to a colleague | ❌ Requires VPN or presence | ✅ Just share the URL |
| CI/CD integration | ❌ No public address | ✅ webhook + tunnel |
💡 A tunnel doesn't slow down your model—it adds only 10–50ms of network latency. For text generation, this is imperceptible.
6.2Ports and Base API URL
Each LLM server listens on its own port. This is important when creating a tunnel—you need to tunnel the specific port where your model is running.
| Server | Port | Base API URL (local) |
|---|---|---|
| LM Studio | 1234 | http://localhost:1234/v1/ |
| llama.cpp server | 8080 | http://localhost:8080/v1/ |
| Ollama | 11434 | http://localhost:11434/v1/ |
| Jan | 1337 | http://localhost:1337/v1/ |
| KoboldCPP | 5001 | http://localhost:5001/v1/ |
| LocalAI | 8080 | http://localhost:8080/v1/ |
⚠️ Golden rule: in the baseURL field in Extella, always specify the URL up to and including /v1/—and nothing beyond. Extella automatically appends the required path (/chat/completions, /embeddings, etc.).
✅ Correct: https://abc123.ngrok-free.app/v1/
❌ Incorrect: https://abc123.ngrok-free.app/v1/chat/completions
Including an extra endpoint is the most common reason a local model "doesn't respond" in Extella. Check this first.
6.3Three Tunneling Methods: Which to Choose
Several tools are available. Here are recommendations for choosing:
| Method | Free | Persistent URL | Complexity | Best for |
|---|---|---|---|---|
| ngrok | Limited* | Paid / static | 🟢 Easy | Quick start, testing |
| Cloudflare Tunnel | ✅ Fully | ✅ Custom domain | 🟡 Medium | Permanent operation |
| LocalTunnel | ✅ Fully | Partial | 🟢 Easy | Quick testing without registration |
| Tailscale Funnel | ✅ Free | ✅ Stable | 🟡 Medium | If already using Tailscale |
* ngrok provides one free tunnel with a random URL. A static domain is free with registration (one domain).
💡 For most Extella users, I recommend starting with ngrok (5 minutes to results), then switching to Cloudflare Tunnel for regular use—it's free, reliable, and supports your own domain.
6.4Method 1: ngrok — Fastest Way to Get Started
ngrok is the simplest way to get a working public URL in 2–3 minutes. It's an excellent choice for your first exposure to the topic or for occasional tasks.
Installation
| OS | Command |
|---|---|
| macOS (Homebrew) | brew install ngrok/ngrok/ngrok |
| Linux (Debian/Ubuntu) | sudo apt install ngrok (after adding the repository — see below) |
| Windows | winget install ngrok |
Linux:
# Linux — full installation: curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \ | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \ | sudo tee /etc/apt/sources.list.d/ngrok.list sudo apt update && sudo apt install ngrok
Registration (one-time)
ngrok requires free registration at ngrok.com. After registering:
Add your token (copy from Dashboard → Your Authtoken):
ngrok config add-authtoken YOUR_TOKEN_HERE
💡 The tunnel works without a token but has session time limits. With a token, there are no restrictions.
Starting the Tunnel
One command and your tunnel is ready:
ngrok http 1234 # LM Studio ngrok http 8080 # llama.cpp ngrok http 11434 # Ollama
After starting, the terminal displays a line like:
Forwarding https://abc123.ngrok-free.app -> http://localhost:1234
Your base API URL for Extella: https://abc123.ngrok-free.app/v1/
Static URL (free)
Random URLs change with every restart, which is inconvenient if the URL is saved in Extella. The solution is a static domain:
ngrok http --domain=your-static-name.ngrok-free.app 1234
One static domain is free. You can register it in Dashboard → Domains.
Password protection (recommended)
A public URL without protection means anyone on the internet can send requests to your model. Add basic authentication:
ngrok http --basic-auth="user:strongpassword" 1234
⚠️ Don't leave a tunnel open without authentication for long periods. Bots actively scan known ngrok domains.
6.5Method 2: Cloudflare Tunnel — for permanent operation
Cloudflare Tunnel (cloudflared) is an enterprise-grade solution from Cloudflare. Completely free with no traffic limits, and supports custom domain binding. Ideal if you plan to use a local model with Extella regularly.
Key advantage over ngrok: the tunnel doesn't depend on open router ports and works even behind double NAT (corporate networks, mobile internet, etc.).
Installing cloudflared
macOS / Windows:
brew install cloudflare/cloudflare/cloudflared # macOS winget install Cloudflare.cloudflared # Windows
Linux (Debian/Ubuntu):
curl -L --output cloudflared.deb \ https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb sudo dpkg -i cloudflared.deb
Method A: Quick tunnel without registration
For a quick test, run without an account:
cloudflared tunnel --url http://localhost:1234 # LM Studio cloudflared tunnel --url http://localhost:11434 # Ollama
The URL will look like: https://some-random-name.trycloudflare.com/v1/
⚠️ The URL changes with each restart. Method B provides a permanent address.
Method B: Permanent tunnel with your own domain
For permanent operation, you need to register at cloudflare.com (free) and have your own domain (or subdomain).
Step 1 — Authentication:
cloudflared tunnel login
Step 2 — Create the tunnel:
cloudflared tunnel create my-llm-tunnel
Step 3 — Create the ~/.cloudflared/config.yml file:
tunnel: <TUNNEL_ID>
credentials-file: /Users/YOUR_USER/.cloudflared/<TUNNEL_ID>.json
ingress:
- hostname: llm.yourdomain.com
service: http://localhost:1234
- service: http_status:404
Step 4 — Configure DNS:
cloudflared tunnel route dns my-llm-tunnel llm.yourdomain.com
Step 5 — Start the tunnel:
cloudflared tunnel run my-llm-tunnel
Your permanent URL: https://llm.yourdomain.com/v1/
Auto-start on system boot
sudo cloudflared service install sudo systemctl start cloudflared # Linux # macOS: launchd service is created automatically
💡 Once auto-start is configured, the tunnel launches on every system boot. You can set the URL in Extella once and forget about it.
6.6Method 3: Alternatives
LocalTunnel — As simple as it gets (Node.js)
If ngrok feels like overkill and you already have Node.js installed:
npm install -g localtunnel lt --port 1234 --subdomain my-llm
URL: https://my-llm.loca.lt/v1/ — the subdomain persists if the name is available.
⚠️ LocalTunnel is less stable than ngrok or Cloudflare. Best suited for one-off tests.
Tailscale Funnel
If you're already using Tailscale for VPN networking, Funnel exposes your server to the internet with a single command:
tailscale funnel 1234
The URL is generated from your device name in Tailscale. Very convenient if you already have Tailscale set up.
6.7Connecting to Extella: Step-by-Step
Once your tunnel is running and you have a public URL, add the model to Extella as a custom provider. This is done in agent settings (Section 5).
| Field in Extella | What to enter | Example |
|---|---|---|
| provider | custom | custom |
| baseURL | Tunnel URL + /v1/ | https://abc123.ngrok-free.app/v1/ |
| apiKey | Any string (model doesn't validate) | lm-studio or ollama |
| model | Model name on server | llama-3.2-3b-instruct |
💡 The model name in the model field must match exactly how the model is named on the server. In LM Studio, this is the model filename. In Ollama, check the output of ollama list.
After saving, the agent will use your local model for all requests. Concepts, KV Store, Rules, and all Experts work exactly the same—they're stored in Extella's cloud, only inference (text generation) goes through your model.
6.8Local Server Configuration
Before creating a tunnel, make sure your server accepts external connections. By default, most servers listen only on localhost and will reject requests coming through the tunnel.
LM Studio
- Open the Local Server tab
- Enable ✓ Enable CORS
- Enable ✓ Allow connections from network
- Click Start Server
💡 Without Enable CORS, requests from Extella will be rejected by the browser with a CORS policy error. This is a common pitfall.
llama.cpp — Server Launch Parameters
Launch with external connections enabled:
./llama-server \ -m ./models/your-model.gguf \ --port 8080 \ --host 0.0.0.0 \ -c 4096 \ -ngl 35
Key parameter: --host 0.0.0.0 (accept connections from all sources, not just localhost). -ngl 35 specifies the number of layers on GPU (adjust based on your graphics card).
Ollama — enabling external connections
By default, Ollama only accepts localhost connections. You need to modify the environment variable:
macOS / Linux (temporary):
OLLAMA_HOST=0.0.0.0 ollama serve
Linux (permanent via systemd):
sudo systemctl edit ollama # Add to the [Service] section: Environment="OLLAMA_HOST=0.0.0.0"
⚠️ After changing OLLAMA_HOST, restart the service: sudo systemctl restart ollama
6.9Security and performance
Security — required reading
A public URL without protection means anyone can use your model—free of charge at your CPU/GPU expense. This isn't a theoretical threat: bots actively scan for open LLM endpoints.
| Threat | Solution |
|---|---|
| Unauthorized model access | ngrok --basic-auth or Cloudflare Access |
| Data interception | Tunnels use HTTPS — data is encrypted |
| Prompt leakage | Don't tunnel the model unnecessarily, use VPN networks (Tailscale) |
| DDoS on model | Rate limiting in Cloudflare or ngrok Pro |
💡 The simplest protection option is ngrok --basic-auth. Extella supports basic authentication: specify a user:password string in Base64 format in the apiKey field.
Performance
- Tunneling adds 10–50ms latency. This is negligible for text generation.
- For long responses, use streaming—users see text as it generates rather than waiting for completion.
- Ensure the LLM server is running before starting the tunnel—otherwise the tunnel will be created, but requests will fail.
- GPU acceleration remains on your machine—the tunnel doesn't affect inference speed.
6.10Quick reference
| Tool | Command | URL Type | Recommendation |
|---|---|---|---|
| ngrok | ngrok http 1234 | Random / static | Best start |
| cloudflared (quick) | cloudflared tunnel --url http://localhost:1234 | Random | Testing without registration |
| cloudflared (persistent) | via config.yml + custom domain | Permanent | For regular use |
| LocalTunnel | lt --port 1234 --subdomain my-llm | Random / subdomain | Quick one-time test |
| Tailscale Funnel | tailscale funnel 1234 | Permanent | If Tailscale is already installed |
Bottom line: run ngrok http <port>, grab the URL from the output, append /v1/ to the end, paste it into the baseURL field in your Extella agent settings—and your local model is ready to go.
7.Parallel Execution
7.1The Physics of Parallelism
The formula is simple:
Sequential: T = T1 + T2 + T3 + ... + TN
Parallel: T = max(T1, T2, T3, ..., TN) + ~1 sec polling
| Scenario | Sequential | parallel_task | Speedup |
|---|---|---|---|
| 3 tasks x 15 sec | 45 sec | 16 sec | 2.8x |
| 5 tasks x 20 sec | 100 sec | 21 sec | 4.8x |
| 10 tasks x 30 sec | 300 sec (5 min) | 31 sec | 9.7x |
| 20 tasks x 30 sec | 600 sec (10 min) | 31 sec | 19.4x |
This isn't optimization — it's a formula change. Claude Code thinks like an LLM — sequentially. Extella parallel_task thinks like hardware — in parallel. Modern CPUs have multiple cores, each executing independent tasks simultaneously.
7.2Five Problems with Synchronous Mode
Problem 1: Timeout — Results Lost
Most synchronous LLM agents have a hard timeout of ~5 minutes. Processing 1000 files? Training? Deep analysis? — results vanish without warning. With parallel_task, each worker is independent of the LLM connection. Even if the connection drops, the OS process keeps running.
Problem 2: Linear Time Accumulation
| Tasks | Synchronous | Parallel | Lost time |
|---|---|---|---|
| 2 x 30s | 60s | 31s | 29s |
| 5 x 30s | 150s | 31s | 119s |
| 10 x 30s | 300s | 31s | 269s |
Problem 3: No Way to Cancel
In synchronous mode, there's no Cancel button. Spotted an error at second 3 of 300 — you still wait. In Task Registry, each task has a ✕ button (SIGTERM by PID) — instant termination.
Problem 4: No Visibility
With parallel_task, each task writes its status to /tmp/pt_{uuid}.json. The agent can read the file at any time to check task state: running, complete, error. For visual monitoring, you can optionally deploy task_registry_server — a custom Flask-based Expert with an HTML dashboard (see section 7.3 for details).
Problem 5: Lost Traceback on Error
When a synchronous process crashes — you get a generic message without context. In Extella, each parallel_task job writes the full traceback to /tmp/pt_{uuid}.json (error field). If task_registry_server is running, the traceback is also available via GET /tasks/<uuid>.
7.3Task Statuses and Diagnostics
The state of each parallel_task is stored in a /tmp/pt_{uuid}.json file on the device. This is the primary and only guaranteed tracking mechanism—it works without any additional components.
| Field in file | Value | Description |
|---|---|---|
| status | "running" | Task is running |
| status | "complete" | Task completed successfully |
| status | "error" | Task failed with error |
| result | dict from worker | Result (only on complete) |
| error | traceback string | Error details (only onerror) |
Example: reading task status manually:
import json
from pathlib import Path
data = json.loads(Path(f'/tmp/pt_{uuid}.json').read_text())
print(data['status']) # 'running' / 'complete' / 'error'
print(data.get('result')) # worker result if complete
7.3.1. Optional Visual Dashboard — task_registry_server
task_registry_server is not a built-in platform interface but a separate custom Expert (Flask application) that you can run for visual task monitoring in a browser. It is not required: parallel_task and wait_tasks work without it.
⚠️ task_registry_server is a custom component. If it hasn't been created by the agent in your account, ask the agent: "Create task_registry_server for monitoring parallel tasks."
When task_registry_server is running, it provides:
| Feature | Description |
|---|---|
| HTML UI | Browser page:http://localhost:7755 — task list with auto-refresh |
| GET /tasks | JSON with all tasks |
| GET /tasks/<uuid> | Details of specific task + logs |
| DELETE /cancel/<uuid> | SIGTERM by task PID → status cancelled |
| POST /clear | Clear all records |
| GET /health | {ok: true, port: 7755, tasks: N} |
Starting and managing (if the Expert exists):
# Start:
run_expert('task_registry_server')
# -> {"status": "success", "url": "http://localhost:7755/", "port": 7755}
# If already running:
# {"status": "already_running", "port": 7755, "tasks": 3}
# Force restart:
run_expert('task_registry_server', {'force_restart': '1'})
Persistence: task_registry_server stores the state of all tasks in /tmp/extella_task_registry.json. When the server restarts, the file is re-read and all records are restored. Without task_registry_server, state is stored only in /tmp/pt_{uuid}.json.
7.4UUID vs PID: A Fundamental Design Decision
Why not just use the process PID?
The OS reuses PIDs. A process terminates with PID 12345—a second later, a new process may receive the same PID. When canceling by PID, you might kill not your task but a completely different process.
UUID v4 is globally unique. Never reused. Independent of OS, containers, or reboots. Format: a1b2c3d4-e5f6-7890-abcd-ef1234567890.
All registry operations use UUID. PID is stored only for SIGTERM during cancellation.
7.5The __api_token__ Parameter (Required with task_registry_server)
Without __api_token__, workers cannot register with the Task Registry or report results.
Three reserved parameters are relevant when using task_registry_server:
__registry_url__ — Registry URL (default: http://localhost:7755) __description__ — Human-readable task description for UI __api_token__ — Extella API token for registering the task in the registry
Without task_registry_server, the __api_token__ parameter is not needed: parallel_task operates via /tmp/pt_{uuid}.json without server calls.
7.6The 4-Step Pattern: Complete Example
Step 0: Start Registry (must be first!)
registry = run_expert('task_registry_server')
print(registry) # {"status": "success", "url": "http://localhost:7755/"}
Step 1: Get API token
API_TOKEN = kv_get('extella_api_token')['value']
Step 2: Launch workers in parallel
# Each call returns a UUID immediately (~0.5 sec)
# Worker runs in background as a separate OS process
r1 = run_expert('analyze_document', {
'file_path': '/tmp/doc1.pdf',
'__api_token__': API_TOKEN,
'__description__': 'Analysis: doc1.pdf'
})
r2 = run_expert('analyze_document', {
'file_path': '/tmp/doc2.pdf',
'__api_token__': API_TOKEN,
'__description__': 'Analysis: doc2.pdf'
})
r3 = run_expert('analyze_document', {
'file_path': '/tmp/doc3.pdf',
'__api_token__': API_TOKEN,
'__description__': 'Analysis: doc3.pdf'
})
uuid1 = r1['uuid']
uuid2 = r2['uuid']
uuid3 = r3['uuid']
# All three launched in ~1.5 sec total
Step 3: Wait for all to complete
import json
results = run_expert('demo_wait_tasks', {
'uuids': json.dumps([uuid1, uuid2, uuid3]),
'timeout': 120,
'poll_interval': 2
})
# Polls http://localhost:7755/tasks every 2 sec
# Returns when ALL tasks complete or timeout
# -> {
# "status": "complete",
# "summary": {"total": 3, "complete": 3, "error": 0},
# "elapsed_seconds": 31.2,
# "results": {uuid1: {...}, uuid2: {...}, uuid3: {...}}
# }
Processing results
if results['summary']['error'] > 0:
# Handle failed tasks
for uuid, result in results['results'].items():
if result.get('status') == 'error':
print(f'Task {uuid} failed: {result.get("error")}')
# Retry or log
else:
print(f'All {results["summary"]["total"]} tasks completed')
print(f'Time: {results["elapsed_seconds"]}s')
for uuid, result in results['results'].items():
print(f'{uuid[:8]}...: {result["result"]}')
7.7Comparison with synchronous mode
| Characteristic | Synchronous | parallel_task |
|---|---|---|
| Time for N tasks | N x T | max(T) + ~1s |
| Task IDs | No | UUID v4 (globally unique) |
| Cancellation | No | ✅ Cancel (SIGTERM) |
| Visibility | No | ✅ /tmp/pt_{uuid}.json; optionally — UI :7755 (if task_registry_server is running) |
| Traceback on error | Lost | ✅ Saved to registry |
| Timeout | ~5 min, result lost | Configurable |
| LLM dependency | Full | Process independent |
| Persistence | No | JSONin /tmp, survives restart |
7.8When to use parallel_task
| Condition | Choice |
|---|---|
| Task A is required for Task B (dependency) | Synchronous mode |
| Single task | Synchronous (overhead not worth it) |
| Each task < 5 sec | Synchronous (registration overhead > benefit) |
| 2+ independent tasks > 5 sec | parallel_task |
| Task > 1 minute | parallel_task(timeout protection) |
| Need cancellation support | parallel_task |
| Progress visibility needed | parallel_task |
Practical parallel_task examples: scraping 10 websites, analyzing a batch of 50 CSVs, generating reports across different metrics, checking API endpoints for test coverage.
7.9Critical rules
task_registry_server is an optional component. parallel_task works without it (state stored in /tmp/pt_{uuid}.json). If task_registry_server is used, start it BEFORE workers; otherwise POST /register will get ConnectionRefused.
- UUID, not PID — UUID is globally unique and not reused by the OS
- /tmp/extella_task_registry.json — single source of truth, survives restarts
- Worker ALWAYS calls /update — even on crash (try/except -> POST error status with traceback)
- __api_token__ = kv_get('extella_api_token')['value'] — avoid hardcoding tokens in your code
- Pass uuids as a JSON string: json.dumps([uuid1, uuid2]) — not a Python list
- One task at a time for heavy workloads — avoid running more than N workers simultaneously
8.CSPL
CSPL (Container Specific Programming Language) is Extella's paradigm for building automations: instead of having the LLM generate all the code, the LLM writes a compact description and a deterministic handler generates the actual code from it.
In Section 7, you worked with parallel_task and nohup — two CSPL modes. Now let's examine the complete CSPL architecture and why it fundamentally changes how complex automations are built.
8.1The Problem: LLMs Struggle with Large-Scale Precise Code
A real experiment — creating Godot Level 3 (a complete scene with 193 nodes):
| Tool | Tokens | Errors | Retries | Total |
|---|---|---|---|---|
| CSPL | ~1 000 | 0 | 1 | Perfect |
| fython (LLM generates all Python) | ~8 000 | 7 | 4 | Many revisions |
| Claude Code | ~15 000 | 12 | 6 | Very slow |
LLMs excel at planning — describing architecture, breaking down tasks. But token-by-token generation with probabilistic sampling is fundamentally unsuited for large-scale syntactically precise code. A single typo breaks the entire project. Every error means rerunning, thousands of tokens, minutes of your time.
The solution: shift the paradigm. Instead of "LLM writes all the code" → "LLM writes a compact description, deterministic handler generates the code."
8.2The WHAT vs HOW Principle
- LLM (WHAT): generates a compact JSON description of the structure (~200 tokens for a 193-node scene). This is a declarative description: what objects exist, how they connect, what parameters they have.
- Handler (HOW): a Python module that takes JSON and deterministically generates complete code. Same input — always same output. Zero hallucinations.
Example: an Expert with cspl=godot_level_3 contains not Python but a JSON scene description in its body. The handler generates .tscn files and GDScript. The LLM wrote 200 tokens of JSON instead of 8000 tokens of GDScript. Errors — zero.
cspl=godot_level_3:
# Expert body — not Python, but JSON description:
{
"scene": "main_level",
"nodes": [
{"id": 1, "type": "Node2D", "name": "Player", "pos": [100, 200]},
{"id": 2, "type": "Area2D", "name": "Hitbox", "parent": 1},
{"id": 3, "type": "Sprite2D", "name": "Sprite", "parent": 1}
],
"signals": [{"from": 2, "signal": "body_entered", "to": 1, "method": "on_hit"}]
}
# Handler godot_level_3 generates complete .tscn + GDScript from this
8.3Complete Table of CSPL Modes
| Mode | Body type | $extens | Returns | Synchronicity | When to use |
|---|---|---|---|---|---|
| fython (default) | Python def fn() | + | dict from function | Synchronous | Regular Experts (Section 4) |
| nohup | Python script (no def) | - | {pid, log_file} | Detached process | Orchestrators, ETL, long-running tasks |
| parallel_task | Python def fn() | + | {uuid} | Asynchronous, /tmp/pt_{uuid}.json | Parallel Tasks (Section 7) |
| shell | Bash commands | - | {stdout, returncode} | Synchronous | CLI wrappers: git, docker, ffmpeg |
| interpreter | Code in any language | - | Depends on language | Synchronous | Go, R, SQL, Node.js, Julia |
| cspl_builder_code | Python handler | + | — | Synchronous | Creating a new CSPL mode |
8.4nohup Mode — Complete Specification
nohup differs fundamentally from fython. The body is a pure Python script that executes from start to finish. The Listener writes it to a temporary file and launches it via subprocess.Popen(start_new_session=True) — the process detaches and runs independently.
1. No def fn() — pure script, top to bottom
# fython (regular expert):
def my_expert(param: str = '') -> dict:
# ... logic
return {"status": "success"}
# nohup (script without function):
import os, datetime
log_path = '/tmp/nohup_test.txt'
with open(log_path, 'w') as f:
f.write(f'ran at {datetime.datetime.now()}\n')
f.write(f'cwd: {os.getcwd()}\n')
# No return — script simply executes and exits
2. No $extens() — manual include() (optional)
In nohup mode, the $extens() directive is not processed (no fython wrapper). You can install dependencies and import them just like in regular Python — using pip or any other standard method. Alternatively, implement include() directly at the beginning of your script:
import sys, subprocess
def include(module, commands):
try:
exec(module, globals())
return True
except:
for cmd in commands:
parts = cmd.split()
if parts[0] in ('extella-pip', 'pip', 'pip3'):
subprocess.run([sys.executable, '-m', 'pip'] + parts[1:])
try:
exec(module, globals())
return True
except:
return False
include('import pandas', ['extella-pip install pandas'])
include('import requests', ['extella-pip install requests'])
# pandas and requests are now available
3. Parameters via {{placeholders}}
Kwargs are substituted into the script text BEFORE execution. Use {{parameter_name}} in your code:
# Parameters: api_token='abc123', file_path='/tmp/data.csv', output_dir='/tmp'
import pandas as pd
api_token = '{{api_token}}' # <- will be replaced with 'abc123'
file_path = '{{file_path}}' # <- will be replaced with '/tmp/data.csv'
output_dir = '{{output_dir}}' # <- will be replaced with '/tmp'
df = pd.read_csv(file_path)
result = df.groupby('category').sum()
result.to_csv(f'{output_dir}/output.csv', index=False)
4. No return — result via file
import json
from pathlib import Path
# ... perform work ...
result = {
'status': 'success',
'processed_rows': 15000,
'errors': 3,
'output_file': '/tmp/result.csv'
}
# Must write result:
Path('/tmp/nohup_my_script_result.json').write_text(
json.dumps(result, ensure_ascii=False)
)
5. Logs and management
stdout/stderr → /tmp/nohup_<name>.log. The agent receives an immediate response: {pid, log_file, pid_file}. Monitor by reading the log file. On completion, read result.json.
8.4.1. wait_tasks mode — synchronization barrier
wait_tasks is a CSPL mode paired with parallel_task: it accepts a list of UUIDs for running tasks and waits for all to complete (or until timeout). It polls /tmp/pt_{uuid}.json every 0.3–2 seconds.
| Parameter | Type | Default | Description |
|---|---|---|---|
| uuids | str (JSON) | required | JSON array of UUIDs: json.dumps([uuid1, uuid2]) — strictly a string, not a Python list |
| timeout | int | 120 | Maximum wait time in seconds |
| poll_interval | float | 2 | File polling interval (seconds) |
What demo_wait_tasks returns:
{
"results": {
"uuid-1...": {"status": "complete", "result": {...}},
"uuid-2...": {"status": "complete", "result": {...}}
},
"summary": "2/2 completed",
"elapsed_seconds": 31.2
}
Entry point: bridge expert demo_wait_tasks (saved with cspl=wait_tasks). This is what you call via run_expert — the wait_tasks CSPL mode is not called directly.
8.4.2. shell and interpreter modes
Two additional modes for CLI tools and code in other languages. Both support {{placeholders}} for kwargs.
shell — built-in bash runner
The Expert body consists of bash commands. No function, no $extens. The Listener executes via subprocess and returns {stdout, stderr, returncode}.
# cspl='shell' — video conversion via ffmpeg:
ffmpeg -i {{input_path}} -vf scale=1280:720 -c:a copy {{output_path}}
# cspl='shell' — git pull:
git -C {{repo_path}} fetch origin
git -C {{repo_path}} pull --rebase
| Use shell for | Examples |
|---|---|
| Media Processing | ffmpeg, ImageMagick convert, sox |
| Documents | pandoc, wkhtmltopdf, libreoffice --headless |
| Git Operations | git fetch, git pull, git tag, git log |
| System utilities | rsync, tar, curl, wget, find |
| Containers and orchestration | docker build/run, kubectl apply |
interpreter — code in any installed language
The Expert body is code in any language. The handler compiles/interprets it on the device. Kwargs are accessible via {{placeholders}} as in nohup.
# cspl='interpreter' — Go code:
package main
import "fmt"
func main() {
data := "{{input}}"
fmt.Println("Processed:", data)
}
| Language | When to Use |
|---|---|
| Go | High-performance data processing, binary operations |
| R | Statistical analysis, ML models, ggplot visualizations |
| SQL | Analytical queries to local databases |
| Node.js | JSON-processing, working with npm-ecosystem |
| Julia | Scientific and matrix computations |
| Ruby | System administration, Rakefile scenarios |
8.5DSL: Domain-Specific Languages
CSPL enables creating compact languages for specific domains. Instead of 400 lines of HTML/CSS/JS — 40 lines of JSON, and the handler generates a complete website.
| Domain | CSPL Mode | Generates | Token Savings |
|---|---|---|---|
| Web API | api_dsl | FastAPI + Pydantic + OpenAPI | 10x |
| Database | schema_dsl | SQL DDL + Alembic migrations | 8x |
| CI/CD pipeline | pipeline_dsl | GitHub Actions YAML | 12x |
| Godot levels | godot_level_3 | .tscn + GDScript | 15x |
| HDL schematics | hdl_dsl | Verilog / VHDL | 20x |
| Tests | test_dsl | pytest fixtures + test cases | 6x |
| Markdown reports | mini_report_dsl | HTML or Markdown | 8x |
Example DSL for Web API (6 lines instead of hundreds):
# Expert body with cspl='api_dsl':
API UserService BASE /api/users AUTH bearer
GET / -> list[User] CACHE 60
POST / -> User BODY {name: str, email: str}
GET /:id -> User
DELETE /:id -> void
# Handler generates a complete FastAPI router, Pydantic models, and OpenAPI documentation
⚠️ The DSL modes listed in the table above (api_dsl, schema_dsl, godot_level_3, etc.) are examples of custom handlers created via cspl_builder_code. They are not included in the standard Extella distribution. Only the following are built-in (available out of the box): fython, nohup, parallel_task, wait_tasks, shell, interpreter, cspl_builder_code.
8.6cspl_builder_code: Creating Your Own CSPL
A meta-mode: you create a new CSPL type directly from the chat, without modifying platform code. The architecture is extensible on the fly.
Process:
- 1. Describe the desired handler: "Create a CSPL for FastAPI from a JSON schema"
- 2. The agent writes the Python handler code: a function that takes the code body and generates an artifact
- 3. The handler is registered as a new cspl type in the system
- 4. The new mode is available immediately: cspl='fastapi_generator'
# Example of a simple DSL handler:
def my_report_dsl(filtered_source_code='', func_name='', kwargs=None, **extra):
lines = filtered_source_code.strip().split('\n')
html_parts = []
for line in lines:
if line.startswith('TITLE'):
html_parts.append(f'<h1>{line[6:]}</h1>')
elif line.startswith('SECTION'):
html_parts.append(f'<h2>{line[8:]}</h2>')
elif line.startswith('> '):
html_parts.append(f'<p>{line[2:]}</p>')
html = '<html><body>' + ''.join(html_parts) + '</body></html>'
output = Path('/tmp/report.html')
output.write_text(html)
return {'status': 'success', 'output': str(output), 'sections': len(html_parts)}
8.7The Recursive Nature of CSPL: No Ceiling
Each handler can use other handlers. There is no upper limit:
- Level 1: fython with JSON description → handler generates Python classes
- Level 2: interpreter with Go code → handler compiles Go binary
- Level 3: Go uses C library → handler generates ctypes wrapper
- Level 4: C on ARM64 → handler generates inline assembly for optimization
- Level N: ...
CSPL is a bridge between declarative description (what LLMs do well) and imperative implementation (what LLMs do poorly). This bridge is built from Python, Go, C, bash, SQL, GDScript, Terraform, Dockerfile.
8.8When NOT to Use CSPL
In practice: when in doubt, use fython. CSPL pays off only for recurring task classes where token and error savings are significant.
Three Conditions: When CSPL Is Justified
| Condition | Validation Question | If NO → |
|---|---|---|
| 1. Class of Repetitive Tasks | Will this be used multiple times, not just once? | fython — CSPL requires investment in handler |
| 2. Logic more complex data | Does the output format have internal dependencies that require computation? | fython — the LLM can handle it directly |
| 3. Dataandlogic are separated | Is it clear: here's what changes each time, here's what stays the same? | fython — boundary is fuzzy, CSPL won't provide benefit |
CSPL makes sense only when all three conditions are met simultaneously. If even one is violated, use fython.
Examples of applying the rule:
| Task | Cond.1 | Cond.2 | Cond.3 | Output |
|---|---|---|---|---|
| Generate 50 similar reports with different data | ✅ | ✅ | ✅ | CSPL ✅ |
| A one-off parsing script CSV | ❌ | — | — | fython |
| Godot-levels (recurring pattern) | ✅ | ✅ | ✅ | CSPL ✅ |
| Calling OpenAI API with different prompts | ✅ | ❌ | — | fython — logic simpler data |
| FastAPI-routers from JSON-schemas (10+ items) | ✅ | ✅ | ✅ | CSPL ✅ |
Situations Where CSPL Is Overkill
| Situation | Recommendation |
|---|---|
| Task up to 100 lines of code | fython — LLM will write without errors |
| Unique one-time task | fython — CSPL requires repetitive pattern |
| Need result immediately | fython or shell — nohup asynchronous |
| Handlermore complex than the task itself | Handler must generate 10x more code |
| No ready-made handler for the domain | First create the handler using cspl_builder_code |
9.REST API
9.1Three Scenarios: Why You Need the API
Scenario 1: Embedding in Your Product
You're building a CRM, ERP, chatbot, or any platform. Instead of training a model from scratch, call a ready-made Extella agent via API. Your backend sends a prompt — Extella returns a response. Your product's end user never knows Extella is working under the hood.
Scenario 2: Automating Background Tasks
Every night a script pulls new documents, runs the agent, gets the analysis, and writes the results. CI/CD pipelines use Experts to generate documentation, validate code, and process logs. The API supports async mode: async=true → task_id → polling /api/task/check. Ideal for background tasks that don't block the main process.
Scenario 3: Exporting Data for Analytics and Fine-Tuning
/api/agent/export/chats — complete conversation history (a valuable dataset for fine-tuning). /api/agent/export/calls — call logs: model, latency_ms, prompt_tokens, completion_tokens, created_at. Valuable for AI cost analysis, prompt optimization, and fine-tuning local models.
9.2Base URL and the Sneaky 405 Error
⚠️ API Version: docs.extella.ai describes v0.7.0 — 48 endpoints, 11 sections. Primary authentication header: X-Auth-Token. The Authorization: Bearer method is accepted as an alternative.
The most common cause of HTTP 405 (Method Not Allowed) is sending an API request to the wrong URL.
| URL | Purpose | Rule |
|---|---|---|
| https://api.extella.ai/api/agent/* | Agents API | ALL requests to agents |
| https://api.extella.ai/api/expert/* | Experts API | ALL Expert requests |
| https://api.extella.ai/api/concept/* | Concepts API | Working with Concepts |
| https://api.extella.ai/api/kv/* | KV Store API | Key-Value |
| https://api.extella.ai/api/rules/* | Rules API | Rules |
| https://api.extella.ai/api/token/* | Tokens API | Token Management |
| https://api.extella.ai/api/profile/* | Profiles API | Profile Management |
Rule: for EVERYTHING starting with /api/ — use https://api.extella.ai
9.3Authentication: Two Equivalent Methods
# Method 1 (preferred — Bearer standard): Authorization: Bearer <your-token> # Method 2 (Extella-specific): X-Auth-Token: <your-token> # For Database Services (/api/concept/*, /api/kv/*) # passing user_id in the request body also works, # but the authorization header is preferred
Getting your first token via the agent: type "Generate an API token" — you'll get one instantly. Via API:
POST https://api.extella.ai/api/token/generate
Authorization: Bearer <existing_token>
Content-Type: application/json
{"name": "Production API"}
# -> {"token": "a1b2c3...", "user_id": "user_abc", "name": "Production API"}
Validation (rate limit: 30 requests/min — validate once at startup, not before each request):
POST https://api.extella.ai/api/token/validate
{"token": "your-token"}
# -> {"valid": true, "user_id": "user_abc"}
9.4OpenAI-Compatible Mode
If your application already works with OpenAI, switching to Extella requires minimal changes:
from openai import OpenAI
client = OpenAI(
api_key="your_extella_token",
base_url="https://api.extella.ai/v1",
)
response = client.chat.completions.create(
model="gpt-4", # ignored — agent's model is used
messages=[
{"role": "user", "content": "What is REST API?"}
],
temperature=0.7,
)
print(response.choices[0].message.content)
/api/agent/run modes:
- sync (default) — blocking call, waits for complete response
- stream — Server-Sent Events, tokens delivered as generated (Accept: text/event-stream)
- async — immediately returns task_id, retrieve result via /api/task/check
Note (docs.extella.ai): agents are launched via POST /api/agent/run with agent_id passed in the request body (json={"agent_id": "agent_...", "input": "..."}). The X-Agent-Id header is also accepted as an alternative.
9.5Key Endpoints Reference
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/agent/run | Run agent (sync/stream/async) |
| POST | /api/agent/get | Get agent config |
| POST | /api/agent/create | Create agent (requires Pro) |
| POST | /api/agent/update | Update agent |
| POST | /api/agent/list | List agents |
| POST | /api/agent/export/chats | Export conversation history |
| POST | /api/agent/export/calls | Call log with metrics (parameters in request body) |
| POST | /api/profile/create | Create profile |
| POST | /api/profile/add_agent | Add agent to profile |
| POST | /api/profile/delete | Delete profile (agents remain) |
| POST | /api/profile/list | List profiles |
| POST | /api/expert/run | Run Expert |
| POST | /api/expert/save | Save Expert |
| GET | /api/expert/get/<name> | Get Expert by name |
| DELETE | /api/expert/delete/<name> | Delete Expert |
| POST | /api/blocks/search | Semantic search for Experts |
| POST | /api/task/check (or /api/tasks/check) | Async task status |
| POST | /api/concept/add | Add Concept |
| POST | /api/concept/search | Semantic search for Concepts |
| POST | /api/concept/update | Update Concept |
| POST | /api/concept/remove | Delete Concept |
| POST | /api/concept/list | List Concepts |
| POST | /api/kv/set | Set KV pair |
| POST | /api/kv/get | Get KV pair |
| POST | /api/kv/search | Semantic search in KV |
| POST | /api/kv/list | List KV pairs |
| POST | /api/rules/add | Add rule |
| POST | /api/rules/list | List rules |
| POST | /api/rules/update | Update rule |
| POST | /api/rules/remove | Delete rule |
| POST | /api/token/generate | Create token |
| POST | /api/token/validate | Validate token |
| POST | /api/token/revoke | Revoke token |
| POST | /api/token/list | List tokens |
| POST | /api/defaults/set_target | Set default device |
| POST | /api/defaults/get_target | Get default device |
Rate limits: 60 req/min per IP, 20 req/min for /api/agent/run. HTTP 429 — check Retry-After header, use exponential backoff.
Endpoints not listed in the table above (complete list at docs.extella.ai):
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health | Health Check — server status |
| POST | /api/agent/delete | Delete an agent (agent_id in request body) |
| POST | /api/kv/remove | Delete KV pair (key in body) |
| POST | /api/targets/add | Add device (target, description) |
| POST | /api/targets/list | List devices |
| POST | /api/targets/search | Semantic device search |
| POST | /api/targets/update | Update device (id required) |
| POST | /api/targets/remove | Delete device (id required) |
| POST | /api/experts_db/list | List experts from DB (metadata) |
9.6Field Name Pitfalls
Pitfall 1: blocks/search returns matches, not results
data = response.json()
# WRONG:
for r in data['results']: # KeyError!
print(r['similarity'])
# CORRECT:
for block in data['matches']: # 'matches'
print(block['score']) # 'score', not 'similarity'
Pitfall 2: expert/get uses camelCase
expert = response.json() # CORRECT field names: code = expert['expert_code'] # not 'code' params = expert['expert_params'] # not 'kwargs' name = expert['expert_name'] # not 'name' created = expert['createdAt'] # camelCase!
Pitfall 3: export/calls — parameters also go in body (POST), same as export/chats
# export/chats — parameters in body:
requests.post(BASE+'/api/agent/export/chats',
json={'by': 'agent', 'id': 'agent_...'})
# export/calls — parameters also in body (POST, not GET):
requests.post(BASE+'/api/agent/export/calls',
headers=HEADERS,
json={'by': 'agent', 'id': 'agent_...',
'limit': 200, 'from': '2026-01-01T00:00:00Z'})
Pitfall 4: No _id, only id
API responses don't use MongoDB-style _id. They use plain id. There's also no __v field (document version). This is a REST API, not Mongoose.
9.7Complete Working Python Example
Complete working example (save as extella_client.py):
import os, time, requests
BASE = "https://api.extella.ai"
TOKEN = os.environ["EXTELLA_API_TOKEN"] # store in .env, never in code
HEADERS = {"X-Auth-Token": TOKEN, "Content-Type": "application/json"}
# 1. Validate token (once at startup, not before every request)
r = requests.post(f"{BASE}/api/token/validate", json={"token": TOKEN})
assert r.json()["valid"], f"Invalid token: {r.text}"
# 2. Create agent (Pro plan required; on Free plan use an existing agent_id)
r = requests.post(f"{BASE}/api/agent/create", headers=HEADERS, json={
"name": "My API Agent",
"instructions": "You are a helpful assistant. Respond concisely.",
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001"
})
if r.status_code == 403:
raise SystemExit("Pro plan required for /api/agent/create — use an existing agent_id.")
agent_id = r.json()["agent_id"]
print(f"Agent: {agent_id}")
# 3. Synchronous call — response.output is a list of items
r = requests.post(f"{BASE}/api/agent/run", headers=HEADERS,
json={"agent_id": agent_id, "input": "What is 2 + 2?"})
text = next((c["text"] for item in r.json()["output"] if item["type"] == "message"
for c in item["content"] if c["type"] == "output_text"), "")
print("Sync:", text)
# 4. Async call — for tasks > 60 sec
r = requests.post(f"{BASE}/api/agent/run", headers=HEADERS,
json={"agent_id": agent_id,
"input": "Summarize AI trends in 2025.",
"async": True})
task_id = r.json()["task_id"]
for _ in range(30): # poll up to 60 sec
r = requests.post(f"{BASE}/api/task/check", headers=HEADERS,
json={"task_id": task_id})
status = r.json()["status"]
if status == "complete":
print("Async:", r.json()["output"]); break
elif status == "error":
print("Error:", r.json().get("error")); break
time.sleep(2)
# 5. Semantic search across Experts
r = requests.post(f"{BASE}/api/blocks/search", headers=HEADERS,
json={"agent_id": agent_id, "query": "send telegram message"})
for block in r.json()["matches"]: # 'matches', not 'results'
print(block["expert_name"], block["score"]) # 'score', not 'similarity'
9.8Secure Integration Checklist
- Store token in environment variables: os.environ['EXTELLA_API_TOKEN'], not in code
- Base URL: https://api.extella.ai for all /api/ requests
- Rate limits: catch HTTP 429, read Retry-After, use exponential backoff
- Save agent_id after creation — you can't run an agent without it
- async=true for tasks > 60 sec — don't block the main thread
- stream=True for UX — use Accept: text/event-stream when users expect real-time responses
- store=False during debugging — avoid polluting chat history
- global=True when searching Concepts — otherwise you only search the current agent's memory
- blocks/search: field is matches, not results; score, not similarity
- expert/get: expert_code, expert_params, createdAt (camelCase)
- export/calls: POST with parameters in body: {by, id, limit, from}
- Pro plan required for /api/agent/create
- Validate token once at startup — not before every request (rate limit 30/min)
9.9Typical Workflow from Zero to Integration
1. Get token: POST /api/token/generate -> save to .env
2. Create agent: POST /api/agent/create -> agent_id
3. Create profile: POST /api/profile/create -> profile_id
4. Add to profile: POST /api/profile/add_agent -> {profile_id, agent_id}
5. Run synchronously: POST /api/agent/run + X-Agent-Id: agent_id
Run async: POST /api/agent/run + {async: true} -> task_id
Check status: POST /api/task/check {task_id: ...}
6. Export chats: POST /api/agent/export/chats -> dataset for fine-tuning
7. Export calls: POST /api/agent/export/calls -> analytics
This pipeline covers 95% of integration scenarios. For complex cases (parallel Experts, semantic search, KV Store), refer to sections 3, 4, and 7.