Guide — Extella

1.What is Extella

1.1Why an Agent, Not a Chatbot

You've probably used ChatGPT, Claude, or other large language models. Every conversation starts from scratch. The model doesn't remember yesterday. It can't open your file, send an email, run a script, or save a report. At best, it generates code that you then copy and run yourself.

Extella is not a chatbot. It's an AI agent that executes tasks, not advises how to do them.

The formula: AI chat + automation + persistent memory + execution on your device + personal toolkit — all in one place.

You describe a need. Extella creates an Expert (an executable module), saves it to your library, and runs it. The result — an actual file, a sent message, processed data — stays on your device. This isn't text in a chat. It's an object that persists after the session ends.

1.2Extella vs Standard LLMs — Fundamental Differences

ChatGPT tells you what to do. Extella does it for you.

Feature	ChatGPT / Standard LLMs	Extella
Primary Purpose	Text generation	Real action execution
Memory	Current chat only. New session — clean slate	Persistent: Concepts, Rules, Experts are saved permanently
Code Execution	Generates code. You run it yourself	Runs automatically via Experts on your device
Reusability	Each request — from scratch	Created Expert runs repeatedly with any parameters
Security	Data goes to OpenAI/Anthropic	Files are processed locally and never leave your device
Personalization	System prompt is fixed	Rules — dynamic prompt, changes during execution
Outcome	Text in chat	Files, data, reports, automated processes
Integrations	Plugins (limited set)	Any: Telegram, email, API, file system

1.3Architecture: How It Works

Extella uses a client-server architecture with two components:

Server component — the AI brain: language model, knowledge base, Expert and agent management.
Client component — Listener: a background process on your device that receives tasks from the agent and runs them locally.

Listener is the executor. When the agent says "create a PDF," Listener runs the corresponding Expert on your machine with full access to your files.

By default, Experts run directly in the Listener environment. When strict dependency isolation is needed, use the isolated=true parameter — the Expert then runs in a clean Python venv. This isn't Docker: no heavy virtualization, no root access required. Full access to the user's file system is preserved.

1.4Data Security

Anthropic Claude is the primary model for chat and code generation in Extella. It processes text requests. The Pro plan allows connecting your own LLM providers and local models.
OpenAI API is used only for vector embeddings (semantic search of Concepts, KV Store, Rules, and Experts). Data is vectorized but not stored by OpenAI.
Corporate files and API keys are NEVER sent to providers. Files are processed locally. Keys are stored in an encrypted KV Store on your device.

1.5Key Terms: Glossary

Before moving forward, it's important to understand the six core entities of the platform. Each will be explained in detail in the following sections.

Entity	What It Is	Analogy
Expert	Saved executable module — a function that does one specific thing	Tool on your shelf
Agent	AI-specialist with model, tools, instructions and memory	Employee with a job title
Profile	Group of agents under one project/client	A department in a company
Concept	Unit of technical knowledge with semantic search	Note in knowledge base
KV Store	Encrypted key-value store: API keys, tokens, data	Safe with deposit boxes
Rule	Behavioral instruction embedded in the agent's system prompt	Job description
Team	Group of agents working on one project with shared Concepts, Rules, and an orchestrating agent	Department or project team

1.6The Compounding Effect: Why Extella Gets Stronger Every Day

Typical AI tools don't accumulate value. Every new ChatGPT conversation starts from scratch. Extella works fundamentally differently.

Experts:

Day 1: Created an Expert "read Excel spreadsheet" → saved
Day 30: 15 Experts — a library of tools
Day 90: 50+ Experts — you're not "asking AI" anymore — you're "running tools"

Concepts:

First time solving a PDF issue → saved as a Concept
A pattern for working with a specific API → saved
Each Concept makes the system smarter. This is institutional memory

Rules:

First: "always ask for confirmation before deleting"
Then: "save files to ~/Documents/Extella/"
Then: "if task > 1 step — describe the plan first"

With each Rule, the agent becomes more precise. After 30 days, Extella understands you better than ever.

Metaphor: A Stone Bridge

Each task you solve with Extella is a stone in the foundation. One stone changes nothing. A hundred stones build a bridge to automation of any complexity. In a year, you'll have a personal system that knows your context, tools, and preferences—one that grows stronger every day.

2.Quick Start

A step-by-step guide from downloading the application to completing your first task. Follow the steps in order—each builds on the previous one.

2.1Step 1: Installing the Application and Creating an Account

Download Extella Desktop from www.extella.ai for your OS (macOS, Linux, Windows).
Install it like any standard application.
Create an account and sign in.

Immediately after signing in, Listener starts in the background and performs initial registration:

→ Creates a device record in the system
→ Retrieves a unique Device ID (Target UUID)
→ Establishes a connection with the Extella server

System tray status: "Connected" — everything is working.

2.2Step 2: Understanding Device ID

Device ID is your device's unique identifier. It looks like: 09f7d600-996c-4c9f-a19e-f5bfe433da0e.

Why you need it:

The agent knows WHERE to execute tasks. "Read my file" — the system understands which machine the file is on.
If you have multiple devices (Mac Studio at home, MacBook at the office) — each has its own Device ID. You choose where to run the task.

Where to find Device ID:

In the Extella Desktop interface — bottom section of the application.
Via agent: "show my devices" — the agent returns a list with UUIDs and descriptions.
Via API: POST https://api.extella.ai/api/defaults/get_target

Default Target — the device where Experts run by default. Changed via set_default_target.

2.3Step 3: Getting an API Token

API token — a key string that verifies your identity in the system. Required for Listener authentication and programmatic calls.

Via agent (easiest method):

Type in chat: "Generate an API token for me" — the agent creates a token instantly. Optionally specify a name, e.g., "Mac Studio listener".

Copy and save the token — it's used for Listener configuration.

Managing tokens via agent:

"Show my tokens" — list of all active tokens
"Revoke token [name]" — instant deactivation

2.4Step 4: Your First Agent Request

Open the Extella Desktop interface. Type your first natural language request:

"Create a 3-slide PDF presentation about our product. Save it to Downloads."

What happens:

The agent analyzes the request
Creates an Expert (an executable module — e.g., using ReportLab)
Runs it on your device via the Listener
Within seconds, a PDF file appears in ~/Downloads/

This is your first Extella result.

2.5Step 5: Expert saved to library

The created Expert is automatically saved to your personal library with a name (e.g., generate_product_presentation_pdf). Now you can:

Run it again with different parameters (different text, different title)
Modify it: "change the background color to blue"
Use it as part of a more complex workflow

This is the key difference from chatbots: solve a task once—the tool remains forever.

2.6Step 6: First Rule

A Rule is an instruction that applies to every interaction. Add your first rule:

"Add a rule: always respond in English"

Or other useful rules:

"Save all files to ~/Documents/Extella/"
"If a task takes more than one step—describe the plan first"
"Always ask for confirmation before deleting data"

Now every time the agent generates a response or creates a file, these rules are applied automatically—no reminders needed.

2.7Step 7: First Concepts

Concepts are the agent's long-term memory. They accumulate automatically. After the first completed task, the agent saves:

Which library worked best for PDF generation
How to handle errors from a specific API
Which approach worked for your task

You can also add them manually:

"Remember: I prefer pandas over openpyxl for working with tables"

The more tasks you solve, the smarter your agent becomes.

2.8Checklist: Ready to Go

#	Action	Status
1	Extella Desktop installed and running	⬜
2	Listener shows Connected in system tray	⬜
3	Device ID registered (visible in interface)	⬜
4	API token obtained and saved	⬜
5	First request sent and response received	⬜
6	First Expert created and visible in library	⬜
7	First Rule added	⬜
8	First Concept saved	⬜

If all 8 items are checked, your platform is configured. Now the real fun begins: scaling the system.

3.KV Store, Concepts, Rules

Section 1 provided a brief glossary. Section 2 covered the practical quickstart. This section examines each of the three components in detail: how they work, what they can do, when to use them, and what not to store.

Note: the default agent comes with a set of pre-installed basic Rules and Concepts. You can view and modify them at any time.

3.1Why Three Storage Systems Instead of One

Example	Solution	Why this approach
Look up an API key by exact name	KV Store — exact search by key	Data is encrypted, searched exactly and quickly
Recall context from a past PDF session	Concepts — semantic search	Semantic search, not keyword matching
Agent always responds in English	Rules — auto-loaded into every prompt	No need to search — always active

KV Store is a "vault"—you store items with an exact name and retrieve them by that name. Concepts is a "knowledge search index"—it finds information by query meaning, even when the wording differs. Rules are "reflexes"—they load automatically with every user message, so the agent operates with them from the first word of the conversation.

3.2KV Store — Encrypted Data Storage

Each KV entry has three fields: a unique key name, a value (text or JSON, up to 1 GB), and a description used for semantic search.

Encryption and PIN

All values are encrypted with your PIN. The agent decrypts them automatically via kv_get. This protects credentials from leaking in logs and exports.

Important when running an Expert on a different device: if the PIN on that device differs, decryption returns garbage—you'll see an 'invalid decimal literal' error. Solution: pass the pin explicitly when calling: run_expert('name', {}, pin='your_pin').

What KV Store Contains

Typical data categories:

Category	Key examples
Service API keys	telegram_bot_token, anthropic_api_key, openai_api_key, tavily_api_key
Device Target UUIDs	mac_studio_target, ubuntu_vm_target, macbook_target
URLs and endpoints	aios_backend_url, webhook_slack, api_crm_url
Session data	session_history, cache_results (JSON arrays)
Configurations	typefully_social_set_id, redis_url, redis_token

KV Store holds more than just short strings. A value can be a complete JSON array with session data history—KV becomes a fast key-value cache for agents.

Semantic Search in KV Store

Each record has an embedding (OpenAI text-embedding-3-small) generated from key + description. This enables semantic search:

# Forgot the exact key name?
kv_search("telegram bot token")
# Finds: telegram_bot_token, telegram_bot_token_taskboard
# Works even if description is in Russian and query in English, or vice versa

Writing Good Descriptions

A description isn't a comment—it's a semantic search index. The more precise and informative, the more reliably agents will find the key.

# Good description:
kv_set(key="anthropic_key", value="sk-...",
       description="Anthropic Claude API key (main production, updated 2025-03)")

# Bad description — search won't help:
kv_set(key="k1", value="sk-...", description="key")

Agent Auto-Search Algorithm — The Golden Rule

Agents NEVER ask for credentials first. They follow a strict algorithm:

1. Need a key? → kv_search("<service> key token")
2. Found it? → kv_get(key) — automatic decryption
3. Not found? → only then ask the user
4. User provided it? → kv_set + permanent storage

If you saved tavily_api_key with the description "Tavily web search API key" once, the next time you request web search, the agent finds it automatically—without asking a single question.

Core Principle: Experts Never Access KV Directly

This is a fundamental architectural security principle. An Expert is a pure function. Agents inject credentials via params. The Expert receives the already-decrypted value as a parameter—and knows nothing about KV Store at all.

# WRONG: Expert accesses KV directly
def send_telegram(text: str) -> dict:
    import requests
    # Value is encrypted ($enc:...) — Expert can't decrypt it!
    r = requests.post("https://api.extella.ai/api/kv/get", ...)
    token = r.json()["value"]  # gets garbage

# RIGHT: Agent decrypts and injects
def send_telegram(text: str, bot_token: str = "") -> dict:
    import requests
    # bot_token already decrypted by agent and passed via params
    url = f"https://api.telegram.org/bot{bot_token}/sendMessage"
    # ... rest of the logic

This ensures: security (credentials not in code), reusability (one Expert, different tokens), testability (any input data without KV dependency).

3.3Concepts — Semantic Knowledge Memory

A Concept is a text fragment (knowledge, pattern, solution) stored with semantic search. Concepts use vector search: the meaning of your query is matched against saved knowledge, not just keywords.

When an agent saves a concept like "For PDF generation in Docker, use ReportLab — not wkhtmltopdf, which requires X11," the system immediately sends the text to OpenAI, receives a 1536-dimensional vector, and stores it alongside the text. When searching for "create PDF in container," the query is also converted to a vector, and the system finds the nearest match. Even though the query contains neither "ReportLab" nor "X11," the semantic distance is minimal.

Concept examples

"Extella execution environment: Experts run on the local device via the Listener. Python venv is an optional parameter (isolated=true). Not Docker."
"For reading .docx files: python-docx. Installation: extella-pip install python-docx"
"PDF on Linux: if wkhtmltopdf is unavailable — use ReportLab directly. Installation: extella-pip install reportlab"
"Telegram bot getUpdates: offset = last update_id + 1. Otherwise the same messages will be received again."

What to store vs what not to store

Store in Concepts	Do NOT store in Concepts
Patterns and problem solutions	API-keys and tokens (→KV Store)
Library installation instructions	JSON-session data and caches
Architectural decisions of the project	Specific file paths
Business requirements and specifications	Personal data
Insights from work experience	Configurations containing passwords or secrets
Technical limitations and workarounds	Temporary data that becomes outdated

Why you should never store credentials in Concepts: concepts are found by meaning. If you save an API key, semantic search will find it when queried for "need a key" — and it will appear in the context as plain text, without encryption. This is a security violation.

Correct pattern: generalized insight from experience

Agent encounters an error → resolves it → extracts generalized knowledge → saves it:

# After the agent solved the PDF problem:
concept_add(
    "For PDF generation in headless environments (Docker, server without X11)"
    " use ReportLab. wkhtmltopdf requires a graphical display"
    " and doesn't work in containers without Xvfb."
)

# In the future, it will be automatically found when querying:
concept_search("PDF in Docker")
# -> Finds with high similarity, even though 'wkhtmltopdf' isn't in the query

A Concept is a generalized insight from experience, not raw data.

Concept Operations

Operation	MCP tool	Description
Create	concept_add	Text → embedding → save
Find	concept_search	Semantic search by meaning (not by keywords)
Update	concept_update	Edit text + regenerate embedding
Delete	concept_remove	Delete by ID
List	concept_list	All concepts of an agent or profile

The global=true parameter enables searching Concepts across all profile Experts—knowledge from one Expert becomes accessible to others. Without global=true, an Expert can only access its own Concepts.

3.4Rules — Dynamic System Prompt

Loading Mechanism

A Rule is a behavioral instruction loaded with EVERY user message via rules_list. Here's how it works:

1. User sends a message
2. System calls rules_list() -> retrieves all active Rules
3. Rules are embedded into the system prompt BEFORE processing the request
4. Agent generates a response considering all Rules

This cycle occurs automatically on every conversation turn. A Rule is not a memory query—it's part of the agent's "personality."

The Expert doesn't "recall" Rules—it operates with them from the first word of the conversation. This is the fundamental difference from Concepts, which must be explicitly searched.

Example Rules to Get Started

"Always respond in English"
"Always ask for confirmation before deleting files or data"
"If a task requires more than one step—describe the plan first, then execute"
"When a task is complete—briefly explain what was done"
"Save all created files to ~/Documents/Extella/"
"Never store credentials in Concepts—use KV Store"

Limits and Restrictions

Maximum length per Rule: 4,000 characters. This is sufficient for detailed instructions. For extensive technical knowledge, use Concepts.

Rules are independent: you can have 50 Rules, and all will apply simultaneously. Application order is not guaranteed—write Rules so they don't conflict with each other.

global=true for Rules

Rules with global=true are visible to all agents in the profile. This lets you define common behavior guidelines for all agents in one profile without configuring each one individually.

Difference between Rules and Concepts

Parameter	Rules	Concepts
Loading	Automatically with each message	Only on explicit search concept_search
Impact	Always in system prompt	Only when found and added to context
Search	None — all are loaded	Semantic search by meaning
Data type	Instructions, constraints, style	Knowledge, facts, solution patterns
Size limit	4000 characters per rule	Unlimited (TEXT in PostgreSQL)
Example	"Always respond in Russian"	"For PDF in Docker use ReportLab"

Mnemonic: if the behavior should ALWAYS apply—it's a Rule. If the knowledge might be needed SOMETIMES—it's a Concept.

Operations with Rules

Operation	MCP tool	Description
Create	rules_add	New rule (rule_id is generated automatically)
Update	rules_update	Edit the text of an existing rule
Delete	rules_remove	Delete a rule by rule_id
List	rules_list	Get all rules (called automatically with each message)

3.5Comparison table of three storage types

Characteristic	KV Store	Concepts	Rules
Data type	Key-value + description	Semantic knowledge (text)	Behavioral instruction
Encryption	✅ User PIN	❌ No	❌ No
Search	Exact by key + semantic	Semantic only	No — all are loaded
Auto-loading	❌	❌	✅ On every message
Embeddings	pgvector (from key + description)	pgvector (from concept text)	N/A
Embedding model	text-embedding-3-small	text-embedding-3-small	N/A
Value limit	TEXT (up to 1 GB)	TEXT	4000 characters
What it stores	API-keys,UUID, URL, JSON, session data	Knowledge, patterns, solutions, insights	Constraints, protocols, response style
global flag	✅	✅	✅
Isolation	By agent_id / profile_id	By agent_id / profile_id	By agent_id / profile_id

3.6Data isolation: three levels

All core tables (KV, Concepts, Rules, Targets, Experts) include agent_id and profile_id columns. The three-level isolation model:

Level	Analogy	Description
user_id	Building owner	Global user identifier
profile_id	Floor (department)	Group of agents for a single project/client
agent_id	Room on the floor	Specific agent within a profile

How the global flag works

global=false (default) — "I see only my office": the agent sees only its own data (filtered by agent_id). The "Researcher" agent cannot see concepts belonging to the "Writer" agent.
global=true — "I see the entire floor": the agent sees data from all agents in the profile (filtered by profile_id). Researcher + Writer + Analyst — all three agents share one profile.

INSERT is always yours

When creating a new record (concept_add, kv_set, rules_add), the system ALWAYS uses the current agent_id and profile_id. You cannot create a record "for another agent." This ensures data belongs to whoever created it.

When reading/updating/deleting (concept_search, kv_get, rules_list), results are filtered by the global flag. Without global=true — only your data. With global=true — data from all agents in the profile.

4.Experts & Automations

An Expert is an atomic automation that persists forever. Create it once — it runs indefinitely. This section covers everything from Expert types to creating scheduled automated tasks.

4.1Four Types of Experts

1. SIMPLE — Single-Task Building Blocks

Expert	What it does	API key?
convert_pdf_to_text	Extracts text from PDF	No
send_telegram_message	Sends message	Yes (bot_token)
excel_query	SQL query to .xlsx	No
word_generate	Generates .docx from JSON	No

2. COMPLEX — Multi-Stage Pipelines

decompile_binary_to_pseudocode: file → disassembly → graph → pseudocode
generate_3d_model_from_photo: photo → depth map → 3D mesh → .obj

3. NESTED — Orchestrators (cspl=nohup)

Call other Experts via REST API. Example:

fetch_emails -> extract_data -> check_crm -> create_task -> send_notification

→ For more on nohup Experts (script structure, {{placeholders}} syntax, no return statement, manual include): Section 8, subsection 8.4.

→ Pattern for parallel worker execution + synchronization via wait_tasks: Section 7.

4. INTEGRATION — Technology Wrappers

Subtype	Examples
CLI wrapper	ffmpeg, ImageMagick, pandoc, git
Library wrapper	Pillow, pandas, BeautifulSoup
External API	Telegram, OpenAI, Notion, Jira
Database	SQLite, PostgreSQL

4.2Expert Structure: 5 Required Elements

Expert template:

$extens("include.py")
include("import requests", ["extella-pip install requests"])

def expert_name(param1: str = "", param2: int = 0) -> dict:
    import requests
    if not param1:
        return {"status": "error", "message": "param1 required"}
    try:
        # ... logic ...
        return {"status": "success", "result": "..."}
    except Exception as e:
        return {"status": "error", "message": str(e)}

5 required elements for every Expert:

1. Directive: $extens("include.py") — first line, mandatory
2. Dependencies: include(..., ["extella-pip install ..."])
3. Signature: def name(param: str = "") -> dict — explicit types, defaults, no *args/**kwargs
4. Validation: check inputs, return early on error
5. Return: always a dict with a status field

4.3Description — a search index, not a comment

When you save an Expert, the backend generates an embedding from the name + description fields. This embedding powers search_blocks — the semantic search across your library. A poor description means the Expert won't be found when you need it.

❌ Bad — not searchable	✅ Good — semantically searchable
description=""	description="Sends message to Telegram. Parameters: chat_id — chat ID; message — text; bot_token_key — token key in KV Store"
description="utility"	description="Converts PDF to text via pdfplumber. Parameters: file_path — path to PDF; max_pages — page limit (0=all)"

Rule: description = one sentence describing what the Expert does + a list of all parameters with their purpose.

4.4Names — snake_case. Saving with an existing name overwrites the Expert

The Expert name is a unique key in the library. Requirements:

• snake_case only: send_telegram_message, convert_pdf_to_text, get_server_metrics

• No spaces, hyphens, or Cyrillic characters

• Saving with a name that's already taken → the previous version is overwritten without warning

• For versioning, use suffixes: analyze_document_v2, or explicitly delete the old version

4.5The isolated=True parameter — run in a clean environment

You can call run_expert with isolated=True — the Expert will run in a fresh venv without interference from other Experts' dependencies:

run_expert('my_expert', {'param': 'value'}, isolated=True)

When to use: dependency conflicts between Experts, non-standard library versions, reproducibility during debugging.

4.6extella-pip install — mandatory rule

include("from pdfplumber import open as pdf_open", ["extella-pip install pdfplumber"])

Always use extella-pip install, not pip install or pip3 install. This ensures packages are installed in the correct Expert virtual environment.

Multiple dependencies:

include("import pandas", ["extella-pip install pandas", "extella-pip install openpyxl"])

4.7Generalization Principle: Avoid Hardcoding

Bad — hardcoded:

def process_invoice():
    file_path = "/Users/ivan/Downloads/invoice.pdf"  # Only works on one machine!

Good — parameterized:

def process_invoice(file_path: str = "", output_dir: str = "") -> dict:
    if not file_path:
        return {"status": "error", "message": "file_path required"}

Four absolute prohibitions:

❌ Hardcoding paths, keys, IDs — pass everything through parameters
❌ *args/**kwargs in signatures — use only explicit named parameters
❌ Returning binary data — return only file paths
❌ Fourth prohibition: Experts must never access KV Store directly

An Expert must not call /api/kv/get or any other Extella API from within its code to retrieve credentials. This violates isolation and creates a hidden dependency on the cloud.

❌ Forbidden — expert pulls KV itself	✅ Correct — the agent injects via params
def send_msg(chat_id): r = requests.get('/api/kv/get') token = r.json()['value'] # ... uses token	def send_msg(chat_id, bot_token=''): if not bot_token: return {'status': 'error'} # uses bot_token directly

Correct pattern: the agent retrieves credentials from KV via MCP (kv_get/kv_search) and passes values to the Expert as parameters at runtime:

# Agent (outside Expert code):
token = kv_get('telegram_bot_token')['value']   # agent decrypts via PIN
run_expert('send_telegram', {'chat_id': id, 'bot_token': token})  # injects

Expert = pure logic with no external calls. Data and credentials = parameters from the agent.

4.8CLI Wrappers: 5 Lines Instead of 50

Example: ImageMagick via subprocess:

$extens("include.py")
include("import subprocess", [])

def resize_image(input_path: str="", output_path: str="", width: int=800, height: int=600) -> dict:
    import subprocess
    if not input_path:
        return {"status": "error", "message": "input_path required"}
    size = str(width) + "x" + str(height)
    cmd = ["convert", input_path, "-resize", size, output_path]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        return {"status": "error", "stderr": result.stderr}
    return {"status": "success", "output": output_path}

ffmpeg, pandoc, git, docker, rsync — any of these can become an Expert in 5-10 lines.

4.9Cron Jobs: Scheduled Automation

A Cron job is a nohup Expert that runs on a schedule. Create one with a single phrase:

"Create a job: every morning at 9:00 AM, generate a server metrics summary"

The agent creates a background process — no crontab files, no YAML. Important:

Cron runs through Listener — the device must be powered on
Logs: /tmp/nohup_<name>.log
To stop: "Stop Cron job <name>"
After reboot — requires manual restart

How to Technically Stop a Cron Job

A Cron job is an OS nohup process. It runs independently of chat and Listener. Three ways to stop it:

Method	Command / action	When
Via agent (recommended)	"Stop Cron task <name>" — agent finds PID from .pidfile and sends SIGTERM	Primary method
Via Listener UI	Listener tab → find process → Cancel button	If agent is unavailable
Manually via terminal	kill $(cat /tmp/nohup_<name>.pid) or: kill <PID>	Emergency stop

Diagnostics — find the PID and logs of a running job:

cat /tmp/nohup_<name>.pid            # process PID tail -f /tmp/nohup_<name>.log        # real-time logs ps aux | grep <name>                  # check if process is alive

When the device reboots, the nohup process terminates. The PID file remains but references a non-existent process. For auto-start on reboot — add Expert launch to launchd (macOS) or systemd (Linux).

Pattern	Command	Cron	What it does
Monitoring	"Every 5 minutes check availability of example.com"	/5 * * *	Ping, KV-record, notification on failure
Daily summary	"At 9:00 AM — server metrics summary"	0 9 * * *	Metrics, report, concept of the day
Weekly analysis	"On Sunday at 6:00 PM — weekly error analysis"	0 18 * * 0	Log aggregation, pattern concepts
Monthly audit	"On the 1st at 10:00 AM — token audit"	0 10 1 * *	Token scanning, report

Self-improving loop: Cron agent writes Concepts. Next week, it uses them as context. After a year — an Expert on your system's problem history.

5.Agents & Teams

Now that you understand Experts, Concepts, KV Store, and Rules, it's time to explore how they come together in agents and teams. This section covers the power of multi-agent orchestration, the full agent customization options available on the Pro plan, and how to build your own team of AI specialists.

5.1What is an Agent

An agent is an AI specialist with a specific role, its own memory, and a set of tools. Unlike a standard chatbot, an agent is a configured entity: it knows its specialization, remembers interaction history, and can perform real actions. An agent consists of:

Model — Claude, Gemini, Qwen, Llama, GPT — you choose
System instructions — specialization, working style, and boundaries
Tool set — MCP functions: concept_add, run_expert, web_search, kv_get, etc.
Profile — an isolated workspace the agent belongs to
Memory — Concepts (knowledge), KV Store (data and keys), Rules (behavioral guidelines)

A single agent can handle hundreds of tasks. Multiple agents in a team form an entire specialized department. Each agent is a precision instrument for a specific domain, not a universal Swiss army knife.

5.2Creating and Configuring an Agent

On Plus and Flex plans, the default agent (or multiple agents) comes preconfigured with the system prompt and all parameters already set—no user action required. The only difference between Plus and Flex is that Flex users can use their own LLM provider API key instead of paying credits to use Extella's key.

On the Pro plan, you get full control over every agent parameter. This is a fundamental shift: instead of accepting an out-of-the-box agent, you become its architect. You can configure your own agent in the right slide-out panel of the chatbot interface.

Parameter	Configuration	Example usage
Model	GPT-4o, Claude Opus, Gemini Pro, Mistral, local via Ollama, etc.	Claude Opus for deep document analysis
System prompt	Precise instruction: role, style, boundaries, specialization	"You are a financial analyst. Respond only in JSON."
Temperature	Balance accuracy <-> creativity (0.0 = predictable, 1.0 = creative)	0.2 for analytics; 0.9 for copywriting
Top-P / Top-K	Managing token probability distribution	Top-P 0.9 for diverse text generation
Max tokens	Maximum response length	4096 for documents; 512 for brief responses
Tools & MCP	Which tools are available to the agent	Financial tools only for finance specialists
Memory settings	Which memory type it uses: concepts, rules, KV	Long-term memory + thematic concepts
Rules setting	Which rules apply in which situations	"When uncertain — clarify before acting"
Response format	Output structure: text, JSON, markdown, schema	Strict JSON for integration with CRM system

Examples of Specialized Agents

Analyst agent: Claude Opus, temperature 0.2, JSON-only output, financial tools only
Copywriter agent: GPT-4o, temperature 0.9, no tools, detailed brand voice in the prompt
Code reviewer agent: Mistral, minimal context, security checklist in system prompt
Research agent: Gemini, web_search + concept_add, maximum context, high recursion_limit

Reusability: configure once, use forever

Configured agents are saved to your library and accessible from a dropdown menu at any time. No need to paste prompts or select parameters each time—the agent already knows who it is and what it can do. This changes your workflow: instead of "find an agent for this task," you think "pick the right one from my library."

Agent limits on the Pro plan

The Pro plan does not limit the number of agents a user can create.

5.3Three Agent Interaction Patterns

1. Recursion — agent calls itself iteratively

The agent processes data in chunks, calling itself with refined parameters. Each iteration starts with a clean context. Protection against infinite loops is provided by the recursion_limit parameter when creating the agent (recommended: 5–15).

Example: A CTO checks 47 API endpoints for test coverage, processing 15 per iteration. Iteration 1: 1–15 (12 covered, 3 missing). Iteration 2: 16–30 (14 covered, 1 missing). Iteration 3: 31–47 (13 covered, 4 missing). Final report: 39/47 covered (83%), 8 require tests.

2. Escalation — sub-agent signals the orchestrator

When a specialized agent encounters a situation outside its competence, it escalates back to the orchestrator. The orchestrator then decides whether to reassign the task, bring in another specialist, or adjust the delegation parameters.

Example: A CCO analyzing competitors discovers news about a competitor's $50M Series B round. This is a strategic factor—the CCO escalates to the orchestrator: "New competitive factor detected. Recommend revisiting positioning." The orchestrator brings in the Corporate Director.

3. Cross-calls — peer-level agents

Agents communicate directly with each other without an intermediary—when a task requires data from an adjacent domain. Technically, this works through the MCP tool agent_run with the target agent's known agent_id. All calls are logged.

Example: A CTO directly asks the Corporate Director: "Estimate infrastructure costs: 3 microservices, GPU A10G, 10K requests/day." Receives a $2,400/month estimate and includes it in the architecture document without extra hops through the orchestrator.

5.4Teams — Multi-Agent Systems with Delegation

In traditional platforms, a single agent "tries to be everything to everyone." Its context window quickly fills with irrelevant data. The longer the session, the less accurate the responses become.

Extella solves this differently—by creating a Team that collaborates on a single project: an orchestrator agent receives a task, breaks it into subtasks, and delegates each to a specialized agent within a clean, fully relevant context. Each specialist focuses solely on their domain.

Example: the task "analyze the product, prepare competitive analysis, financial model, and pitch for Seed round":

Step	Agent	Clean context
Competitive analysis	CCO	Market data, G2/Gartner, competitor pricing models
Technical feasibility	CTO	Platform architecture, ML models, integrations, cost
Financial model	Corporate Director	CTO assessment + inference costs + CAC/LTV benchmarks
Pitch deck	CCO	Competitive analysis (step 1) + financial model (step 3)
Synthesis	Orchestrator	All results: competitive analysis + technical + financials + pitch

Each agent has its own Experts, KV Store, Concepts, and Rules. Each focuses on their expertise. The result is not a generic blurred response, but a structured document where each section is prepared by a specialist.

Key advantage: a specialist agent with a clean 32K token context makes more accurate decisions than a generalist agent with an overloaded 200K token context.

A Team in the Extella platform is a multi-agent system that operates as a single unit. You send a task to the Team, and it determines which agent to route it to—or how to split it among several.

What each Team includes

Goal and context — what it was created for, what it does well
Members with roles — each agent knows its role (Research, Writing, Review, Execution, etc.)
Dedicated Concepts — a knowledge base specific to this Team only
Dedicated Rules — behavioral rules applied within the system
Orchestration prompt — delegation logic: the criteria for routing tasks to agents

How a Team makes decisions

A Team operates in auto mode: upon receiving a task, the orchestrator analyzes it, matches it against member roles and delegation rules, then routes it to the appropriate agent (or several in parallel). No human involvement in this distribution.

In the future, the delegation mechanism will be enhanced with a trained RL model that will make faster and more accurate decisions based on the Team's accumulated experience.

One agent — in multiple Teams simultaneously

A Team doesn't duplicate agents — it references them. An agent can formally be a member of multiple Teams. However, each agent carries its own fixed configuration: system prompt, Rules, and Concepts do not change based on which Team the agent belongs to. This means the same agent cannot be used for fundamentally different purposes simply by assigning it to different Teams — choose agents whose configuration matches their intended role within each Team.

Team examples

Team	Participants (Roles)	Purpose
Content Studio	Research -> Writer -> Editor -> SEO-reviewer	Creating content materials from research to publication
Due Diligence	Financial Analyst + Legal Reviewer + Market Researcher	Collecting and synthesizing company data for investment decisions
Product Sprint	PM + Tech Lead + UX Reviewer	Task Breakdown and Technical Specification Development
Personal Knowledge Base	Collector + Summarizer + Tagger	Structuring incoming information into a personal knowledge base

Agent limits on the Pro plan

Team creation is limited to a maximum of 3. The number of agents within each Team is limited to 5.

5.5Interface: My Agents and My Teams

On the Pro plan, the agent dropdown in the upper left panel of the interface has the following structure:

▼ Default Agents

• Extella Claude Sonnet (preconfigured)

• Extella GPT-5 (preconfigured)

▼ My Agents

• Financial Analyst (your agent)

• Copywriter (your agent)

• Code Reviewer (your agent)

▼ My Teams • Content Studio (your team)

• Due Diligence (your team)

• Product Sprint (your team)

Each group is expandable. Any agent or Team is clickable—selecting one assigns it to the current chat. Note: you can select either an entire Team or an individual agent.

Creating a new agent

Open the right settings panel—the Agent Builder section contains agent configuration fields: name, model, provider, system prompt, temperature, and tools. You can also create and configure an agent through chat—describe the agent you want to Extella, and she will handle everything automatically.

Creating a new Team

Open the right settings panel—the Team Builder section. Here you can create a team, assign it a name, and add available agents to it, designating a master (or orchestrating) agent responsible for Team management. Team configuration and populating it with Rules and Concepts happens directly through chat with Extella.

5.6Creating and configuring a Team through interaction with Extella

As an alternative to interface configuration, you can create and configure a Team through chatbot interaction. The user describes in words what they want to achieve. Extella uses reasoning to form a Team object and saves it.

What the user describes

Team purpose — what tasks it's created for, what it should do well
Composition — which agents to include (from existing configured agents or new ones)
Roles — who is responsible for what within the system
Rules — how the Team should behave, what to consider when delegating
Knowledge — specific context that system agents need

The user can describe everything at once or answer Extella's clarifying questions.

What Extella does when creating a Team

Step	What Happens Automatically
Defines composition	Selects suitable agents from the library. If none available — suggests creating or using Extella Agents
Assigns roles	Creates role descriptions for each agent within the Team
Creates orchestration prompt	Instructions for the orchestrator: how to distribute tasks, what criteria to consider
Creates Concepts Team	Stores specific context and knowledge provided by the user
Creates Team Rules	Records behavioral rules for the system
Stores Team	The object appears in the My Teams list in the dropdown menu

Example Team creation dialog

User:

"Create a content marketing team. I need: a researcher (gathers topic data), a writer (creates content), an editor (reviews style), and an SEO specialist (optimizes). They work sequentially. Rule: the final text must contain at least 3 keywords from the brief."

Extella:

"Creating Team Content Studio with four agents. Setting up the chain: Researcher -> Writer -> Editor -> SEO. Adding the keyword rule to Team Rules. Done — Content Studio now appears in My Teams."

Editing a Team via chat

Message Extella in the chat at any time:

"Add a social media agent to Content Studio" — Extella adds an agent with the Social Media role
"Change Researcher's role — now they handle statistics research" — Extella updates the role
"Add a rule: verify facts through two sources" — Extella adds the rule to Team Rules
"Remove Editor from Due Diligence" — Extella removes the agent from the Team (the agent itself is preserved)

5.7Data isolation: two levels

On the Pro plan, agent data (Concepts, Rules, KV Store, Experts) is isolated by default and never mixes with other agents' data. Visibility is controlled by the global parameter, available in most MCP tools and REST endpoints.

Level	What It Covers	Parameter	Default
Agent	Current agent only	global=false	✅ yes
Profile	All agents within one profile	global=true	—

How this works in practice:

global=false (default) — the agent sees only its own Concepts, Rules, KV pairs, and Experts. Other agents' data remains inaccessible, even within the same profile.
global=true — the agent sees data from all agents in the profile. Use this when you need to share a common knowledge base across a team.

Key principle: isolation is not hierarchy. Data doesn't automatically "leak" up or down. The developer or agent explicitly controls scope through the global parameter with each call.

Example:

# Add concept only for current agent (default)
concept_add(text="...", global=False)

# Find concept in any agent of the profile
concept_search(query="...", global=True)

Note: Profile (Team) is a container for agents, not a separate storage level. There is no dedicated "Team storage" for Concepts or Rules.

5.8Production Agents in Extella

Below are actual agents running in the Extella system. Each specializes in its domain with its own configuration, tools, and model.

Agent Name	Model	Specialization
Extella (CEO)	Claude Sonnet 4.6	Main orchestrator: delegation, strategy, results synthesis
CCO	Gemini 2.5 Flash	B2B sales, GTM strategy, competitive analysis, pitches, pricing
CTO	Gemini 2.5 Flash	Platform architecture, CSPL Experts, API, security, infrastructure
Corporate Director	Qwen 3.6 Plus	Finance, legal, compliance, capital structure, investors
Extella Architect	Qwen 3.6 Plus (NVIDIA)	Complex reasoning, deep architectural analysis
Llama 4 Maverick	Llama 4 Maverick	1M context, multimodal, parallel processing of large volumes
Step 3.5 Flash	Step 3.5 Flash	196B MoE, 262K context — batch processing, writing sections
Llama 3.3 70B	Llama 3.3 70B (Groq)	Ultra-fast inference (300+ tokens/sec) — urgent tasks
Architect R1	DeepSeek R1	Chain-of-thought reasoning, complex logical tasks, planning
Auto Router	Auto Router (OpenRouter)	Automatic selection of optimal model + fallback chain

Workers (Llama, Step, DeepSeek, Qwen) are used by the orchestrator for parallel processing: writing 10 sections simultaneously, analyzing 5 competitors, generating 20 pitch variations. These are the system's "computational muscle."

5.9Agent Growth Over Time

An agent in Extella is not a static tool. It grows like a real employee: accumulating knowledge in Concepts, guidelines in Rules, and patterns in KV Store. With each week of work, it becomes smarter and more precise.

Stage	Agent state	What happens	Example
Day 1	Empty profile	Competent as a good LLM, but doesn't know your business	CTO writes Python, but doesn't know your architecture
Week 2	15-30 concepts, 5-10 rules	Knows code style, typical mistakes, team preferences	"Inproject FastAPI+PostgreSQL+Redis. Migrations use Alembic."
Month 3	100-300 concepts, Cron patterns	Suggests solutions from project history, predicts problems	"Add caching" -> immediately suggests Redis pattern you already use
Month 6	500+ concepts, autonomous operation	Project expert. Makes decisions without additional context	Autonomously alerts about degradation based on historical data

Path to autonomy:

Day 1: "What do you use for the database?"
Week 2: "I recommend PostgreSQL, as you already use"
Month 3: "I'll add Redis cache, similar to the auth module"
Month 6: The agent proposes changes, implements them, and reports back—without additional context

Exporting Agent Conversations

After 6 months of work, you can export all conversations:

POST https://api.extella.ai/api/agent/export/chats Authorization: Bearer <your-token>  // By agent: {"by": "agent", "id": "agent_xyz..."}  // By profile (all agents): {"by": "profile", "id": "team_abc..."}

The resulting JSON containing thousands of QA pairs and discussions serves as a valuable dataset for quality analysis or fine-tuning. The actual training process is performed outside the Extella platform—on your own infrastructure or through external services.

6.Local Models & Tunnels

6.1Why Do You Need a Tunnel?

In Section 5, you learned that you can connect any language model to an agent—including a local one running on your laptop or home server. This opens up significant possibilities: complete data privacy, zero inference cost, any uncensored models, and offline operation.

However, there's a technical detail: local LLM servers (Ollama, LM Studio, llama.cpp, etc.) listen only on localhost by default—an address accessible only from the same device. Extella operates as a cloud service and physically cannot reach your localhost directly.

The solution is a tunnel. This is a program that creates an encrypted "bridge" between your device and a public HTTPS address on the internet. Extella connects to the public address, and the tunnel transparently forwards requests to your localhost. To Extella, it looks like a standard cloud-based API server.

Scenario	Without tunnel	With tunnel
Extella + local model	❌ Unreachable	✅ Works via public URL
Access from phone	❌ Localhost only	✅ Any device
Demo to a colleague	❌ Requires VPN or presence	✅ Just share the URL
CI/CD integration	❌ No public address	✅ webhook + tunnel

💡 A tunnel doesn't slow down your model—it adds only 10–50ms of network latency. For text generation, this is imperceptible.

6.2Ports and Base API URL

Each LLM server listens on its own port. This is important when creating a tunnel—you need to tunnel the specific port where your model is running.

Server	Port	Base API URL (local)
LM Studio	1234	http://localhost:1234/v1/
llama.cpp server	8080	http://localhost:8080/v1/
Ollama	11434	http://localhost:11434/v1/
Jan	1337	http://localhost:1337/v1/
KoboldCPP	5001	http://localhost:5001/v1/
LocalAI	8080	http://localhost:8080/v1/

⚠️ Golden rule: in the baseURL field in Extella, always specify the URL up to and including /v1/—and nothing beyond. Extella automatically appends the required path (/chat/completions, /embeddings, etc.).

✅ Correct: https://abc123.ngrok-free.app/v1/
❌ Incorrect: https://abc123.ngrok-free.app/v1/chat/completions

Including an extra endpoint is the most common reason a local model "doesn't respond" in Extella. Check this first.

6.3Three Tunneling Methods: Which to Choose

Several tools are available. Here are recommendations for choosing:

Method	Free	Persistent URL	Complexity	Best for
ngrok	Limited*	Paid / static	🟢 Easy	Quick start, testing
Cloudflare Tunnel	✅ Fully	✅ Custom domain	🟡 Medium	Permanent operation
LocalTunnel	✅ Fully	Partial	🟢 Easy	Quick testing without registration
Tailscale Funnel	✅ Free	✅ Stable	🟡 Medium	If already using Tailscale

* ngrok provides one free tunnel with a random URL. A static domain is free with registration (one domain).

💡 For most Extella users, I recommend starting with ngrok (5 minutes to results), then switching to Cloudflare Tunnel for regular use—it's free, reliable, and supports your own domain.

6.4Method 1: ngrok — Fastest Way to Get Started

ngrok is the simplest way to get a working public URL in 2–3 minutes. It's an excellent choice for your first exposure to the topic or for occasional tasks.

Installation

OS	Command
macOS (Homebrew)	brew install ngrok/ngrok/ngrok
Linux (Debian/Ubuntu)	sudo apt install ngrok (after adding the repository — see below)
Windows	winget install ngrok

Linux:

# Linux — full installation: curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \ | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \ | sudo tee /etc/apt/sources.list.d/ngrok.list sudo apt update && sudo apt install ngrok

Registration (one-time)

ngrok requires free registration at ngrok.com. After registering:

Add your token (copy from Dashboard → Your Authtoken):

ngrok config add-authtoken YOUR_TOKEN_HERE

💡 The tunnel works without a token but has session time limits. With a token, there are no restrictions.

Starting the Tunnel

One command and your tunnel is ready:

ngrok http 1234     # LM Studio
ngrok http 8080     # llama.cpp
ngrok http 11434    # Ollama

After starting, the terminal displays a line like:

Forwarding https://abc123.ngrok-free.app -> http://localhost:1234

Your base API URL for Extella: https://abc123.ngrok-free.app/v1/

Static URL (free)

Random URLs change with every restart, which is inconvenient if the URL is saved in Extella. The solution is a static domain:

ngrok http --domain=your-static-name.ngrok-free.app 1234

One static domain is free. You can register it in Dashboard → Domains.

Password protection (recommended)

A public URL without protection means anyone on the internet can send requests to your model. Add basic authentication:

ngrok http --basic-auth="user:strongpassword" 1234

⚠️ Don't leave a tunnel open without authentication for long periods. Bots actively scan known ngrok domains.

6.5Method 2: Cloudflare Tunnel — for permanent operation

Cloudflare Tunnel (cloudflared) is an enterprise-grade solution from Cloudflare. Completely free with no traffic limits, and supports custom domain binding. Ideal if you plan to use a local model with Extella regularly.

Key advantage over ngrok: the tunnel doesn't depend on open router ports and works even behind double NAT (corporate networks, mobile internet, etc.).

Installing cloudflared

macOS / Windows:

brew install cloudflare/cloudflare/cloudflared   # macOS
winget install Cloudflare.cloudflared            # Windows

Linux (Debian/Ubuntu):

curl -L --output cloudflared.deb \
  https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb

Method A: Quick tunnel without registration

For a quick test, run without an account:

cloudflared tunnel --url http://localhost:1234    # LM Studio
cloudflared tunnel --url http://localhost:11434   # Ollama

The URL will look like: https://some-random-name.trycloudflare.com/v1/

⚠️ The URL changes with each restart. Method B provides a permanent address.

Method B: Permanent tunnel with your own domain

For permanent operation, you need to register at cloudflare.com (free) and have your own domain (or subdomain).

Step 1 — Authentication:

cloudflared tunnel login

Step 2 — Create the tunnel:

cloudflared tunnel create my-llm-tunnel

Step 3 — Create the ~/.cloudflared/config.yml file:

tunnel: <TUNNEL_ID>
credentials-file: /Users/YOUR_USER/.cloudflared/<TUNNEL_ID>.json
ingress:
  - hostname: llm.yourdomain.com
    service: http://localhost:1234
  - service: http_status:404

Step 4 — Configure DNS:

cloudflared tunnel route dns my-llm-tunnel llm.yourdomain.com

Step 5 — Start the tunnel:

cloudflared tunnel run my-llm-tunnel

Your permanent URL: https://llm.yourdomain.com/v1/

Auto-start on system boot

sudo cloudflared service install
sudo systemctl start cloudflared   # Linux
# macOS: launchd service is created automatically

💡 Once auto-start is configured, the tunnel launches on every system boot. You can set the URL in Extella once and forget about it.

6.6Method 3: Alternatives

LocalTunnel — As simple as it gets (Node.js)

If ngrok feels like overkill and you already have Node.js installed:

npm install -g localtunnel
lt --port 1234 --subdomain my-llm

URL: https://my-llm.loca.lt/v1/ — the subdomain persists if the name is available.

⚠️ LocalTunnel is less stable than ngrok or Cloudflare. Best suited for one-off tests.

Tailscale Funnel

If you're already using Tailscale for VPN networking, Funnel exposes your server to the internet with a single command:

tailscale funnel 1234

The URL is generated from your device name in Tailscale. Very convenient if you already have Tailscale set up.

6.7Connecting to Extella: Step-by-Step

Once your tunnel is running and you have a public URL, add the model to Extella as a custom provider. This is done in agent settings (Section 5).

Field in Extella	What to enter	Example
provider	custom	custom
baseURL	Tunnel URL + /v1/	https://abc123.ngrok-free.app/v1/
apiKey	Any string (model doesn't validate)	lm-studio or ollama
model	Model name on server	llama-3.2-3b-instruct

💡 The model name in the model field must match exactly how the model is named on the server. In LM Studio, this is the model filename. In Ollama, check the output of ollama list.

After saving, the agent will use your local model for all requests. Concepts, KV Store, Rules, and all Experts work exactly the same—they're stored in Extella's cloud, only inference (text generation) goes through your model.

6.8Local Server Configuration

Before creating a tunnel, make sure your server accepts external connections. By default, most servers listen only on localhost and will reject requests coming through the tunnel.

LM Studio

Open the Local Server tab
Enable ✓ Enable CORS
Enable ✓ Allow connections from network
Click Start Server

💡 Without Enable CORS, requests from Extella will be rejected by the browser with a CORS policy error. This is a common pitfall.

llama.cpp — Server Launch Parameters

Launch with external connections enabled:

./llama-server \
  -m ./models/your-model.gguf \
  --port 8080 \
  --host 0.0.0.0 \
  -c 4096 \
  -ngl 35

Key parameter: --host 0.0.0.0 (accept connections from all sources, not just localhost). -ngl 35 specifies the number of layers on GPU (adjust based on your graphics card).

Ollama — enabling external connections

By default, Ollama only accepts localhost connections. You need to modify the environment variable:

macOS / Linux (temporary):

OLLAMA_HOST=0.0.0.0 ollama serve

Linux (permanent via systemd):

sudo systemctl edit ollama
# Add to the [Service] section:
Environment="OLLAMA_HOST=0.0.0.0"

⚠️ After changing OLLAMA_HOST, restart the service: sudo systemctl restart ollama

6.9Security and performance

Security — required reading

A public URL without protection means anyone can use your model—free of charge at your CPU/GPU expense. This isn't a theoretical threat: bots actively scan for open LLM endpoints.

Threat	Solution
Unauthorized model access	ngrok --basic-auth or Cloudflare Access
Data interception	Tunnels use HTTPS — data is encrypted
Prompt leakage	Don't tunnel the model unnecessarily, use VPN networks (Tailscale)
DDoS on model	Rate limiting in Cloudflare or ngrok Pro

💡 The simplest protection option is ngrok --basic-auth. Extella supports basic authentication: specify a user:password string in Base64 format in the apiKey field.

Performance

Tunneling adds 10–50ms latency. This is negligible for text generation.
For long responses, use streaming—users see text as it generates rather than waiting for completion.
Ensure the LLM server is running before starting the tunnel—otherwise the tunnel will be created, but requests will fail.
GPU acceleration remains on your machine—the tunnel doesn't affect inference speed.

6.10Quick reference

Tool	Command	URL Type	Recommendation
ngrok	ngrok http 1234	Random / static	Best start
cloudflared (quick)	cloudflared tunnel --url http://localhost:1234	Random	Testing without registration
cloudflared (persistent)	via config.yml + custom domain	Permanent	For regular use
LocalTunnel	lt --port 1234 --subdomain my-llm	Random / subdomain	Quick one-time test
Tailscale Funnel	tailscale funnel 1234	Permanent	If Tailscale is already installed

Bottom line: run ngrok http <port>, grab the URL from the output, append /v1/ to the end, paste it into the baseURL field in your Extella agent settings—and your local model is ready to go.

7.Parallel Execution

7.1The Physics of Parallelism

The formula is simple:

Sequential: T = T1 + T2 + T3 + ... + TN

Parallel: T = max(T1, T2, T3, ..., TN) + ~1 sec polling

Scenario	Sequential	parallel_task	Speedup
3 tasks x 15 sec	45 sec	16 sec	2.8x
5 tasks x 20 sec	100 sec	21 sec	4.8x
10 tasks x 30 sec	300 sec (5 min)	31 sec	9.7x
20 tasks x 30 sec	600 sec (10 min)	31 sec	19.4x

This isn't optimization — it's a formula change. Claude Code thinks like an LLM — sequentially. Extella parallel_task thinks like hardware — in parallel. Modern CPUs have multiple cores, each executing independent tasks simultaneously.

7.2Five Problems with Synchronous Mode

Problem 1: Timeout — Results Lost

Most synchronous LLM agents have a hard timeout of ~5 minutes. Processing 1000 files? Training? Deep analysis? — results vanish without warning. With parallel_task, each worker is independent of the LLM connection. Even if the connection drops, the OS process keeps running.

Problem 2: Linear Time Accumulation

Tasks	Synchronous	Parallel	Lost time
2 x 30s	60s	31s	29s
5 x 30s	150s	31s	119s
10 x 30s	300s	31s	269s

Problem 3: No Way to Cancel

In synchronous mode, there's no Cancel button. Spotted an error at second 3 of 300 — you still wait. In Task Registry, each task has a ✕ button (SIGTERM by PID) — instant termination.

Problem 4: No Visibility

With parallel_task, each task writes its status to /tmp/pt_{uuid}.json. The agent can read the file at any time to check task state: running, complete, error. For visual monitoring, you can optionally deploy task_registry_server — a custom Flask-based Expert with an HTML dashboard (see section 7.3 for details).

Problem 5: Lost Traceback on Error

When a synchronous process crashes — you get a generic message without context. In Extella, each parallel_task job writes the full traceback to /tmp/pt_{uuid}.json (error field). If task_registry_server is running, the traceback is also available via GET /tasks/<uuid>.

7.3Task Statuses and Diagnostics

The state of each parallel_task is stored in a /tmp/pt_{uuid}.json file on the device. This is the primary and only guaranteed tracking mechanism—it works without any additional components.

Field in file	Value	Description
status	"running"	Task is running
status	"complete"	Task completed successfully
status	"error"	Task failed with error
result	dict from worker	Result (only on complete)
error	traceback string	Error details (only onerror)

Example: reading task status manually:

import json
from pathlib import Path

data = json.loads(Path(f'/tmp/pt_{uuid}.json').read_text())
print(data['status'])   # 'running' / 'complete' / 'error'
print(data.get('result'))  # worker result if complete

7.3.1. Optional Visual Dashboard — task_registry_server

task_registry_server is not a built-in platform interface but a separate custom Expert (Flask application) that you can run for visual task monitoring in a browser. It is not required: parallel_task and wait_tasks work without it.

⚠️ task_registry_server is a custom component. If it hasn't been created by the agent in your account, ask the agent: "Create task_registry_server for monitoring parallel tasks."

When task_registry_server is running, it provides:

Feature	Description
HTML UI	Browser page:http://localhost:7755 — task list with auto-refresh
GET /tasks	JSON with all tasks
GET /tasks/<uuid>	Details of specific task + logs
DELETE /cancel/<uuid>	SIGTERM by task PID → status cancelled
POST /clear	Clear all records
GET /health	{ok: true, port: 7755, tasks: N}

Starting and managing (if the Expert exists):

# Start:
run_expert('task_registry_server')
# -> {"status": "success", "url": "http://localhost:7755/", "port": 7755}

# If already running:
# {"status": "already_running", "port": 7755, "tasks": 3}

# Force restart:
run_expert('task_registry_server', {'force_restart': '1'})

Persistence: task_registry_server stores the state of all tasks in /tmp/extella_task_registry.json. When the server restarts, the file is re-read and all records are restored. Without task_registry_server, state is stored only in /tmp/pt_{uuid}.json.

7.4UUID vs PID: A Fundamental Design Decision

Why not just use the process PID?

The OS reuses PIDs. A process terminates with PID 12345—a second later, a new process may receive the same PID. When canceling by PID, you might kill not your task but a completely different process.

UUID v4 is globally unique. Never reused. Independent of OS, containers, or reboots. Format: a1b2c3d4-e5f6-7890-abcd-ef1234567890.

All registry operations use UUID. PID is stored only for SIGTERM during cancellation.

7.5The __api_token__ Parameter (Required with task_registry_server)

Without __api_token__, workers cannot register with the Task Registry or report results.

Three reserved parameters are relevant when using task_registry_server:

__registry_url__  — Registry URL (default: http://localhost:7755)
__description__   — Human-readable task description for UI
__api_token__     — Extella API token for registering the task in the registry

Without task_registry_server, the __api_token__ parameter is not needed: parallel_task operates via /tmp/pt_{uuid}.json without server calls.

7.6The 4-Step Pattern: Complete Example

Step 0: Start Registry (must be first!)

registry = run_expert('task_registry_server')
print(registry)  # {"status": "success", "url": "http://localhost:7755/"}

Step 1: Get API token

API_TOKEN = kv_get('extella_api_token')['value']

Step 2: Launch workers in parallel

# Each call returns a UUID immediately (~0.5 sec)
# Worker runs in background as a separate OS process
r1 = run_expert('analyze_document', {
    'file_path': '/tmp/doc1.pdf',
    '__api_token__': API_TOKEN,
    '__description__': 'Analysis: doc1.pdf'
})
r2 = run_expert('analyze_document', {
    'file_path': '/tmp/doc2.pdf',
    '__api_token__': API_TOKEN,
    '__description__': 'Analysis: doc2.pdf'
})
r3 = run_expert('analyze_document', {
    'file_path': '/tmp/doc3.pdf',
    '__api_token__': API_TOKEN,
    '__description__': 'Analysis: doc3.pdf'
})
uuid1 = r1['uuid']
uuid2 = r2['uuid']
uuid3 = r3['uuid']
# All three launched in ~1.5 sec total

Step 3: Wait for all to complete

import json
results = run_expert('demo_wait_tasks', {
    'uuids': json.dumps([uuid1, uuid2, uuid3]),
    'timeout': 120,
    'poll_interval': 2
})
# Polls http://localhost:7755/tasks every 2 sec
# Returns when ALL tasks complete or timeout
# -> {
#   "status": "complete",
#   "summary": {"total": 3, "complete": 3, "error": 0},
#   "elapsed_seconds": 31.2,
#   "results": {uuid1: {...}, uuid2: {...}, uuid3: {...}}
# }

Processing results

if results['summary']['error'] > 0:
    # Handle failed tasks
    for uuid, result in results['results'].items():
        if result.get('status') == 'error':
            print(f'Task {uuid} failed: {result.get("error")}')
            # Retry or log
else:
    print(f'All {results["summary"]["total"]} tasks completed')
    print(f'Time: {results["elapsed_seconds"]}s')
    for uuid, result in results['results'].items():
        print(f'{uuid[:8]}...: {result["result"]}')

7.7Comparison with synchronous mode

Characteristic	Synchronous	parallel_task
Time for N tasks	N x T	max(T) + ~1s
Task IDs	No	UUID v4 (globally unique)
Cancellation	No	✅ Cancel (SIGTERM)
Visibility	No	✅ /tmp/pt_{uuid}.json; optionally — UI :7755 (if task_registry_server is running)
Traceback on error	Lost	✅ Saved to registry
Timeout	~5 min, result lost	Configurable
LLM dependency	Full	Process independent
Persistence	No	JSONin /tmp, survives restart

7.8When to use parallel_task

Condition	Choice
Task A is required for Task B (dependency)	Synchronous mode
Single task	Synchronous (overhead not worth it)
Each task < 5 sec	Synchronous (registration overhead > benefit)
2+ independent tasks > 5 sec	parallel_task
Task > 1 minute	parallel_task(timeout protection)
Need cancellation support	parallel_task
Progress visibility needed	parallel_task

Practical parallel_task examples: scraping 10 websites, analyzing a batch of 50 CSVs, generating reports across different metrics, checking API endpoints for test coverage.

7.9Critical rules

task_registry_server is an optional component. parallel_task works without it (state stored in /tmp/pt_{uuid}.json). If task_registry_server is used, start it BEFORE workers; otherwise POST /register will get ConnectionRefused.

UUID, not PID — UUID is globally unique and not reused by the OS
/tmp/extella_task_registry.json — single source of truth, survives restarts
Worker ALWAYS calls /update — even on crash (try/except -> POST error status with traceback)
__api_token__ = kv_get('extella_api_token')['value'] — avoid hardcoding tokens in your code
Pass uuids as a JSON string: json.dumps([uuid1, uuid2]) — not a Python list
One task at a time for heavy workloads — avoid running more than N workers simultaneously

8.CSPL

CSPL (Container Specific Programming Language) is Extella's paradigm for building automations: instead of having the LLM generate all the code, the LLM writes a compact description and a deterministic handler generates the actual code from it.

In Section 7, you worked with parallel_task and nohup — two CSPL modes. Now let's examine the complete CSPL architecture and why it fundamentally changes how complex automations are built.

8.1The Problem: LLMs Struggle with Large-Scale Precise Code

A real experiment — creating Godot Level 3 (a complete scene with 193 nodes):

Tool	Tokens	Errors	Retries	Total
CSPL	~1 000	0	1	Perfect
fython (LLM generates all Python)	~8 000	7	4	Many revisions
Claude Code	~15 000	12	6	Very slow

LLMs excel at planning — describing architecture, breaking down tasks. But token-by-token generation with probabilistic sampling is fundamentally unsuited for large-scale syntactically precise code. A single typo breaks the entire project. Every error means rerunning, thousands of tokens, minutes of your time.

The solution: shift the paradigm. Instead of "LLM writes all the code" → "LLM writes a compact description, deterministic handler generates the code."

8.2The WHAT vs HOW Principle

LLM (WHAT): generates a compact JSON description of the structure (~200 tokens for a 193-node scene). This is a declarative description: what objects exist, how they connect, what parameters they have.
Handler (HOW): a Python module that takes JSON and deterministically generates complete code. Same input — always same output. Zero hallucinations.

Example: an Expert with cspl=godot_level_3 contains not Python but a JSON scene description in its body. The handler generates .tscn files and GDScript. The LLM wrote 200 tokens of JSON instead of 8000 tokens of GDScript. Errors — zero.

cspl=godot_level_3:

# Expert body — not Python, but JSON description:
{
  "scene": "main_level",
  "nodes": [
    {"id": 1, "type": "Node2D", "name": "Player", "pos": [100, 200]},
    {"id": 2, "type": "Area2D", "name": "Hitbox", "parent": 1},
    {"id": 3, "type": "Sprite2D", "name": "Sprite", "parent": 1}
  ],
  "signals": [{"from": 2, "signal": "body_entered", "to": 1, "method": "on_hit"}]
}
# Handler godot_level_3 generates complete .tscn + GDScript from this

8.3Complete Table of CSPL Modes

Mode	Body type	$extens	Returns	Synchronicity	When to use
fython (default)	Python def fn()	+	dict from function	Synchronous	Regular Experts (Section 4)
nohup	Python script (no def)	-	{pid, log_file}	Detached process	Orchestrators, ETL, long-running tasks
parallel_task	Python def fn()	+	{uuid}	Asynchronous, /tmp/pt_{uuid}.json	Parallel Tasks (Section 7)
shell	Bash commands	-	{stdout, returncode}	Synchronous	CLI wrappers: git, docker, ffmpeg
interpreter	Code in any language	-	Depends on language	Synchronous	Go, R, SQL, Node.js, Julia
cspl_builder_code	Python handler	+	—	Synchronous	Creating a new CSPL mode

8.4nohup Mode — Complete Specification

nohup differs fundamentally from fython. The body is a pure Python script that executes from start to finish. The Listener writes it to a temporary file and launches it via subprocess.Popen(start_new_session=True) — the process detaches and runs independently.

1. No def fn() — pure script, top to bottom

# fython (regular expert):
def my_expert(param: str = '') -> dict:
    # ... logic
    return {"status": "success"}

# nohup (script without function):
import os, datetime

log_path = '/tmp/nohup_test.txt'
with open(log_path, 'w') as f:
    f.write(f'ran at {datetime.datetime.now()}\n')
    f.write(f'cwd: {os.getcwd()}\n')
# No return — script simply executes and exits

2. No $extens() — manual include() (optional)

In nohup mode, the $extens() directive is not processed (no fython wrapper). You can install dependencies and import them just like in regular Python — using pip or any other standard method. Alternatively, implement include() directly at the beginning of your script:

import sys, subprocess

def include(module, commands):
    try:
        exec(module, globals())
        return True
    except:
        for cmd in commands:
            parts = cmd.split()
            if parts[0] in ('extella-pip', 'pip', 'pip3'):
                subprocess.run([sys.executable, '-m', 'pip'] + parts[1:])
        try:
            exec(module, globals())
            return True
        except:
            return False

include('import pandas', ['extella-pip install pandas'])
include('import requests', ['extella-pip install requests'])
# pandas and requests are now available

3. Parameters via {{placeholders}}

Kwargs are substituted into the script text BEFORE execution. Use {{parameter_name}} in your code:

# Parameters: api_token='abc123', file_path='/tmp/data.csv', output_dir='/tmp'

import pandas as pd

api_token = '{{api_token}}'   # <- will be replaced with 'abc123'
file_path = '{{file_path}}'   # <- will be replaced with '/tmp/data.csv'
output_dir = '{{output_dir}}' # <- will be replaced with '/tmp'

df = pd.read_csv(file_path)
result = df.groupby('category').sum()
result.to_csv(f'{output_dir}/output.csv', index=False)

4. No return — result via file

import json
from pathlib import Path

# ... perform work ...

result = {
    'status': 'success',
    'processed_rows': 15000,
    'errors': 3,
    'output_file': '/tmp/result.csv'
}
# Must write result:
Path('/tmp/nohup_my_script_result.json').write_text(
    json.dumps(result, ensure_ascii=False)
)

5. Logs and management

stdout/stderr → /tmp/nohup_<name>.log. The agent receives an immediate response: {pid, log_file, pid_file}. Monitor by reading the log file. On completion, read result.json.

8.4.1. wait_tasks mode — synchronization barrier

wait_tasks is a CSPL mode paired with parallel_task: it accepts a list of UUIDs for running tasks and waits for all to complete (or until timeout). It polls /tmp/pt_{uuid}.json every 0.3–2 seconds.

Parameter	Type	Default	Description
uuids	str (JSON)	required	JSON array of UUIDs: json.dumps([uuid1, uuid2]) — strictly a string, not a Python list
timeout	int	120	Maximum wait time in seconds
poll_interval	float	2	File polling interval (seconds)

What demo_wait_tasks returns:

{
  "results": {
    "uuid-1...": {"status": "complete", "result": {...}},
    "uuid-2...": {"status": "complete", "result": {...}}
  },
  "summary": "2/2 completed",
  "elapsed_seconds": 31.2
}

Entry point: bridge expert demo_wait_tasks (saved with cspl=wait_tasks). This is what you call via run_expert — the wait_tasks CSPL mode is not called directly.

8.4.2. shell and interpreter modes

Two additional modes for CLI tools and code in other languages. Both support {{placeholders}} for kwargs.

shell — built-in bash runner

The Expert body consists of bash commands. No function, no $extens. The Listener executes via subprocess and returns {stdout, stderr, returncode}.

# cspl='shell' — video conversion via ffmpeg:
ffmpeg -i {{input_path}} -vf scale=1280:720 -c:a copy {{output_path}}

# cspl='shell' — git pull:
git -C {{repo_path}} fetch origin
git -C {{repo_path}} pull --rebase

Use shell for	Examples
Media Processing	ffmpeg, ImageMagick convert, sox
Documents	pandoc, wkhtmltopdf, libreoffice --headless
Git Operations	git fetch, git pull, git tag, git log
System utilities	rsync, tar, curl, wget, find
Containers and orchestration	docker build/run, kubectl apply

interpreter — code in any installed language

The Expert body is code in any language. The handler compiles/interprets it on the device. Kwargs are accessible via {{placeholders}} as in nohup.

# cspl='interpreter' — Go code:
package main
import "fmt"
func main() {
    data := "{{input}}"
    fmt.Println("Processed:", data)
}

Language	When to Use
Go	High-performance data processing, binary operations
R	Statistical analysis, ML models, ggplot visualizations
SQL	Analytical queries to local databases
Node.js	JSON-processing, working with npm-ecosystem
Julia	Scientific and matrix computations
Ruby	System administration, Rakefile scenarios

8.5DSL: Domain-Specific Languages

CSPL enables creating compact languages for specific domains. Instead of 400 lines of HTML/CSS/JS — 40 lines of JSON, and the handler generates a complete website.

Domain	CSPL Mode	Generates	Token Savings
Web API	api_dsl	FastAPI + Pydantic + OpenAPI	10x
Database	schema_dsl	SQL DDL + Alembic migrations	8x
CI/CD pipeline	pipeline_dsl	GitHub Actions YAML	12x
Godot levels	godot_level_3	.tscn + GDScript	15x
HDL schematics	hdl_dsl	Verilog / VHDL	20x
Tests	test_dsl	pytest fixtures + test cases	6x
Markdown reports	mini_report_dsl	HTML or Markdown	8x

Example DSL for Web API (6 lines instead of hundreds):

# Expert body with cspl='api_dsl':
API UserService BASE /api/users AUTH bearer
GET  /      -> list[User]  CACHE 60
POST /      -> User        BODY {name: str, email: str}
GET  /:id   -> User
DELETE /:id -> void
# Handler generates a complete FastAPI router, Pydantic models, and OpenAPI documentation

⚠️ The DSL modes listed in the table above (api_dsl, schema_dsl, godot_level_3, etc.) are examples of custom handlers created via cspl_builder_code. They are not included in the standard Extella distribution. Only the following are built-in (available out of the box): fython, nohup, parallel_task, wait_tasks, shell, interpreter, cspl_builder_code.

8.6cspl_builder_code: Creating Your Own CSPL

A meta-mode: you create a new CSPL type directly from the chat, without modifying platform code. The architecture is extensible on the fly.

Process:

1. Describe the desired handler: "Create a CSPL for FastAPI from a JSON schema"
2. The agent writes the Python handler code: a function that takes the code body and generates an artifact
3. The handler is registered as a new cspl type in the system
4. The new mode is available immediately: cspl='fastapi_generator'

# Example of a simple DSL handler:
def my_report_dsl(filtered_source_code='', func_name='', kwargs=None, **extra):
    lines = filtered_source_code.strip().split('\n')
    html_parts = []
    for line in lines:
        if line.startswith('TITLE'):
            html_parts.append(f'<h1>{line[6:]}</h1>')
        elif line.startswith('SECTION'):
            html_parts.append(f'<h2>{line[8:]}</h2>')
        elif line.startswith('> '):
            html_parts.append(f'<p>{line[2:]}</p>')
    html = '<html><body>' + ''.join(html_parts) + '</body></html>'
    output = Path('/tmp/report.html')
    output.write_text(html)
    return {'status': 'success', 'output': str(output), 'sections': len(html_parts)}

8.7The Recursive Nature of CSPL: No Ceiling

Each handler can use other handlers. There is no upper limit:

Level 1: fython with JSON description → handler generates Python classes
Level 2: interpreter with Go code → handler compiles Go binary
Level 3: Go uses C library → handler generates ctypes wrapper
Level 4: C on ARM64 → handler generates inline assembly for optimization
Level N: ...

CSPL is a bridge between declarative description (what LLMs do well) and imperative implementation (what LLMs do poorly). This bridge is built from Python, Go, C, bash, SQL, GDScript, Terraform, Dockerfile.

8.8When NOT to Use CSPL

In practice: when in doubt, use fython. CSPL pays off only for recurring task classes where token and error savings are significant.

Three Conditions: When CSPL Is Justified

Condition	Validation Question	If NO →
1. Class of Repetitive Tasks	Will this be used multiple times, not just once?	fython — CSPL requires investment in handler
2. Logic more complex data	Does the output format have internal dependencies that require computation?	fython — the LLM can handle it directly
3. Dataandlogic are separated	Is it clear: here's what changes each time, here's what stays the same?	fython — boundary is fuzzy, CSPL won't provide benefit

CSPL makes sense only when all three conditions are met simultaneously. If even one is violated, use fython.

Examples of applying the rule:

Task	Cond.1	Cond.2	Cond.3	Output
Generate 50 similar reports with different data	✅	✅	✅	CSPL ✅
A one-off parsing script CSV	❌	—	—	fython
Godot-levels (recurring pattern)	✅	✅	✅	CSPL ✅
Calling OpenAI API with different prompts	✅	❌	—	fython — logic simpler data
FastAPI-routers from JSON-schemas (10+ items)	✅	✅	✅	CSPL ✅

Situations Where CSPL Is Overkill

Situation	Recommendation
Task up to 100 lines of code	fython — LLM will write without errors
Unique one-time task	fython — CSPL requires repetitive pattern
Need result immediately	fython or shell — nohup asynchronous
Handlermore complex than the task itself	Handler must generate 10x more code
No ready-made handler for the domain	First create the handler using cspl_builder_code

9.REST API

9.1Three Scenarios: Why You Need the API

Scenario 1: Embedding in Your Product

You're building a CRM, ERP, chatbot, or any platform. Instead of training a model from scratch, call a ready-made Extella agent via API. Your backend sends a prompt — Extella returns a response. Your product's end user never knows Extella is working under the hood.

Scenario 2: Automating Background Tasks

Every night a script pulls new documents, runs the agent, gets the analysis, and writes the results. CI/CD pipelines use Experts to generate documentation, validate code, and process logs. The API supports async mode: async=true → task_id → polling /api/task/check. Ideal for background tasks that don't block the main process.

Scenario 3: Exporting Data for Analytics and Fine-Tuning

/api/agent/export/chats — complete conversation history (a valuable dataset for fine-tuning). /api/agent/export/calls — call logs: model, latency_ms, prompt_tokens, completion_tokens, created_at. Valuable for AI cost analysis, prompt optimization, and fine-tuning local models.

9.2Base URL and the Sneaky 405 Error

⚠️ API Version: docs.extella.ai describes v0.7.0 — 48 endpoints, 11 sections. Primary authentication header: X-Auth-Token. The Authorization: Bearer method is accepted as an alternative.

The most common cause of HTTP 405 (Method Not Allowed) is sending an API request to the wrong URL.

URL	Purpose	Rule
https://api.extella.ai/api/agent/*	Agents API	ALL requests to agents
https://api.extella.ai/api/expert/*	Experts API	ALL Expert requests
https://api.extella.ai/api/concept/*	Concepts API	Working with Concepts
https://api.extella.ai/api/kv/*	KV Store API	Key-Value
https://api.extella.ai/api/rules/*	Rules API	Rules
https://api.extella.ai/api/token/*	Tokens API	Token Management
https://api.extella.ai/api/profile/*	Profiles API	Profile Management

Rule: for EVERYTHING starting with /api/ — use https://api.extella.ai

9.3Authentication: Two Equivalent Methods

# Method 1 (preferred — Bearer standard):
Authorization: Bearer <your-token>

# Method 2 (Extella-specific):
X-Auth-Token: <your-token>

# For Database Services (/api/concept/*, /api/kv/*)
# passing user_id in the request body also works,
# but the authorization header is preferred

Getting your first token via the agent: type "Generate an API token" — you'll get one instantly. Via API:

POST https://api.extella.ai/api/token/generate
Authorization: Bearer <existing_token>
Content-Type: application/json

{"name": "Production API"}
# -> {"token": "a1b2c3...", "user_id": "user_abc", "name": "Production API"}

Validation (rate limit: 30 requests/min — validate once at startup, not before each request):

POST https://api.extella.ai/api/token/validate
{"token": "your-token"}
# -> {"valid": true, "user_id": "user_abc"}

9.4OpenAI-Compatible Mode

If your application already works with OpenAI, switching to Extella requires minimal changes:

from openai import OpenAI

client = OpenAI(
    api_key="your_extella_token",
    base_url="https://api.extella.ai/v1",
)

response = client.chat.completions.create(
    model="gpt-4",  # ignored — agent's model is used
    messages=[
        {"role": "user", "content": "What is REST API?"}
    ],
    temperature=0.7,
)
print(response.choices[0].message.content)

/api/agent/run modes:

sync (default) — blocking call, waits for complete response
stream — Server-Sent Events, tokens delivered as generated (Accept: text/event-stream)
async — immediately returns task_id, retrieve result via /api/task/check

Note (docs.extella.ai): agents are launched via POST /api/agent/run with agent_id passed in the request body (json={"agent_id": "agent_...", "input": "..."}). The X-Agent-Id header is also accepted as an alternative.

9.5Key Endpoints Reference

Method	Endpoint	Description
POST	/api/agent/run	Run agent (sync/stream/async)
POST	/api/agent/get	Get agent config
POST	/api/agent/create	Create agent (requires Pro)
POST	/api/agent/update	Update agent
POST	/api/agent/list	List agents
POST	/api/agent/export/chats	Export conversation history
POST	/api/agent/export/calls	Call log with metrics (parameters in request body)
POST	/api/profile/create	Create profile
POST	/api/profile/add_agent	Add agent to profile
POST	/api/profile/delete	Delete profile (agents remain)
POST	/api/profile/list	List profiles
POST	/api/expert/run	Run Expert
POST	/api/expert/save	Save Expert
GET	/api/expert/get/<name>	Get Expert by name
DELETE	/api/expert/delete/<name>	Delete Expert
POST	/api/blocks/search	Semantic search for Experts
POST	/api/task/check (or /api/tasks/check)	Async task status
POST	/api/concept/add	Add Concept
POST	/api/concept/search	Semantic search for Concepts
POST	/api/concept/update	Update Concept
POST	/api/concept/remove	Delete Concept
POST	/api/concept/list	List Concepts
POST	/api/kv/set	Set KV pair
POST	/api/kv/get	Get KV pair
POST	/api/kv/search	Semantic search in KV
POST	/api/kv/list	List KV pairs
POST	/api/rules/add	Add rule
POST	/api/rules/list	List rules
POST	/api/rules/update	Update rule
POST	/api/rules/remove	Delete rule
POST	/api/token/generate	Create token
POST	/api/token/validate	Validate token
POST	/api/token/revoke	Revoke token
POST	/api/token/list	List tokens
POST	/api/defaults/set_target	Set default device
POST	/api/defaults/get_target	Get default device

Rate limits: 60 req/min per IP, 20 req/min for /api/agent/run. HTTP 429 — check Retry-After header, use exponential backoff.

Endpoints not listed in the table above (complete list at docs.extella.ai):

Method	Endpoint	Description
GET	/api/health	Health Check — server status
POST	/api/agent/delete	Delete an agent (agent_id in request body)
POST	/api/kv/remove	Delete KV pair (key in body)
POST	/api/targets/add	Add device (target, description)
POST	/api/targets/list	List devices
POST	/api/targets/search	Semantic device search
POST	/api/targets/update	Update device (id required)
POST	/api/targets/remove	Delete device (id required)
POST	/api/experts_db/list	List experts from DB (metadata)

9.6Field Name Pitfalls

Pitfall 1: blocks/search returns matches, not results

data = response.json()
# WRONG:
for r in data['results']:     # KeyError!
    print(r['similarity'])

# CORRECT:
for block in data['matches']:  # 'matches'
    print(block['score'])      # 'score', not 'similarity'

Pitfall 2: expert/get uses camelCase

expert = response.json()
# CORRECT field names:
code   = expert['expert_code']      # not 'code'
params = expert['expert_params']    # not 'kwargs'
name   = expert['expert_name']      # not 'name'
created = expert['createdAt']       # camelCase!

Pitfall 3: export/calls — parameters also go in body (POST), same as export/chats

# export/chats — parameters in body:
requests.post(BASE+'/api/agent/export/chats',
              json={'by': 'agent', 'id': 'agent_...'})

# export/calls — parameters also in body (POST, not GET):
requests.post(BASE+'/api/agent/export/calls',
              headers=HEADERS,
              json={'by': 'agent', 'id': 'agent_...',
                    'limit': 200, 'from': '2026-01-01T00:00:00Z'})

Pitfall 4: No _id, only id

API responses don't use MongoDB-style _id. They use plain id. There's also no __v field (document version). This is a REST API, not Mongoose.

9.7Complete Working Python Example

Complete working example (save as extella_client.py):

import os, time, requests

BASE = "https://api.extella.ai"
TOKEN = os.environ["EXTELLA_API_TOKEN"]  # store in .env, never in code
HEADERS = {"X-Auth-Token": TOKEN, "Content-Type": "application/json"}

# 1. Validate token (once at startup, not before every request)
r = requests.post(f"{BASE}/api/token/validate", json={"token": TOKEN})
assert r.json()["valid"], f"Invalid token: {r.text}"

# 2. Create agent (Pro plan required; on Free plan use an existing agent_id)
r = requests.post(f"{BASE}/api/agent/create", headers=HEADERS, json={
    "name": "My API Agent",
    "instructions": "You are a helpful assistant. Respond concisely.",
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001"
})
if r.status_code == 403:
    raise SystemExit("Pro plan required for /api/agent/create — use an existing agent_id.")
agent_id = r.json()["agent_id"]
print(f"Agent: {agent_id}")

# 3. Synchronous call — response.output is a list of items
r = requests.post(f"{BASE}/api/agent/run", headers=HEADERS,
                  json={"agent_id": agent_id, "input": "What is 2 + 2?"})
text = next((c["text"] for item in r.json()["output"] if item["type"] == "message"
             for c in item["content"] if c["type"] == "output_text"), "")
print("Sync:", text)

# 4. Async call — for tasks > 60 sec
r = requests.post(f"{BASE}/api/agent/run", headers=HEADERS,
                  json={"agent_id": agent_id,
                        "input": "Summarize AI trends in 2025.",
                        "async": True})
task_id = r.json()["task_id"]

for _ in range(30):          # poll up to 60 sec
    r = requests.post(f"{BASE}/api/task/check", headers=HEADERS,
                      json={"task_id": task_id})
    status = r.json()["status"]
    if status == "complete":
        print("Async:", r.json()["output"]); break
    elif status == "error":
        print("Error:", r.json().get("error")); break
    time.sleep(2)

# 5. Semantic search across Experts
r = requests.post(f"{BASE}/api/blocks/search", headers=HEADERS,
                  json={"agent_id": agent_id, "query": "send telegram message"})
for block in r.json()["matches"]:         # 'matches', not 'results'
    print(block["expert_name"], block["score"])   # 'score', not 'similarity'

9.8Secure Integration Checklist

Store token in environment variables: os.environ['EXTELLA_API_TOKEN'], not in code
Base URL: https://api.extella.ai for all /api/ requests
Rate limits: catch HTTP 429, read Retry-After, use exponential backoff
Save agent_id after creation — you can't run an agent without it
async=true for tasks > 60 sec — don't block the main thread
stream=True for UX — use Accept: text/event-stream when users expect real-time responses
store=False during debugging — avoid polluting chat history
global=True when searching Concepts — otherwise you only search the current agent's memory
blocks/search: field is matches, not results; score, not similarity
expert/get: expert_code, expert_params, createdAt (camelCase)
export/calls: POST with parameters in body: {by, id, limit, from}
Pro plan required for /api/agent/create
Validate token once at startup — not before every request (rate limit 30/min)

9.9Typical Workflow from Zero to Integration

1. Get token:          POST /api/token/generate       -> save to .env
2. Create agent:       POST /api/agent/create         -> agent_id
3. Create profile:     POST /api/profile/create       -> profile_id
4. Add to profile:     POST /api/profile/add_agent    -> {profile_id, agent_id}
5. Run synchronously:  POST /api/agent/run + X-Agent-Id: agent_id
   Run async:          POST /api/agent/run + {async: true} -> task_id
   Check status:       POST /api/task/check  {task_id: ...}
6. Export chats:       POST /api/agent/export/chats   -> dataset for fine-tuning
7. Export calls:       POST /api/agent/export/calls    -> analytics

This pipeline covers 95% of integration scenarios. For complex cases (parallel Experts, semantic search, KV Store), refer to sections 3, 4, and 7.

Extella Guide