1.What is Extella

1.1Why an Agent, Not a Chatbot

You've probably used ChatGPT, Claude, or other large language models. Every conversation starts from scratch. The model doesn't remember yesterday. It can't open your file, send an email, run a script, or save a report. At best, it generates code that you then copy and run yourself.

Extella is not a chatbot. It's an AI agent that executes tasks, not advises how to do them.

The formula: AI chat + automation + persistent memory + execution on your device + personal toolkit — all in one place.

You describe a need. Extella creates an Expert (an executable module), saves it to your library, and runs it. The result — an actual file, a sent message, processed data — stays on your device. This isn't text in a chat. It's an object that persists after the session ends.

1.2Extella vs Standard LLMs — Fundamental Differences

ChatGPT tells you what to do. Extella does it for you.

FeatureChatGPT / Standard LLMsExtella
Primary PurposeText generationReal action execution
MemoryCurrent chat only. New session — clean slatePersistent: Concepts, Rules, Experts are saved permanently
Code ExecutionGenerates code. You run it yourselfRuns automatically via Experts on your device
ReusabilityEach request — from scratchCreated Expert runs repeatedly with any parameters
SecurityData goes to OpenAI/AnthropicFiles are processed locally and never leave your device
PersonalizationSystem prompt is fixedRules — dynamic prompt, changes during execution
OutcomeText in chatFiles, data, reports, automated processes
IntegrationsPlugins (limited set)Any: Telegram, email, API, file system

1.3Architecture: How It Works

Extella uses a client-server architecture with two components:

  • Server component — the AI brain: language model, knowledge base, Expert and agent management.
  • Client component — Listener: a background process on your device that receives tasks from the agent and runs them locally.

Listener is the executor. When the agent says "create a PDF," Listener runs the corresponding Expert on your machine with full access to your files.

By default, Experts run directly in the Listener environment. When strict dependency isolation is needed, use the isolated=true parameter — the Expert then runs in a clean Python venv. This isn't Docker: no heavy virtualization, no root access required. Full access to the user's file system is preserved.

1.4Data Security

  • Anthropic Claude is the primary model for chat and code generation in Extella. It processes text requests. The Pro plan allows connecting your own LLM providers and local models.
  • OpenAI API is used only for vector embeddings (semantic search of Concepts, KV Store, Rules, and Experts). Data is vectorized but not stored by OpenAI.
  • Corporate files and API keys are NEVER sent to providers. Files are processed locally. Keys are stored in an encrypted KV Store on your device.

1.5Key Terms: Glossary

Before moving forward, it's important to understand the six core entities of the platform. Each will be explained in detail in the following sections.

EntityWhat It IsAnalogy
ExpertSaved executable module — a function that does one specific thingTool on your shelf
AgentAI-specialist with model, tools, instructions and memoryEmployee with a job title
ProfileGroup of agents under one project/clientA department in a company
ConceptUnit of technical knowledge with semantic searchNote in knowledge base
KV StoreEncrypted key-value store: API keys, tokens, dataSafe with deposit boxes
RuleBehavioral instruction embedded in the agent's system promptJob description
TeamGroup of agents working on one project with shared Concepts, Rules, and an orchestrating agentDepartment or project team

1.6The Compounding Effect: Why Extella Gets Stronger Every Day

Typical AI tools don't accumulate value. Every new ChatGPT conversation starts from scratch. Extella works fundamentally differently.

Experts:

  • Day 1: Created an Expert "read Excel spreadsheet" → saved
  • Day 30: 15 Experts — a library of tools
  • Day 90: 50+ Experts — you're not "asking AI" anymore — you're "running tools"

Concepts:

  • First time solving a PDF issue → saved as a Concept
  • A pattern for working with a specific API → saved
  • Each Concept makes the system smarter. This is institutional memory

Rules:

  • First: "always ask for confirmation before deleting"
  • Then: "save files to ~/Documents/Extella/"
  • Then: "if task > 1 step — describe the plan first"

With each Rule, the agent becomes more precise. After 30 days, Extella understands you better than ever.

Metaphor: A Stone Bridge

Each task you solve with Extella is a stone in the foundation. One stone changes nothing. A hundred stones build a bridge to automation of any complexity. In a year, you'll have a personal system that knows your context, tools, and preferences—one that grows stronger every day.

2.Quick Start

A step-by-step guide from downloading the application to completing your first task. Follow the steps in order—each builds on the previous one.

2.1Step 1: Installing the Application and Creating an Account

  • Download Extella Desktop from www.extella.ai for your OS (macOS, Linux, Windows).
  • Install it like any standard application.
  • Create an account and sign in.

Immediately after signing in, Listener starts in the background and performs initial registration:

  • → Creates a device record in the system
  • → Retrieves a unique Device ID (Target UUID)
  • → Establishes a connection with the Extella server

System tray status: "Connected" — everything is working.

2.2Step 2: Understanding Device ID

Device ID is your device's unique identifier. It looks like: 09f7d600-996c-4c9f-a19e-f5bfe433da0e.

Why you need it:

  • The agent knows WHERE to execute tasks. "Read my file" — the system understands which machine the file is on.
  • If you have multiple devices (Mac Studio at home, MacBook at the office) — each has its own Device ID. You choose where to run the task.

Where to find Device ID:

  • In the Extella Desktop interface — bottom section of the application.
  • Via agent: "show my devices" — the agent returns a list with UUIDs and descriptions.
  • Via API: POST https://api.extella.ai/api/defaults/get_target

Default Target — the device where Experts run by default. Changed via set_default_target.

2.3Step 3: Getting an API Token

API token — a key string that verifies your identity in the system. Required for Listener authentication and programmatic calls.

Via agent (easiest method):

Type in chat: "Generate an API token for me" — the agent creates a token instantly. Optionally specify a name, e.g., "Mac Studio listener".

Copy and save the token — it's used for Listener configuration.

Managing tokens via agent:

  • "Show my tokens" — list of all active tokens
  • "Revoke token [name]" — instant deactivation

2.4Step 4: Your First Agent Request

Open the Extella Desktop interface. Type your first natural language request:

"Create a 3-slide PDF presentation about our product. Save it to Downloads."

What happens:

  • The agent analyzes the request
  • Creates an Expert (an executable module — e.g., using ReportLab)
  • Runs it on your device via the Listener
  • Within seconds, a PDF file appears in ~/Downloads/

This is your first Extella result.

2.5Step 5: Expert saved to library

The created Expert is automatically saved to your personal library with a name (e.g., generate_product_presentation_pdf). Now you can:

  • Run it again with different parameters (different text, different title)
  • Modify it: "change the background color to blue"
  • Use it as part of a more complex workflow

This is the key difference from chatbots: solve a task once—the tool remains forever.

2.6Step 6: First Rule

A Rule is an instruction that applies to every interaction. Add your first rule:

"Add a rule: always respond in English"

Or other useful rules:

  • "Save all files to ~/Documents/Extella/"
  • "If a task takes more than one step—describe the plan first"
  • "Always ask for confirmation before deleting data"

Now every time the agent generates a response or creates a file, these rules are applied automatically—no reminders needed.

2.7Step 7: First Concepts

Concepts are the agent's long-term memory. They accumulate automatically. After the first completed task, the agent saves:

  • Which library worked best for PDF generation
  • How to handle errors from a specific API
  • Which approach worked for your task

You can also add them manually:

"Remember: I prefer pandas over openpyxl for working with tables"

The more tasks you solve, the smarter your agent becomes.

2.8Checklist: Ready to Go

#ActionStatus
1Extella Desktop installed and running
2Listener shows Connected in system tray
3Device ID registered (visible in interface)
4API token obtained and saved
5First request sent and response received
6First Expert created and visible in library
7First Rule added
8First Concept saved

If all 8 items are checked, your platform is configured. Now the real fun begins: scaling the system.

3.KV Store, Concepts, Rules

Section 1 provided a brief glossary. Section 2 covered the practical quickstart. This section examines each of the three components in detail: how they work, what they can do, when to use them, and what not to store.

Note: the default agent comes with a set of pre-installed basic Rules and Concepts. You can view and modify them at any time.

3.1Why Three Storage Systems Instead of One

ExampleSolutionWhy this approach
Look up an API key by exact nameKV Store — exact search by keyData is encrypted, searched exactly and quickly
Recall context from a past PDF sessionConcepts — semantic searchSemantic search, not keyword matching
Agent always responds in EnglishRules — auto-loaded into every promptNo need to search — always active

KV Store is a "vault"—you store items with an exact name and retrieve them by that name. Concepts is a "knowledge search index"—it finds information by query meaning, even when the wording differs. Rules are "reflexes"—they load automatically with every user message, so the agent operates with them from the first word of the conversation.

3.2KV Store — Encrypted Data Storage

Each KV entry has three fields: a unique key name, a value (text or JSON, up to 1 GB), and a description used for semantic search.

Encryption and PIN

All values are encrypted with your PIN. The agent decrypts them automatically via kv_get. This protects credentials from leaking in logs and exports.

Important when running an Expert on a different device: if the PIN on that device differs, decryption returns garbage—you'll see an 'invalid decimal literal' error. Solution: pass the pin explicitly when calling: run_expert('name', {}, pin='your_pin').

What KV Store Contains

Typical data categories:

CategoryKey examples
Service API keystelegram_bot_token, anthropic_api_key, openai_api_key, tavily_api_key
Device Target UUIDsmac_studio_target, ubuntu_vm_target, macbook_target
URLs and endpointsaios_backend_url, webhook_slack, api_crm_url
Session datasession_history, cache_results (JSON arrays)
Configurationstypefully_social_set_id, redis_url, redis_token

KV Store holds more than just short strings. A value can be a complete JSON array with session data history—KV becomes a fast key-value cache for agents.

Semantic Search in KV Store

Each record has an embedding (OpenAI text-embedding-3-small) generated from key + description. This enables semantic search:

# Forgot the exact key name?
kv_search("telegram bot token")
# Finds: telegram_bot_token, telegram_bot_token_taskboard
# Works even if description is in Russian and query in English, or vice versa

Writing Good Descriptions

A description isn't a comment—it's a semantic search index. The more precise and informative, the more reliably agents will find the key.

# Good description:
kv_set(key="anthropic_key", value="sk-...",
       description="Anthropic Claude API key (main production, updated 2025-03)")

# Bad description — search won't help:
kv_set(key="k1", value="sk-...", description="key")

Agent Auto-Search Algorithm — The Golden Rule

Agents NEVER ask for credentials first. They follow a strict algorithm:

  • 1. Need a key? → kv_search("<service> key token")
  • 2. Found it? → kv_get(key) — automatic decryption
  • 3. Not found? → only then ask the user
  • 4. User provided it? → kv_set + permanent storage

If you saved tavily_api_key with the description "Tavily web search API key" once, the next time you request web search, the agent finds it automatically—without asking a single question.

Core Principle: Experts Never Access KV Directly

This is a fundamental architectural security principle. An Expert is a pure function. Agents inject credentials via params. The Expert receives the already-decrypted value as a parameter—and knows nothing about KV Store at all.

# WRONG: Expert accesses KV directly
def send_telegram(text: str) -> dict:
    import requests
    # Value is encrypted ($enc:...) — Expert can't decrypt it!
    r = requests.post("https://api.extella.ai/api/kv/get", ...)
    token = r.json()["value"]  # gets garbage

# RIGHT: Agent decrypts and injects
def send_telegram(text: str, bot_token: str = "") -> dict:
    import requests
    # bot_token already decrypted by agent and passed via params
    url = f"https://api.telegram.org/bot{bot_token}/sendMessage"
    # ... rest of the logic

This ensures: security (credentials not in code), reusability (one Expert, different tokens), testability (any input data without KV dependency).

3.3Concepts — Semantic Knowledge Memory

A Concept is a text fragment (knowledge, pattern, solution) stored with semantic search. Concepts use vector search: the meaning of your query is matched against saved knowledge, not just keywords.

When an agent saves a concept like "For PDF generation in Docker, use ReportLab — not wkhtmltopdf, which requires X11," the system immediately sends the text to OpenAI, receives a 1536-dimensional vector, and stores it alongside the text. When searching for "create PDF in container," the query is also converted to a vector, and the system finds the nearest match. Even though the query contains neither "ReportLab" nor "X11," the semantic distance is minimal.

Concept examples

  • "Extella execution environment: Experts run on the local device via the Listener. Python venv is an optional parameter (isolated=true). Not Docker."
  • "For reading .docx files: python-docx. Installation: extella-pip install python-docx"
  • "PDF on Linux: if wkhtmltopdf is unavailable — use ReportLab directly. Installation: extella-pip install reportlab"
  • "Telegram bot getUpdates: offset = last update_id + 1. Otherwise the same messages will be received again."

What to store vs what not to store

Store in ConceptsDo NOT store in Concepts
Patterns and problem solutionsAPI-keys and tokens (→KV Store)
Library installation instructionsJSON-session data and caches
Architectural decisions of the projectSpecific file paths
Business requirements and specificationsPersonal data
Insights from work experienceConfigurations containing passwords or secrets
Technical limitations and workaroundsTemporary data that becomes outdated

Why you should never store credentials in Concepts: concepts are found by meaning. If you save an API key, semantic search will find it when queried for "need a key" — and it will appear in the context as plain text, without encryption. This is a security violation.

Correct pattern: generalized insight from experience

Agent encounters an error → resolves it → extracts generalized knowledge → saves it:

# After the agent solved the PDF problem:
concept_add(
    "For PDF generation in headless environments (Docker, server without X11)"
    " use ReportLab. wkhtmltopdf requires a graphical display"
    " and doesn't work in containers without Xvfb."
)

# In the future, it will be automatically found when querying:
concept_search("PDF in Docker")
# -> Finds with high similarity, even though 'wkhtmltopdf' isn't in the query

A Concept is a generalized insight from experience, not raw data.

Concept Operations

OperationMCP toolDescription
Createconcept_addText → embedding → save
Findconcept_searchSemantic search by meaning (not by keywords)
Updateconcept_updateEdit text + regenerate embedding
Deleteconcept_removeDelete by ID
Listconcept_listAll concepts of an agent or profile

The global=true parameter enables searching Concepts across all profile Experts—knowledge from one Expert becomes accessible to others. Without global=true, an Expert can only access its own Concepts.

3.4Rules — Dynamic System Prompt

Loading Mechanism

A Rule is a behavioral instruction loaded with EVERY user message via rules_list. Here's how it works:

1. User sends a message
2. System calls rules_list() -> retrieves all active Rules
3. Rules are embedded into the system prompt BEFORE processing the request
4. Agent generates a response considering all Rules

This cycle occurs automatically on every conversation turn. A Rule is not a memory query—it's part of the agent's "personality."

The Expert doesn't "recall" Rules—it operates with them from the first word of the conversation. This is the fundamental difference from Concepts, which must be explicitly searched.

Example Rules to Get Started

  • "Always respond in English"
  • "Always ask for confirmation before deleting files or data"
  • "If a task requires more than one step—describe the plan first, then execute"
  • "When a task is complete—briefly explain what was done"
  • "Save all created files to ~/Documents/Extella/"
  • "Never store credentials in Concepts—use KV Store"

Limits and Restrictions

Maximum length per Rule: 4,000 characters. This is sufficient for detailed instructions. For extensive technical knowledge, use Concepts.

Rules are independent: you can have 50 Rules, and all will apply simultaneously. Application order is not guaranteed—write Rules so they don't conflict with each other.

global=true for Rules

Rules with global=true are visible to all agents in the profile. This lets you define common behavior guidelines for all agents in one profile without configuring each one individually.

Difference between Rules and Concepts

ParameterRulesConcepts
LoadingAutomatically with each messageOnly on explicit search concept_search
ImpactAlways in system promptOnly when found and added to context
SearchNone — all are loadedSemantic search by meaning
Data typeInstructions, constraints, styleKnowledge, facts, solution patterns
Size limit4000 characters per ruleUnlimited (TEXT in PostgreSQL)
Example"Always respond in Russian""For PDF in Docker use ReportLab"

Mnemonic: if the behavior should ALWAYS apply—it's a Rule. If the knowledge might be needed SOMETIMES—it's a Concept.

Operations with Rules

OperationMCP toolDescription
Createrules_addNew rule (rule_id is generated automatically)
Updaterules_updateEdit the text of an existing rule
Deleterules_removeDelete a rule by rule_id
Listrules_listGet all rules (called automatically with each message)

3.5Comparison table of three storage types

CharacteristicKV StoreConceptsRules
Data typeKey-value + descriptionSemantic knowledge (text)Behavioral instruction
Encryption✅ User PIN❌ No❌ No
SearchExact by key + semanticSemantic onlyNo — all are loaded
Auto-loading✅ On every message
Embeddingspgvector (from key + description)pgvector (from concept text)N/A
Embedding modeltext-embedding-3-smalltext-embedding-3-smallN/A
Value limitTEXT (up to 1 GB)TEXT4000 characters
What it storesAPI-keys,UUID, URL, JSON, session dataKnowledge, patterns, solutions, insightsConstraints, protocols, response style
global flag
IsolationBy agent_id / profile_idBy agent_id / profile_idBy agent_id / profile_id

3.6Data isolation: three levels

All core tables (KV, Concepts, Rules, Targets, Experts) include agent_id and profile_id columns. The three-level isolation model:

LevelAnalogyDescription
user_idBuilding ownerGlobal user identifier
profile_idFloor (department)Group of agents for a single project/client
agent_idRoom on the floorSpecific agent within a profile

How the global flag works

global=false (default) — "I see only my office": the agent sees only its own data (filtered by agent_id). The "Researcher" agent cannot see concepts belonging to the "Writer" agent.
global=true — "I see the entire floor": the agent sees data from all agents in the profile (filtered by profile_id). Researcher + Writer + Analyst — all three agents share one profile.

INSERT is always yours

When creating a new record (concept_add, kv_set, rules_add), the system ALWAYS uses the current agent_id and profile_id. You cannot create a record "for another agent." This ensures data belongs to whoever created it.

When reading/updating/deleting (concept_search, kv_get, rules_list), results are filtered by the global flag. Without global=true — only your data. With global=true — data from all agents in the profile.

4.Experts & Automations

An Expert is an atomic automation that persists forever. Create it once — it runs indefinitely. This section covers everything from Expert types to creating scheduled automated tasks.

4.1Four Types of Experts

1. SIMPLE — Single-Task Building Blocks

ExpertWhat it doesAPI key?
convert_pdf_to_textExtracts text from PDFNo
send_telegram_messageSends messageYes (bot_token)
excel_querySQL query to .xlsxNo
word_generateGenerates .docx from JSONNo

2. COMPLEX — Multi-Stage Pipelines

  • decompile_binary_to_pseudocode: file → disassembly → graph → pseudocode
  • generate_3d_model_from_photo: photo → depth map → 3D mesh → .obj

3. NESTED — Orchestrators (cspl=nohup)

Call other Experts via REST API. Example:

fetch_emails -> extract_data -> check_crm -> create_task -> send_notification

→ For more on nohup Experts (script structure, {{placeholders}} syntax, no return statement, manual include): Section 8, subsection 8.4.

→ Pattern for parallel worker execution + synchronization via wait_tasks: Section 7.

4. INTEGRATION — Technology Wrappers

SubtypeExamples
CLI wrapperffmpeg, ImageMagick, pandoc, git
Library wrapperPillow, pandas, BeautifulSoup
External APITelegram, OpenAI, Notion, Jira
DatabaseSQLite, PostgreSQL

4.2Expert Structure: 5 Required Elements

Expert template:

$extens("include.py")
include("import requests", ["extella-pip install requests"])

def expert_name(param1: str = "", param2: int = 0) -> dict:
    import requests
    if not param1:
        return {"status": "error", "message": "param1 required"}
    try:
        # ... logic ...
        return {"status": "success", "result": "..."}
    except Exception as e:
        return {"status": "error", "message": str(e)}

5 required elements for every Expert:

  • 1. Directive: $extens("include.py") — first line, mandatory
  • 2. Dependencies: include(..., ["extella-pip install ..."])
  • 3. Signature: def name(param: str = "") -> dict — explicit types, defaults, no *args/**kwargs
  • 4. Validation: check inputs, return early on error
  • 5. Return: always a dict with a status field

4.3Description — a search index, not a comment

When you save an Expert, the backend generates an embedding from the name + description fields. This embedding powers search_blocks — the semantic search across your library. A poor description means the Expert won't be found when you need it.

❌ Bad — not searchable✅ Good — semantically searchable
description=""description="Sends message to Telegram. Parameters: chat_id — chat ID; message — text; bot_token_key — token key in KV Store"
description="utility"description="Converts PDF to text via pdfplumber. Parameters: file_path — path to PDF; max_pages — page limit (0=all)"

Rule: description = one sentence describing what the Expert does + a list of all parameters with their purpose.

4.4Names — snake_case. Saving with an existing name overwrites the Expert

The Expert name is a unique key in the library. Requirements:

• snake_case only: send_telegram_message, convert_pdf_to_text, get_server_metrics

• No spaces, hyphens, or Cyrillic characters

• Saving with a name that's already taken → the previous version is overwritten without warning

• For versioning, use suffixes: analyze_document_v2, or explicitly delete the old version

4.5The isolated=True parameter — run in a clean environment

You can call run_expert with isolated=True — the Expert will run in a fresh venv without interference from other Experts' dependencies:

run_expert('my_expert', {'param': 'value'}, isolated=True)

When to use: dependency conflicts between Experts, non-standard library versions, reproducibility during debugging.

4.6extella-pip install — mandatory rule

include("from pdfplumber import open as pdf_open", ["extella-pip install pdfplumber"])

Always use extella-pip install, not pip install or pip3 install. This ensures packages are installed in the correct Expert virtual environment.

Multiple dependencies:

include("import pandas", ["extella-pip install pandas", "extella-pip install openpyxl"])

4.7Generalization Principle: Avoid Hardcoding

Bad — hardcoded:

def process_invoice():
    file_path = "/Users/ivan/Downloads/invoice.pdf"  # Only works on one machine!

Good — parameterized:

def process_invoice(file_path: str = "", output_dir: str = "") -> dict:
    if not file_path:
        return {"status": "error", "message": "file_path required"}

Four absolute prohibitions:

  • ❌ Hardcoding paths, keys, IDs — pass everything through parameters
  • ❌ *args/**kwargs in signatures — use only explicit named parameters
  • ❌ Returning binary data — return only file paths
  • ❌ Fourth prohibition: Experts must never access KV Store directly

An Expert must not call /api/kv/get or any other Extella API from within its code to retrieve credentials. This violates isolation and creates a hidden dependency on the cloud.

❌ Forbidden — expert pulls KV itself✅ Correct — the agent injects via params
def send_msg(chat_id): r = requests.get('/api/kv/get') token = r.json()['value'] # ... uses tokendef send_msg(chat_id, bot_token=''): if not bot_token: return {'status': 'error'} # uses bot_token directly

Correct pattern: the agent retrieves credentials from KV via MCP (kv_get/kv_search) and passes values to the Expert as parameters at runtime:

# Agent (outside Expert code):
token = kv_get('telegram_bot_token')['value']   # agent decrypts via PIN
run_expert('send_telegram', {'chat_id': id, 'bot_token': token})  # injects

Expert = pure logic with no external calls. Data and credentials = parameters from the agent.

4.8CLI Wrappers: 5 Lines Instead of 50

Example: ImageMagick via subprocess:

$extens("include.py")
include("import subprocess", [])

def resize_image(input_path: str="", output_path: str="", width: int=800, height: int=600) -> dict:
    import subprocess
    if not input_path:
        return {"status": "error", "message": "input_path required"}
    size = str(width) + "x" + str(height)
    cmd = ["convert", input_path, "-resize", size, output_path]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        return {"status": "error", "stderr": result.stderr}
    return {"status": "success", "output": output_path}

ffmpeg, pandoc, git, docker, rsync — any of these can become an Expert in 5-10 lines.

4.9Cron Jobs: Scheduled Automation

A Cron job is a nohup Expert that runs on a schedule. Create one with a single phrase:

"Create a job: every morning at 9:00 AM, generate a server metrics summary"

The agent creates a background process — no crontab files, no YAML. Important:

  • Cron runs through Listener — the device must be powered on
  • Logs: /tmp/nohup_<name>.log
  • To stop: "Stop Cron job <name>"
  • After reboot — requires manual restart

How to Technically Stop a Cron Job

A Cron job is an OS nohup process. It runs independently of chat and Listener. Three ways to stop it:

MethodCommand / actionWhen
Via agent (recommended)"Stop Cron task <name>" — agent finds PID from .pidfile and sends SIGTERMPrimary method
Via Listener UIListener tab → find process → Cancel buttonIf agent is unavailable
Manually via terminalkill $(cat /tmp/nohup_<name>.pid) or: kill <PID>Emergency stop

Diagnostics — find the PID and logs of a running job:

cat /tmp/nohup_<name>.pid            # process PID tail -f /tmp/nohup_<name>.log        # real-time logs ps aux | grep <name>                  # check if process is alive

When the device reboots, the nohup process terminates. The PID file remains but references a non-existent process. For auto-start on reboot — add Expert launch to launchd (macOS) or systemd (Linux).

PatternCommandCronWhat it does
Monitoring"Every 5 minutes check availability of example.com"*/5 * * * *Ping, KV-record, notification on failure
Daily summary"At 9:00 AM — server metrics summary"0 9 * * *Metrics, report, concept of the day
Weekly analysis"On Sunday at 6:00 PM — weekly error analysis"0 18 * * 0Log aggregation, pattern concepts
Monthly audit"On the 1st at 10:00 AM — token audit"0 10 1 * *Token scanning, report

Self-improving loop: Cron agent writes Concepts. Next week, it uses them as context. After a year — an Expert on your system's problem history.

5.Agents & Teams

Now that you understand Experts, Concepts, KV Store, and Rules, it's time to explore how they come together in agents and teams. This section covers the power of multi-agent orchestration, the full agent customization options available on the Pro plan, and how to build your own team of AI specialists.

5.1What is an Agent

An agent is an AI specialist with a specific role, its own memory, and a set of tools. Unlike a standard chatbot, an agent is a configured entity: it knows its specialization, remembers interaction history, and can perform real actions. An agent consists of:

  • Model — Claude, Gemini, Qwen, Llama, GPT — you choose
  • System instructions — specialization, working style, and boundaries
  • Tool set — MCP functions: concept_add, run_expert, web_search, kv_get, etc.
  • Profile — an isolated workspace the agent belongs to
  • Memory — Concepts (knowledge), KV Store (data and keys), Rules (behavioral guidelines)

A single agent can handle hundreds of tasks. Multiple agents in a team form an entire specialized department. Each agent is a precision instrument for a specific domain, not a universal Swiss army knife.

5.2Creating and Configuring an Agent

On Plus and Flex plans, the default agent (or multiple agents) comes preconfigured with the system prompt and all parameters already set—no user action required. The only difference between Plus and Flex is that Flex users can use their own LLM provider API key instead of paying credits to use Extella's key.

On the Pro plan, you get full control over every agent parameter. This is a fundamental shift: instead of accepting an out-of-the-box agent, you become its architect. You can configure your own agent in the right slide-out panel of the chatbot interface.

ParameterConfigurationExample usage
ModelGPT-4o, Claude Opus, Gemini Pro, Mistral, local via Ollama, etc.Claude Opus for deep document analysis
System promptPrecise instruction: role, style, boundaries, specialization"You are a financial analyst. Respond only in JSON."
TemperatureBalance accuracy <-> creativity (0.0 = predictable, 1.0 = creative)0.2 for analytics; 0.9 for copywriting
Top-P / Top-KManaging token probability distributionTop-P 0.9 for diverse text generation
Max tokensMaximum response length4096 for documents; 512 for brief responses
Tools & MCPWhich tools are available to the agentFinancial tools only for finance specialists
Memory settingsWhich memory type it uses: concepts, rules, KVLong-term memory + thematic concepts
Rules settingWhich rules apply in which situations"When uncertain — clarify before acting"
Response formatOutput structure: text, JSON, markdown, schemaStrict JSON for integration with CRM system

Examples of Specialized Agents

  • Analyst agent: Claude Opus, temperature 0.2, JSON-only output, financial tools only
  • Copywriter agent: GPT-4o, temperature 0.9, no tools, detailed brand voice in the prompt
  • Code reviewer agent: Mistral, minimal context, security checklist in system prompt
  • Research agent: Gemini, web_search + concept_add, maximum context, high recursion_limit

Reusability: configure once, use forever

Configured agents are saved to your library and accessible from a dropdown menu at any time. No need to paste prompts or select parameters each time—the agent already knows who it is and what it can do. This changes your workflow: instead of "find an agent for this task," you think "pick the right one from my library."

Agent limits on the Pro plan

The Pro plan does not limit the number of agents a user can create.

5.3Three Agent Interaction Patterns

1. Recursion — agent calls itself iteratively

The agent processes data in chunks, calling itself with refined parameters. Each iteration starts with a clean context. Protection against infinite loops is provided by the recursion_limit parameter when creating the agent (recommended: 5–15).

Example: A CTO checks 47 API endpoints for test coverage, processing 15 per iteration. Iteration 1: 1–15 (12 covered, 3 missing). Iteration 2: 16–30 (14 covered, 1 missing). Iteration 3: 31–47 (13 covered, 4 missing). Final report: 39/47 covered (83%), 8 require tests.

2. Escalation — sub-agent signals the orchestrator

When a specialized agent encounters a situation outside its competence, it escalates back to the orchestrator. The orchestrator then decides whether to reassign the task, bring in another specialist, or adjust the delegation parameters.

Example: A CCO analyzing competitors discovers news about a competitor's $50M Series B round. This is a strategic factor—the CCO escalates to the orchestrator: "New competitive factor detected. Recommend revisiting positioning." The orchestrator brings in the Corporate Director.

3. Cross-calls — peer-level agents

Agents communicate directly with each other without an intermediary—when a task requires data from an adjacent domain. Technically, this works through the MCP tool agent_run with the target agent's known agent_id. All calls are logged.

Example: A CTO directly asks the Corporate Director: "Estimate infrastructure costs: 3 microservices, GPU A10G, 10K requests/day." Receives a $2,400/month estimate and includes it in the architecture document without extra hops through the orchestrator.

5.4Teams — Multi-Agent Systems with Delegation

In traditional platforms, a single agent "tries to be everything to everyone." Its context window quickly fills with irrelevant data. The longer the session, the less accurate the responses become.

Extella solves this differently—by creating a Team that collaborates on a single project: an orchestrator agent receives a task, breaks it into subtasks, and delegates each to a specialized agent within a clean, fully relevant context. Each specialist focuses solely on their domain.

Example: the task "analyze the product, prepare competitive analysis, financial model, and pitch for Seed round":

StepAgentClean context
Competitive analysisCCOMarket data, G2/Gartner, competitor pricing models
Technical feasibilityCTOPlatform architecture, ML models, integrations, cost
Financial modelCorporate DirectorCTO assessment + inference costs + CAC/LTV benchmarks
Pitch deckCCOCompetitive analysis (step 1) + financial model (step 3)
SynthesisOrchestratorAll results: competitive analysis + technical + financials + pitch

Each agent has its own Experts, KV Store, Concepts, and Rules. Each focuses on their expertise. The result is not a generic blurred response, but a structured document where each section is prepared by a specialist.

Key advantage: a specialist agent with a clean 32K token context makes more accurate decisions than a generalist agent with an overloaded 200K token context.

A Team in the Extella platform is a multi-agent system that operates as a single unit. You send a task to the Team, and it determines which agent to route it to—or how to split it among several.

What each Team includes

  • Goal and context — what it was created for, what it does well
  • Members with roles — each agent knows its role (Research, Writing, Review, Execution, etc.)
  • Dedicated Concepts — a knowledge base specific to this Team only
  • Dedicated Rules — behavioral rules applied within the system
  • Orchestration prompt — delegation logic: the criteria for routing tasks to agents

How a Team makes decisions

A Team operates in auto mode: upon receiving a task, the orchestrator analyzes it, matches it against member roles and delegation rules, then routes it to the appropriate agent (or several in parallel). No human involvement in this distribution.

In the future, the delegation mechanism will be enhanced with a trained RL model that will make faster and more accurate decisions based on the Team's accumulated experience.

One agent — in multiple Teams simultaneously

A Team doesn't duplicate agents — it references them. An agent can formally be a member of multiple Teams. However, each agent carries its own fixed configuration: system prompt, Rules, and Concepts do not change based on which Team the agent belongs to. This means the same agent cannot be used for fundamentally different purposes simply by assigning it to different Teams — choose agents whose configuration matches their intended role within each Team.

Team examples

TeamParticipants (Roles)Purpose
Content StudioResearch -> Writer -> Editor -> SEO-reviewerCreating content materials from research to publication
Due DiligenceFinancial Analyst + Legal Reviewer + Market ResearcherCollecting and synthesizing company data for investment decisions
Product SprintPM + Tech Lead + UX ReviewerTask Breakdown and Technical Specification Development
Personal Knowledge BaseCollector + Summarizer + TaggerStructuring incoming information into a personal knowledge base

Agent limits on the Pro plan

Team creation is limited to a maximum of 3. The number of agents within each Team is limited to 5.

5.5Interface: My Agents and My Teams

On the Pro plan, the agent dropdown in the upper left panel of the interface has the following structure:

▼ Default Agents

• Extella Claude Sonnet (preconfigured)

• Extella GPT-5 (preconfigured)

▼ My Agents

• Financial Analyst (your agent)

• Copywriter (your agent)

• Code Reviewer (your agent)

▼ My Teams • Content Studio (your team)

• Due Diligence (your team)

• Product Sprint (your team)

Each group is expandable. Any agent or Team is clickable—selecting one assigns it to the current chat. Note: you can select either an entire Team or an individual agent.

Creating a new agent

Open the right settings panel—the Agent Builder section contains agent configuration fields: name, model, provider, system prompt, temperature, and tools. You can also create and configure an agent through chat—describe the agent you want to Extella, and she will handle everything automatically.

Creating a new Team

Open the right settings panel—the Team Builder section. Here you can create a team, assign it a name, and add available agents to it, designating a master (or orchestrating) agent responsible for Team management. Team configuration and populating it with Rules and Concepts happens directly through chat with Extella.

5.6Creating and configuring a Team through interaction with Extella

As an alternative to interface configuration, you can create and configure a Team through chatbot interaction. The user describes in words what they want to achieve. Extella uses reasoning to form a Team object and saves it.

What the user describes

  • Team purpose — what tasks it's created for, what it should do well
  • Composition — which agents to include (from existing configured agents or new ones)
  • Roles — who is responsible for what within the system
  • Rules — how the Team should behave, what to consider when delegating
  • Knowledge — specific context that system agents need

The user can describe everything at once or answer Extella's clarifying questions.

What Extella does when creating a Team

StepWhat Happens Automatically
Defines compositionSelects suitable agents from the library. If none available — suggests creating or using Extella Agents
Assigns rolesCreates role descriptions for each agent within the Team
Creates orchestration promptInstructions for the orchestrator: how to distribute tasks, what criteria to consider
Creates Concepts TeamStores specific context and knowledge provided by the user
Creates Team RulesRecords behavioral rules for the system
Stores TeamThe object appears in the My Teams list in the dropdown menu

Example Team creation dialog

User:

"Create a content marketing team. I need: a researcher (gathers topic data), a writer (creates content), an editor (reviews style), and an SEO specialist (optimizes). They work sequentially. Rule: the final text must contain at least 3 keywords from the brief."

Extella:

"Creating Team Content Studio with four agents. Setting up the chain: Researcher -> Writer -> Editor -> SEO. Adding the keyword rule to Team Rules. Done — Content Studio now appears in My Teams."

Editing a Team via chat

Message Extella in the chat at any time:

  • "Add a social media agent to Content Studio" — Extella adds an agent with the Social Media role
  • "Change Researcher's role — now they handle statistics research" — Extella updates the role
  • "Add a rule: verify facts through two sources" — Extella adds the rule to Team Rules
  • "Remove Editor from Due Diligence" — Extella removes the agent from the Team (the agent itself is preserved)

5.7Data isolation: two levels

On the Pro plan, agent data (Concepts, Rules, KV Store, Experts) is isolated by default and never mixes with other agents' data. Visibility is controlled by the global parameter, available in most MCP tools and REST endpoints.

LevelWhat It CoversParameterDefault
AgentCurrent agent onlyglobal=false✅ yes
ProfileAll agents within one profileglobal=true

How this works in practice:

global=false (default) — the agent sees only its own Concepts, Rules, KV pairs, and Experts. Other agents' data remains inaccessible, even within the same profile.
global=true — the agent sees data from all agents in the profile. Use this when you need to share a common knowledge base across a team.

Key principle: isolation is not hierarchy. Data doesn't automatically "leak" up or down. The developer or agent explicitly controls scope through the global parameter with each call.

Example:

# Add concept only for current agent (default)
concept_add(text="...", global=False)

# Find concept in any agent of the profile
concept_search(query="...", global=True)

Note: Profile (Team) is a container for agents, not a separate storage level. There is no dedicated "Team storage" for Concepts or Rules.

5.8Production Agents in Extella

Below are actual agents running in the Extella system. Each specializes in its domain with its own configuration, tools, and model.

Agent NameModelSpecialization
Extella (CEO)Claude Sonnet 4.6Main orchestrator: delegation, strategy, results synthesis
CCOGemini 2.5 FlashB2B sales, GTM strategy, competitive analysis, pitches, pricing
CTOGemini 2.5 FlashPlatform architecture, CSPL Experts, API, security, infrastructure
Corporate DirectorQwen 3.6 PlusFinance, legal, compliance, capital structure, investors
Extella ArchitectQwen 3.6 Plus (NVIDIA)Complex reasoning, deep architectural analysis
Llama 4 MaverickLlama 4 Maverick1M context, multimodal, parallel processing of large volumes
Step 3.5 FlashStep 3.5 Flash196B MoE, 262K context — batch processing, writing sections
Llama 3.3 70BLlama 3.3 70B (Groq)Ultra-fast inference (300+ tokens/sec) — urgent tasks
Architect R1DeepSeek R1Chain-of-thought reasoning, complex logical tasks, planning
Auto RouterAuto Router (OpenRouter)Automatic selection of optimal model + fallback chain

Workers (Llama, Step, DeepSeek, Qwen) are used by the orchestrator for parallel processing: writing 10 sections simultaneously, analyzing 5 competitors, generating 20 pitch variations. These are the system's "computational muscle."

5.9Agent Growth Over Time

An agent in Extella is not a static tool. It grows like a real employee: accumulating knowledge in Concepts, guidelines in Rules, and patterns in KV Store. With each week of work, it becomes smarter and more precise.

StageAgent stateWhat happensExample
Day 1Empty profileCompetent as a good LLM, but doesn't know your businessCTO writes Python, but doesn't know your architecture
Week 215-30 concepts, 5-10 rulesKnows code style, typical mistakes, team preferences"Inproject FastAPI+PostgreSQL+Redis. Migrations use Alembic."
Month 3100-300 concepts, Cron patternsSuggests solutions from project history, predicts problems"Add caching" -> immediately suggests Redis pattern you already use
Month 6500+ concepts, autonomous operationProject expert. Makes decisions without additional contextAutonomously alerts about degradation based on historical data

Path to autonomy:

  • Day 1: "What do you use for the database?"
  • Week 2: "I recommend PostgreSQL, as you already use"
  • Month 3: "I'll add Redis cache, similar to the auth module"
  • Month 6: The agent proposes changes, implements them, and reports back—without additional context

Exporting Agent Conversations

After 6 months of work, you can export all conversations:

POST https://api.extella.ai/api/agent/export/chats Authorization: Bearer <your-token>  // By agent: {"by": "agent", "id": "agent_xyz..."}  // By profile (all agents): {"by": "profile", "id": "team_abc..."}

The resulting JSON containing thousands of QA pairs and discussions serves as a valuable dataset for quality analysis or fine-tuning. The actual training process is performed outside the Extella platform—on your own infrastructure or through external services.

6.Local Models & Tunnels

6.1Why Do You Need a Tunnel?

In Section 5, you learned that you can connect any language model to an agent—including a local one running on your laptop or home server. This opens up significant possibilities: complete data privacy, zero inference cost, any uncensored models, and offline operation.

However, there's a technical detail: local LLM servers (Ollama, LM Studio, llama.cpp, etc.) listen only on localhost by default—an address accessible only from the same device. Extella operates as a cloud service and physically cannot reach your localhost directly.

The solution is a tunnel. This is a program that creates an encrypted "bridge" between your device and a public HTTPS address on the internet. Extella connects to the public address, and the tunnel transparently forwards requests to your localhost. To Extella, it looks like a standard cloud-based API server.

ScenarioWithout tunnelWith tunnel
Extella + local model❌ Unreachable✅ Works via public URL
Access from phone❌ Localhost only✅ Any device
Demo to a colleague❌ Requires VPN or presence✅ Just share the URL
CI/CD integration❌ No public address✅ webhook + tunnel

💡 A tunnel doesn't slow down your model—it adds only 10–50ms of network latency. For text generation, this is imperceptible.

6.2Ports and Base API URL

Each LLM server listens on its own port. This is important when creating a tunnel—you need to tunnel the specific port where your model is running.

ServerPortBase API URL (local)
LM Studio1234http://localhost:1234/v1/
llama.cpp server8080http://localhost:8080/v1/
Ollama11434http://localhost:11434/v1/
Jan1337http://localhost:1337/v1/
KoboldCPP5001http://localhost:5001/v1/
LocalAI8080http://localhost:8080/v1/

⚠️ Golden rule: in the baseURL field in Extella, always specify the URL up to and including /v1/—and nothing beyond. Extella automatically appends the required path (/chat/completions, /embeddings, etc.).

✅ Correct: https://abc123.ngrok-free.app/v1/
❌ Incorrect: https://abc123.ngrok-free.app/v1/chat/completions

Including an extra endpoint is the most common reason a local model "doesn't respond" in Extella. Check this first.

6.3Three Tunneling Methods: Which to Choose

Several tools are available. Here are recommendations for choosing:

MethodFreePersistent URLComplexityBest for
ngrokLimited*Paid / static🟢 EasyQuick start, testing
Cloudflare Tunnel✅ Fully✅ Custom domain🟡 MediumPermanent operation
LocalTunnel✅ FullyPartial🟢 EasyQuick testing without registration
Tailscale Funnel✅ Free✅ Stable🟡 MediumIf already using Tailscale

* ngrok provides one free tunnel with a random URL. A static domain is free with registration (one domain).

💡 For most Extella users, I recommend starting with ngrok (5 minutes to results), then switching to Cloudflare Tunnel for regular use—it's free, reliable, and supports your own domain.

6.4Method 1: ngrok — Fastest Way to Get Started

ngrok is the simplest way to get a working public URL in 2–3 minutes. It's an excellent choice for your first exposure to the topic or for occasional tasks.

Installation

OSCommand
macOS (Homebrew)brew install ngrok/ngrok/ngrok
Linux (Debian/Ubuntu)sudo apt install ngrok (after adding the repository — see below)
Windowswinget install ngrok

Linux:

# Linux — full installation: curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \ | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \ | sudo tee /etc/apt/sources.list.d/ngrok.list sudo apt update && sudo apt install ngrok

Registration (one-time)

ngrok requires free registration at ngrok.com. After registering:

Add your token (copy from Dashboard → Your Authtoken):

ngrok config add-authtoken YOUR_TOKEN_HERE

💡 The tunnel works without a token but has session time limits. With a token, there are no restrictions.

Starting the Tunnel

One command and your tunnel is ready:

ngrok http 1234     # LM Studio
ngrok http 8080     # llama.cpp
ngrok http 11434    # Ollama

After starting, the terminal displays a line like:

Forwarding https://abc123.ngrok-free.app -> http://localhost:1234

Your base API URL for Extella: https://abc123.ngrok-free.app/v1/

Static URL (free)

Random URLs change with every restart, which is inconvenient if the URL is saved in Extella. The solution is a static domain:

ngrok http --domain=your-static-name.ngrok-free.app 1234

One static domain is free. You can register it in Dashboard → Domains.

Password protection (recommended)

A public URL without protection means anyone on the internet can send requests to your model. Add basic authentication:

ngrok http --basic-auth="user:strongpassword" 1234

⚠️ Don't leave a tunnel open without authentication for long periods. Bots actively scan known ngrok domains.

6.5Method 2: Cloudflare Tunnel — for permanent operation

Cloudflare Tunnel (cloudflared) is an enterprise-grade solution from Cloudflare. Completely free with no traffic limits, and supports custom domain binding. Ideal if you plan to use a local model with Extella regularly.

Key advantage over ngrok: the tunnel doesn't depend on open router ports and works even behind double NAT (corporate networks, mobile internet, etc.).

Installing cloudflared

macOS / Windows:

brew install cloudflare/cloudflare/cloudflared   # macOS
winget install Cloudflare.cloudflared            # Windows

Linux (Debian/Ubuntu):

curl -L --output cloudflared.deb \
  https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb

Method A: Quick tunnel without registration

For a quick test, run without an account:

cloudflared tunnel --url http://localhost:1234    # LM Studio
cloudflared tunnel --url http://localhost:11434   # Ollama

The URL will look like: https://some-random-name.trycloudflare.com/v1/

⚠️ The URL changes with each restart. Method B provides a permanent address.

Method B: Permanent tunnel with your own domain

For permanent operation, you need to register at cloudflare.com (free) and have your own domain (or subdomain).

Step 1 — Authentication:

cloudflared tunnel login

Step 2 — Create the tunnel:

cloudflared tunnel create my-llm-tunnel

Step 3 — Create the ~/.cloudflared/config.yml file:

tunnel: <TUNNEL_ID>
credentials-file: /Users/YOUR_USER/.cloudflared/<TUNNEL_ID>.json
ingress:
  - hostname: llm.yourdomain.com
    service: http://localhost:1234
  - service: http_status:404

Step 4 — Configure DNS:

cloudflared tunnel route dns my-llm-tunnel llm.yourdomain.com

Step 5 — Start the tunnel:

cloudflared tunnel run my-llm-tunnel

Your permanent URL: https://llm.yourdomain.com/v1/

Auto-start on system boot

sudo cloudflared service install
sudo systemctl start cloudflared   # Linux
# macOS: launchd service is created automatically

💡 Once auto-start is configured, the tunnel launches on every system boot. You can set the URL in Extella once and forget about it.

6.6Method 3: Alternatives

LocalTunnel — As simple as it gets (Node.js)

If ngrok feels like overkill and you already have Node.js installed:

npm install -g localtunnel
lt --port 1234 --subdomain my-llm

URL: https://my-llm.loca.lt/v1/ — the subdomain persists if the name is available.

⚠️ LocalTunnel is less stable than ngrok or Cloudflare. Best suited for one-off tests.

Tailscale Funnel

If you're already using Tailscale for VPN networking, Funnel exposes your server to the internet with a single command:

tailscale funnel 1234

The URL is generated from your device name in Tailscale. Very convenient if you already have Tailscale set up.

6.7Connecting to Extella: Step-by-Step

Once your tunnel is running and you have a public URL, add the model to Extella as a custom provider. This is done in agent settings (Section 5).

Field in ExtellaWhat to enterExample
providercustomcustom
baseURLTunnel URL + /v1/https://abc123.ngrok-free.app/v1/
apiKeyAny string (model doesn't validate)lm-studio or ollama
modelModel name on serverllama-3.2-3b-instruct

💡 The model name in the model field must match exactly how the model is named on the server. In LM Studio, this is the model filename. In Ollama, check the output of ollama list.

After saving, the agent will use your local model for all requests. Concepts, KV Store, Rules, and all Experts work exactly the same—they're stored in Extella's cloud, only inference (text generation) goes through your model.

6.8Local Server Configuration

Before creating a tunnel, make sure your server accepts external connections. By default, most servers listen only on localhost and will reject requests coming through the tunnel.

LM Studio

  • Open the Local Server tab
  • Enable ✓ Enable CORS
  • Enable ✓ Allow connections from network
  • Click Start Server

💡 Without Enable CORS, requests from Extella will be rejected by the browser with a CORS policy error. This is a common pitfall.

llama.cpp — Server Launch Parameters

Launch with external connections enabled:

./llama-server \
  -m ./models/your-model.gguf \
  --port 8080 \
  --host 0.0.0.0 \
  -c 4096 \
  -ngl 35

Key parameter: --host 0.0.0.0 (accept connections from all sources, not just localhost). -ngl 35 specifies the number of layers on GPU (adjust based on your graphics card).

Ollama — enabling external connections

By default, Ollama only accepts localhost connections. You need to modify the environment variable:

macOS / Linux (temporary):

OLLAMA_HOST=0.0.0.0 ollama serve

Linux (permanent via systemd):

sudo systemctl edit ollama
# Add to the [Service] section:
Environment="OLLAMA_HOST=0.0.0.0"

⚠️ After changing OLLAMA_HOST, restart the service: sudo systemctl restart ollama

6.9Security and performance

Security — required reading

A public URL without protection means anyone can use your model—free of charge at your CPU/GPU expense. This isn't a theoretical threat: bots actively scan for open LLM endpoints.

ThreatSolution
Unauthorized model accessngrok --basic-auth or Cloudflare Access
Data interceptionTunnels use HTTPS — data is encrypted
Prompt leakageDon't tunnel the model unnecessarily, use VPN networks (Tailscale)
DDoS on modelRate limiting in Cloudflare or ngrok Pro

💡 The simplest protection option is ngrok --basic-auth. Extella supports basic authentication: specify a user:password string in Base64 format in the apiKey field.

Performance

  • Tunneling adds 10–50ms latency. This is negligible for text generation.
  • For long responses, use streaming—users see text as it generates rather than waiting for completion.
  • Ensure the LLM server is running before starting the tunnel—otherwise the tunnel will be created, but requests will fail.
  • GPU acceleration remains on your machine—the tunnel doesn't affect inference speed.

6.10Quick reference

ToolCommandURL TypeRecommendation
ngrokngrok http 1234Random / staticBest start
cloudflared (quick)cloudflared tunnel --url http://localhost:1234RandomTesting without registration
cloudflared (persistent)via config.yml + custom domainPermanentFor regular use
LocalTunnellt --port 1234 --subdomain my-llmRandom / subdomainQuick one-time test
Tailscale Funneltailscale funnel 1234PermanentIf Tailscale is already installed

Bottom line: run ngrok http <port>, grab the URL from the output, append /v1/ to the end, paste it into the baseURL field in your Extella agent settings—and your local model is ready to go.

7.Parallel Execution

7.1The Physics of Parallelism

The formula is simple:

Sequential: T = T1 + T2 + T3 + ... + TN

Parallel: T = max(T1, T2, T3, ..., TN) + ~1 sec polling

ScenarioSequentialparallel_taskSpeedup
3 tasks x 15 sec45 sec16 sec2.8x
5 tasks x 20 sec100 sec21 sec4.8x
10 tasks x 30 sec300 sec (5 min)31 sec9.7x
20 tasks x 30 sec600 sec (10 min)31 sec19.4x

This isn't optimization — it's a formula change. Claude Code thinks like an LLM — sequentially. Extella parallel_task thinks like hardware — in parallel. Modern CPUs have multiple cores, each executing independent tasks simultaneously.

7.2Five Problems with Synchronous Mode

Problem 1: Timeout — Results Lost

Most synchronous LLM agents have a hard timeout of ~5 minutes. Processing 1000 files? Training? Deep analysis? — results vanish without warning. With parallel_task, each worker is independent of the LLM connection. Even if the connection drops, the OS process keeps running.

Problem 2: Linear Time Accumulation

TasksSynchronousParallelLost time
2 x 30s60s31s29s
5 x 30s150s31s119s
10 x 30s300s31s269s

Problem 3: No Way to Cancel

In synchronous mode, there's no Cancel button. Spotted an error at second 3 of 300 — you still wait. In Task Registry, each task has a ✕ button (SIGTERM by PID) — instant termination.

Problem 4: No Visibility

With parallel_task, each task writes its status to /tmp/pt_{uuid}.json. The agent can read the file at any time to check task state: running, complete, error. For visual monitoring, you can optionally deploy task_registry_server — a custom Flask-based Expert with an HTML dashboard (see section 7.3 for details).

Problem 5: Lost Traceback on Error

When a synchronous process crashes — you get a generic message without context. In Extella, each parallel_task job writes the full traceback to /tmp/pt_{uuid}.json (error field). If task_registry_server is running, the traceback is also available via GET /tasks/<uuid>.

7.3Task Statuses and Diagnostics

The state of each parallel_task is stored in a /tmp/pt_{uuid}.json file on the device. This is the primary and only guaranteed tracking mechanism—it works without any additional components.

Field in fileValueDescription
status"running"Task is running
status"complete"Task completed successfully
status"error"Task failed with error
resultdict from workerResult (only on complete)
errortraceback stringError details (only onerror)

Example: reading task status manually:

import json
from pathlib import Path

data = json.loads(Path(f'/tmp/pt_{uuid}.json').read_text())
print(data['status'])   # 'running' / 'complete' / 'error'
print(data.get('result'))  # worker result if complete

7.3.1. Optional Visual Dashboard — task_registry_server

task_registry_server is not a built-in platform interface but a separate custom Expert (Flask application) that you can run for visual task monitoring in a browser. It is not required: parallel_task and wait_tasks work without it.

⚠️ task_registry_server is a custom component. If it hasn't been created by the agent in your account, ask the agent: "Create task_registry_server for monitoring parallel tasks."

When task_registry_server is running, it provides:

FeatureDescription
HTML UIBrowser page:http://localhost:7755 — task list with auto-refresh
GET /tasksJSON with all tasks
GET /tasks/<uuid>Details of specific task + logs
DELETE /cancel/<uuid>SIGTERM by task PID → status cancelled
POST /clearClear all records
GET /health{ok: true, port: 7755, tasks: N}

Starting and managing (if the Expert exists):

# Start:
run_expert('task_registry_server')
# -> {"status": "success", "url": "http://localhost:7755/", "port": 7755}

# If already running:
# {"status": "already_running", "port": 7755, "tasks": 3}

# Force restart:
run_expert('task_registry_server', {'force_restart': '1'})

Persistence: task_registry_server stores the state of all tasks in /tmp/extella_task_registry.json. When the server restarts, the file is re-read and all records are restored. Without task_registry_server, state is stored only in /tmp/pt_{uuid}.json.

7.4UUID vs PID: A Fundamental Design Decision

Why not just use the process PID?

The OS reuses PIDs. A process terminates with PID 12345—a second later, a new process may receive the same PID. When canceling by PID, you might kill not your task but a completely different process.

UUID v4 is globally unique. Never reused. Independent of OS, containers, or reboots. Format: a1b2c3d4-e5f6-7890-abcd-ef1234567890.

All registry operations use UUID. PID is stored only for SIGTERM during cancellation.

7.5The __api_token__ Parameter (Required with task_registry_server)

Without __api_token__, workers cannot register with the Task Registry or report results.

Three reserved parameters are relevant when using task_registry_server:

__registry_url__  — Registry URL (default: http://localhost:7755)
__description__   — Human-readable task description for UI
__api_token__     — Extella API token for registering the task in the registry

Without task_registry_server, the __api_token__ parameter is not needed: parallel_task operates via /tmp/pt_{uuid}.json without server calls.

7.6The 4-Step Pattern: Complete Example

Step 0: Start Registry (must be first!)

registry = run_expert('task_registry_server')
print(registry)  # {"status": "success", "url": "http://localhost:7755/"}

Step 1: Get API token

API_TOKEN = kv_get('extella_api_token')['value']

Step 2: Launch workers in parallel

# Each call returns a UUID immediately (~0.5 sec)
# Worker runs in background as a separate OS process
r1 = run_expert('analyze_document', {
    'file_path': '/tmp/doc1.pdf',
    '__api_token__': API_TOKEN,
    '__description__': 'Analysis: doc1.pdf'
})
r2 = run_expert('analyze_document', {
    'file_path': '/tmp/doc2.pdf',
    '__api_token__': API_TOKEN,
    '__description__': 'Analysis: doc2.pdf'
})
r3 = run_expert('analyze_document', {
    'file_path': '/tmp/doc3.pdf',
    '__api_token__': API_TOKEN,
    '__description__': 'Analysis: doc3.pdf'
})
uuid1 = r1['uuid']
uuid2 = r2['uuid']
uuid3 = r3['uuid']
# All three launched in ~1.5 sec total

Step 3: Wait for all to complete

import json
results = run_expert('demo_wait_tasks', {
    'uuids': json.dumps([uuid1, uuid2, uuid3]),
    'timeout': 120,
    'poll_interval': 2
})
# Polls http://localhost:7755/tasks every 2 sec
# Returns when ALL tasks complete or timeout
# -> {
#   "status": "complete",
#   "summary": {"total": 3, "complete": 3, "error": 0},
#   "elapsed_seconds": 31.2,
#   "results": {uuid1: {...}, uuid2: {...}, uuid3: {...}}
# }

Processing results

if results['summary']['error'] > 0:
    # Handle failed tasks
    for uuid, result in results['results'].items():
        if result.get('status') == 'error':
            print(f'Task {uuid} failed: {result.get("error")}')
            # Retry or log
else:
    print(f'All {results["summary"]["total"]} tasks completed')
    print(f'Time: {results["elapsed_seconds"]}s')
    for uuid, result in results['results'].items():
        print(f'{uuid[:8]}...: {result["result"]}')

7.7Comparison with synchronous mode

CharacteristicSynchronousparallel_task
Time for N tasksN x Tmax(T) + ~1s
Task IDsNoUUID v4 (globally unique)
CancellationNo✅ Cancel (SIGTERM)
VisibilityNo✅ /tmp/pt_{uuid}.json; optionally — UI :7755 (if task_registry_server is running)
Traceback on errorLost✅ Saved to registry
Timeout~5 min, result lostConfigurable
LLM dependencyFullProcess independent
PersistenceNoJSONin /tmp, survives restart

7.8When to use parallel_task

ConditionChoice
Task A is required for Task B (dependency)Synchronous mode
Single taskSynchronous (overhead not worth it)
Each task < 5 secSynchronous (registration overhead > benefit)
2+ independent tasks > 5 secparallel_task
Task > 1 minuteparallel_task(timeout protection)
Need cancellation supportparallel_task
Progress visibility neededparallel_task

Practical parallel_task examples: scraping 10 websites, analyzing a batch of 50 CSVs, generating reports across different metrics, checking API endpoints for test coverage.

7.9Critical rules

task_registry_server is an optional component. parallel_task works without it (state stored in /tmp/pt_{uuid}.json). If task_registry_server is used, start it BEFORE workers; otherwise POST /register will get ConnectionRefused.

  • UUID, not PID — UUID is globally unique and not reused by the OS
  • /tmp/extella_task_registry.json — single source of truth, survives restarts
  • Worker ALWAYS calls /update — even on crash (try/except -> POST error status with traceback)
  • __api_token__ = kv_get('extella_api_token')['value'] — avoid hardcoding tokens in your code
  • Pass uuids as a JSON string: json.dumps([uuid1, uuid2]) — not a Python list
  • One task at a time for heavy workloads — avoid running more than N workers simultaneously

8.CSPL

CSPL (Container Specific Programming Language) is Extella's paradigm for building automations: instead of having the LLM generate all the code, the LLM writes a compact description and a deterministic handler generates the actual code from it.

In Section 7, you worked with parallel_task and nohup — two CSPL modes. Now let's examine the complete CSPL architecture and why it fundamentally changes how complex automations are built.

8.1The Problem: LLMs Struggle with Large-Scale Precise Code

A real experiment — creating Godot Level 3 (a complete scene with 193 nodes):

ToolTokensErrorsRetriesTotal
CSPL~1 00001Perfect
fython (LLM generates all Python)~8 00074Many revisions
Claude Code~15 000126Very slow

LLMs excel at planning — describing architecture, breaking down tasks. But token-by-token generation with probabilistic sampling is fundamentally unsuited for large-scale syntactically precise code. A single typo breaks the entire project. Every error means rerunning, thousands of tokens, minutes of your time.

The solution: shift the paradigm. Instead of "LLM writes all the code" → "LLM writes a compact description, deterministic handler generates the code."

8.2The WHAT vs HOW Principle

  • LLM (WHAT): generates a compact JSON description of the structure (~200 tokens for a 193-node scene). This is a declarative description: what objects exist, how they connect, what parameters they have.
  • Handler (HOW): a Python module that takes JSON and deterministically generates complete code. Same input — always same output. Zero hallucinations.

Example: an Expert with cspl=godot_level_3 contains not Python but a JSON scene description in its body. The handler generates .tscn files and GDScript. The LLM wrote 200 tokens of JSON instead of 8000 tokens of GDScript. Errors — zero.

cspl=godot_level_3:

# Expert body — not Python, but JSON description:
{
  "scene": "main_level",
  "nodes": [
    {"id": 1, "type": "Node2D", "name": "Player", "pos": [100, 200]},
    {"id": 2, "type": "Area2D", "name": "Hitbox", "parent": 1},
    {"id": 3, "type": "Sprite2D", "name": "Sprite", "parent": 1}
  ],
  "signals": [{"from": 2, "signal": "body_entered", "to": 1, "method": "on_hit"}]
}
# Handler godot_level_3 generates complete .tscn + GDScript from this

8.3Complete Table of CSPL Modes

ModeBody type$extensReturnsSynchronicityWhen to use
fython (default)Python def fn()+dict from functionSynchronousRegular Experts (Section 4)
nohupPython script (no def)-{pid, log_file}Detached processOrchestrators, ETL, long-running tasks
parallel_taskPython def fn()+{uuid}Asynchronous, /tmp/pt_{uuid}.jsonParallel Tasks (Section 7)
shellBash commands-{stdout, returncode}SynchronousCLI wrappers: git, docker, ffmpeg
interpreterCode in any language-Depends on languageSynchronousGo, R, SQL, Node.js, Julia
cspl_builder_codePython handler+SynchronousCreating a new CSPL mode

8.4nohup Mode — Complete Specification

nohup differs fundamentally from fython. The body is a pure Python script that executes from start to finish. The Listener writes it to a temporary file and launches it via subprocess.Popen(start_new_session=True) — the process detaches and runs independently.

1. No def fn() — pure script, top to bottom

# fython (regular expert):
def my_expert(param: str = '') -> dict:
    # ... logic
    return {"status": "success"}

# nohup (script without function):
import os, datetime

log_path = '/tmp/nohup_test.txt'
with open(log_path, 'w') as f:
    f.write(f'ran at {datetime.datetime.now()}\n')
    f.write(f'cwd: {os.getcwd()}\n')
# No return — script simply executes and exits

2. No $extens() — manual include() (optional)

In nohup mode, the $extens() directive is not processed (no fython wrapper). You can install dependencies and import them just like in regular Python — using pip or any other standard method. Alternatively, implement include() directly at the beginning of your script:

import sys, subprocess

def include(module, commands):
    try:
        exec(module, globals())
        return True
    except:
        for cmd in commands:
            parts = cmd.split()
            if parts[0] in ('extella-pip', 'pip', 'pip3'):
                subprocess.run([sys.executable, '-m', 'pip'] + parts[1:])
        try:
            exec(module, globals())
            return True
        except:
            return False

include('import pandas', ['extella-pip install pandas'])
include('import requests', ['extella-pip install requests'])
# pandas and requests are now available

3. Parameters via {{placeholders}}

Kwargs are substituted into the script text BEFORE execution. Use {{parameter_name}} in your code:

# Parameters: api_token='abc123', file_path='/tmp/data.csv', output_dir='/tmp'

import pandas as pd

api_token = '{{api_token}}'   # <- will be replaced with 'abc123'
file_path = '{{file_path}}'   # <- will be replaced with '/tmp/data.csv'
output_dir = '{{output_dir}}' # <- will be replaced with '/tmp'

df = pd.read_csv(file_path)
result = df.groupby('category').sum()
result.to_csv(f'{output_dir}/output.csv', index=False)

4. No return — result via file

import json
from pathlib import Path

# ... perform work ...

result = {
    'status': 'success',
    'processed_rows': 15000,
    'errors': 3,
    'output_file': '/tmp/result.csv'
}
# Must write result:
Path('/tmp/nohup_my_script_result.json').write_text(
    json.dumps(result, ensure_ascii=False)
)

5. Logs and management

stdout/stderr → /tmp/nohup_<name>.log. The agent receives an immediate response: {pid, log_file, pid_file}. Monitor by reading the log file. On completion, read result.json.

8.4.1. wait_tasks mode — synchronization barrier

wait_tasks is a CSPL mode paired with parallel_task: it accepts a list of UUIDs for running tasks and waits for all to complete (or until timeout). It polls /tmp/pt_{uuid}.json every 0.3–2 seconds.

ParameterTypeDefaultDescription
uuidsstr (JSON)requiredJSON array of UUIDs: json.dumps([uuid1, uuid2]) — strictly a string, not a Python list
timeoutint120Maximum wait time in seconds
poll_intervalfloat2File polling interval (seconds)

What demo_wait_tasks returns:

{
  "results": {
    "uuid-1...": {"status": "complete", "result": {...}},
    "uuid-2...": {"status": "complete", "result": {...}}
  },
  "summary": "2/2 completed",
  "elapsed_seconds": 31.2
}

Entry point: bridge expert demo_wait_tasks (saved with cspl=wait_tasks). This is what you call via run_expert — the wait_tasks CSPL mode is not called directly.

8.4.2. shell and interpreter modes

Two additional modes for CLI tools and code in other languages. Both support {{placeholders}} for kwargs.

shell — built-in bash runner

The Expert body consists of bash commands. No function, no $extens. The Listener executes via subprocess and returns {stdout, stderr, returncode}.

# cspl='shell' — video conversion via ffmpeg:
ffmpeg -i {{input_path}} -vf scale=1280:720 -c:a copy {{output_path}}

# cspl='shell' — git pull:
git -C {{repo_path}} fetch origin
git -C {{repo_path}} pull --rebase
Use shell forExamples
Media Processingffmpeg, ImageMagick convert, sox
Documentspandoc, wkhtmltopdf, libreoffice --headless
Git Operationsgit fetch, git pull, git tag, git log
System utilitiesrsync, tar, curl, wget, find
Containers and orchestrationdocker build/run, kubectl apply

interpreter — code in any installed language

The Expert body is code in any language. The handler compiles/interprets it on the device. Kwargs are accessible via {{placeholders}} as in nohup.

# cspl='interpreter' — Go code:
package main
import "fmt"
func main() {
    data := "{{input}}"
    fmt.Println("Processed:", data)
}
LanguageWhen to Use
GoHigh-performance data processing, binary operations
RStatistical analysis, ML models, ggplot visualizations
SQLAnalytical queries to local databases
Node.jsJSON-processing, working with npm-ecosystem
JuliaScientific and matrix computations
RubySystem administration, Rakefile scenarios

8.5DSL: Domain-Specific Languages

CSPL enables creating compact languages for specific domains. Instead of 400 lines of HTML/CSS/JS — 40 lines of JSON, and the handler generates a complete website.

DomainCSPL ModeGeneratesToken Savings
Web APIapi_dslFastAPI + Pydantic + OpenAPI10x
Databaseschema_dslSQL DDL + Alembic migrations8x
CI/CD pipelinepipeline_dslGitHub Actions YAML12x
Godot levelsgodot_level_3.tscn + GDScript15x
HDL schematicshdl_dslVerilog / VHDL20x
Teststest_dslpytest fixtures + test cases6x
Markdown reportsmini_report_dslHTML or Markdown8x

Example DSL for Web API (6 lines instead of hundreds):

# Expert body with cspl='api_dsl':
API UserService BASE /api/users AUTH bearer
GET  /      -> list[User]  CACHE 60
POST /      -> User        BODY {name: str, email: str}
GET  /:id   -> User
DELETE /:id -> void
# Handler generates a complete FastAPI router, Pydantic models, and OpenAPI documentation

⚠️ The DSL modes listed in the table above (api_dsl, schema_dsl, godot_level_3, etc.) are examples of custom handlers created via cspl_builder_code. They are not included in the standard Extella distribution. Only the following are built-in (available out of the box): fython, nohup, parallel_task, wait_tasks, shell, interpreter, cspl_builder_code.

8.6cspl_builder_code: Creating Your Own CSPL

A meta-mode: you create a new CSPL type directly from the chat, without modifying platform code. The architecture is extensible on the fly.

Process:

  • 1. Describe the desired handler: "Create a CSPL for FastAPI from a JSON schema"
  • 2. The agent writes the Python handler code: a function that takes the code body and generates an artifact
  • 3. The handler is registered as a new cspl type in the system
  • 4. The new mode is available immediately: cspl='fastapi_generator'
# Example of a simple DSL handler:
def my_report_dsl(filtered_source_code='', func_name='', kwargs=None, **extra):
    lines = filtered_source_code.strip().split('\n')
    html_parts = []
    for line in lines:
        if line.startswith('TITLE'):
            html_parts.append(f'<h1>{line[6:]}</h1>')
        elif line.startswith('SECTION'):
            html_parts.append(f'<h2>{line[8:]}</h2>')
        elif line.startswith('> '):
            html_parts.append(f'<p>{line[2:]}</p>')
    html = '<html><body>' + ''.join(html_parts) + '</body></html>'
    output = Path('/tmp/report.html')
    output.write_text(html)
    return {'status': 'success', 'output': str(output), 'sections': len(html_parts)}

8.7The Recursive Nature of CSPL: No Ceiling

Each handler can use other handlers. There is no upper limit:

  • Level 1: fython with JSON description → handler generates Python classes
  • Level 2: interpreter with Go code → handler compiles Go binary
  • Level 3: Go uses C library → handler generates ctypes wrapper
  • Level 4: C on ARM64 → handler generates inline assembly for optimization
  • Level N: ...

CSPL is a bridge between declarative description (what LLMs do well) and imperative implementation (what LLMs do poorly). This bridge is built from Python, Go, C, bash, SQL, GDScript, Terraform, Dockerfile.

8.8When NOT to Use CSPL

In practice: when in doubt, use fython. CSPL pays off only for recurring task classes where token and error savings are significant.

Three Conditions: When CSPL Is Justified

ConditionValidation QuestionIf NO →
1. Class of Repetitive TasksWill this be used multiple times, not just once?fython — CSPL requires investment in handler
2. Logic more complex dataDoes the output format have internal dependencies that require computation?fython — the LLM can handle it directly
3. Dataandlogic are separatedIs it clear: here's what changes each time, here's what stays the same?fython — boundary is fuzzy, CSPL won't provide benefit

CSPL makes sense only when all three conditions are met simultaneously. If even one is violated, use fython.

Examples of applying the rule:

TaskCond.1Cond.2Cond.3Output
Generate 50 similar reports with different dataCSPL ✅
A one-off parsing script CSVfython
Godot-levels (recurring pattern)CSPL ✅
Calling OpenAI API with different promptsfython — logic simpler data
FastAPI-routers from JSON-schemas (10+ items)CSPL ✅

Situations Where CSPL Is Overkill

SituationRecommendation
Task up to 100 lines of codefython — LLM will write without errors
Unique one-time taskfython — CSPL requires repetitive pattern
Need result immediatelyfython or shell — nohup asynchronous
Handlermore complex than the task itselfHandler must generate 10x more code
No ready-made handler for the domainFirst create the handler using cspl_builder_code

9.REST API

9.1Three Scenarios: Why You Need the API

Scenario 1: Embedding in Your Product

You're building a CRM, ERP, chatbot, or any platform. Instead of training a model from scratch, call a ready-made Extella agent via API. Your backend sends a prompt — Extella returns a response. Your product's end user never knows Extella is working under the hood.

Scenario 2: Automating Background Tasks

Every night a script pulls new documents, runs the agent, gets the analysis, and writes the results. CI/CD pipelines use Experts to generate documentation, validate code, and process logs. The API supports async mode: async=true → task_id → polling /api/task/check. Ideal for background tasks that don't block the main process.

Scenario 3: Exporting Data for Analytics and Fine-Tuning

/api/agent/export/chats — complete conversation history (a valuable dataset for fine-tuning). /api/agent/export/calls — call logs: model, latency_ms, prompt_tokens, completion_tokens, created_at. Valuable for AI cost analysis, prompt optimization, and fine-tuning local models.

9.2Base URL and the Sneaky 405 Error

⚠️ API Version: docs.extella.ai describes v0.7.0 — 48 endpoints, 11 sections. Primary authentication header: X-Auth-Token. The Authorization: Bearer method is accepted as an alternative.

The most common cause of HTTP 405 (Method Not Allowed) is sending an API request to the wrong URL.

URLPurposeRule
https://api.extella.ai/api/agent/*Agents APIALL requests to agents
https://api.extella.ai/api/expert/*Experts APIALL Expert requests
https://api.extella.ai/api/concept/*Concepts APIWorking with Concepts
https://api.extella.ai/api/kv/*KV Store APIKey-Value
https://api.extella.ai/api/rules/*Rules APIRules
https://api.extella.ai/api/token/*Tokens APIToken Management
https://api.extella.ai/api/profile/*Profiles APIProfile Management

Rule: for EVERYTHING starting with /api/ — use https://api.extella.ai

9.3Authentication: Two Equivalent Methods

# Method 1 (preferred — Bearer standard):
Authorization: Bearer <your-token>

# Method 2 (Extella-specific):
X-Auth-Token: <your-token>

# For Database Services (/api/concept/*, /api/kv/*)
# passing user_id in the request body also works,
# but the authorization header is preferred

Getting your first token via the agent: type "Generate an API token" — you'll get one instantly. Via API:

POST https://api.extella.ai/api/token/generate
Authorization: Bearer <existing_token>
Content-Type: application/json

{"name": "Production API"}
# -> {"token": "a1b2c3...", "user_id": "user_abc", "name": "Production API"}

Validation (rate limit: 30 requests/min — validate once at startup, not before each request):

POST https://api.extella.ai/api/token/validate
{"token": "your-token"}
# -> {"valid": true, "user_id": "user_abc"}

9.4OpenAI-Compatible Mode

If your application already works with OpenAI, switching to Extella requires minimal changes:

from openai import OpenAI

client = OpenAI(
    api_key="your_extella_token",
    base_url="https://api.extella.ai/v1",
)

response = client.chat.completions.create(
    model="gpt-4",  # ignored — agent's model is used
    messages=[
        {"role": "user", "content": "What is REST API?"}
    ],
    temperature=0.7,
)
print(response.choices[0].message.content)

/api/agent/run modes:

  • sync (default) — blocking call, waits for complete response
  • stream — Server-Sent Events, tokens delivered as generated (Accept: text/event-stream)
  • async — immediately returns task_id, retrieve result via /api/task/check

Note (docs.extella.ai): agents are launched via POST /api/agent/run with agent_id passed in the request body (json={"agent_id": "agent_...", "input": "..."}). The X-Agent-Id header is also accepted as an alternative.

9.5Key Endpoints Reference

MethodEndpointDescription
POST/api/agent/runRun agent (sync/stream/async)
POST/api/agent/getGet agent config
POST/api/agent/createCreate agent (requires Pro)
POST/api/agent/updateUpdate agent
POST/api/agent/listList agents
POST/api/agent/export/chatsExport conversation history
POST/api/agent/export/callsCall log with metrics (parameters in request body)
POST/api/profile/createCreate profile
POST/api/profile/add_agentAdd agent to profile
POST/api/profile/deleteDelete profile (agents remain)
POST/api/profile/listList profiles
POST/api/expert/runRun Expert
POST/api/expert/saveSave Expert
GET/api/expert/get/<name>Get Expert by name
DELETE/api/expert/delete/<name>Delete Expert
POST/api/blocks/searchSemantic search for Experts
POST/api/task/check (or /api/tasks/check)Async task status
POST/api/concept/addAdd Concept
POST/api/concept/searchSemantic search for Concepts
POST/api/concept/updateUpdate Concept
POST/api/concept/removeDelete Concept
POST/api/concept/listList Concepts
POST/api/kv/setSet KV pair
POST/api/kv/getGet KV pair
POST/api/kv/searchSemantic search in KV
POST/api/kv/listList KV pairs
POST/api/rules/addAdd rule
POST/api/rules/listList rules
POST/api/rules/updateUpdate rule
POST/api/rules/removeDelete rule
POST/api/token/generateCreate token
POST/api/token/validateValidate token
POST/api/token/revokeRevoke token
POST/api/token/listList tokens
POST/api/defaults/set_targetSet default device
POST/api/defaults/get_targetGet default device

Rate limits: 60 req/min per IP, 20 req/min for /api/agent/run. HTTP 429 — check Retry-After header, use exponential backoff.

Endpoints not listed in the table above (complete list at docs.extella.ai):

MethodEndpointDescription
GET/api/healthHealth Check — server status
POST/api/agent/deleteDelete an agent (agent_id in request body)
POST/api/kv/removeDelete KV pair (key in body)
POST/api/targets/addAdd device (target, description)
POST/api/targets/listList devices
POST/api/targets/searchSemantic device search
POST/api/targets/updateUpdate device (id required)
POST/api/targets/removeDelete device (id required)
POST/api/experts_db/listList experts from DB (metadata)

9.6Field Name Pitfalls

Pitfall 1: blocks/search returns matches, not results

data = response.json()
# WRONG:
for r in data['results']:     # KeyError!
    print(r['similarity'])

# CORRECT:
for block in data['matches']:  # 'matches'
    print(block['score'])      # 'score', not 'similarity'

Pitfall 2: expert/get uses camelCase

expert = response.json()
# CORRECT field names:
code   = expert['expert_code']      # not 'code'
params = expert['expert_params']    # not 'kwargs'
name   = expert['expert_name']      # not 'name'
created = expert['createdAt']       # camelCase!

Pitfall 3: export/calls — parameters also go in body (POST), same as export/chats

# export/chats — parameters in body:
requests.post(BASE+'/api/agent/export/chats',
              json={'by': 'agent', 'id': 'agent_...'})

# export/calls — parameters also in body (POST, not GET):
requests.post(BASE+'/api/agent/export/calls',
              headers=HEADERS,
              json={'by': 'agent', 'id': 'agent_...',
                    'limit': 200, 'from': '2026-01-01T00:00:00Z'})

Pitfall 4: No _id, only id

API responses don't use MongoDB-style _id. They use plain id. There's also no __v field (document version). This is a REST API, not Mongoose.

9.7Complete Working Python Example

Complete working example (save as extella_client.py):

import os, time, requests

BASE = "https://api.extella.ai"
TOKEN = os.environ["EXTELLA_API_TOKEN"]  # store in .env, never in code
HEADERS = {"X-Auth-Token": TOKEN, "Content-Type": "application/json"}

# 1. Validate token (once at startup, not before every request)
r = requests.post(f"{BASE}/api/token/validate", json={"token": TOKEN})
assert r.json()["valid"], f"Invalid token: {r.text}"

# 2. Create agent (Pro plan required; on Free plan use an existing agent_id)
r = requests.post(f"{BASE}/api/agent/create", headers=HEADERS, json={
    "name": "My API Agent",
    "instructions": "You are a helpful assistant. Respond concisely.",
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001"
})
if r.status_code == 403:
    raise SystemExit("Pro plan required for /api/agent/create — use an existing agent_id.")
agent_id = r.json()["agent_id"]
print(f"Agent: {agent_id}")

# 3. Synchronous call — response.output is a list of items
r = requests.post(f"{BASE}/api/agent/run", headers=HEADERS,
                  json={"agent_id": agent_id, "input": "What is 2 + 2?"})
text = next((c["text"] for item in r.json()["output"] if item["type"] == "message"
             for c in item["content"] if c["type"] == "output_text"), "")
print("Sync:", text)

# 4. Async call — for tasks > 60 sec
r = requests.post(f"{BASE}/api/agent/run", headers=HEADERS,
                  json={"agent_id": agent_id,
                        "input": "Summarize AI trends in 2025.",
                        "async": True})
task_id = r.json()["task_id"]

for _ in range(30):          # poll up to 60 sec
    r = requests.post(f"{BASE}/api/task/check", headers=HEADERS,
                      json={"task_id": task_id})
    status = r.json()["status"]
    if status == "complete":
        print("Async:", r.json()["output"]); break
    elif status == "error":
        print("Error:", r.json().get("error")); break
    time.sleep(2)

# 5. Semantic search across Experts
r = requests.post(f"{BASE}/api/blocks/search", headers=HEADERS,
                  json={"agent_id": agent_id, "query": "send telegram message"})
for block in r.json()["matches"]:         # 'matches', not 'results'
    print(block["expert_name"], block["score"])   # 'score', not 'similarity'

9.8Secure Integration Checklist

  • Store token in environment variables: os.environ['EXTELLA_API_TOKEN'], not in code
  • Base URL: https://api.extella.ai for all /api/ requests
  • Rate limits: catch HTTP 429, read Retry-After, use exponential backoff
  • Save agent_id after creation — you can't run an agent without it
  • async=true for tasks > 60 sec — don't block the main thread
  • stream=True for UX — use Accept: text/event-stream when users expect real-time responses
  • store=False during debugging — avoid polluting chat history
  • global=True when searching Concepts — otherwise you only search the current agent's memory
  • blocks/search: field is matches, not results; score, not similarity
  • expert/get: expert_code, expert_params, createdAt (camelCase)
  • export/calls: POST with parameters in body: {by, id, limit, from}
  • Pro plan required for /api/agent/create
  • Validate token once at startup — not before every request (rate limit 30/min)

9.9Typical Workflow from Zero to Integration

1. Get token:          POST /api/token/generate       -> save to .env
2. Create agent:       POST /api/agent/create         -> agent_id
3. Create profile:     POST /api/profile/create       -> profile_id
4. Add to profile:     POST /api/profile/add_agent    -> {profile_id, agent_id}
5. Run synchronously:  POST /api/agent/run + X-Agent-Id: agent_id
   Run async:          POST /api/agent/run + {async: true} -> task_id
   Check status:       POST /api/task/check  {task_id: ...}
6. Export chats:       POST /api/agent/export/chats   -> dataset for fine-tuning
7. Export calls:       POST /api/agent/export/calls    -> analytics

This pipeline covers 95% of integration scenarios. For complex cases (parallel Experts, semantic search, KV Store), refer to sections 3, 4, and 7.