Generative AI~ 7 min read

Model Context Protocol

Serving a model is one half of the generative AI story; the other half is giving models access to your world, the functions they can call, the data they can read, and the prompts they can reuse. The Model Context Protocol (MCP) is the open standard for exactly that, and Flama provides native, first-class support for building MCP servers. This page shows how to expose tools, resources, and prompts to AI clients, and how to use the advanced extensions for background tasks, interactive input, and embedded user interfaces.

What is MCP?

The Model Context Protocol is an open standard that lets AI applications connect to external capabilities through a uniform interface. An MCP server advertises three kinds of capability: tools (functions the model can invoke), resources (data the model can read), and prompts (reusable prompt templates). Clients such as AI assistants discover these capabilities and call them over JSON-RPC (a lightweight remote-procedure-call protocol that exchanges JSON messages).

In Flama, an MCP server is a named registry that you mount on your application and then populate with decorated Python functions. The framework implements the stateless 2026-07-28 revision of the protocol: rather than negotiating a session once through an initialize handshake, every request is self-contained, carrying its protocol version and capabilities in a _meta object and its routing data in Mcp-Method / Mcp-Name headers. This makes MCP servers trivial to scale horizontally, since no per-client state is held between calls.

Why is it important?

Interoperability: Any MCP-capable client can use your tools without bespoke integration code.
Reuse: The same Python functions that power your API can be exposed to AI agents with a single decorator.
Type safety: Flama derives each tool's input and output JSON Schema from the handler's type hints, so clients receive accurate, self-contained contracts.
Statelessness: The 2026-07-28 protocol holds no session state, so servers scale horizontally without sticky sessions.
Extensibility: Optional extensions add background Tasks, interactive Elicitation, and embeddable MCP Apps user interfaces.

The main virtue MCP brings is a single, type-safe contract that turns the functions already in your codebase into capabilities any AI client can discover and call.

Building an MCP server

You register an MCP server on your application by name through the mcp module. The add_server method both creates the server and mounts it at a URL path, so a single application can host several servers, each under its own path:

import flama
app = flama.Flama()app.mcp.add_server("/mcp/tools/", "tools", version="2.0.0", instructions="Flama demo MCP tools server")

This registers a server named tools, reachable at /mcp/tools/. With the server in place, you populate it by name: every tool, resource, and prompt decorator takes an mcp argument identifying the server the capability belongs to.

Tools

A tool is a function the model can invoke. Declare one with the tool decorator, naming the target server through mcp; Flama infers the tool's input and output schema from the handler's type hints:

@app.mcp.tool("add", description="Add two integers", mcp="tools")def add(a: int, b: int) -> int:    return a + b

@app.mcp.tool(description="Greet someone by name", mcp="tools")async def greet(name: str) -> str:    return f"Hello, {name}!"

Tools may be synchronous or asynchronous functions. When you omit the name, the function's own name is used; when you omit the description, its docstring is used. The parameters and return annotation become the tool's inputSchema and outputSchema, advertised to clients verbatim.

Resources

A resource is readable data addressed by a URI. The resource decorator registers one on the named server:

import json

@app.mcp.resource("config://app", name="config", description="Application configuration",                  mime_type="application/json", mcp="tools")def config():    return json.dumps({"debug": True, "name": "flama-mcp"})

Resources are listed and read by their URI, so a client fetches the configuration above by requesting config://app.

Prompts

A prompt is a named, reusable prompt template. The prompt decorator registers one on the named server, deriving its arguments from the handler's parameters:

@app.mcp.prompt("summarise", description="Summarise the given text", mcp="tools")def summarise(text: str):    return f"Summarise the following:\n\n{text}"

Prompts are listed by name and rendered with arguments supplied by the client; here text becomes the single required argument.

Calling a tool

Clients interact with a mounted server by POSTing a JSON-RPC request to its path, with the routing headers that identify the method and target. To invoke the add tool defined above:

curl --request POST \  --url http://127.0.0.1:8000/mcp/tools/ \  --header 'Content-Type: application/json' \  --header 'Mcp-Method: tools/call' \  --header 'Mcp-Name: add' \  --header 'MCP-Protocol-Version: 2026-07-28' \  --data '{  "jsonrpc": "2.0",  "id": 1,  "method": "tools/call",  "params": {"name": "add", "arguments": {"a": 2, "b": 3}}}'

The server replies with a JSON-RPC result carrying the rendered content and, because add declares a return type, a structured representation of the result:

{  "jsonrpc": "2.0",  "id": 1,  "result": {    "content": [{"type": "text", "text": "5"}],    "structuredContent": 5  }}

Clients discover the available tools the same way, by sending a tools/list method, so an AI assistant can enumerate your capabilities before deciding which to call.

Advanced extensions

The 2026-07-28 protocol defines optional extensions, all supported natively. A server advertises the extensions it uses in its discovery capabilities, so clients negotiate them per request.

Tasks

Long-running tools can run as background Tasks rather than blocking the call. Pass task=True and the server returns a task handle the client can poll:

@app.mcp.tool("square", task=True, description="Square a number as a background task", mcp="tools")async def square(x: int) -> int:    return x * x

Elicitation

A tool can pause mid-call to elicit further input from the user. The handler declares a parameter annotated with Elicitation to read the answers gathered so far, and returns Elicit.require(...) to request more:

from flama.mcp.data_structures import Elicit, Elicitation

@app.mcp.tool("confirm", description="Confirm an action through an elicitation round-trip", mcp="tools")def confirm(elicitation: Elicitation) -> str:    if "confirm" not in elicitation:        return Elicit.require("Are you sure?", {"type": "boolean"}, name="confirm")    return f"confirmed={elicitation['confirm']}"

The elicitation parameter is supplied by the server and excluded from the tool's input schema, so it never appears as a tool argument. Because the protocol is stateless, the answers gathered so far are round-tripped through an opaque continuation token the client echoes back on the retry.

MCP Apps

A tool can declare a prefetchable user-interface template (an MCP App) that hosts render alongside its result. Register the template with app_template and point the tool at it with ui_template:

@app.mcp.app_template("ui://widget", name="widget", description="A small UI widget", mcp="tools")def widget():    return "<html><body><h1>Flama widget</h1></body></html>"

@app.mcp.tool("with_ui", description="A tool that declares a prefetchable UI template",              ui_template="ui://widget", mcp="tools")def with_ui() -> str:    return "rendered"

Example

The following application registers two servers by name on a single application, then populates each through the decorators. The mcp argument keeps every capability bound to the right server, so the tools server gathers a synchronous tool, a background task, an elicitation round-trip, a resource, a prompt, and a tool with a UI template, while the math server hosts its own tool:

# examples/mcp.pyimport json
import flamafrom flama import Flamafrom flama.mcp.data_structures import Elicit, Elicitation
app = Flama(    openapi={        "info": {            "title": "Generative AI API",            "version": "1.0.0",            "description": "Model Context Protocol servers with Flama 🔥",        },    },)
app.mcp.add_server("/mcp/tools/", "tools", version="2.0.0", instructions="Flama demo MCP tools server")app.mcp.add_server("/mcp/math/", "math", version="2.0.0")

@app.mcp.tool("add", description="Add two integers", mcp="tools")def add(a: int, b: int) -> int:    return a + b

@app.mcp.tool("square", task=True, description="Square a number as a background task", mcp="tools")async def square(x: int) -> int:    return x * x

@app.mcp.tool("confirm", description="Confirm an action through an elicitation round-trip", mcp="tools")def confirm(elicitation: Elicitation) -> str:    if "confirm" not in elicitation:        return Elicit.require("Are you sure?", {"type": "boolean"}, name="confirm")    return f"confirmed={elicitation['confirm']}"

@app.mcp.resource("config://app", name="config", description="Application configuration",                  mime_type="application/json", mcp="tools")def config():    return json.dumps({"debug": True, "name": "flama-mcp"})

@app.mcp.prompt("summarise", description="Summarise the given text", mcp="tools")def summarise(text: str):    return f"Summarise the following:\n\n{text}"

@app.mcp.app_template("ui://widget", name="widget", description="A small UI widget", mcp="tools")def widget():    return "<html><body><h1>Flama widget</h1></body></html>"

@app.mcp.tool("with_ui", description="A tool that declares a prefetchable UI template",              ui_template="ui://widget", mcp="tools")def with_ui() -> str:    return "rendered"

@app.mcp.tool("multiply", description="Multiply two integers", mcp="math")def multiply(a: int, b: int) -> int:    return a * b

if __name__ == "__main__":    flama.run(flama_app=app, server_host="0.0.0.0", server_port=8000)

With this, your application serves models, a chat interface, and a set of agent-ready tools from a single codebase. To revisit how those models are served, return to Serving LLMs; to learn how Flama structures larger applications, continue to the Domain-driven design section.

Introduction

Getting Started

Fundamentals

Flama CLI

Advanced Topics

Predictive AI

Generative AI

Domain driven design

Contributing