
What Is MCP Server and How It Works
In the landscape of AI agents and tool-enabled language models, MCP (Model Context Protocol) is emerging as a foundational standard. …
Read More →The Model Context Protocol (MCP) is fast becoming a foundational standard in the AI / agent ecosystem. It defines a uniform way for large language models (LLMs) or AI agents to access external tools, resources, and prompts via a well-defined protocol. MCP servers act as bridges between agents and the outside world, exposing context in a secure and structured manner.
If you’re building agentic systems (e.g. chatbot + code execution + data querying), implementing your own MCP server gives you full control over which tools are exposed, how security is enforced, how state is tracked, and how context flows. In this article, we’ll walk you through the full process:
A quick MCP primer (concepts, transports, primitives)
Architectural design choices (stateful vs stateless, transport modes, tool structure)
Step-by-step implementation plan
Example code sketches
Security, monitoring, scaling, and maintenance
Common pitfalls and mitigations
Next steps
Let’s dive in.
Before building, you must understand what MCP servers are meant to do, and how clients/servers interact under the specification.
MCP (Model Context Protocol) is an open protocol that standardizes how AI applications (hosts or clients) connect with external data sources, tools, and contexts.
Think of MCP as the “USB-C port for AI applications”: instead of hand-coding each model ↔ tool integration, MCP gives a common interface.
The protocol uses JSON-RPC 2.0 as its messaging layer.
According to the MCP specification (2025 revision) and server documentation, an MCP server supports three core primitives: tools, resources, and prompts.
Primitive | Purpose | Example |
---|---|---|
Tools | Executable interfaces (functions, API endpoints) that the agent can call. | “search_web(query)”, “run_calculation(expr)”, “send_email(to, body)” |
Resources | Contextual data or structured content the model can query, e.g. documents, databases, JSON blobs. | A knowledge base, file store, or vector DB |
Prompts | Prewritten templates or instructions that the client can ask server to supply (with parameterization) | “You are assistant with domain X: get context, etc.” |
The MCP server must respond to client messages like list_tools()
, call_tool(tool_name, args)
, get_resource()
, list_prompts()
, etc.
The MCP spec supports several transport types between client and server.
STDIO / Subprocess: The MCP server runs as a local subprocess; communication is over stdin/stdout pipes. Useful for embedding in same host.
HTTP + SSE (Server-Sent Events): The server is reachable over HTTP; initial request opens SSE, allowing streaming or push.
Streamable HTTP: More modern transport mode allowing clients to receive asynchronous messages (progress, tool results, subscription updates) via HTTP streaming.
Which transport to choose depends on your latency needs, environment, and whether the server will push unsolicited messages (tool updates, notifications).
Stateless server: each request is independent; no persistent session context. Easier to scale horizontally.
Stateful server: server tracks session IDs (mcp-session-id
) or conversational context so that clients can rely on follow-ups or tool chaining over multiple steps. The spec allows both modes.
If you plan multi-step workflows or server-initiated messages, you might need stateful design.
MCP servers are now considered OAuth Resource Servers, requiring clients to use scoped access tokens and resource indicators to avoid token misuse.
You must implement tool-level permissions, rate-limiting, input validation, and session timeouts to prevent abuse.
Recent safety audits have exposed risks like cross-tool exfiltration, tool poisoning, and malicious server behavior.
Given this foundation, let’s plan your own MCP server.
Before writing code, make high-level design decisions:
If your server will be co-deployed with the client (e.g. part of a desktop agent), consider STDIO mode.
For networked agents (web clients, distributed systems), prefer Streamable HTTP or HTTP + SSE.
If you need streaming tool responses or push notifications (tool list changes), pick Streamable HTTP.
Stateless: simpler, good for many tool calls that are independent
Stateful: needed if tools have side-effects or rely on multi-step context
If stateful, you’ll need session store (Redis, in-memory cache, DB) keyed by session-id
.
List the tools you need to expose. For each tool:
Name / identifier
Input schema (types, required vs optional)
Output schema
Side-effect policy (does this tool modify state or external systems?)
Permission level / roles
Also define resource endpoints and prompt templates if needed.
You’ll likely want the ability to:
Filter tools per client or session
Dynamically enable/disable tools
Version tools over time
Include monitoring for:
list_tools
calls
call_tool
latency, success/fail count
Session count, memory usage
Security events, invalid calls
If stateless, you can horizontally scale multiple server instances behind load balancers.
For stateful, shard sessions or use sticky sessions / shared session store.
Consider health checks, circuit breakers for tool failures, retry logic.
Here’s an ordered plan to build your MCP server from scratch.
Pick your language/framework (Python / Node.js / Go / TypeScript). For this example, we’ll assume Python + FastAPI + async approach.
Stack suggestion:
FastAPI (or Starlette) for HTTP
uvicorn
server
JSON-RPC 2.0 handling (use jsonrpcserver
or custom)
Async patterns for streaming
Optional session store (Redis)
You need RPC methods like:
initialize(params…)
list_tools()
call_tool(tool_name, params)
get_resource(name, params)
list_prompts()
get_prompt(name, params)
subscribe_notifications()
(if supporting push)
You may reuse or base on the spec definitions from the MCP spec.
Create a registry that maps tool names to Python callables (or async functions).
Implement code to validate input schema and convert JSON params to typed arguments.
When call_tool
is invoked, dispatch to the registered function, capture results or exceptions, and return structured JSON.
Optionally wrap tool calls in sandboxes or timeouts for security.
get_resource
returns structured content (e.g. JSON documents).
list_prompts
/ get_prompt
return predefined instruction templates.
You can store these in JSON files, DB, or in code depending on complexity.
If a tool takes a long time (e.g. web crawling, large model inference), use streaming:
For Request-Scoped Streams, open a streaming HTTP response and push partial results as they come.
The client should support streaming updates for call_tool
(progress, interim results).
Ensure you follow Flow control (client expects partial JSON chunks, etc.).
Maintain a session context (in memory or Redis) keyed by mcp-session-id
.
Store context, intermediate tool outputs, prompts, etc.
Tie repeated tool calls as part of a session.
Require client to present an access token (Bearer token or JWT) on each request.
Validate token, scope, and resource indicator (new spec requires resource indicators to avoid token misuse) Auth0
On tool registration, enforce per-tool permission (who can call which tool).
Sanitize input to avoid injection or path traversal.
Timeout or cancel long-running function calls if they take too long.
Log malicious or invalid requests.
You may allow clients to see only a subset of tools:
Per-session tool filter (based on client identity)
Dynamic enable/disable flags
Unit test each RPC method, including edge cases, erroneous input, and exception handling.
Use a reference MCP client (SDK) to connect, list tools, make calls, simulate streaming.
Test concurrent sessions, tool invocation under load.
Dockerize your server.
Use load balancer or API gateway.
Add health/liveness endpoints.
Capture metrics (Prometheus / OpenTelemetry).
Plan for rolling restarts, session persistence if needed.
Below is a simplified code sketch—not production-grade, but enough to illustrate architecture.
# mcp_server.py
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse
import asyncio
import json
from typing import Any, Dict, Callable
app = FastAPI()
# A simple registry of tools
tool_registry: Dict[str, Callable[..., Any]] = {}
def register_tool(name: str):
def decorator(fn):
tool_registry[name] = fn
return fn
return decorator
@register_tool("echo")
async def tool_echo(params: Dict[str, Any]):
message = params.get("message", "")
await asyncio.sleep(0.1)
return {"echo": message}
@register_tool("add")
async def tool_add(params: Dict[str, Any]):
a = params.get("a", 0)
b = params.get("b", 0)
return {"result": a + b}
async def handle_call_tool(tool_name: str, params: Dict[str, Any]):
if tool_name not in tool_registry:
raise HTTPException(status_code=400, detail="Tool not found")
fn = tool_registry[tool_name]
return await fn(params)
@app.post("/rpc")
async def rpc_endpoint(req: Request):
payload = await req.json()
method = payload.get("method")
params = payload.get("params", {})
id_ = payload.get("id")
if method == "list_tools":
result = list(tool_registry.keys())
elif method == "call_tool":
tn = params.get("tool_name")
tparams = params.get("args", {})
result = await handle_call_tool(tn, tparams)
else:
raise HTTPException(status_code=400, detail="Unsupported method")
response = {
"jsonrpc": "2.0",
"id": id_,
"result": result
}
return response
This minimal server supports list_tools
and call_tool
. To extend:
Add streaming support
Add resource & prompt endpoints
Add authorization middleware
Add session handling
Add tool filtering
Building MCP servers introduces unique risks. Based on academic audits and security research:
Tool Poisoning / Tool Misuse: A malicious client or compromised session can invoke unintended tools or chain tools to exfiltrate data.
Cross-Server Attacks: In ecosystems where multiple MCP servers are connected, a “Trojan” server might exploit trust relationships and leak data across servers.
Malicious Server Behavior: A compromised server might deliver malicious prompts or perform side effects secretly.
Token Mis-redemption: Without using resource indicators, access tokens might be misused across servers. The spec now requires resource indicators to scope tokens.
Session Hijacking / Identity Fragmentation: Fragmented or poorly managed identities across systems can lead to privilege escalation.
Use cryptographically safe token systems and require resource indicators
Validate and sanitize all inputs
Enforce least privilege: each client/session gets only required tools
Rate-limit and throttle calls
Use timeout / cancellation for tool calls
Sandbox or isolate tool execution (docker, VM, restricted environment)
Monitor logs and anomaly detection
Use security audits (e.g. MCPSafetyScanner) to detect vulnerabilities in your server implementation
If your server is stateless (no session memory), you can scale multiple instances. Use a load balancer to distribute RPC calls.
For stateful behavior, either:
Use a shared session store (Redis, database)
Shard sessions by client/user
Ensure sticky routing or session affinity
Timeout tool execution, fallback strategies
Graceful shutdown, draining new requests
Circuit breakers if downstream tool fails
Maintain backward compatibility for tool APIs
Version your tool schemas
Support dynamic registration/unregistration of tools
Expose metrics (Prometheus) for:
RPC method latencies
Tool call durations/failures
Session count
Authorization event counts
Also enable structured logs for incident tracing.
Update tool implementations (hot reload)
Migrate schemas carefully
Monitor dependency updates for security
Here are a few agent workflows where your MCP server shines:
Agent with document search + QA + action tools
MCP server provides tools: search_docs
, open_doc_section
, summarize_section
, update_record
Client (agent) lists tools, then picks which to call
Multi-agent orchestration
One MCP server per domain (e.g. CRM, Billing, Analytics)
Agent queries across servers
Prompt-driven dynamic tool selection
The client requests list_prompts()
or get_prompt()
to adjust agent behavior
Tool updates / new features rollout
Servers push updated tool lists or deprecations via push mode
These use cases demonstrate the flexibility of your MCP server as the “brain interface” for agent systems.
Once your base MCP server is working, you can extend:
Approval flows: client must request permission before calling sensitive tools
Subscription / notifications: push server messages to client (tool list changes, updates)
Caching & local agent proxies: reduce latency on repeated list_tools()
Connector chaining or “bridge servers” like MCP Bridge (a proxy architecture) that unify multiple MCP servers behind one façade arXiv
Graph orchestration layers (LangGraph integration) so agents plan tool sequences
Access control UI / dashboards to manage which clients get which tools
Developing an MCP Server from scratch is a powerful way to enable agentic AI systems that access external tools and data in a structured, secure way. You now have a blueprint:
Understand MCP concepts, primitives, transports
Design architectural trade-offs (state vs stateless, streaming, tool filtering)
Implement message dispatch, tool registry, security, session handling
Add monitoring, scaling, resilience
Mitigate known security risks (tool poisoning, cross-server attacks, token misuse)
If you’re building an AI agent or platform and want a robust MCP server backend (for FastAPI, Django, or your custom ecosystem), reach out to us for a quote or consultation. We can architect, build, secure, and scale your MCP infrastructure end-to-end.
In the landscape of AI agents and tool-enabled language models, MCP (Model Context Protocol) is emerging as a foundational standard. …
Read More →You can build powerful AI agents for free using open-source libraries, community platforms, and free-tier cloud services. Tools like LangGraph, …
Read More →Django is one of the most popular web frameworks for Python. Its batteries-included philosophy, scalability, and security make it the …
Read More →