← Back to Blog
AI9 min read2026-05-03

Model Context Protocol (MCP): 0-to-Hero (No Fluff)

MCP collapses the N×M integration problem between AI tools and models. Architecture, primitives, transport, security, and a 10-module learning path — explained simply.

MCPAILLMsProtocolsIntegrations

Model Context Protocol — 0 to Hero

Model Context Protocol (MCP): 0-to-Hero (No Fluff)

If you've ever connected an LLM to Slack, GitHub, or Postgres and thought "why am I writing this glue code again?" — this one's for you.


What Even Is MCP?

You have N tools (Slack, GitHub, Postgres, Jira…) and M AI models (Claude, ChatGPT, Cursor, your own agent). Connecting them naively means N × M custom adapters. Maintenance nightmare.

MCP is the USB-C for AI tools. One plug, every device.

Without MCP:                    With MCP:
N tools × M models              N tools + M models
= explosion of glue code        = one shared protocol

Open standard. Created by Anthropic in November 2024 by David Soria Parra and Justin Spahr-Summers. OpenAI adopted it in March 2025. Now industry default.


The Three Roles 🧩

Every MCP setup has exactly three players:

┌──────────────────────────────────────────┐
│                  HOST                    │
│   (Claude Desktop, Cursor, ChatGPT…)     │
│                                          │
│    ┌──────────┐    ┌──────────┐          │
│    │ CLIENT 1 │    │ CLIENT 2 │  …       │
│    └────┬─────┘    └────┬─────┘          │
└─────────┼───────────────┼────────────────┘
          │               │
          ▼               ▼
    ┌──────────┐    ┌────────────┐
    │ SERVER A │    │  SERVER B  │
    │ (GitHub) │    │ (Postgres) │
    └──────────┘    └────────────┘

Host — the AI app. Owns the user, the model, permissions.

Client — lives inside the host. 1:1 with one server. Handles session, reconnect, auth.

Server — the actual integration. Talks to GitHub, Postgres, your filesystem, whatever. Exposes capabilities.

Mental model: Host = browser. Client = a browser tab. Server = a website. One browser, many tabs, many sites.


Under the Hood — JSON-RPC 2.0

MCP messages travel as JSON-RPC 2.0. Three message types:

| Type | Needs reply? | |------|--------------| | Request | Yes | | Response | It IS the reply | | Notification | No (fire-and-forget) |

Bi-directional. Both sides can ask. Transport-agnostic. Batchable.


Transport — How Bytes Move

Two flavors:

stdio (local)

Host spawns server as a child process. Talks over stdin/stdout.

Host process ──spawns──► Server process
       │   stdin / stdout pipe   │
       └────────────◄────────────┘
  • Lowest latency
  • No network
  • Best for: Claude Desktop, Cursor, local dev

Streamable HTTP (remote)

Host ──HTTPS──► Cloud Server
       streamed responses
  • Original 2024-11-05 spec used HTTP + SSE
  • Newer 2025-06-18 spec uses Streamable HTTP (cleaner)
  • Best for: SaaS, multi-tenant, prod deployments

⚠️ Always check the spec version before you build. Transport changed between revisions.


The Four Primitives

A server can expose four things. Learn these — they're 90% of MCP:

1. Resources 📄

Read-only context the model can pull in. Files, DB rows, API responses.

Server says: "Here's a list of files. Each has a URI you can fetch."
Model:       reads the ones it cares about.

2. Tools 🔧

Actions the model can invoke. SQL queries, API calls, shell commands, writes.

Tool: run_sql(query)
Tool: create_github_issue(repo, title, body)
Tool: send_slack_message(channel, text)

The model decides when to call. Probabilistic. Plan for misfires.

3. Prompts 📝

Reusable prompt templates the server ships. Domain knowledge baked in.

Prompt: "review_this_pr" → loads a battle-tested review checklist

4. Sampling 🪞 (the unique one)

The server asks the host's LLM to generate text. Inverted control.

Server: "Hey host, can the model summarize this for me?"
Host:   asks user to approve → runs model → returns text

This is what makes MCP more than function-calling. Servers can orchestrate model calls. Powerful + new attack surface — see security section.


Connection Lifecycle

1. Host connects/spawns server
       ↓
2. Capability negotiation (what do you support?)
       ↓
3. Client asks: list tools, resources, prompts
       ↓
4. Model calls tools → client forwards → server runs → result back
       ↓
5. Session ends or reconnects on drop

When MCP Shines ✨

💻 IDE assistants (Cursor, Claude Desktop) One server → many editors. Slack + GitHub + tests in one place.

🗄️ Live DB access Dynamic queries beat stale snapshots.

🏢 Enterprise tooling (Jira, Linear, Confluence) Build integration once. Every AI client uses it.

🤖 Multi-agent pipelines Real-time grounding without per-model adapters.

🔄 Vendor independence Swap Claude ↔ ChatGPT without rewriting integrations.

MCP does NOT replace RAG. It's the pipe to your vector DB. You still need embeddings, indexing, retrieval logic. MCP solves connectivity, not retrieval strategy.

When NOT to Use MCP 🚫

  • Sub-millisecond critical paths → use direct in-process SDK
  • Single model, single tool, never going to grow → just call the API
  • Simple chat app with no external systems → you don't need this

Pros & Cons — Honest Take

✅ Wins

🌐 Portability Works across any compliant host. No LLM lock-in.

🔌 Build once, plug many One server → Claude + ChatGPT + Cursor + custom agents.

🔍 Dynamic discovery Clients learn capabilities at runtime. No compile-time coupling.

🧱 Clean separation Model layer ≠ business layer. Easier to test, easier to secure.

🪞 Sampling Server-driven LLM workflows. Function-calling can't do this.


❌ Costs

⚙️ Operational weight Server = real distributed system. Monitoring, rate-limits, scaling — not a thin wrapper.

🌱 Ecosystem immaturity Lots of community servers. Few have SLAs, audits, or maintenance commitments.

🔐 Auth complexity OAuth 2.1, token refresh, secret rotation. Not trivial to get right.

🎲 Non-determinism Model picks when to call. Plan for hallucinated or skipped tool calls.

🐢 Latency overhead Protocol layer adds round-trips. Bad fit for sub-millisecond paths.

🚨 Sampling = new attack surface Compromised server can trigger model calls inside host context. Log + rate-limit.


MCP vs. Alternatives ⚖️

Same job, different trade-offs. Pick by use case, not hype.

🟢 MCP

  • Standard across models ✅
  • Runtime discovery ✅
  • Multi-client support ✅
  • Server-initiated LLM (sampling) ✅
  • Latency: Medium
  • Cost to build: High
  • Best for: Multi-model fleets, shared integration layers

⚡ Direct Function Calling

  • Standard across models ❌ (model-specific SDK)
  • Runtime discovery ❌ (hardcoded)
  • Multi-client support ❌
  • Sampling ❌
  • Latency: Lowest 🏆
  • Cost to build: Low
  • Best for: Hot path, single model, single integration

🔗 LangChain Tools

  • Standard across models ❌ (framework-locked)
  • Runtime discovery: Partial
  • Multi-client support ❌
  • Sampling ❌
  • Latency: Medium
  • Cost to build: Medium
  • Best for: Apps already built on LangChain

📚 RAG (Vector DB)

  • Standard across models: N/A (not a tool layer)
  • Runtime discovery ❌
  • Multi-client support ❌
  • Sampling ❌
  • Latency: Medium
  • Cost to build: Medium
  • Best for: Static, pre-indexed knowledge retrieval

Quick rule: One model + one tool = direct call. Many models + many tools = MCP. Static docs lookup = RAG. Already on LangChain = LangChain tools.


Security — Wider Than You Think 🔒

Tool calling has a small attack surface. MCP has a bigger one. Plan for it.

  • OAuth 2.1 — auth without leaking creds to the model
  • Sampling abuse — log + rate-limit every server-initiated model call
  • Approval gates — human-in-the-loop for writes; design UX so people don't disable it
  • Open-source server vetting — read the code before prod use. Many have zero review.
  • Transport — TLS 1.3+, request signing for HTTP servers
  • Secrets — vaults (Vault, AWS Secrets Manager), not env vars in source

The 10-Module Path 🎓

A practical learning ladder. Pick a weekend per module.

Module 1 — Foundations (2–3h)

What MCP is. The N×M problem. USB-C analogy. Hosts/clients/servers.

Module 2 — JSON-RPC + Transport (3–4h)

Request/response/notification. stdio vs. Streamable HTTP. Hands-on: a tiny JSON-RPC client.

Module 3 — Primitives Deep Dive (3–4h)

Resources, Tools, Prompts, Sampling. Map each to a real use case.

Module 4 — Build Your First Server (Python) (4–5h)

  • Install Python MCP SDK
  • Tool: read a local file
  • Resource: list a directory
  • Wire to Claude Desktop via stdio
  • Watch the model call your code 🎉

Module 5 — Same Thing, Node.js (4–5h)

  • TypeScript MCP SDK
  • Add a Prompt primitive
  • Compare DX with Python

Module 6 — Remote Server (5–6h)

  • Streamable HTTP transport
  • CORS, sessions, keepalive
  • Streaming partial results
  • Deploy to Vercel / AWS / GCP
  • Exponential backoff retries

Module 7 — Auth + Approval + Security (5–6h)

  • OAuth 2.1 flow end-to-end
  • Approval UX that survives daily use
  • Audit logging
  • Vault integration

Module 8 — Testing + Observability (4–5h)

  • Mock clients for unit tests
  • End-to-end integration tests
  • Structured logs with correlation IDs
  • Distributed tracing

Module 9 — Production Patterns (5–6h)

  • Caching idempotent reads
  • Circuit breakers
  • Per-user / per-tool rate limits
  • Capability versioning
  • Health checks + alerting

Module 10 — Capstone (8–10h)

Pick one. Or all three.

A — Docs Retrieval Bot 📚 Server wraps Pinecone/Weaviate. Claude asks → semantic search → top-K back.

B — SQL Analyst 🗄️ Server exposes SQL tools over Postgres. Schema introspection, query validation, row caps, approval on writes.

C — GitHub Automation 🤖 Server exposes repo + CI tools. Claude reviews PRs, files issues, runs workflows. Wire it into Cursor.


TL;DR — The 5 Things to Remember

  1. MCP is a protocol, not a framework. Your business logic still lives in your server.
  2. Build once, plug many. One server → every AI client.
  3. Sampling is the differentiator. Server-initiated LLM calls = workflows function-calling can't express. Also new attack surface.
  4. Treat servers like prod services. Auth, monitoring, rate-limits, versioning. Not toys.
  5. Pick the right tool. MCP wins for multi-client. Direct SDK wins for single-model hot paths.

Action item: Build a 50-line MCP server today. Wire it to Claude Desktop. Watch the model call your code. That's the fastest path from zero to fluent.

Written after enough MCP server reviews to develop strong opinions. Every analogy had to pass the "would I have understood this on day one?" test.

🧑‍💻

Anoop Singh

Tech Lead & AI Architect