Building the Quality Layer for MCP: MCP Playground & mcpx

The Model Context Protocol went from Anthropic-only to industry standard in under six months. OpenAI, Google DeepMind, Microsoft, Stripe, Linear, Notion, Vercel, Supabase — they all support it now. There are 692 servers in the official registry as I write this, and more ship every week.

But there was a problem nobody had solved: no good way to test them, and no quality bar for what "good" even looks like.

This is the story of how I built MCP Playground and mcpx to fix that.

The Problem

When I first started exploring the MCP ecosystem, the workflow for testing any server was painful:

Find a server in the registry
Clone the repo
Install dependencies locally
Wire up Claude Desktop or write a throwaway client script
Hope the schema is documented somewhere
Give up if it requires auth or the docs are missing

Most developers gave up at step 4. And about 95% of servers in the registry are npm packages with no remote endpoint — meaning you had to install them locally just to see what tools they expose.

I wanted the Postman moment for MCP. Paste a URL, click connect, start testing in seconds. So I built it.

MCP Playground — Test Any MCP Server in Your Browser

MCP Playground is a browser-based tool for discovering, connecting to, and testing MCP servers. No installation. No local setup. You paste a URL and you're in.

It supports all three MCP transports: Streamable HTTP, SSE, and WebSocket. It auto-generates forms from JSON Schema, executes tool calls, shows structured results, and logs every JSON-RPC message in a traffic inspector so you can debug protocol-level issues visually.

For the registry's many npm-only servers, I built something more ambitious.

The In-Browser Sandbox

The sandbox runs any npm MCP server entirely in your browser using WebContainers — a WASM-based Node.js runtime that runs locally with no host network or filesystem access. Click a button, it installs the package, boots the process, and connects via stdio. Nothing leaves your tab.

This sounds straightforward. It wasn't.

Problem 1: Echo filtering. WebContainers echo stdin back to stdout. The MCP protocol communicates over stdin/stdout, and the SDK reads stdout to parse JSON-RPC messages. The echoed input was being parsed as responses, completely breaking the protocol. I had to build a custom transport layer that strips echoed input before passing stdout to the SDK.

Problem 2: Transport vs. process lifecycle. The MCP SDK calls transport.close() internally when initialization fails. My transport's close() method killed the underlying WebContainer process — making any retry attempt impossible because the process was already dead. The fix was separating transport lifecycle from process lifecycle entirely: closing the transport no longer touches the process, so you can reconnect without restarting.

Problem 3: React Strict Mode double-mount. In development, React 18 mounts components twice to help surface side effects. A simple boolean abort flag (let aborted = false) doesn't survive this because the second mount resets it — so the first mount's cleanup had no effect and two WebContainer instances would spin up simultaneously. I switched to a version counter pattern: each mount gets an incrementing ID, and any async operation checks whether its ID still matches the current one before doing anything.

These were the kinds of bugs you only find by building the thing.

What Else It Does

Beyond the core playground and sandbox, MCP Playground includes:

Registry browser — explore all 692 servers from the official MCP Registry with filters, search, and category pills
Server detail pages — every server at /server/[id] with registry metadata, install commands, and a live quality scan
Shareable links — deep-link to a specific server, tool, and pre-filled arguments with autorun=1 to auto-execute on open
Embed support — drop a live playground into your docs site via iframe, or add a "Try in Playground" badge to your README
Add to IDE — one-click config generation for Claude Desktop, Cursor, and Claude Code CLI
Auth header support — bring your own API keys; stored in sessionStorage only, never sent to our backend
Traffic inspector — every JSON-RPC message between client and server, live

The Quality Problem

After building the playground, I kept running into a different problem. You connect to a server and have no idea if it's any good. Tools with no descriptions. Parameters with no types. Schemas that burn thousands of tokens before a single call is made.

This is a real cost. Every token in your schema occupies context window. A poorly designed MCP server can consume 10,000+ tokens just in tool definitions — before any work is done. This is part of why the MCP vs CLI debate is heating up: agents using MCP often pay a massive "context tax" that bash commands simply don't have.

But here's what I think the debate is getting wrong: it's not the protocol. It's the servers.

So I built the quality layer.

The Schema Linter

The linter at mcpplayground.tech/lint runs 19 rules against any MCP server and grades it A–F:

Are tool descriptions present and meaningful (not just 5 words)?
Are all parameters typed with JSON Schema?
Are required fields declared?
Is the total token footprint under control?
Are resource URIs well-structured?
Does the server expose any tools at all?

Scores start at 100. Each error deducts 15 points. Each warning deducts 5. A server with zero tools, resources, and prompts automatically fails regardless of score.

Grade	Score
A	90–100
B	75–89
C	60–74
D	40–59
F	0–39

The Quality Dashboard — Real Data

I built a registry-wide quality dashboard at mcpplayground.tech/quality that scans every live MCP server and grades them all. Here are the results from 641 servers scanned:

127

Publicly reachable (18%)

301

Completely unreachable

213

Require authentication

58%

Grade A (of reachable servers)

The most surprising finding: 4 out of 5 MCP servers in the official registry can't be reached without credentials or are simply offline. If your AI agent tries to discover and use MCP servers dynamically, it fails 82% of the time before it even gets to evaluate the schema quality.

The servers that are public and reachable are actually reasonably well built (58% A grade). The ecosystem's problem isn't that developers write bad schemas — it's that most servers were never designed to be publicly testable.

mcpx — Lint in the Terminal, Enforce in CI

A browser linter is useful. But it doesn't belong in a CI pipeline.

I built @samsec/mcpx to bring the same linting engine to the terminal and CI systems. It's a thin wrapper over the MCP Playground public API — no backend to set up, no infrastructure to manage.

# No install needed npx @samsec/mcpx lint https://your-server.com/mcp

MCP Playground — Schema Linter ───────────────────────────────────────── Server : My MCP Server v1.0.0 Tools : 12 · Resources: 0 · Prompts: 0 Grade : B · Score: 85/100 Tokens : ~1,240 estimated ERRORS (1) ✗ tool "create_payment" — description too short (8 chars, min 20) WARNINGS (3) ⚠ tool "refund" — missing required fields declaration ⚠ tool "search" — Property "q" has no description ⚠ tool "search" — Description is 420 chars — consider trimming

The Exit Code Design

One decision I thought hard about: exit codes. Most CLI tools use two states — 0 for success, 1 for failure. I use three:

0 — passed
1 — failed (errors found, grade below threshold, budget exceeded)
2 — unreachable (server is down, requires auth, or timed out)

The distinction between 1 and 2 matters in CI. "I linted your server and it's bad" is a different problem from "I couldn't reach your server at all." A deployment pipeline should handle those differently — one is a schema quality failure, the other is an infrastructure failure.

Enforcing Standards in CI

# Fail CI if grade drops below B
mcpx lint https://your-server.com/mcp --min-grade B

# Fail CI if token footprint exceeds budget
mcpx lint https://your-server.com/mcp --token-budget 5000

# Fail on any warnings, not just errors
mcpx lint https://your-server.com/mcp --fail-on warnings

# Machine-readable output
mcpx lint https://your-server.com/mcp --format json | jq '.grade'

Catching Regressions with mcpx diff

The diff command compares two server versions — useful for staging vs. production checks:

mcpx diff --base https://prod.com/mcp --head https://staging.com/mcp

Grade : A → B ✖ dropped Score : 95 → 82 (−13) Tokens : ~800 → ~1,240 (+55.0%)

If your staging server's token footprint ballooned or the grade dropped, the diff exits non-zero and blocks the merge. You can configure thresholds: --score-drop 10 fails if score drops more than 10 points, --token-threshold 20 fails if tokens increase more than 20%.

The GitHub Action

- uses: sameenchand/mcpx@v1
  with:
    url: ${{ secrets.MCP_SERVER_URL }}
    fail-on: errors
    min-grade: B
    token-budget: 5000

The action surfaces warnings and errors as native GitHub annotations — they appear inline in the PR diff without any extra configuration. Four outputs are available: grade, score, errors, warnings.

The Public REST API

Everything the playground and mcpx do is available as a free, CORS-enabled REST API:

# Health check
curl "https://mcpplayground.tech/api/v1/health?url=https://mcp.example.com/mcp"

# Full inspection
curl "https://mcpplayground.tech/api/v1/inspect?url=https://mcp.example.com/mcp"

# Schema lint
curl "https://mcpplayground.tech/api/v1/lint?url=https://mcp.example.com/mcp"

# Registry search
curl "https://mcpplayground.tech/api/v1/registry/servers?q=filesystem&limit=5"

Rate limited per IP. No API key required. Use it in your own pipelines.

The Full Loop

Before MCP Playground and mcpx, testing an MCP server required local setup. Linting didn't exist. CI quality gates didn't exist.

Now the full loop exists:

Test in the browser → Lint in the terminal → Enforce in CI

Every MCP server author can paste their URL into the linter and get a grade in 10 seconds. Every team shipping an MCP server can add one CI step and know immediately when their schema quality regresses.

What's Next

Server monitoring — track uptime over time, alert authors when their server goes down
Verified badge — servers that maintain grade A or B get a certification badge for their README
User accounts — saved servers, personal dashboards, team workspaces
MCP Server Hosting — the big one. Host and serve MCP servers on managed infrastructure. Authors build the server, we run it. Nobody is doing this yet.

Try It

Playground Lint your server Quality Dashboard Docs GitHub (Playground) GitHub (mcpx)

Everything is open source. If you're building an MCP server, run the linter on it. Curious what grade yours gets.

Questions or feedback? Reach me by email or find me on LinkedIn.