MCP Basics
In my recent post on how to improve LLMs, I introduced a few common notions. What I did not talk about was MCP (Model Context Protect). It doesn’t quite fit into the mould, but it’s been a concept that has generated a lot of buzz. So let’s talk about what it is and when it’s useful.
The basic scenario
Recall that an AI agent, in the most basic sense, is an LLM that can use tools. It runs in a loop until some specified task is done. Now, how do we hook up an LLM like ChatGPT to a tool we’d like it to use? If you are the maintainer of the LLM, you can simply integrate the capabilities directly into your system. Ask ChatGPT for a picture of something, and it will access its image generation tool. But what about all the other third-party tools?
Enter MCP. It’s a protocol, a standardized way for extending an AI agent’s capabilities with those of another tool. Skipping over the technical details, the idea is that the third-party tool provider has an MCP Server running that you can point your AI tool toward. From that server, the AI tool gets, in plain language, a list of capabilities and how to invoke them.
This probably sounds a tad esoteric, so let’s make it extremely concrete, with an example.
The other day, I needed to generate an online survey form, with some text fields, some multiple choices fields, etc. I had the outline for it written in a google doc, and was now facing the task of clicking together and configuring the fields in the amazing tally.so platform. Then I noticed that they now have an MCP server. So all I had to do was:
Authorize the connection and configure permissions (basically, which actions Claude should perform with/without double-checking with me)
Post the survey plan into Claude and tell it to make me a form in tally.so
And off it went, with an amazing result that was almost instantly useable, with just a few more tweaks on my end.
Behind the scenes, the MCP protocol provides a shared language for how a tool like Tally can tell an AI like Claude what it’s capable of: “Hey, I’m Tally, and if you ask me nicely, I can make a multiple choice field, as long as you tell me what the options are, together with numerous other capabilities.
The reason MCP created so much buzz is that it instantly simplified the question of how we could make the vast universe of tools available to LLMs.
Questions remain
The first question is, of course, who should be responsible for running the MCP server. In an ideal world, it would be the provider of the tool. Much like these days they provide API integration via REST APIs, they should provide AI integration via MCP. But there can be issues around incentives: Some tools want to hoard your data and not give it up easily via MCP. Slack and Salesforce come to mind.
Another issue is around the quality of an MCP. There is a very lazy way to create an MCP server: Just take your existing REST API, and slap the MCP layer around it. If the only reason you’re creating an MCP server is to tick a box along the “yeah boss, we have an AI strategy” line, then fine. If you want the MCP server to be genuinely useful, though, you’re better off crafting skills around the “job to be done”. The capabilities exposed by a classic REST API are very basic, whereas the jobs a user would like the agent to perform might be more complex.
Digging a bit into the Todoist MCP (my favourite to-do app), for example, we see that it comes with a get-overview skill. According to its description (which gets passed to the AI tool), it generates a nicely formatted overview of a project. This requires several calls to the REST API, like getting a list of sub-projects, project sections, and tasks in that project. You can either hope that the AI agent would realize and correctly perform these steps when a user says “He Claude, give me an overview of what’s on my plate in Todoist”, or you can give the AI a huge leg up by implementing get-overview as a complete skill.
There’s one additional final issue with MCP in its current form: Because each MCP tool adds a lot of information to the AI tool’s context, you can quickly use up all the available context, leaving not much context for actual instructions or extended reasoning.
When does your SaaS Product need an MCP Server?
It might seem like a no-brainer. Of course you want your tool to be accessible by ChatGPT, Claude, and co. And I’d argue that a solid MCP server is a low-cost way of attaching whatever you built to the crazy train that is generative AI. So the more pointy question to ask is: When should you not bother with an MCP? I’d say you don’t want to expose your tool via MCP if you have strong business reasons to have your own AI agent sitting inside your tool. And then beef up that agent via MCP. (Even then, you could arguably expose the hjgher level capabilities of your tool via MCP, which then in the background does more work, possibly using more MCP…)
So, MCP all the way, and if you feel strongly that you need one for your tech stack but don’t know where to start, let’s talk 🙂
PS: More on Claude’s new shiny thing (”Skills”) in another post.
