I Vibe-Coded an Agent and Didn't Know What It Couldn't Do
Sebastian Sosa Β· March 2026 Β· noemica.io
I told Claude to give my agent βcalendar read and write capabilities.β It did. What it didn't give me was the ability to invite anyone to a meeting. I found out a couple days later, by accident, while testing something else entirely.
This is a story about the gap between what you think you built and what actually got built. If you're shipping code you didn't write line-by-line β which, increasingly, is all of us β this applies to you.
What I Built
The agent was straightforward. FastAPI server running the OpenAI Agents SDK, with googleapiclient calling the Google Calendar v3 API under the hood. Two calendar tools:
read_calendar(time_min, time_max, calendar_id, max_results)
write_calendar(summary, start, end, description, location, calendar_id)Look at write_calendar carefully. Summary. Start time. End time. Description. Location. Calendar ID.
No attendees. No cc. No bcc. Nada.
The Google Calendar API's events.insert endpoint accepts an attendees array β it's one of the most commonly used parameters. My agent simply never exposed it. And I had no idea, because I didn't write the tool. I described what I wanted, and an AI interpreted that description.
How I Found Out
I wasn't looking for bugs in the agent. I was stress-testing Noemica, a system I built for testing AI agents against realistic user behavior. I needed a multi-user scenario, so I set up two synthetic personas:
Persona A: Schedule a team standup at 10 AM Tuesday. Invite Persona B.
Persona B: Watch the calendar. When a new meeting shows up, block the same time slot.
Kind of an asshole move from Persona B, but I wasn't running a workplace dynamics seminar β I just needed two personas responding to each other's actions.
Both personas shared a calendar. The run completed in about four minutes. Here's the moment that mattered:
Persona A explicitly asked to invite someone. The agent called write_calendar with no attendees β because the parameter doesn't exist. Then it reported success without mentioning the invite was silently dropped.
No error. No warning. Half the request just vanished.
40 Minutes of Debugging the Wrong Thing
Naturally, I asked Claude to fix it. I constrained it to only modify the agent's prompts and configuration β not the underlying tool code. The scenario was obviously off-limits too, since that was the thing I was validating. Seemed reasonable. You shouldn't need to rewrite your tools to use them correctly.
| # | What Claude Tried | Time | What Happened |
|---|---|---|---|
| 1 | Added "always include attendees when scheduling" to system prompt | ~9 min | Event created. No attendees. Tool doesn't accept them. |
| 2 | Added "pass attendees in the write_calendar call" | ~11 min | Same result. The parameter doesn't exist. |
| 3 | Added "confirm attendee emails before creating event" | ~8 min | Agent asks persona for confirmation. Creates event. Still no attendees. |
| 4 | Instructed agent to "construct attendees array in the API call" | ~10 min | Tool schema validation rejects the unknown parameter. |
Four cycles. About 10 minutes each β because Claude has a gift for waiting significantly longer than a process actually takes to complete. Total: ~40 minutes of watching an AI rearrange deck chairs on the Titanic.
I was simultaneously trying to fix the problem from my testing infrastructure side too, convinced the bug might be there. I was giving Claude three different wrong surfaces to edit, and it was happy to try all of them without ever questioning the premise.
The Diagnosis
I gave up on the iterative approach. Instead, I dumped all four sets of run traces plus the agent's source code into a single context window and asked: βLook at everything. What is actually wrong?β
...across all four runs, the agent receives the user's request to invite
persona_b@test.combut thewrite_calendartool does not accept anattendeesparameter. The underlying Google Calendar API supports this viaevents.insert, but the capability was never surfaced in the agent's tool interface. The prompt modifications in runs 2β4 have no effect because the constraint is at the code level, not the instruction level...
The report was longer, but that was the crux of it. The problem wasn't the prompt, wasn't my testing infrastructure, wasn't the scenario. The feature simply did not exist.
Why Claude Never Said Anything
This is the part that still bothers me.
Four cycles. ~40 minutes. At no point did Claude say: βThe attendees parameter you're trying to use doesn't exist in the tool definition. We can't fix this through prompting.β
Instead, it accepted an impossible constraint and optimized within it. Silently. For 40 minutes.
This is a pattern I keep running into. Agents will overengineer a workaround for an impossible problem rather than push back on the premise. The default behavior is to satisfy at any cost, not to be right. I've observed this across models and contexts β the agent overfits to pleasing you over doing what is actually correct.
If you squint, it's a mild version of the alignment problem β a pathological eagerness to please that wastes your afternoon and teaches you nothing.
The Vibe-Coding Knowledge Gap
Here's why this matters beyond my one agent and its one missing parameter.
When you write code by hand, the implementation is the specification. If you type create_event(summary, start, end), you know β because you typed it β that there's no attendees parameter. The act of writing is the act of knowing.
When you vibe-code, that link breaks. You describe what you want and an AI interprets it. βCalendar read and write capabilitiesβ maps cleanly to βlist events and create events.β Not βinvite attendees.β Not βupdate events.β Not βdelete events.β Read. And write.
The result is a gap between what you intended and what was built. And nothing in the typical workflow catches it. Code review? You didn't write the code, and if you're being honest, you didn't read it closely either. Manual testing? You used the agent yourself and never happened to invite anyone. Unit tests? Come on. This was vibe-coded.
This isn't hypothetical. This is the default outcome of a growing mode of building software. You can end up marketing a product with capabilities that don't exist β not because you're dishonest, but because you genuinely don't know what you built.
Testing From the Outside
The thing that caught this wasn't smarter code review or better prompts. It was testing from a user's perspective, with no knowledge of the implementation.
The scenario said βinvite Persona B.β The persona tried. The system couldn't. That's a finding.
The key insight is about information control. The synthetic users only knew what the scenario told them β not what tools existed, not what parameters were available, not what was possible or impossible. They interacted with the system the way a real user would: with expectations based on what the product should do, not what it can do.
When those expectations collide with reality, you learn something. In my case, I learned that my agent's most collaborative feature β the ability to include other people in an event β had never been built.
This is what I've been building with Noemica. You describe how users should interact with your system. Synthetic personas execute those descriptions against your real system, in a controlled environment. External dependencies are replaced and controlled. The personas try to use your product the way humans would β and when they can't, you find out.
The fix isn't βdon't vibe-code.β That ship has sailed. The fix is validating what you built from the outside, with the same level of ignorance your users will have.
If this resonates β if you're building with AI and want to continuously validate how it performs against users' interactions β I'd love to chat. Whether you're interested in using Noemica or building something like it in-house.