Skip to main content

2 posts tagged with "ai"

View All Tags

· 8 min read
Elisha Sterngold

Programming in the Age of AI Agents

When Experience Becomes a Liability: Programming in the Age of AI Agents

The Comfortable Lie About AI and Seniority

At the beginning of the AI‑agent era, the industry converged on a reassuring belief: AI would transform software development, but not its hierarchy. Juniors would be displaced first. Seniors would become more valuable than ever. Someone would need to supervise the AI agents, to judge their output, to understand when a confident answer was actually dangerous. Only engineers with real scars—production outages, failed rewrites, architectural dead ends—could possibly do that job.

This belief made sense at the time. Early agents were impressive but shallow. They wrote code fluently but reasoned poorly at the system level. Supervision meant knowing where they hallucinated, where abstractions leaked, where reality diverged from elegance. Architecture was still heavy, expensive to change, and deeply shaped by historical accidents. Experience mattered because memory mattered.

But that belief quietly depends on an assumption that is no longer holding: that AI agents will remain weaker than humans at global reasoning, and that supervision will always mean catching the model when it is wrong. Once agents become strong enough to ingest entire codebases, replay years of architectural evolution, simulate migrations, and explore alternative designs, supervision itself changes meaning. With foundation models at the level of Opus‑4.5, GPT-5.2-Codex, Gemini 3 Pro as raw reasoning engines—and with agent layers built on top of them such as Claude Code, GitHub Copilot, Cursor, Antigravity and high‑autonomy agent frameworks—we are no longer dealing with assistants. We are dealing with machines that can explore architectural space directly—and this direction is only accelerating, as models improve and agent layers grow more autonomous, more stateful, and more deeply embedded in real development workflows. 

The Collapse of Architectural Permanence

Architecture, in this world, stops being sacred. For decades it was heavy because changing it was dangerous. Decisions hardened not because they were optimal, but because revisiting them was too costly. Seniority emerged as authority because it carried memory: memory of why something was split, why it failed, why it was glued back together, and why certain areas were never to be touched again. Architecture was history embodied in code.

AI agents dissolve this monopoly on memory. When the past becomes machine‑readable—every commit, every incident, every failed experiment—history stops living only in human heads. It becomes something you can query, simulate, and challenge. Architecture becomes a version, not a monument. It can be branched, stress‑tested, rewritten, and rolled back. Once the cost of change collapses, experience as stored trauma loses its central power.

Experience as Emotional Debt

This is where the uncomfortable reversal begins. Senior engineers carry not only knowledge, but standards—deeply internalized ideas about how code should look, how systems should be structured, and what "good engineering" means. These standards were earned through hard experience: migrations that nearly killed the company, rewrites that went nowhere, abstractions that promised clarity and delivered outages. But the need to ensure that new code conforms to these standards slows everything down. Every change must be carefully shaped, reviewed, and aligned with an existing mental model of correctness.

That caution is wisdom, but it is also friction. It turns supervision into enforcement and progress into negotiation. Seniors are not only guarding the system from breaking; they are guarding it from deviating. And in a world where AI agents can rewrite, test, and validate entire systems quickly, this insistence on conformity becomes a form of inertia. It encodes a sense of what is too dangerous—or simply too unfamiliar—to try, even when the tools that once made deviation risky no longer exist.

The Junior Advantage

Juniors carry none of this. They have no architectural nostalgia, no sunk cost, no identity tied to the current shape of the system. They look at a codebase not as a legacy to be preserved, but as something provisional. In the pre‑agent world, this made them naïve. In the agent world, it can make them powerful.

Because now the agent carries the memory. Claude Code can reconstruct why a refactor failed five years ago. Codex can explore alternative implementations without touching production. Cursor can let you navigate, rewrite, and validate entire systems in hours instead of months. The junior no longer needs to remember the past—the system can replay it. What the junior brings instead is a willingness to ask whether the past should continue to define the future.

Vibe‑Coding in a Serious World

This is also where vibe‑coding stops being a joke. In the early days, vibe‑coding—iterating quickly with AI, trusting intuition, caring more about flow than formal design—looked irresponsible. But once you combine it with strong AI agents, deep test generation, and fast rollback, vibe‑coding becomes a way to explore reality rather than speculate about it. It is not anti‑discipline; it is discipline shifted from up‑front certainty to rapid validation.

Judgment Over Experience

The old claim was that only seniors could supervise AI agents. But as agents become capable of self‑critique, simulation, and multi‑path reasoning, supervision stops being about catching mistakes and starts being about choosing directions. The problem is no longer that the AI cannot reason deeply enough. It is that it can reason in too many directions at once. The scarce resource becomes judgment, not experience.

Judgment is not the same as experience. Experience is backward‑looking; it encodes what failed. Judgment is forward‑looking; it decides what is worth risking now. Experience says, “This was a disaster once.” Judgment asks, “Are the conditions still the same?” And judgment does not scale linearly with years of writing code. It scales with clarity of thought.

High‑Leverage Agent Tools and the End of the Apprenticeship

This is where high‑leverage agent tools matter. Anything that removes the weight of change—instant refactors, cheap rewrites, aggressive simulation, reversible decisions—changes who gets to participate in architectural decisions. When change becomes cheap, the right to propose change expands. The apprenticeship model, where you slowly earn permission to question the system, begins to crack.

The unsettling possibility is that in a mature AI‑agent world, some of the most effective system designers will be people with very little attachment to how things have always been done. Not because they know more, but because they are willing to dismantle and rebuild more. Not because they are reckless, but because the environment has shifted from one where mistakes were catastrophic to one where they are increasingly simulated, contained, and reversible.

The New Role of Seniors

This does not make seniors obsolete. It changes their role. Their value moves away from being living archives of architectural pain and toward defining invariants: what must never break, what constraints are non‑negotiable, what risks are existential regardless of tooling. But the monopoly on exploration dissolves.

What this looks like in practice: Seniors become guardians of boundaries rather than gatekeepers of change. They define the security model that cannot be compromised. They identify the data consistency guarantees that must hold across any refactor. They articulate the performance thresholds below which the product fails its users. They specify the regulatory and compliance rails that no amount of clever architecture can bypass. Crucially, they enforce the ground truth mechanisms that make rapid iteration safe: comprehensive unit tests, meaningful logging, observability pipelines, and monitoring that catches what code review cannot. These are the things AI agents cannot infer from code alone—they require understanding of the business, the users, and the consequences of failure that extend beyond the codebase.

How seniors must adapt: The shift requires letting go of ownership over how things are built and holding tightly to what must remain true. This means resisting the instinct to enforce stylistic preferences, to mandate familiar patterns, or to reject approaches simply because they feel foreign. It means learning to trust validation over intuition—if the tests pass, the system holds, and the invariants are preserved, the unfamiliar path may be the better one. It means becoming comfortable with code that looks nothing like what you would have written, because you did not write it. The senior who thrives in this world is not the one who insists on reviewing every line, but the one who defines the constraints so clearly that review becomes verification rather than negotiation.

The belief that only seniors can supervise AI agents belongs to a world where agents were weak and architecture was rigid. As agents grow stronger and architecture becomes fluid, that belief starts to look like a historical artifact. The hierarchy built on the cost of change erodes as that cost approaches zero.

Who Shapes the Future

The programmer who will shape the next decade of software may not be the one who remembers the most failures, but the one most willing to ask whether those failures should still define what is possible. Armed with Cursor, Claude Code, Codex, and a willingness to iterate fast, they treat architecture as something to be questioned rather than preserved. They are not reckless—they simply operate in a world where the cost of exploration has collapsed.

And that person may not be senior at all.

· 10 min read
Elisha Sterngold

Logs in the Age of AI Agents

Software developers have always relied on logs as a fundamental tool for understanding what happens inside running systems. Logs capture reality: the sequence of events, the state of the system at a given moment, errors that occurred, and the context in which everything happened.

As AI agents increasingly participate in writing, modifying, and maintaining code, it may be tempting to think that logs will become less important — or even obsolete. In practice, the opposite is true. Logs are becoming more critical than ever. The difference is who will primarily consume them, and how they need to be structured.

This post explores why logs remain essential in the age of AI agents, how the nature of logging is likely to change, and what this means for modern development platforms.

AI Makes Mistakes — Just Like Humans

To understand the future role of logs, we need to start with a realistic understanding of how large language models (LLMs) work.

LLMs do not reason about code in the same way compilers, interpreters, or formal verification systems do. They generate output by predicting the most likely next token based on vast amounts of training data. This makes them extremely powerful pattern generators — but not infallible problem solvers.

As a result:

  • LLMs make mistakes, sometimes obvious and sometimes subtle.
  • They can produce code that looks correct but fails under real-world conditions.
  • They are prone to hallucinations — confidently generating incorrect logic, APIs that don’t exist, or assumptions that are not grounded in reality.
  • They often lack awareness of runtime behavior, concurrency issues, environmental differences, or system-specific edge cases.

Let’s look at a few concrete examples.

Example 1: Android Logging Gone Wrong

Imagine an AI agent generating Android code to log network responses:

Log.d("Network", "Response: " + response.body().string())

At first glance, this looks fine. But in practice, calling response.body().string() consumes the response stream. If the same response is later needed for parsing JSON, the app will crash or behave unpredictably. Both human developers and AI models can overlook this subtle side effect during implementation or testing.

Proper logging would look like this:

val bodyString = response.peekBody(Long.MAX_VALUE).string()
Log.d("Network", "Response: $bodyString")

However, even with this fixed version, there’s still an occasional issue where it can run into memory problems, especially when processing extremely large responses or values that approach the system’s maximum capacity.

This exactly shows the complexity of code. Without logs showing the crash or missing data, the AI agent would have no feedback that its generated code caused a runtime issue.

Example 2: iOS threading Issues

Consider an AI model generating Swift code for updating the UI after a background network call:

URLSession.shared.dataTask(with: url) { data, response, error in
if let data = data {
self.statusLabel.text = "Loaded \(data.count) bytes"
}
}.resume()

This code compiles and may even work sometimes, but it violates UIKit’s rule that UI updates must occur on the main thread. The result could be random crashes or UI glitches.

A correct version would wrap the UI update in a DispatchQueue.main.async block:

DispatchQueue.main.async {
self.statusLabel.text = "Loaded \(data.count) bytes"
}

Logs capturing the crash or warning from the runtime would be the only reliable signal for the AI agent to detect and correct this mistake.

Example 3: Hallucinated APIs

LLMs sometimes invent REST APIs that don’t exist — something that might only become apparent once the product is running in production. For example, an AI might generate code that calls a fictional endpoint:

val response = Http.post("https://api.myapp.com/v2/user/trackEvent", event.toJson())
if (response.isSuccessful) {
Log.i("Analytics", "Event sent successfully")
}

If that /v2/user/trackEvent endpoint was never implemented, the code will compile and even deploy, but the system will start logging 404 errors or timeouts once it’s live. Those logs are the only signal — for both humans and AI agents — that the generated API was imaginary and needs correction.

These limitations are not bugs; they are intrinsic to how current models operate. Even as models improve, it is unrealistic to expect near-future AI-generated code to be consistently perfect in production environments.

This is precisely where logs remain indispensable.

Logs as the Source of Truth

When something goes wrong in production, developers don’t rely on intentions, comments, or assumptions — they rely on evidence. Logs provide that evidence.

The same applies to AI agents.

Regardless of whether code is written by a human or generated by an AI, runtime behavior is the ultimate arbiter of correctness. Logs record what actually happened, not what was expected to happen.

They answer questions such as:

  • What sequence of events led to this state?
  • What inputs did the system receive?
  • Which branch of logic was executed?
  • What errors occurred, and in what context?
  • How did external dependencies respond?

Without logs, both humans and AI agents are left guessing.

Why AI Agents Need Logs

As AI agents increasingly participate in development workflows — generating code, refactoring systems, fixing bugs, and even deploying changes — logs become a critical feedback mechanism.

Logs Close the Feedback Loop

AI agents operate on predictions. Logs provide feedback from reality.

By analyzing logs, an AI agent can:

  • Validate whether generated code behaved as intended
  • Detect mismatches between expected and actual outcomes
  • Identify patterns that indicate bugs or regressions
  • Learn from failures in real production conditions

Without logs, AI agents have no reliable way to distinguish correct behavior from silent failure.

Logs Enable Root Cause Analysis

When failures occur, understanding why they happened requires context. Logs provide structured breadcrumbs that allow both humans and AI to trace causality across components, services, and time.

As AI systems take on more responsibility, automated root cause analysis will increasingly depend on rich, well-structured logs.

How Code Will Be Built in the Future

The rise of AI agents is not just changing how code is written — it is changing how software systems evolve over time.

We are moving toward a world where:

  • AI agents generate and modify large portions of code
  • Humans supervise, review, and guide rather than author every line
  • Systems are continuously adjusted based on runtime feedback
  • Debugging and remediation are increasingly automated

In such an environment, logs are no longer just a debugging tool. They become a primary interface between running systems and intelligent agents.

Think of logs as a bidirectional communication channel: they allow AI agents to observe system behavior in real-time, understand what's happening across distributed components, and make informed decisions about modifications and fixes. Just as APIs define how different services communicate, logs define how AI agents perceive and interact with running software. An AI agent monitoring logs can detect anomalies, correlate events across services, identify patterns that indicate potential issues, and even trigger automated responses — all without direct human intervention. This transforms logs from passive historical records into an active, queryable representation of system state that enables autonomous decision-making.

Logs May No Longer Be Written for Humans

One of the most significant shifts ahead is that logs may no longer need to be primarily human-readable.

Historically, logs were formatted for developers reading them line by line: timestamps, severity levels, free-form text messages, and stack traces.

But if the primary consumer of logs is an AI agent, the requirements shift fundamentally. Instead of human-readable prose, logs must become structured data streams that machines can parse, analyze, and reason about. Rather than a developer scanning through text messages, an AI agent needs programmatic access to event data with clear schemas, explicit context, and traceable relationships between events. The format matters less than the ability for algorithms to extract meaning, detect patterns, and make decisions based on what actually happened — not what a human thought was worth writing down.

Human readability becomes secondary. Humans may still access logs — but often through AI-generated summaries, explanations, and insights rather than raw log lines.

Logs as Active Participants in Autonomous Systems

As observability evolves, logs will increasingly move from passive storage to active participation in system behavior. Today, logs are primarily archives — repositories of what happened, consulted after the fact. Tomorrow, they will be real-time inputs that directly influence system actions.

We can already see early signs of this transformation:

  • Logs triggering alerts and automated workflows — when an error pattern appears, systems can automatically scale resources, restart services, or notify teams
  • Logs feeding anomaly detection systems — machine learning models analyze log streams to identify deviations from normal behavior before they escalate
  • Logs being correlated with metrics and traces — combining different signals to build comprehensive views of system health
  • Logs used to gate deployments or rollbacks — automated systems evaluate log patterns to decide whether a new release should proceed or be reverted

In AI-driven systems, this trend accelerates dramatically. Logs become the substrate on which autonomous decision-making is built. Instead of simply reacting to alerts, AI agents can proactively analyze log patterns, predict issues before they manifest, and autonomously implement fixes. An AI agent might notice subtle degradation patterns in logs, correlate them with recent code changes, generate a targeted fix, test it against historical log data, and deploy it — all without human intervention. In this model, logs aren't just records of the past; they're the sensory input that enables systems to observe, reason, and act autonomously.

Shipbook and the AI Age of Logging

At Shipbook, we believe logs are not going away — they are evolving.

Shipbook was built to give developers deep visibility into real-world application behavior, with features like:

  • Powerful search and filtering
  • Session-based log grouping
  • Proactive classification in loglytics

But we take this even further with our Shipbook MCP Server. By implementing the Model Context Protocol, we allow AI agents to directly connect to your Shipbook account. This means your AI assistant can now search, filter, and analyze real-time production logs to help you debug issues faster and more accurately.

But we are also looking ahead.

We are actively developing capabilities that prepare logs for the AI age: logs that are easier for machines to interpret, analyze, and reason about — while still remaining useful for human developers.

As AI agents become first-class participants in software development, logs won't just be a debugging tool — they'll be the trusted interface that enables intelligent systems to understand, learn from, and improve the code they generate. That's the future we're building at Shipbook: logs that power both human insight and AI autonomy.


Ready to prepare your logging infrastructure for the AI age? Shipbook gives you the power to remotely gather, search, and analyze your user logs and exceptions in the cloud, on a per-user & session basis. Start building logs that work for both your team today and the AI agents of tomorrow.