Journalist/developer. Storytelling developer @ USA Today Network. Builder of @HomicideWatch. Sinophile for fun. Past: @frontlinepbs @WBUR, @NPR, @NewsHour.
2163 stories
·
45 followers

How Ruby Went Off the Rails

1 Share

What happened to RubyGems, Bundler, and the Open Source drama that controls the internet infrastructure.

Read the whole story
chrisamico
19 hours ago
reply
Boston, MA
Share this story
Delete

90% | Armin Ronacher's Thoughts and Writings

1 Share

written on September 29, 2025

“I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code”

Dario Amodei

Three months ago I said that AI changes everything. I came to that after plenty of skepticism. There are still good reasons to doubt that AI will write all code, but my current reality is close.

For the infrastructure component I started at my new company, I’m probably north of 90% AI-written code. I don’t want to convince you — just share what I learned. In parts, because I approached this project differently from my first experiments with AI-assisted coding.

The service is written in Go with few dependencies and an OpenAPI-compatible REST API. At its core, it sends and receives emails. I also generated SDKs for Python and TypeScript with a custom SDK generator. In total: about 40,000 lines, including Go, YAML, Pulumi, and some custom SDK glue.

I set a high bar, especially that I can operate it reliably. I’ve run similar systems before and knew what I wanted.

Setting it in Context

Some startups are already near 100% AI-generated. I know, because many build in the open and you can see their code. Whether that works long-term remains to be seen. I still treat every line as my responsibility, judged as if I wrote it myself. AI doesn’t change that.

There are no weird files that shouldn’t belong there, no duplicate implementations, and no emojis all over the place. The comments still follow the style I want and, crucially, often aren’t there. I pay close attention to the fundamentals of system architecture, code layout, and database interaction. I’m incredibly opinionated. As a result, there are certain things I don’t let the AI do. I know it won’t reach the point where I could sign off on a commit. That’s why it’s not 100%.

As contrast: another quick prototype we built is a mess of unclear database tgables, markdown file clutter in the repo, and boatloads of unwanted emojis. It served its purpose — validate an idea — but wasn’t built to last, and we had no expectation to that end.

Foundation Building

I began in the traditional way: system design, schema, architecture. At this state I don’t let the AI write, but I loop it in AI as a kind of rubber duck. The back-and-forth helps me see mistakes, even if I don’t need or trust the answers.

I did get the foundation wrong once. I initially argued myself into a more complex setup than I wanted. That’s a part where I later used the LLM to redo a larger part early and clean it up.

For AI-generated or AI-supported code, I now end up with a stack that looks something like something I often wanted, but was too hard to do by hand:

  • Raw SQL: This is probably the biggest change to how I used to write code. I really like using an ORM, but I don’t like some of its effects. In particular, once you approach the ORM’s limits, you’re forced to switch to handwritten SQL. That mapping is often tedious because you lose some of the powers the ORM gives you. Another consequence is that it’s very hard to find the underlying queries, which makes debugging harder. Seeing the actual SQL in your code and in the database log is powerful. You always lose that with an ORM.

    The fact that I no longer have to write SQL because the AI does it for me is a game changer.

    I also use raw SQL for migrations now.

  • OpenAPI first: I tried various approaches here. There are many frameworks you can use. I ended up first generating the OpenAPI specification and then using code generation from there to the interface layer. This approach works better with AI-generated code. The OpenAPI specification is now the canonical one that both clients and server shim is based on.

Iteration

Today I use Claude Code and Codex. Each has strengths, but the constant is Codex for code review after PRs. It’s very good at that. Claude is indispensable still when debugging and needing a lot of tool access (eg: why do I have a deadlock, why is there corrupted data in the database etc.). The working together of the two is where it’s most magical. Claude might find the data, Codex might understand it better.

I cannot stress enough how bad the code from these agents can be if you’re not careful. While they understand system architecture and how to build something, they can’t keep the whole picture in scope. They will recreate things that already exist. They create abstractions that are completely inappropriate for the scale of the problem.

You constantly need to learn how to bring the right information to the context. For me, this means pointing the AI to existing implementations and giving it very specific instructions on how to follow along.

I generally create PR-sized chunks that I can review. There are two paths to this:

  1. Agent loop with finishing touches: Prompt until the result is close, then clean up.

  2. Lockstep loop: Earlier I went edit by edit. Now I lean on the first method most of the time, keeping a todo list for cleanups before merge.

It requires intuition to know when each approach is more likely to lead to the right results. Familiarity with the agent also helps understanding when a task will not go anywhere, avoiding wasted cycles.

Where It Fails

The most important piece of working with an agent is the same as regular software engineering. You need to understand your state machines, how the system behaves at any point in time, your database.

It is easy to create systems that appear to behave correctly but have unclear runtime behavior when relying on agents. For instance, the AI doesn’t fully comprehend threading or goroutines. If you don’t keep the bad decisions at bay early it, you won’t be able to operate it in a stable manner later.

Here’s an example: I asked it to build a rate limiter. It “worked” but lacked jitter and used poor storage decisions. Easy to fix if you know rate limiters, dangerous if you don’t.

Agents also operate on conventional wisdom from the internet and in tern do things I would never do myself. It loves to use dependencies (particularly outdated ones). It loves to swallow errors and take away all tracebacks. I’d rather uphold strong invariants and let code crash loudly when they fail, than hide problems. If you don’t fight this, you end up with opaque, unobservable systems.

Where It Shines

For me, this has reached the point where I can’t imagine working any other way. Yes, I could probably have done it without AI. But I would have built a different system in parts because I would have made different trade-offs. This way of working unlocks paths I’d normally skip or defer.

Here are some of the things I enjoyed a lot on this project:

  • Research + code, instead of research and code later: Some things that would have taken me a day or two to figure out now take 10 to 15 minutes.
    It allows me to directly play with one or two implementations of a problem. It moves me from abstract contemplation to hands on evaluation.

  • Trying out things: I tried three different OpenAPI implementations and approaches in a day.

  • Constant refactoring: The code looks more organized than it would otherwise have been because the cost of refactoring is quite low. You need to know what you do, but if set up well, refactoring becomes easy.

  • Infrastructure: Claude got me through AWS and Pulumi. Work I generally dislike became a few days instead of weeks. It also debugged the setup issues as it was going through them. I barely had to read the docs.

  • Adopting new patterns: While they suck at writing tests, they turned out great at setting up test infrastructure I didn’t know I needed. I got a recommendation on Twitter to use testcontainers for testing against Postgres. The approach runs migrations once and then creates database clones per test. That turns out to be super useful. It would have been quite an involved project to migrate to. Claude did it in an hour for all tests.

  • SQL quality: It writes solid SQL I could never remember. I just need to review which I can. But to this day I suck at remembering MERGE and WITH when writing it.

What does it mean?

Is 90% of code going to be written by AI? I don’t know. What I do know is, that for me, on this project, the answer is already yes. I’m part of that growing subset of developers who are building real systems this way.

At the same time, for me, AI doesn’t own the code. I still review every line, shape the architecture, and carry the responsibility for how it runs in production. But the sheer volume of what I now let an agent generate would have been unthinkable even six months ago.

That’s why I’m convinced this isn’t some far-off prediction. It’s already here — just unevenly distributed — and the number of developers working like this is only going to grow.

That said, none of this removes the need to actually be a good engineer. If you let the AI take over without judgment, you’ll end up with brittle systems and painful surprises (data loss, security holes, unscalable software). The tools are powerful, but they don’t absolve you of responsibility.

This entry was tagged ai and thoughts

Read the whole story
chrisamico
3 days ago
reply
Boston, MA
Share this story
Delete

I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now

1 Share

I've noticed something interesting over the past few weeks: I've started using the term "agent" in conversations where I don't feel the need to then define it, roll my eyes or wrap it in scare quotes.

This is a big piece of personal character development for me!

Moving forward, when I talk about agents I'm going to use this:

An LLM agent runs tools in a loop to achieve a goal.

I've been very hesitant to use the term "agent" for meaningful communication over the last couple of years. It felt to me like the ultimate in buzzword bingo - everyone was talking about agents, but if you quizzed them everyone seemed to hold a different mental model of what they actually were.

I even started collecting definitions in my agent-definitions tag, including crowdsourcing 211 definitions on Twitter and attempting to summarize and group them with Gemini (I got 13 groups).

Jargon terms are only useful if you can be confident that the people you are talking to share the same definition! If they don't then communication becomes less effective - you can waste time passionately discussing entirely different concepts.

It turns out this is not a new problem. In 1994's Intelligent Agents: Theory and Practice Michael Wooldridge wrote:

Carl Hewitt recently remarked that the question what is an agent? is embarrassing for the agent-based computing community in just the same way that the question what is intelligence? is embarrassing for the mainstream AI community. The problem is that although the term is widely used, by many people working in closely related areas, it defies attempts to produce a single universally accepted definition.

So long as agents lack a commonly shared definition, using the term reduces rather than increases the clarity of a conversation.

In the AI engineering space I think we may finally have settled on a widely enough accepted definition that we can now have productive conversations about them.

Tools in a loop to achieve a goal

An LLM agent runs tools in a loop to achieve a goal. Let's break that down.

The "tools in a loop" definition has been popular for a while - Anthropic in particular have settled on that one. This is the pattern baked into many LLM APIs as tools or function calls - the LLM is given the ability to request actions to be executed by its harness, and the outcome of those tools is fed back into the model so it can continue to reason through and solve the given problem.

"To achieve a goal" reflects that these are not infinite loops - there is a stopping condition.

I debated whether to specify "... a goal set by a user". I decided that's not a necessary part of this definition: we already have sub-agent patterns where another LLM sets the goal (see Claude Code and Claude Research).

There remains an almost unlimited set of alternative definitions: if you talk to people outside of the technical field of building with LLMs you're still likely to encounter travel agent analogies or employee replacements or excitable use of the word "autonomous". In those contexts it's important to clarify the definition they are using in order to have a productive conversation.

But from now on, if a technical implementer tells me they are building an "agent" I'm going to assume they mean they are wiring up tools to an LLM in order to achieve goals using those tools in a bounded loop.

Some people might insist that agents have a memory. The "tools in a loop" model has a fundamental form of memory baked in: those tool calls are constructed as part of a conversation with the model, and the previous steps in that conversation provide short-term memory that's essential for achieving the current specified goal.

If you want long-term memory the most promising way to implement it is with an extra set of tools!

Agents as human replacements is my least favorite definition

If you talk to non-technical business folk you may encounter a depressingly common alternative definition: agents as replacements for human staff. This often takes the form of "customer support agents", but you'll also see cases where people assume that there should be marketing agents, sales agents, accounting agents and more.

If someone surveys Fortune 500s about their "agent strategy" there's a good chance that's what is being implied. Good luck getting a clear, distinct answer from them to the question "what is an agent?" though!

This category of agent remains science fiction. If your agent strategy is to replace your human staff with some fuzzily defined AI system (most likely a system prompt and a collection of tools under the hood) you're going to end up sorely disappointed.

That's because there's one key feature that remains unique to human staff: accountability. A human can take responsibility for its action and learn from its mistakes. Putting an AI agent on a performance improvement plan makes no sense at all!

Amusingly enough, humans also have agency. They can form their own goals and intentions and act autonomously to achieve them - while taking accountability for those decisions. Despite the name, AI agents can do nothing of the sort.

This legendary 1979 IBM training slide says everything we need to know:

A computer can never be held accountable. Therefore a computer must never make a management decision

OpenAI need to get their story straight

The single biggest source of agent definition confusion I'm aware of is OpenAI themselves.

OpenAI CEO Sam Altman is fond of calling agents "AI systems that can do work for you independently".

Back in July OpenAI launched a product feature called "ChatGPT agent" which is actually a browser automation system - toggle that option on in ChatGPT and it can launch a real web browser and use it to interact with web pages directly.

And in March OpenAI launched an Agents SDK with libraries in Python (openai-agents) and JavaScript (@openai/agents). This one is a much closer fit to the "tools in a loop" idea.

It may be too late for OpenAI to unify their definitions at this point. I'm going to ignore their various other definitions and stick with tools in a loop!

There's already a meme for this

Josh Bickett tweeted this in November 2023:

What is an AI agent?

Meme showing a normal distribution curve with IQ scores from 55 to 145 on x-axis, featuring cartoon characters at different points: a calm face at low end labeled "An LLM in a loop with an objective", a stressed face with glasses and tears in the middle peak with a complex flowchart showing "AGENT Performance Standard" with boxes for Critic, feedback, Learning element, Problem Generator, Sensors, Performance element, Experiments, Effectors, Percepts, Environment, and actions connected by arrows.... and a hooded figure at high end also labeled "An LLM in a loop with an objective".

I guess I've climbed my way from the left side of that curve to the right.

You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

Read the whole story
chrisamico
13 days ago
reply
Boston, MA
Share this story
Delete

Slack is extorting us with a $195k/yr bill increase

1 Share

An open letter, or something

For nearly 11 years, Hack Club - a nonprofit that provides coding education and community to teenagers worldwide - has used Slack as the tool for communication. We weren’t freeloaders. A few years ago, when Slack transitioned us from their free nonprofit plan to a $5,000/year arrangement, we happily paid. It was reasonable, and we valued the service they provided to our community.

However, two days ago, Slack reached out to us and said that if we don’t agree to pay an extra $50k this week and $200k a year, they’ll deactivate our Slack workspace and delete all of our message history.

One could argue that Slack is free to stop providing us the nonprofit offer at any time, but in my opinion, a six month grace period is the bare minimum for a massive hike like this, if not more. Essentially, Salesforce (a $230 billion company) is strong-arming a small nonprofit for teens, by providing less than a week to pony up a pretty massive sum of money, or risk cutting off all our communications. That’s absurd.

The impact

The small amount of notice has also been catastrophic for the programs that we run. Dozens of our staff and volunteers are now scrambling to update systems, rebuild integrations and migrate years of institutional knowledge. The opportunity cost of this forced migration is simply staggering.

image image image image

Anyway, we’re moving to Mattermost. This experience has taught us that owning your data is incredibly important, and if you’re a small business especially, then I’d advise you move away too.


This post was rushed out because, well, this has been a shock! If you’d like any additional details then feel free to send me an email.

Read the whole story
chrisamico
14 days ago
reply
Boston, MA
Share this story
Delete

Is your mayor using ChatGPT? Here’s how to FOIA around and find out - Poynter

1 Share

Last year, Nate Sanford filed a “silly story” for Spokane’s alt-weekly Inlander about a state senator getting into a Twitter argument with an AI porn spambot. The bot was eventually suspended after Spokane’s mayor reported the account.

But a city employee mentioned to Sanford, now a reporter at KNKX and Cascade PBS, that they’d been testing AI tools at work. That offhand comment sparked Sanford’s curiosity about how local governments were actually using generative artificial intelligence and led to a series of investigations that revealed how chatbots are quietly embedding into the machinery of local government.

Sanford used extensive public records of ChatGPT and Microsoft Copilot logs from city employees to show, among other things, the city of Bellingham’s draft AI policy was written with the help of ChatGPT.

I get excited when I see an intriguing use of FOIA to demystify local government. And leading Poynter’s AI work, you can imagine how I geeked out when I saw Sanford’s investigations.

Here is how they start:

When the Lummi Nation applied for funding to hire a crime victims coordinator last year, Bellingham Mayor Kim Lund sent a letter encouraging the Washington Department of Commerce to award the nation a state grant.

“The Lummi Nation has a strong history of community leadership and a deep commitment to the well-being of its members,” the letter read. “The addition of a Coordinator will enhance the Lummi Nation’s capacity to address violence and support victims in a meaningful and culturally appropriate manner.”

But the mayor didn’t write those words herself. ChatGPT did.

Records show Lund’s assistant fed the Commerce Department’s request for proposals into the artificial intelligence chatbot and asked it to write the letter for her. “Please include some facts about violence in native communities in the United States or Washington state in particular,” she added in her prompt.

The stories highlight the need for AI literacy as the technology embeds deeper into our lives, even in ways we may never see. So, I reached out to Sanford in an email conversation to find out why and how he used the Freedom of Information Act to obtain  ChatGPT logs, and what they say about where we’re heading.

This conversation has been edited for length and clarity.

Alex Mahadevan: So, why did you think to FOIA for chatbot logs? Did you get a tip? Just curious?

Nate Sanford: After the porn spambot story, I did some research online to see if it was possible to use records requests to get more info on local government AI use. I found a post from someone on MuckRock who had tried requesting AI records from their local police department. I was inspired to try something similar to see what would turn up with Washington city government leaders.

I ended up filing records requests seeking chatbot records from almost a dozen cities in Washington. I was mainly just testing the system to see if it was even possible.

Mahadevan: What was the custodian’s reaction? Did it take a lot of back and forth to get what you wanted?

Sanford: It varied by city. Many have required a fair amount of back and forth. It was clear that most jurisdictions had never dealt with this type of request before.

I got a call from one records officer who wanted to know more about what I was looking for and how they could help. They said it was the first time they’d dealt with a request of that type, and they weren’t really sure how to process it. I’ve had similar questions from several records officers.

Mahadevan: Were you surprised they complied? Surprised they even kept the records? 

Sanford: I really wasn’t sure what to expect.

The story I published ended up focusing on two Washington cities: Bellingham and Everett. We ended up focusing on those cities because they were the fastest and most responsive to my records request. They aren’t necessarily outliers in their use of AI.

Bellingham and Everett both deserve a lot of credit for acting in good faith and doing their best to provide a comprehensive response to my (very time-consuming) records request. Some cities haven’t been as cooperative or transparent. I’m aware that this type of request is expansive and a big lift for records officers and respondents. But I also think it’s important for transparency. Citizens have a right to know how their representatives are using these tools.

Mahadevan: Were you surprised by the widespread use of ChatGPT you found?

Sanford: I knew the technology was widespread in the private sector. I expected that it would also be present in government, but I really didn’t expect it to be this widespread. I hadn’t heard any public communication from governments about how or if they’d be using it.

Mahadevan: What are some tips you’d give to another local reporter looking to do the same thing?

Sanford: Request records from CoPilot as well as ChatGPT: When I first tried filing these records requests, I tried asking for chat logs from every AI chatbot city staff have used. Records officers told me that was too vague and expansive. For simplicity, I ended up limiting the requests to ChatGPT, the world’s most popular chatbot.

Requesting ChatGPT logs was fruitful, but going forward, I think requesting Copilot chatlogs will be even more valuable. Microsoft made its chatbot available to government clients earlier this year, and many jurisdictions are now instructing staff to only use Copilot. I’d recommend that reporters look for records from Copilot as well as ChatGPT. (Depending on how much time you have, it could also be worth filing additional requests for records from Claude, Grok, etc.)

Provide detailed instructions: When I first started filing requests, some city employees responded by simply taking screenshots of every ChatGPT conversation they’d had — sometimes on their mobile phone. This was incredibly chaotic and difficult to sort through. It also meant that I couldn’t see the date the messages were sent or the order they were supposed to be in.

To make things easier, I started asking records officers to send city staff instructions for how to export their ChatGPT histories into a zipped folder. The .ZIP file format is ideal because it gives you:

  • An easily readable HTML file of the chats in chronological order.
  • A JSON file of the email the user signed up for ChatGPT with.
  • A JSON file of the chat history that includes timestamps.
  • Copies of any files the user uploaded into ChatGPT, and copies of any images ChatGPT generated in response to their requests.

The datestamps are in Unix time, so you’ll need to use a free online converter to decipher them.

Call records officers: If the request is taking a while, I would absolutely recommend calling records officers to explain what you’re looking for and ask how you can help make their job easier. The cities that have been most responsive to my request so far — Bellingham and Everett — responded by sending an email to literally every single city employee asking them to turn over their ChatGPT history. It took about five months for them to close out the request.

Figure out a good file management system: The volume of records returned in response to my requests was massive. I’d recommend that reporters figure out a file management system that works for them early on so they don’t lose track of documents. I organized things by taking a screenshot of every interesting message I came across and saving those screenshots to a group of desktop folders organized by city/topic. Most of the chat logs came back as HTML files that let you search to find keywords.

Mahadevan: What did your requests look like?

Sanford: Here’s a template. I’d recommend narrowing the scope a bit if you’re looking for something specific and hoping to get a faster response.

Pursuant to the Washington Public Records Act, I am requesting the following records:

Chat histories of all ChatGPT sessions conducted by city employees on city-owned devices or used in job-related functions in the following departments: City Council, Mayor’s Office, Police, City Attorney, Public Works, Information Technology, TKTKTK and TKTKT.

The timeframe for this request is 1/6/2023 to the date this request is processed. The requested documents will be made available to the general public and this request is not being made for commercial purposes. Please make records available in installments as they are ready to release.

If it’s helpful, please share with respondents the following instructions for exporting ChatGPT histories:

  • **Click on your name or profile icon** (bottom-left corner of the ChatGPT interface).
  • Select **”Settings”**.
  • Go to the **”Data Controls”** tab.
  • Click **”Export data”**.
  • A pop-up will appear — click **”Confirm export”**.
  • OpenAI will email you a download link with a `.zip` file containing your chat history in JSON format (and HTML for easy viewing).

Mahadevan: Any interesting chat logs that didn’t make it into the story?

Sanford: There were so many!

I think the original draft I turned in was almost 10,000 words. I’m thankful to my editors for helping me trim it.

There was lots of small, silly stuff. There were also lots of really interesting examples that shed light on how city leaders are thinking about various policy questions. It was illuminating to see which topics popped up most frequently. (Washington has a huge housing crisis, and there were numerous examples of officials asking ChatGPT for advice on how to increase housing affordability.)

A lot of the chats had sensitive personal information that was really interesting, but not necessarily newsworthy enough for us to publish.

There are a few chat logs that we’re holding on to because they raise legal questions and require more reporting before we can publish.

Mahadevan: What kind of reception have you gotten from the community?

Sanford: The reception has been really positive! It’s clear that most people had no idea that their local government leaders were using AI this way. The story prompted newspaper editorials in both Bellingham and Everett calling for city leaders to approach AI with more caution.

Generative AI is such a new technology that there’s no real consensus on what the norms should be. Does it matter that the mayor’s assistant used ChatGPT to write a letter to a congressman? Or that communications staff used it to respond to emails from constituents? We’ve heard from a lot of readers who are upset about that, but we’ve also heard from people who say they don’t care. I think both perspectives are valid. It’s really interesting seeing people grappling with where the line should be.

It’s clear that local governments have been experimenting with this technology for a while, but there hasn’t been much public discussion about it. I’m glad to see that the story has sparked a really robust debate.

I’ve also heard from lots of reporters in newsrooms across the country who are planning to copy the records request in their respective jurisdictions.

Mahadevan: Got any other follow-ups planned?

Sanford: I have several follow-ups planned. I’m continuing to regularly receive new installments from other Washington jurisdictions. There are a few specific chat records we’ve obtained that require more reporting before we can publish.

Mahadevan: Do you personally use generative AI for anything?

Sanford: It isn’t technically generative AI, but I use Otter.ai every day for transcribing interviews. It’s incredibly helpful.

I’ve experimented with ChatGPT for generating headline ideas, but I haven’t been super impressed with any of its suggestions. I’ve found it helpful for a few computer/coding related questions, but I don’t feel comfortable using it for writing.

I think there probably are ways that generative AI can be helpful for newsrooms, but I’m still pretty wary of it. I’m worried about accuracy, public trust and plagiarism.

Read the whole story
chrisamico
15 days ago
reply
Boston, MA
Share this story
Delete

The climate of fear is self-imposed

1 Share
Read the whole story
chrisamico
15 days ago
reply
Boston, MA
Share this story
Delete
Next Page of Stories