Journalist/developer. Storytelling developer @ USA Today Network. Builder of @HomicideWatch. Sinophile for fun. Past: @frontlinepbs @WBUR, @NPR, @NewsHour.
2114 stories
·
45 followers

Hallucinations in code are the least dangerous form of LLM mistakes

1 Share

A surprisingly common complaint I see from developers who have tried using LLMs for code is that they encountered a hallucination - usually the LLM inventing a method or even a full software library that doesn't exist - and it crashed their confidence in LLMs as a tool for writing code. How could anyone productively use these things if they invent methods that don't exist?

Hallucinations in code are the least harmful hallucinations you can encounter from a model.

(When I talk about hallucinations here I mean instances where an LLM invents a completely untrue fact, or in this case outputs code references which don't exist at all. I see these as a separate issue from bugs and other mistakes, which are the topic of the rest of this post.)

The real risk from using LLMs for code is that they'll make mistakes that aren't instantly caught by the language compiler or interpreter. And these happen all the time!

The moment you run LLM generated code, any hallucinated methods will be instantly obvious: you'll get an error. You can fix that yourself or you can feed the error back into the LLM and watch it correct itself.

Compare this to hallucinations in regular prose, where you need a critical eye, strong intuitions and well developed fact checking skills to avoid sharing information that's incorrect and directly harmful to your reputation.

With code you get a powerful form of fact checking for free. Run the code, see if it works.

In some setups - ChatGPT Code Interpreter, Claude Code, any of the growing number of "agentic" code systems that write and then execute code in a loop - the LLM system itself will spot the error and automatically correct itself.

If you're using an LLM to write code without even running it yourself, what are you doing?

Hallucinated methods are such a tiny roadblock that when people complain about them I assume they've spent minimal time learning how to effectively use these systems - they dropped them at the first hurdle.

My cynical side suspects they may have been looking for a reason to dismiss the technology and jumped at the first one they found.

My less cynical side assumes that nobody ever warned them that you have to put a lot of work in to learn how to get good results out of these systems. I've been exploring their applications for writing code for over two years now and I'm still learning new tricks (and new strengths and weaknesses) almost every day.

Manually testing code is essential

Just because code looks good and runs without errors doesn't mean it's actually doing the right thing. No amount of meticulous code review - or even comprehensive automated tests - will demonstrably prove that code actually does the right thing. You have to run it yourself!

Proving to yourself that the code works is your job. This is one of the many reasons I don't think LLMs are going to put software professionals out of work.

LLM code will usually look fantastic: good variable names, convincing comments, clear type annotations and a logical structure. This can lull you into a false sense of security, in the same way that a gramatically correct and confident answer from ChatGPT might tempt you to skip fact checking or applying a skeptical eye.

The way to avoid those problems is the same as how you avoid problems in code by other humans that you are reviewing, or code that you've written yourself: you need to actively exercise that code. You need to have great manual QA skills.

A general rule for programming is that you should never trust any piece of code until you've seen it work with your own eye - or, even better, seen it fail and then fixed it.

Across my entire career, almost every time I've assumed some code works without actively executing it - some branch condition that rarely gets hit, or an error message that I don't expect to occur - I've later come to regret that assumption.

Tips for reducing hallucinations

If you really are seeing a deluge of hallucinated details in the code LLMs are producing for you, there are a bunch of things you can do about it.

  • Try different models. It might be that another model has better training data for your chosen platform. As a Python and JavaScript programmer my favorite models right now are Claude 3.7 Sonnet with thinking turned on, OpenAI's o3-mini-high and GPT-4o with Code Interpreter (for Python).
  • Learn how to use the context. If an LLM doesn't know a particular library you can often fix this by dumping in a few dozen lines of example code. LLMs are incredibly good at imitating things, and at rapidly picking up patterns from very limited examples. Modern model's have increasingly large context windows - I've recently started using Claude's new GitHub integration to dump entire repositories into the context and it's been working extremely well for me.
  • Chose boring technology. I genuinely find myself picking libraries that have been around for a while partly because that way it's much more likely that LLMs will be able to use them.

I'll finish this rant with a related observation: I keep seeing people say "if I have to review every line of code an LLM writes, it would have been faster to write it myself!"

Those people are loudly declaring that they have under-invested in the crucial skills of reading, understanding and reviewing code written by other people. I suggest getting some more practice in. Reviewing code written for you by LLMs is a great way to do that.


Bonus section: I asked Claude 3.7 Sonnet "extended thinking mode" to review an earlier draft of this post - "Review my rant of a blog entry. I want to know if the argument is convincing, small changes I can make to improve it, if there are things I've missed.". It was quite helpful, especially in providing tips to make that first draft a little less confrontational! Since you can share Claude chats now here's that transcript.

Read the whole story
chrisamico
2 days ago
reply
Boston, MA
Share this story
Delete

New England weather was colder years ago

1 Share

Blue Hills

Read the whole story
chrisamico
9 days ago
reply
Boston, MA
Share this story
Delete

18F: We are dedicated to the American public and we're not done yet

3 Shares

March 1, 2025

A letter to the American People:

For over 11 years, 18F has been proudly serving you to make government technology work better. We are non-partisan civil servants. 18F has worked on hundreds of projects, all designed to make government technology not just efficient but effective, and to save money for American taxpayers.

However, all employees at 18F – a group that the Trump Administration GSA Technology Transformation Services Director called "the gold standard" of civic tech – were terminated today at midnight ET.

18F was doing exactly the type of work that DOGE claims to want – yet we were eliminated.

When former Tesla engineer Thomas Shedd took the position of TTS director and met with TTS including 18F on February 3, 2025, he acknowledged that the group is the “gold standard” of civic technologists and that “you guys have been doing this far longer than I’ve been even aware that your group exists.” He repeatedly emphasized the importance of the work, and the value of the talent that the teams bring to government.

Despite that skill and knowledge, at midnight ET on March 1, the entirety of 18F received notice that our positions had been eliminated.

The letter said that 18F "has been identified as part of this phase of GSA’s Reduction in Force (RIF) as non-critical”.

"This decision was made with explicit direction from the top levels of leadership within both the Administration and GSA," Shedd said in an email shortly after we were given notice.

This was a surprise to all 18F staff and our agency partners. Just yesterday we were working on important projects, including improving access to weather data with NOAA, making it easier and faster to get a passport with the Department of State, supporting free tax filing with the IRS, and other critical projects with organizations at the federal and state levels.

All of that work has now abruptly come to a halt. Since the entire staff was also placed on administrative leave, we have been locked out of our computers, and have no chance to assist in an orderly transition in our work. We don’t even have access to our personal employment data. We’re supposed to return our equipment, but can’t use our email to find out how or where.

Dismantling 18F follows the gutting of the original US Digital Service. These cuts are just the most recent in a series of a sledgehammer approach to the critical US teams supporting IT infrastructure.

Before today’s RIF, DOGE members and GSA political appointees demanded and took access to IT systems that hold sensitive information. They ignored security precautions. Some who pushed back on this questionable behavior resigned rather than grant access. Others were met with reprisals like being booted from work communication channels.

We’re still absorbing what has happened. We’re wrestling with what it will mean for ourselves and our families, as well as the impact on our partners and the American people.

But we came to the government to fix things. And we’re not done with this work yet.

More to come.

Read the whole story
chrisamico
10 days ago
reply
Boston, MA
acdha
10 days ago
reply
Washington, DC
Share this story
Delete

In Trump’s Washington, a Moscow-Like Chill Takes Hold

1 Share
Read the whole story
chrisamico
13 days ago
reply
Boston, MA
Share this story
Delete

More than 80% of new California properties are in high fire-risk areas

1 Share

The Los Angeles wildfires last month destroyed thousands of homes, killed dozens of people and left a city reeling. They also raised serious questions about the region’s future – and where Americans choose to build.

A rapidly increasing share of US homes are built in areas that are at risk of fire. In 1990, about 13% of new homes were built in places at high risk of fire. By 2020, that number had more than doubled to 31%. The numbers come from ClimateCheck, a for-profit research company that compiles risk by studying trends including rainfall, wind and temperature. But the climate crisis is just one of the reasons that more homes are unsafe.

The Covid pandemic changed the way people work and live. Some workers who were able to continue their jobs remotely wanted out of dense cities. New York alone has lost half a million residents since 2020. Many of these domestic migrants relocated to smaller cities and rural areas, often choosing homes that were at greater risk of wildfire.

Maximilian Stiefel, a risk expert at ClimateCheck, explains: “In California, especially, there’s a difficulty in building dense housing, so a lot of people are forced into single unit housing, and that just accelerates this spread into the wildland urban interface, because you can’t build up another unit on your property if you have a single family house. And then there’s a lot of nimbyism in California, where people don’t want denser developments being built.” Over 80% of properties built in California between 2020 and 2022 were in high fire risk areas, compared with just 28% of the properties built between 1920 and 1929.

Insurance companies are well aware of these trends. Experts study data to build “catastrophe models”, predicting how disasters will affect profits. And lately, the numbers haven’t looked good. Two giants from the industry, AllState and State Farm, announced in 2022 and 2023 that they wouldn’t accept any new home insurance applications in California, in large part because of the high risk of wildfire. Meanwhile, the state, which also has its own last-resort insurance program, has engaged in a complex back-and-forth with private companies. LA residents who lost everything – and who are fortunate enough to have insurance – now must grapple with these companies to avert financial disaster.

Even for the most rigorous of insurance experts, wildfires are hard to model statistically. Unlike earthquakes, hurricanes or flooding, they are much more dependent on humans. Humans can start them (wildfires spike on the Fourth of July each year) and humans can stop them with better policies such as better funding for fire departments. Despite the key role that humans play in individual wildfires, the rising overall trend in their frequency is clearly attributable to the climate crisis – long periods of drought, strong winds and high temperatures mean that fires spread faster, last longer and cause far more destruction. Even before the latest LA fires, expected to be the costliest ever, nine of the 10 most expensive wildfires in US history had taken place since 2017.

Stiefel explains his own personal experience with these numbers: “I lived in Santa Barbara during the Thomas fire, which traveled 30 miles up the coast over three weeks, and we were evacuated.” He says he eventually left California in part because there were “too many hazards there with climate change”.

Of course, people have greater mobility when they don’t have dependents, aren’t disabled and don’t have a longstanding attachment to a city.

The answer, according to experts like Varun Sivaram, a fellow at the Council for Foreign Relations, is “more stringent building codes and regulations”. Stiefel agrees, noting: “It’s not just about zoning and land use. It’s also about resource sharing and emergency planning, collaboration among all these different levels of government, and educating the community about fire risk.”

Read the whole story
chrisamico
16 days ago
reply
Boston, MA
Share this story
Delete

Moving on from 18F. — ethanmarcotte.com

1 Share
Read the whole story
chrisamico
20 days ago
reply
Boston, MA
Share this story
Delete
Next Page of Stories