LLMs Are Only Great at Tasks Where it's OK to Make Mistakes

June 16, 2025 · 3 min read

Your humble narrator

I was just asked a question which revealed a hidden assumption I think a lot of people have internalized without realizing it: That computers don't make mistakes. I think this underpins a lot of faulty assumptions about the AI revolution.

If computers can't make mistakes, why does AI make mistakes?

People make mistakes. We're squishy and organic and have brain farts. In the industrial revolution that was frustrating. Companies wanted a person to repeat the same process over and over and not make mistakes. We did our best. Then one day... computers!

Electronics don't make mistakes, they're completely literal. Given the same input I should always get the same output (deterministic). This is ideal for repeating "tedious and sometimes impossible tasks". But, one problem: explaining to the computer exactly what we want it to do is hard, because humans are bad at thinking through every possible input. Software engineers aren't people who just can "talk" computer, they also have experience designing systems that account for edge cases. This is frustrating, because human brains have lots of context.

I can tell any idiot in my town "go get me some ice cream", and that's enough instruction. They might forget half my order, or take the slow route there, but most of the time I'll get some ice cream at some point. But, to make an automatic ice cream delivery service using a computer, you have to start with a list of every ice cream store in town, their hours, their GPS coordinates, etc. That's a lot more time consuming, but if done correctly, would succeed exactly, 100% of the time.

With AI, what we've done is found a way to talk to a computer like we talk to a person. This is amazing because computers finally understand the context of a command. They kinda get what you mean without having to start with a definition of ice cream.

The problem is that they don't act like computers anymore, they act like people: Squishy; makes mistakes; same input ≠ same output every time (not deterministic). Sometimes that's fine, sometimes it's emphatically not, and it depends on the use case.

Are they useful for "repeating tedious and sometimes impossible tasks"? Sure, as long as it's okay for them to fail sometimes.

I assume this is why companies like OpenAI are frantically building hybrid systems. Let the AI drive, but have deterministic rules govern the actual outcome.