For instance, if an AI model could complete a one-hour task with 50% success, it only had a 25% chance of successfully completing a two-hour task. This indicates that for 99% reliability, task duration must be reduced by a factor of 70.
This is interesting. I have noticed this myself. Generally, when an LLM boosts productivity, it shoots back a solution very quickly, and after a quick sanity check, I can accept it and move on. When it has trouble, that’s something of a red flag. You might get there eventually by probing it more and more, but there is good reason for pessimism if it’s taking too long.
In the worst case scenario where you ask it a coding problem for which there is no solution—it’s just not possible to do what you’re asking—it may nevertheless engage you indefinitely until you eventually realize it’s running you around in circles. I’ve wasted a whole afternoon with that nonsense.
Anyway, I worry that companies are no longer hiring junior devs. Today’s juniors are tomorrow’s elites and there is going to be a talent gap in a decade that LLMs—in their current state at least—seem unlikely to fill.
In the worst case scenario where you ask it a coding problem for which there is no solution—it’s just not possible to do what you’re asking—it may nevertheless engage you indefinitely until you eventually realize it’s running you around in circles.
Exactly this, and it’s frustrating as a Jr dev to be fed this bs when you’re learning. I’ve had multiple scenarios where it blatantly told me wrong things. Like using string interpolation in a terraform file to try and set a dynamic source - what it was giving me looked totally viable. It wasn’t until I dug around some more that I found out that terraform init can’t use variables in the source field.
On the positive side it helps give me some direction when I don’t know where to start. I use it with a highly pessimistic and cautious approach. I understand that today is the worst it’s going to be, and that I will be required to use it as a tool in my job going forward, so I’m making an effort to get to grips when working with it.
The claims that AI will be surpassing humans in programming are pretty ridiculous. But let’s be honest - most programming is rather mundane.
Well, this kind of AI won’t ever be useful as a programmer. It doesn’t think. It doesn’t reason. It cannot make decisions besides using a ton of computational power and enormous deep neural networks to shit out a series of words that seem like they should follow your prompt. An LLM is just a really, really good next-word guesser.
So when you ask it to solve the Tower of Hanoi problem, great it can do that. Because it saw someone else’s answer. But if you ask it to solve it for a tower than is 20 disks high it will fail because no one ever talks about going that far and it flounders. It’s not actually reasoning to solve the problem - it’s regurgitating answers it has ingested from stolen internet conversations. It’s not even attempting to solve the general case because it’s not trying to solve the problem, it’s responding to your prompt.
That said - an LLM is also great as an interface to allow natural language and code as prompts for other tools. This is where the actually productive advancements will be made. Those tools are garbage today but they’ll certainly improve.
Well, this kind of AI won’t ever be useful as a programmer
It already is.
You mean useful to a programmer, or as useful as a programmer?
Ah - yeah I read that wrong. It’s useful to a programmer.
Never have I had to implement any kind of ridiculous algorithm to pass tests with huge amounts of data in the least amount of memory, as the competitive websites show.
It has been mostly about:
- Finding the correct library for a job and understanding it well, to prevent footguns and blocking future features
- Design patterns for better build times
- Making sane UI options and deciding resource alloc/dealloc points that would match user interaction expectations
cmake
But then again, I haven’t worked in FinTech or Big Data companies, neither have I made an SQL server.
Because actually writing code is the least important part of programming.
I mean, not the least important, it is an important part. But way less than a common person thinks.
My productivity has at least tripled since I started using Cursor. People are actually underestimating the effects that AI will have in the industry
It means the AI is very helpful to you. This also means you are as good as 1/3 of an AI in coding skills…
Which is not a great news for you mate.
Ah knock it off. Jesus you sound like people in the '90s mocking “intellisense” in the IDE as somehow making programmers “less real programmers”.
It’s all needless gatekeeping and purity test BS. Use tools that are useful. Don’t worry if it makes you less of a man.
It’s not gate keeping it is true. I know devs that say ai tools are useful but all the ones that say it makes them multiples more productive are actually doing negative work because I have to deal with their terrible code they don’t even understand.
I literally don’t write code anymore, I write detailed specs, invest a lot of time into my guardrails and integrations, and review changes from my agents. My code quality has not fallen, in fact we’ve been able to be much more strict about our style guidelines.
My job has changed completely, but the results are the same - simply much, much faster. And to be clear, this is in code bases that are hundreds of thousands of lines deep, across multiple massive monorepos, and using context from several different documentation sites - both internal and external.
If anything, people are understating the effects this will have over the next year, let alone further. The entry-level IC dev is dead. If you aren’t producing at least twice as fast as you used to, you’re going to be left behind. I cannot possibly suggest strongly enough that you start learning how to use it.
Sure, Jan
In the ‘Medium’ difficulty category, OpenAI’s o4-mini-high model scored the highest at 53.5%.
This fits my observation of such models. o4-mini-high is able to help me with 80-90% of the problems at work. For the remaining problems, it would come up with a nonsensical solution and no matter how much I prompt it, it would tunnel-vision on that specific approach. It could never second guess itself and realise that its initial solution is completely off the mark, and try an entirely differently approach. That’s where I usually step in and do the work myself.
It still saves me time with the trivial stuff though.
I can’t say the same for the rest of the LLMs. They are simply no good at coding and just waste my time.
I didn’t see Claude 4 Sonnet in the tests and this is the one I use. And it looks like about the same category as o4 mini from my experience.
It is a nice tool to have in my belt. But these LLM based agents are still very far from being able to do advanced and hard tasks. But to me it is probably more important to communicate and learn about the limitations about these tools to not lose tile instead of gaining it.
In fact, I am not even sure they are good enough to be used to really generate production-ready code. But they are nice for pre-reviewing, building simple scripts that don’t need to be highly reliable, analyse a project, ask specific questions etc… The game changer for me was to use Clojure-MCP. Having a REPL at disposal really enhance the quality of most answers.
Please babe! Just one more parameter, then it will be AGI!
About all they are good for is generating boilerplate code. Just far less efficiently than a snippet library.
Just far less efficiently than a snippet library.
Your snippet library can convert a large JSON file to a Java class using Java property naming conventions and including annotations for Jackson where the names differ from the JSON?
Thanks for posting. People really out here believing that the most basic scripts are Johnny 5.
I mean - the AI models do a lot more than just that one task you know. Or are you a “real programmer” who doesn’t need AI tools because “real men” don’t need help?
The keyword is efficiently.
I’m not anti ai. I’m anti burning down the down planet to solve problems that were already solved years ago with a few scripts.
Also: correctly, without inventing or garbling data.
Using a llm for this task will require intensive proof-reading, as it might have changed the data somewhere.
ai is basically just the worst answer on stackexchange
It’s literally the most common answer on stackexchange.
It’s a rubber ducky that talks back. If you don’t take it seriously, it can reach the level of usefulness just above a wheezing piece of yellow rubber.
They aren’t as cute as actual rubber ducks, though.
Actual rubber ducks don’t randomly spew bullshit either
The bullshit is good it triggers the Cunningham’s Law in my brain.
Sometimes it’s easier to come up with a solution correcting something blatantly wrong than doing it from scratch.