Category Archives: AI

Why ChatGPT Works Better for Newbies than StackOverflow

On StackOverflow, it’s common for newbie questions to get downvoted, closed, and draw comments making fun of the questioner. To be fair, many of these questions are not good and show that the asker didn’t follow the guidelines. Even so, StackOverflow is often hostile to new programmers. To be honest, I’m surprised that ChatGPT didn’t somehow learn this bad behavior.

ChatGPT answers are sometimes totally wrong, and they will be even more wrong for the way newbies ask questions. If they weren’t, StackOverflow wouldn’t have had to ban answers generated from chatbots. But, I still think ChatGPT a better experience because it’s fast and synchronous. This allows it to be iterative. Of course, this doesn’t help if the suggested code can’t be used.

If I were StackOverflow, I might consider how LLMs could be used to help newbies ask a better question that gets answered by humans if the LLM can’t answer. Let the user iterate privately, and then have the LLM propose a question based on a system prompt that understands StackOverflow guidelines. Normally, I’d expect the LLM to be able to answer at this point, but I just ran into a problem yesterday where it kept hallucinating an API that was slightly wrong. This kind of thing happens often in ChatGPT for me. In a lot of cases, I could guess the real API or search for it in the docs, but a newer programmer might not be able to do that.

Evaluating the Evaluators

I was a member of Toastmasters during most of 2023 and 2024. Most people know that Toastmasters is a place where you go to get more comfortable at public speaking. A lesser known aspect is their approach to evaluation.

If you give a speech at Toastmasters, it will be evaluated. This is something semi-formal, meaning that there is a format and a rubric. That makes sense and is probably what you would think it is. What was unexpected to me was that an evaluation is treated like another speech and is evaluated as well (but we stop there—it’s not an infinite game). The evaluation of the evaluation is less formal. It’s usually a few lines during the general evaluation, which needs to cover the entire meeting. When I had to do it, I would try to pick out a line from the evaluation that was worth emulating, to underscore it.

I thought of this while I’ve been going down a rabbit hole trying to learn about LLM evaluations, which also has the concept of evaluating the evaluators. I don’t have much more to say, but just want to leave a link to Hamel Husain’s excellent post: Creating a LLM-as-a-Judge That Drives Business Results, which was the best thing I found on how to improve LLM based features in a product.

How to be avoid being replaced by AI

Right now, I am an independent software developer, and so I own all of the code I create. So, of course, I want it to be as efficient as possible to write that code, and I use AI to do that. I get all of the benefit. I am not replaced—I am freed up to do other things.

All of my consulting income comes from work that originated from my network, cultivated from over 30 years of delivering on my commitments. Some of it is because I am one of only of few people that understand a codebase (that I helped create and partially own). There are very few people that could replace me in this work, even with AI. Even if AI were able to 100% reproduce everything I say and write, the client wouldn’t know how to judge it because their judgement of me is almost entirely based on past experience.

It’s not just AI—there are many people who are smarter than me, with more experience, who work harder, and have better judgement. But if they are completely unknown to my clients, they couldn’t replace me.

Of course, I realize that this isn’t something that one could just replicate immediately, but if you are building a software engineering career for the next few decades, I recommend cultivating a network that trusts you based on your past performance and owning as much of your own work as possible. When you do that, all of the benefits of efficiency flow to you.

Can non-programmers make applications with AI?

Of course! Non-programmers have been making applications for decades, well before there was anything like AI helping them. In my experience, the people who really want to make applications learn what they need to learn. AI closes that gap. No-code tools do that too. So does having things like npm, htmx, React, Pandas, SQLite, AWS, etc. But, the motivated can close bigger gaps if they have to.

My first job was in 1992, working for a company that made FX Options Pricing software. That company was founded by an FX Options trader who was a “non-programmer” until he wasn’t. He taught himself enough C to make simple programs to help him price options at work, and just kept making them better until he was able to sell copies to FX Options brokers and traders. And then he used the money to hire programmers, but he still worked on it for at least the first five years of the company.

A couple of jobs later, I got a job at a startup where the first versions were written by the founder, a mechanical engineer, who probably took programming courses, but was not a “professional programmer”. His first two hires were also self-taught. They went from hacking around with scanners and simple websites to building the web-based scanning and Ajax document viewers that were the basis of our acquisition.

At both places, programmers (like me) were brought in. We added some “professionalism” to the codebase and processes. We helped the teams scale, the products be more reliable, and the delivery be more predictable. But bringing us in was just another tool they used to close the gap.

Use AI to Code Review Yourself, Not Others

In What to do about excessive (good) PR comments I wrote that a PR with a lot of good, actionable comments might be a sign that the author didn’t do enough beforehand to post an acceptable PR. It depends on many factors. For example, a new hire or junior developer might still need training, and so you would expect PRs to need some work,

But, there are some things that should never make it to review. We have linters and automatic code formatters, so we should never be correcting indentation in a review. If you use Rust, you are never finding a dangling pointer bug. There are automated tools for finding accessibility issues and security issues—which aren’t perfect, but they are probably better (and definitely faster) than most humans. All of these tools are somewhat mechanical, though, built on parser technology. We can do “better” with AI.

AI writes code good enough to be a starting point, and it can also review code. Since I think that the point of code reviews is information sharing, we wouldn’t want to replace them with AI. But AI is the perfect thing to use on yourself to make sure you don’t open PRs with obvious problems that slow down the review.

How to Reduce Legacy code

There are two books that I read that try to define legacy code: Working Effectively with Legacy Code [affiliate link] by Michael Feathers and Software Design X-Rays [affiliate link] by Adam Tornhill. Briefly, Feathers says it’s code without tests and Tornhill adds that it’s code we didn’t write. In both cases, there’s an assumption that the code is also low quality.

To deal with the first problem, you should of course, add tests, and Feathers gives plenty of tips for how to do that. I would add that you should use a tool like diff-cover to make that happen more often. Diff-cover takes your unit test code coverage reports and filters them down so that you can see the code changed in the current branch that isn’t covered. Often, that code is inside of other functions, so covering your code will reduce the legacy code you touched.

This makes sense because once you have changed code, you really can’t say that you didn’t write it. Tornhill’s book describes software he sells that identifies legacy code using the repository history. It highlights code that no current engineer has touched. Your edit would remove that code from the list.

But, the author of the PR isn’t the only person who should be considered to know this code—the code reviewer should be counted too. If they have never seen this part of the codebase before, they should learn it enough to review the code. This is why I don’t like the idea of replacing human reviewers with AI. It has a role in reviews, but you can’t increase the number of developers that know a portion of a codebase by letting AI read it for you.

Every part of the codebase that is edited or reviewed by a new person reduces its “legacyness”. So does every test. If your gut tells you an area of the code is legacy, check the repo history and coverage. If you have a budget for dealing with engineering-led initiatives, assign a dev who hasn’t made edits to that area to write some tests.

LLMs Talk Too Much

The biggest tell that you are chatting with an AI and not a human is that they talk too much.

Yesterday, to start off a session with ChatGPT, I asked a yes/no question and it responded with a “Yes” and another 325 more words. No person would do this.

Another version of this same behavior is when you ask an LLM a vague question—it answers it! No questions. Just an answer. Again, people don’t usually behave this way.

It’s odd because in the online forums that were used to train these models, vague questions are often called out. It’s nice that the LLM isn’t a jerk, but asking a clarifying question is basic “intelligent” behavior and they haven’t mastered it yet.

LLMs Can’t Turn You Into a Writer

In Art & Fear, the authors write:

To all viewers but yourself, what matters is the product, the finished artwork. To you and you alone, what matters is the process, the experience of shaping that artwork. The viewers’ concerns are not your concerns. Their job is whatever it is, to be moved by art, to be entertained by it, whatever.

Your job is to learn to work on your work.

If you use LLMs to write for you, you will end up with writing. You’ll pass your class, get by at work, or get some likes on a post. But you will not become a better writer.

It might even be hard to become a better judge of writing. If you do, it won’t be by reading what LLMs bleat out. It won’t be by reading their summaries of great work. “Your job is to learn to work on your work” and to do that you need to do your own writing and limit your reading to those that do the same.

Humans Have Been Upgraded by LLMs

If, twenty years ago, you took the most sophisticated AI thinkers in the world and set up a Turing test with a modern AI chatbot, they would all think they were chatting with a human. Today, that same chatbot would have a hard time fooling most people that have played around with them.

In a sense, humans have been upgraded by the existence of LLMs. If we had no idea that they existed, we would be fooled by them. But, now that we’ve seen them, we’ve collectively learned how they come up short.

Gaming ChatBot Recommendations

I just had conversation with ChatGPT that ended with it making product recommendations. I was asking about documentation, so it suggested Confluence and Notion.

Those are good options and do match what I was asking about. But, if ChatGPT is going to make product recommendations, then companies will try to game this like they do for search engines.

Before Google, search engines ranked pages on the count of keywords in the document. That was easy to game by just stuffing the keywords. Google solved this by ranking the search results by backlinks. They reasoned that pages with the most incoming links must be better because a backlink was like a vote of confidence. Unfortunately, that was also easy to game. The last 20+ years has been a struggle between Google trying to rank results correctly and SEO experts trying to rank higher than they deserve. There is an enormous amount of money fueling both sides.

We are just starting with chatbots. I don’t know why it specifically chose Confluence and Notion, but since it’s a probability engine, more text in the training set would be a factor. It understands sentiment, so it can discern good reviews from bad reviews and knows that recommendations should offer good options.

But does it understand authority, bias, expertise, etc. If I stuff the internet with 1000’s of good reviews of my product (even if they would never be read by a human), would ChatGPT then start recommending me? The ChatBot equivalents of SEO experts will be trying to figure this out.

Since ChatGPT is currently stuck in January 2022 (about a year ago), you might not be able influence its suggestions for a year or so. The learning cycle is currently very long. But, that also means that it probably can’t recover from being gamed quickly either. Hopefully, models can be developed that resist gaming.