Category Archives: AI

Observations on the MIT Study on GitHub Copilot

I just saw this study on GitHub Copilot from February. Here is the abstract:

Generative AI tools hold promise to increase human productivity. This paper presents results from a controlled experiment with GitHub Copilot, an AI pair programmer. Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group. Observed heterogenous effects show promise for AI pair programmers to help people transition into software development careers.

The researchers report benefits to less experienced developers, which is at odds with this other study I wrote about and my own intuition. However, all of the developers were experienced Javascript developers, and not literally learning programming, which is where I think the more detrimental effect would be.

Approaching Infinity

Moore’s Law predicts that the number of transistors on a chip doubles every eighteen months. But, it has always been understood to be a statement about system capability as well. Speed, memory—we’re even getting advancements in power consumption now with Apple Silicon.

The doubling results in an exponential curve, but at the start, doubling a tiny number doesn’t get you much. My first computer had 4k of memory, but it was already an old model when I got it. By the next year, I had a Commodore 64 with 64k, then a Commodore 128(k) a few years later. My C64 was 1MHz in 1984. In 1992, my first work computer was a 16MHz 386 with 1MB of memory. Nice growth, but from a very low base, so still very underpowered in absolute numbers.

But, just like in personal finance, compounding eventually has enormous impact. It’s not just speed and power. We’re feeling it across all industries. Ubiquitous Software Copilots, Vision Pro, new vaccines, technology-enabled sports analytics, pervasive remote-work—all enabled by the last few doublings.

A doubling means that you have the equivalent impact of the entire industry back to the UNIVAC compressed into eighteen months. And the next 18 months doubles that.

I know this is nothing new. Ray Kurzweil’s Singularity described this in 2005. I’m more pointing out that here we are, and it seems like an inflection is happening where we’re doubling big numbers.

In my 30+ year career as a developer, I experienced a steady stream of big industry shifts. In the 90’s, it was web, then, in the 2000’s, it was web 2.0 and the advent of smart phones. The 2010’s were driven by XaaS (platform, infrastructure, etc) technologies. I could learn these as they happened. There wasn’t instantaneous adoption—you could keep up.

Now these waves are coming very fast, and I wonder if this is what it feels like when you start to approach infinity.

LUIs Give You User Intent

Language User Interfaces (LUIs) are driven by natural language prompts which an LLM can use to drive your command-based application. Even if the LUI makes mistakes, the prompts are a treasure trove of user intent.

Right now, we broadly have two ways to get user data: Analytics and User Research. Analytics are easy to scale and are useful, but they cannot give you user intent. They can tell you what the user did, but not why. User research is targeted right at uncovering causal and intent data, but it’s hard to scale.

A LUI gives you the best of both worlds because it asks the user to express what they want in their own words and can easily be deployed to all users.

As an example, consider a dashboard configuration GUI for a B2B SaaS app. Almost every enterprise application has something like this—in this case, let’s consider Salesforce.

Using a GUI, a user might tap on “New Dashboard” and then “Add bar chart” and then use some filters to set it up. And then, they “Add pie chart” and set that up. They put in another chart, then quickly delete it. They add, delete, reorder, and configure for an hour until they seem to be satisfied. In an analytics dataset, you’d have rows for all of these actions. You would have no idea what the user was trying to do.

In a LUI, the user might start with “I have a 1:1 with my manager on Thursday. What are some of the things I excel at that would be good to highlight”. “Ok, make a dashboard showing my demo-to-close ratio and my pipeline velocity”. “Add in standard personal sales data that a sales manager would expect”.

This is something you could find out in user research, but it’s quite expensive to get that data. Some kind of LUI, even if it wasn’t great, would start to help you collect that data at scale.

You might found out a new Job to be Done (1:1 meetings with sales managers) that you could directly support.

ChatGPT Can Add a LUI to a Terminal Command based UI

Before GUIs took off, there were more command-driven applications. The program would respond with text answers like a specialized chat. As I said yesterday, this is not a Language User Interface (LUI), which would use natural language, not specialized commands.

One benefit that command driven systems have over modern GUI systems is that ChatGPT could probably drive them. Large language models seem to have no problem learning programming languages, even niche ones. You can even teach them new ones in your prompt.

To take advantage of this, we should be adding very well specified mini-languages to our applications to help our users get help from ChatGPT. Here’s a simple example based on a fictional airline flights query language I made up:

In your application, you would offer a command window that already has primed ChatGPT with the specification of the language and many examples. I barely had to do anything to get it learn this simple command. I went on in the chat to ask for more complicated queries, even a series of queries to find out about connecting flights, and it had no problems.

In your application, you only need to parse the terminal like commands, which is a lot easier than implementing a natural language parser, even for a constrained topic like airline booking.

I’m sure ChatGPT could build the command parser for you too if you wanted.

LUI LUI

I go by Lou, but my entire family calls me Louie, so I smiled when I found out that there is such a thing called a Language User Interface that uses natural language to drive an application and that it was called a LUI.

In a LUI, you use natural language. So this is not the same as a keyword search or a terminal style UI that uses simple commands like the SABRE airline booking system.

In this video, it output responses on a printer. But the display terminal version was not that different. I worked on software that interfaced with it in 1992, and this 1960’s version is very recognizable to me.

But, this is not a LUI. A LUI does not make you remember a list of accepted commands and their parameters. You give it requests in just the way you would a person, with regular language.

In SABRE, a command might look like this:

    113JUNORDLGA5P

But, in a SABRE LUI, you’d say “What flights are leaving Chicago O’Hare for Laguardia at 5pm today?” which may be more learnable, but a trained airline representative would be a lot faster with the arcane commands.

With a more advanced version that understood “Rebook Lou Franco from his flight from here to New Orleans to NYC instead” that uses many underlying queries and commands (and understands context), the LUI would also be a lot faster.

This would have seemed far-fetched, but with ChatGPT and other LLM systems, it feels very much within reach today.

Morning Pages Make Me Feel Like ChatGPT

In the first episode of my podcast I said that I do morning pages to train myself to write on demand, and then I followed that up in Episode 3 where I explained that I use the momentum from morning pages to write a first draft of something.

While doing my morning pages last week I thought about how doing them is kind of like how ChatGPT generates text. It’s just statistically picking the next word based on all the words so far in the prompt and what it has already generated.

I am also doing something like that In my morning pages. I am writing writing writing and I use the words so far to guide my next ones.

My mind strays as I write and a phrase might trigger a new thread, which I follow for a bit and then follow another and another. ChatGPT’s results are a lot more coherent than my morning pages. It has an uncanny ability to stay on topic because it is considering all of the text, and I don’t.

First drafts are different. When I switch to writing a first draft, I do consider the entire text. I’m not as fast, because I am constantly looking at where I am so far. I also start with a a prompt in the form of a simple message that I hope to convey, which I use as the working title.

I know I could get a first draft faster from ChatGPT, but it would not be as good (I think), or at least not specific to me. More importantly, I would not have improved as a writer.

[NOTE: While writing a draft of this post, I thought of a way to make my morning pages more directed and made a podcast about it]

Large Language Models are a Sustaining Innovation

In the Innovator’s Dilemma, Clay Christensen made a distinction between “disruptive” and “sustaining” innovations. Disruptive innovations favor new entrants and new categories of products because incumbents are harmed by adopting the innovation, and so they resist. Sustaining innovations favor incumbents because it improves their products and margins, and they have the resources and incentives to adopt them.

With this in mind, I think that Large Language Models will be readily adopted by incumbents who will be successful with them. To be clear, I’m not talking about OpenAI vs. Google, but their customers, mostly B2B SaaS companies who will be using LLMs to enhance their own software, not providing LLMs to others.

There are two advantages that incumbents have that will hard to overcome.

The first is that LLMs readily embed into established software. GitHub Copilot is the model. The “copilot” experience is being extended to Microsoft’s Office suite, and I think it fits well in almost any kind of software.

The second advantage is access to proprietary data. Incumbents already have customers and their data and can generate better content using that data in their training sets. A new entrant would be stuck with just public sources which is “ok” for some categories, but in the long tail of B2B SaaS companies would be anemic at best.

This is playing out for VSCode right now. Microsoft controls proprietary data (the private code in GitHub) and has the best content creating software. Their first iteration of enhancing that with LLMs is just a better VSCode. I use VSCode every day and adopting GitHub Copilot was seamless. It took seconds to install, see the benefit, and give over my credit card.

The case for a disruptive innovation is easier to make with things like Webflow, which obsolete the editor and establish a new proprietary datasource (their customers’ projects). This might happen to VSCode, but not to Microsoft, since it has its own no-code solutions (the Power Platform). So even this might not be disruptive.

Magnets for Innovation Needles in a Haystack

Each department in a business collects and spreads performance information throughout their organization. They know their own numbers well and are incentivized to improve them. For successful businesses, all of that data is telling them to keep doing what they are doing.

But, new kinds of information aren’t as easy to collect and spread, which is a problem because new information drives innovation.

The beginning of a new market regime, where your products don’t match the market, starts out insignificantly small. You might have a customer satisfaction of 97%, but the innovation might be living in that 3%. Or it might be pointing to an uninteresting niche that you should ignore. It’s hard to know if outliers are noise or a new signal.

If this problem is like finding a needle in a haystack, then we need a metal detector or, even better, a magnet.

In Noticing Opportunities Using an AI Agent I wrote that AI’s ability to consume and synthesize a lot of disparate information can be used to bring opportunities to you. I gave two examples of when I did it consciously and unconsciously in job searches.

For this to work, the organization needs to pour a lot of its internal data into a training set. They should start with sales conversation transcripts and customer support tickets. If there are any user communities, all of their posts should be put in as well. If their competition has public forums, bring those in. Each interaction in the dataset may be an anecdote, but if you have enough of them, at some point you need to accept that there are clusters worth exploring.

Humans still need vet what the AI finds, so I’d try to use this to generate experiments that can validate (or invalidate) a hypothesis.

One thing you’ll want to know is how big the potential new market is. Since these might not be your customers yet, the experiment might need to a content strategy rather than a product one.

It’s fitting then, that the concept of a Lead Magnet already exists. The normal use is to get a lead to start a sales process with. For an innovation, use lead magnets to research and validate a new market.

More on Copilot for Learners

Yesterday, I wrote that you shouldn’t use GitHub Copilot if you are a new programmer:

But, if you are still learning to program, then I am worried that Copilot would inhibit that. The main way it does this is by not letting you practice remembering.

I just saw a paper called: GitHub Copilot AI pair programmer: Asset or Liability? that studied this issue and concludes:

Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans’ contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise.

So, check that paper out if you want to see actual data supporting this belief. One of the core findings is that Copilot generates wrong code that is easy for experts to fix, which was my intuition based on using it.

Use GitHub Copilot When You Aren’t Learning

I’m trying to wrap my head around whether I think junior developers should be using GitHub Copilot or something similar. I am thoroughly convinced that experts that are mostly concerned with their output and productivity, and not with learning, should be using it. The more your coding is a performance of your abilities, the more I think you should use it.

But, if you are still learning to program, then I am worried that Copilot would inhibit that. The main way it does this is by not letting you practice remembering.

According to Make it Stick, the key to learning something is to practice remembering it. One way to do this is to try to solve problems without help and learning code by writing code is a great way to do this. But, if Copilot is constantly helping you, then even coding novel programs becomes the kind of puppetting that tutorials do.

Now, it may turn out that composing code completely from scratch is a dying skill and won’t be the way we program in the (near) future. I use Copilot exclusively, and I certainly feel like I’m guiding/collaborating with the AI, and coding feels very different to me now. But, I also think that my skill at seeing code is helping me do that. That skill was developed by writing and reading lots of code.

So, right now, in April 2023, if you are learning to code, I’d limit usage of Copilot to the times when you are not learning. If you are building something that is well within your ability, then I would use it wholeheartedly. There are Copilot skills to develop on top of your coding skill.