There are two books that I read that try to define legacy code: Working Effectively with Legacy Code [affiliate link] by Michael Feathers and Software Design X-Rays [affiliate link] by Adam Tornhill. Briefly, Feathers says it’s code without tests and Tornhill adds that it’s code we didn’t write. In both cases, there’s an assumption that the code is also low quality.
To deal with the first problem, you should of course, add tests, and Feathers gives plenty of tips for how to do that. I would add that you should use a tool like diff-cover to make that happen more often. Diff-cover takes your unit test code coverage reports and filters them down so that you can see the code changed in the current branch that isn’t covered. Often, that code is inside of other functions, so covering your code will reduce the legacy code you touched.
This makes sense because once you have changed code, you really can’t say that you didn’t write it. Tornhill’s book describes software he sells that identifies legacy code using the repository history. It highlights code that no current engineer has touched. Your edit would remove that code from the list.
But, the author of the PR isn’t the only person who should be considered to know this code—the code reviewer should be counted too. If they have never seen this part of the codebase before, they should learn it enough to review the code. This is why I don’t like the idea of replacing human reviewers with AI. It has a role in reviews, but you can’t increase the number of developers that know a portion of a codebase by letting AI read it for you.
Every part of the codebase that is edited or reviewed by a new person reduces its “legacyness”. So does every test. If your gut tells you an area of the code is legacy, check the repo history and coverage. If you have a budget for dealing with engineering-led initiatives, assign a dev who hasn’t made edits to that area to write some tests.