Use_ai_for_your_tooling_rather_than_your_code

AI is undeniably all the rage right now. Much has been written and said about using it to improve both your coding process and your codebase. Many are measuring how it helps developers ramp up quickly in unfamiliar codebases. I decided to take AI for a spin in a different direction.

Tech Debt can be paid in various ways#

As any code that has been in production longer than a week, our codebase carries a certain level of technical debt. Some of our debt comes from changing paradigms and architectural frameworks, other parts come from different levels between large human groups and the communication challenges that come with it.

A common approach seen throughout the industry is to apply an LLM agent, well-equipped with contextual knowledge, directly to the codebase and rewrite it using the combined power of well-defined guidelines and a rule-abiding machine. Instead, I decided to explore another approach.

Don’t refactor, build refactorers#

I began exploring using LLMs and known foundation tools to craft specialized debt paying single purpose tools. Debt returns, sometimes it can be fended using linting tools and Code Reviews but that all eventually fails and some slips back in.

If one manages to identify a single behavior to be corrected, a tool can be crafted to fix it, again and again, making the Return of Investment of using AI even better.

A good sample case#

For our sample case I went for a very simple case of debt, feature flags are sometimes used as literal strings instead of centralized constants, as a synthetic example:

do_foo = is_feature_flag_on("ff_foo", bar=baz)

would be used to know of ff_foo feature flag is set (the convention is not that either, I just made it up for clarity). The quick check is not a big deal if used only in one place but it quickly becomes prone to typos and redundant feature flags if spread in a large code base.

A simple case of this could be fixed with ast-grep and sone creativity but as the cases grow and the variants diverge from each other one becomes a bit more wary of magic scripting and wants a deterministic fix.

A good initial definition would be:

For calls of function is_feature_flag_on, find all the calls that have the first parameter as a string literal and replace
said literal with a variable:
* All in capitals
* With a valid python name
* replace: initial _ with FF_, . with _dot_, "," with _comma_ ... [snipped for brevity, i had quite a few rules)
* Add the variable definition to a constants.py file
* Add an import for the variables in the replaced file.

This made a base tool, quite well structured, in python, using regular expressions…. it was a desaster.

Augmenting AI with Natural Intelligence#

We now know our basic definition is right and this is where the expertise of the programmer needs to apply. How can we improve this? by making choices based on something other than statistics.

I followed some simple rules to instruct the LLM better If:.

You are interpreting code: Use AST to interpret the code
You are re-writing code, where AST would loose a lot of info: Use the AST information to replace and insert using positions on the file, correct after each replacement.
You are modifying imports: Ask it to take in account where the imports end but only at the top of the file (python supports importing anywhere)
You have code style: Ask it to run the style tool (we use ruff) after generating code.
Your code base is big: Ask it to make it transactional so changes can be undone in case of failure.
You want to reuse this: Make as much as possible configurabl

Expanding on that last point, after instructing all the above, I asked to be able to specify which parameter be it positional or named I wanted to be checked, what was the function name (more functions used feature flag names), where to put the resulting centralized constants.

The whole experiments took about half a day of experimentation, I used our actual code base and git diff for sanity checks.

By the end, I had a tool that could be used to pay this tech debt again and again in idempotent form.

Give back to the people#

Running LLMs is, like any other piece of software, energy intensive to some degree (processors spit heat, no matter what you do in them) so you can save some time from people in the futre and some heat too.

I had a bit more of conversation with the LLM and instructed it to help me port the script to a more generic and reusable form, using astral’s amazing rust libraries for python. The end result is constantify which is on my personal GitHub because it was done as a more general form of the idea on my free time and from scratch. This version covers, with some degree of configuration, the same case as the original tool (not published here because it contains internal information about our codebase) and more.

This way we can pay that and other tech debts with code generated once.

Disclaimer#

The used tool for generating the code was SourceGraph’s Cody. The originally generated code in rust was not all that functional, I had to modify it to have it compile, the LLM offered some help but I dismissed most of it in favor of using my common sense as the given options were not ideal and let to functional yet repetitive and “ugly” code. The choice of rust was only to leverage astral existing crates which are very efficient and produce consistent results with ruff which is a tool I already use, there was no other technical reason considered.