Using AI to debug
It’s 2024, and we have a lot of AI tools available.
I’ve spent some time trying a variety of tools: Gemma 2 (a smaller model based on Gemini), TabbyML, and Microsoft Copilot.
Can we debug with them? My short answer is: not meaningfully.
Most LLM tools are confined to text chat interfaces. I haven’t found any that plug into the debugger or other analysis tools.
I tried pasting in code snippets and asking, “Where is the bug?”. I tried C++ code similar to what I described in my Chaos of Memory Corruption article. Most failed to find the issue; only Microsoft Copilot was able to spot it.
I tried TabbyML, which has a VS Code Extension, but even attaching files to the chat and asking questions didn’t work well. I might use a debugger to figure out the behaviour of some code, so I tried asking questions about what the code was doing. The answers I got were often inaccurate or completely wrong.
Debugging often involves seeking out different data sources: logs, crash reports, code in compiled libraries, code in different repositories, testing in a debugger, reports from static analysis tools, compilation/linking warnings and errors. The current AI tools aren’t able to go and get these data sources. They seem to struggle with accurately comprehending individual code files.
Lastly, while there may be common errors and patterns of types of bugs, the really challenging ones are unique and new. So far, LLM-based AI tools are based on existing datasets, so they have no clue what to do with new, unique situations.
Without getting into the limitations of LLMs in general, my consensus is that we still need debugging skills to figure out software errors and malfunctions.