U+237B NOT CHECKMARK

Rethinking CLI interfaces for AI

We need to augment our command line tools and design APIs so they can be better used by LLM Agents. The designs are inadequate for LLMs as they are now – especially if you're constrained by the tiny context windows available with local models.

Agent APIs

Like many developers, I’ve been dipping my toes into LLM agents. I’ve done my fair share of vibe coding, but also I’ve been playing around with using LLMs to automate reverse engineering tasks mostly using mrexodia’s IDA Pro MCP, including extending it.

Developing an MCP interface is an interesting process. You need to walk the line between providing too much information to avoid filling the context windows but also providing enough information to reduce tool calls. We have a few APIs that are better than others, like get_global_variable_at, which takes an address, identifies the type, and returns the best string representation of that value based on that type. However, the function can fail, so we provide a second set of accessor methods (data_read_dword, data_read_word, read_memory_bytes, etc). These accessor methods are fine, but they ignore type information – so we don’t want the LLM to use them first.

To mitigate this problem, we added some guidance into the docstrings:

@jsonrpc
@idaread
def data_read_byte(
    address: Annotated[str, "Address to get 1 byte value from"],
) -> int:
    """
    Read the 1 byte value at the specified address.

    Only use this function if `get_global_variable_at` failed.
    """
    ea = parse_address(address)
    return ida_bytes.get_wide_byte(ea)

This seems to have mostly worked, but these sorts of problems exist for all the APIs. We have the nice convenience function and we also have the more gnarly but more complete function and we want the LLM to use the convenience one first.

 I like to do work with offline LLMs which have much smaller context windows, so having better APIs matters a lot.

Command Line Tools

These problems exist for command line tools also. If you watch Claude Code, you’ll see that it often uses head -n100 to limit the results apriori. It also gets lost about which directory it’s in, and it will frustratingly flail around trying to run commands in different directories until it finds the right one.

To keep Claude Code in line on my project, I’ve relied heavily on linters, build scripts, formatters, and git commit hooks. It’s pretty easy to get Claude Code to commit often by including it in your CLAUDE.md, but it often likes to ignore other commands like “make sure the build doesn’t fail” and “fix any failing tests”. All my projects have a .git/hooks/pre-commit script that enforces project standards. The hook works really well to keep things in line.

However, on really difficult changes it really doesn’t like to acknowledge that it broke a test. I often see it in this loop:

  • Make a change
  • Build the project: passes
  • Run the tests: fail
  • Attempt to fix the test
  • Fail at fixing the test
  • Say “this test was failing beforehand, going to commit with --no-verify"

Then it commits bypassing the hooks! It was doing this so often that I have a git wrapper in my PATH now which prevents git commit --no-verify and the error message has some prompting to actually fix the errors.

$ git commit --no-verify
------------------------------------------------------------------
❌ ERROR: Commit Rejected.
The use of the '--no-verify' flag is disabled for this repository.
------------------------------------------------------------------

🤖 GUIDANCE FOR THE AI AGENT:
You have attempted to bypass the required pre-commit verification 
steps. All code must pass quality checks (formatting, linting, and
tests) before it can be committed.

DO NOT BYPASS THE CHECKS. YOU MUST FIX THE UNDERLYING ERRORS.

The pre-commit hook is likely failing. Diagnose and fix the issues. 
Search for advice if you get stuck.

After all commands complete successfully, attempt the commit again 
*without* the '--no-verify' flag.

This started a game of whack-a-mole where the LLM would also attempt to change the pre-commit hooks! I had to fix it by denying Edit(.git/hooks/pre-commit) to my project’s .claude/settings.json. I look forward to its next lazy innovation.

Information Architecture for LLMs

The field of user experience has a concept called “Information Architecture”. Information Architecture focuses on how and what information is presented to users to provide the best possible user experience. You rarely notice a good IA, but you notice the lack of one.

Bulk Rename Utility, notorious for having a terrible user experience
You have all the options you could ever want! Perhaps you'd prefer a second example.

I think watching the agents use our existing command line utilities get confused and lost is a strong indicator that the information architecture of our command line utilities is inadequate.

LLMs are trained on using our existing CLI tools, so I think we need to augment them with context that is useful to the LLM and maybe adapt the output to be better consumed by agents.

Going back to the head example before. Often, the agent will run my build with cargo build | head -n100. I think there are a few problems here:

  • the agent will have to repeat the build to get further output/errors,
  • the agent has no idea how many lines are remaining, and
  • re-running the build is resource intensive.

Perhaps we could replace head with a wrapper which cache’s the output, converts it into a more structured output, and informs the agent how many lines are remaining.

Similarly, when the agent fails to run a command because it’s in the wrong directory. We could give the agent a little extra help with a shell hook:

command_not_found_handler() {
   echo "zsh: command not found: '$1'"
   echo "zsh: current directory is $PWD"
   return 127  # Keep standard behavior (127 = command not found)
 }
$ sdfsdf
zsh: command not found: 'sdfsdf'
zsh: current directory is /Users/ryan

It could even be a bit fuzzier, checking the recent or parent directories for the command and suggesting:

$ sdfsdf
zsh: command not found: 'sdfsdf'
zsh: current directory is /Users/ryan
zsh: Perhaps you meant to run: cd agent_directory; sdfsdf

Conclusion

Basically every CLI tool can be improved in some way to provide extra context to LLMs. It will reduce tool calls and optimize context windows. 

The agents may benefit from some training on tools available within their agents. This will certainly help with the majority of general CLI tools, there are bespoke tools that could benefit from adapting to LLMs.

It seems a bit silly to suggest, but perhaps we need a whole set of LLM-enhanced CLI tools or a custom LLM shell? The user experience (UX) field could even branch into AI experience and provide us a whole new information architecture.