Exploring LLMs as Agents: Planning via Prompting

My exploration of how to use LLMs as Agents continues. In this post, I’m looking at ways to improve the outcomes by using better approaches to planning. For more context, checkout the previous posts:

Minimalist Approach: I kicked off my exploration by making two agents using the bare-minimum dependencies. I wanted to learn the concepts, before using a do-it-all framework.
Taking Action: I decided to focus on the tool_agent and add some read/write tools for the agent to use. I also discovered the need to send clear and consistent context in prompts.

Take a look at the repository to see the code.

Planning & Reasoning

I have seen a lot of people talking about “planning” and “reasoning” as crucial parts of an LLM agent system, so I started to read and learn more about different strategies. There is obviously a ton written about LLMs and Agents all over the internet, but I stumbled upon a useful web site that collects a lot of the information and has references to where the ideas originated.

Prompt Engineering Guide, not surprisingly, focuses on many different prompting strategies. It turns out, most of the different planning and reasoning strategies are based on different prompting strategies. Here are some of the strategies I progressed through:

Zero-shot: I think of this strategy as “hoping the LLM was trained well enough to just know the answer”. Hope is not a strategy, but you’d be surprised how many people believe the LLM can just figure everything out on its own.
Few-shot: This strategy enables in-context learning and depends on adding some examples to your prompt to help guide the model to better outcomes. My initial tool_agent used this approach before I even knew what it was called.
ReAct (Reasoning + Acting): This strategy combines reasoning and acting with LLMs into the prompt. ReAct prompts LLMs to generate reasoning and actions for a task. Asking the LLM to “think about the step” is enough to create significant improvements to the outcomes.

Funny thing: I thought I wanted to try using something called Chain of Thought (CoT), which is another prompt strategy to get the LLM to think about the steps. CoT ended up being a pain to implement in a simple way. I ended up wanting to merge the “thought” (reasoning) and “action” parts together into the same response — which turns out to basically be ReAct.

Example Flows

I honestly don’t know if the ReAct strategy is yielding better outcomes than the Few-shot approach I was using. I guess I need to set up benchmarks next? The flows are promising though, and the ReAct approach didn’t add much to the size of the code.

You: What's the weather in Beverly Hills?

Thought: The user wants to know the weather in Beverly Hills. First, I need to get the zipcode for Beverly Hills, then I can use that to get the weather.
Thought: Now that I have the zipcode for Beverly Hills, I can get the weather.
Thought: I have the weather information. Now I need to present it to the user.

Agent: The weather in Beverly Hills is 75 F and Sunny.

Here’s a longer one that uses the appointment scheduling tools:

You: I want to book a haircut

Thought: The user wants to book a haircut. I need to find the appropriate specialty to book an appointment for. I should first check the available specialties.
Thought: The available specialties are hair, dentist, and vision. Since the user wants a haircut, the relevant specialty is 'hair'. Now I need to find available appointments for 'hair'.
Thought: I have found the available appointments for a haircut. Now I need to present them to the user and ask them to choose one.

Agent: Here are the available appointments for a haircut:

* ID: 10, Date: 2025-03-09, Time: 10:30 AM
* ID: 11, Date: 2025-03-10, Time: 2:00 PM
* ID: 12, Date: 2025-03-16, Time: 11:00 AM
* ID: 13, Date: 2025-03-16, Time: 3:00 PM

Please tell me the ID of the appointment you would like to book.

You: 12

Thought: The user has selected appointment ID 12. Now I need to book this appointment.
Thought: The appointment has been successfully booked. I need to inform the user.

Agent: Great! Your appointment has been booked successfully for March 16, 2025, at 11:00 AM. The appointment ID is 12.

Learnings

Reflection and Correction

There are others ways we can improve the outcomes of the LLM agent beyond planning and reasoning. Reflection and correction is a way to help the model learn from its mistakes. I already had some basic checks in place to catch invalid JSON and unknown tools, but I made some additions to the code to help:

Validation of the JSON actions for expected fields
Validation loops to tell the LLM about any validation failures and allow the LLM to try again.

Prompt & Tool Consistency

I also realized, once again, that I was confusing the LLM with some of the ways I framed the prompt and tools.

I referred to both “tools” and “functions” in the initial prompt and subsequent conversational prompts. I decided to only use “tool” and changed all “function” references the LLM saw to “tool”.
I made some of the tool names ambiguous and the LLM would try to use tools that were not appropriate. An easy one is get_specialties which returns the list of specialties used for appointment booking. The LLM would sometimes avoid using the tool to figure out what types of appointments could be booked and would try other tools or guess. I changed the tool name to get_appointment_specialties and the LLM always used it correctly.

What’s Next?

I want to add more tools. I want to add tools that use secondary LLMs and I want to add tools that allow me to use the Agent for my own tasks.

I want to explore different UX around agents. Chat is an obvious choice, and I will likely expose tool_agent into a simple Web Chat system to play with it more and try to use it for actual tasks.

As mentioned above, I may look into benchmarks too.