Exploring LLMs as Agents: Taking Action

I’m still exploring how to use LLMs to build agents. In the first post, I described some of the motivation and approach. I started working on two agents:

web_agent: A basic agent that completes a web-based task using browser automation actions.
tool_agent: A basic agent that completes a task using a set of supplied tools or methods.

I’ve been focused more on tool_agent, trying to expand the types of tools I can provide. I had a suggestion to add some read/write type tools and scenarios, so I decided to try that out.

Taking Action

an agent is characterized by the environment it operates in and the set of actions it can perform. — Chip Huyen Agents

I want to create a more realistic set of actions for the agent to use. Scheduling appointments seemed like a straightforward scenario with some interesting possible flows. I added a few tool methods and a mocked appointment data store.

get_specialties(): Retrieves a list of available specialties for scheduling appointments.
get_available_appointments(specialty): Retrieves available appointments for a given specialty.
book_appointment(booking_data): Books an appointment based on provided booking data. The agent can pass in an appointment ID or a combination of a date and time.
get_my_appointments(): Retrieves the list of appointments booked by the user.
cancel_appointment(appointment_id): Cancels a booked appointment based on the provided appointment ID.

Checkout the tool_agent.py code for more detail on the way the tools work and the way the tools are added to the system prompt.

Example Flows

I was pleased with how well this rudimentary system worked. Without many tweaks (some on that below), I was able to get some useful exchanges.

You: what kind of appts can I book?
Action raw: {"type": "call_function", "tool": "get_specialties"}
Function result: ["dentist", "vision", "hair"]
Action raw: {"type": "output", "value": "You can book appointments for dentist, vision, and hair."}
Agent: You can book appointments for dentist, vision, and hair.

The LLM did a good job of taking some ambiguity (“appts” and “book”) to figure out that it needed to use get_specialties to answer the question.

You: I want to get a haircut
Action raw: {"type": "call_function", "tool": "get_specialties"}
Function result: ["vision", "dentist", "hair"]
Action raw: {"type": "call_function", "tool": "get_available_appointments", "param": "hair"}
Function result: [... list of appointments ...]
Action raw: {"type": "output", "value": "... output ..."}
Agent: Okay, here are the available hair appointments:

* ID: 10, Date: 2025-03-01, Time: 10:30 AM
* ID: 11, Date: 2025-03-02, Time: 2:00 PM
* ID: 12, Date: 2025-03-08, Time: 11:00 AM
* ID: 13, Date: 2025-03-08, Time: 3:00 PM

In this exchange, the agent was able to figure out that it needed to display “hair” appointments. Interesting: the agent came up with the output display format on its own. When I saw the agent using this format, I added the ability to reference appointments by ID in the tool methods.

You: show me my appointments
Action raw: {"type": "call_function", "tool": "get_my_appointments"}
Function result: [{"id": "3", "date": "2025-03-02", "time": "11:00 AM", "specialty": "dentist"}]
Action raw: {"type": "output", "value": "... output ..."}
Agent: Here are your booked appointments:

* ID: 3, Date: 2025-03-02, Time: 11:00 AM, Specialty: dentist

You: cancel the dentist appointment
Action raw: {"type": "call_function", "tool": "cancel_appointment", "param": "3"}
Function result: {"status": "success", "message": "Appointment canceled successfully."}
Action raw: {"type": "output", "value": "... output ..."}
Agent: Your appointment has been canceled successfully.

I thought I could easily confuse the agent by trying to cancel an appointment by the type (specialty) of the appointment, but the agent used the correct ID for the appointment when cancelling via the tool method.

Learning

At one point, the agent started returning invalid JSON. I assumed I need to be more strict in my system prompt, but I discovered an error. Since I was surrounding my Python strings with double quotes, the inner quotes were single quotes. My JSON examples were using single quotes — which is illegal in JSON. The LLM was relying on those examples, so I was causing the LLM to emit invalid JSON.

As I was looking at the code for similar issues, I noticed I was sending Python formatted object output back into the conversation. The LLM was handling it well enough but I decided to change the output to be legal JSON output:

-  response.conversation.prompt(f"Function result: {function_result}")
+  function_result_json = json.dumps(function_result)
+  response = conversation.prompt(f"Function result: {function_result_json}")

What’s Next?

Most of Chip Huyen’s post on Agents talks about “planning”, but I have not really adding any planning specific code to tool_agent yet. Right now, I am getting by with whatever amount of planning the LLM can create itself.

I want to learn more about planning, and how to add a little code to help the agent deal with even more complicated scenarios.

I think the important piece here is to use the model’s tools available to generate structured output, for example, in your case that you are using gemini, the [documentation](https://ai.google.dev/gemini-api/docs/structured-output?lang=rest#json-schemas) explains how it works under the hood.

That being said, there are a few misunderstandings in your code regarding the different tools to use for your goals. Since we are using the LLM library, it has some abstractions that we can use, let me list a few of the improvements you can make to your code to better use the model’s capabilities:

In your code you are initializing the system prompt as a user prompt:
“`
conversation.prompt(initial_prompt)
“`
LLM’s [documentation](https://llm.datasette.io/en/stable/python-api.html#system-prompts) explains that system prompts should be passed in the `system` parameter of the `prompt` function:
“`
response = conversation.prompt(user_input, system=initial_prompt)
“`
Setting up the system prompt as per the model’s specification, ensures that the model will treat the system prompt as a rule and not as a request and that it will never leave the context window, making sure that the model always considers the system prompt before generating its answers.

For structured output (JSON) the LLM [documentation](https://llm.datasette.io/en/stable/python-api.html#schemas) explains that the desired output schema should be passed to the prompt in the `schema` parameter. From the docs:
“`
response = model.prompt(“Describe a nice dog”, schema={
“properties”: {
“name”: {“title”: “Name”, “type”: “string”},
“age”: {“title”: “Age”, “type”: “integer”},
},
“required”: [“name”, “age”],
“title”: “Dog”,
“type”: “object”,
})
“`

Now, for the real downer, the LLM library does not support the proper use of function calling and is a feature under development, as seen in this github issue:
https://github.com/simonw/llm/issues/607

Which means that the library is ill fitted for your goals and, at least from my point of view, will not allow you to explore the full capabilities of modern LLMs and achieve your goals. I would recommend you to use a different library or even google’s own genai library since you are using gemini:
https://github.com/googleapis/python-genai

5 Replies to “Exploring LLMs as Agents: Taking Action”

dario says:

March 3, 2025 at 12:52 pm

I think the important piece here is to use the model’s tools available to generate structured output, for example, in your case that you are using gemini, the [documentation](https://ai.google.dev/gemini-api/docs/structured-output?lang=rest#json-schemas) explains how it works under the hood.

That being said, there are a few misunderstandings in your code regarding the different tools to use for your goals. Since we are using the LLM library, it has some abstractions that we can use, let me list a few of the improvements you can make to your code to better use the model’s capabilities:

In your code you are initializing the system prompt as a user prompt:
“`
conversation.prompt(initial_prompt)
“`
LLM’s [documentation](https://llm.datasette.io/en/stable/python-api.html#system-prompts) explains that system prompts should be passed in the `system` parameter of the `prompt` function:
“`
response = conversation.prompt(user_input, system=initial_prompt)
“`
Setting up the system prompt as per the model’s specification, ensures that the model will treat the system prompt as a rule and not as a request and that it will never leave the context window, making sure that the model always considers the system prompt before generating its answers.

For structured output (JSON) the LLM [documentation](https://llm.datasette.io/en/stable/python-api.html#schemas) explains that the desired output schema should be passed to the prompt in the `schema` parameter. From the docs:
“`
response = model.prompt(“Describe a nice dog”, schema={
“properties”: {
“name”: {“title”: “Name”, “type”: “string”},
“age”: {“title”: “Age”, “type”: “integer”},
},
“required”: [“name”, “age”],
“title”: “Dog”,
“type”: “object”,
})
“`

Now, for the real downer, the LLM library does not support the proper use of function calling and is a feature under development, as seen in this github issue:
https://github.com/simonw/llm/issues/607

Which means that the library is ill fitted for your goals and, at least from my point of view, will not allow you to explore the full capabilities of modern LLMs and achieve your goals. I would recommend you to use a different library or even google’s own genai library since you are using gemini:
https://github.com/googleapis/python-genai
Pingback: Exploring LLMs as Agents: Planning via Prompting – stark raving finkle
Mark Finkle says:

March 9, 2025 at 9:21 pm

Thanks for the review, Dario!

I’ll update to the newest LLM version and try out the schema support. As for the lack of function calling (yet), I don’t think that’s holding me back too much yet. The LLM is still calling the tools fairly reliably.

When LLM adds support, I can get rid of some more code though.
Pingback: Exploring LLMs as Agents: Tools & Benchmarking – stark raving finkle
Pingback: Exploring LLMs as Agents: WebAgent Gets a Refactor – stark raving finkle

This site uses Akismet to reduce spam. Learn how your comment data is processed.