🏠

Drawing house and Ai agents

These are my learnings while doing the Ai agent course for buildclub.ai. Check the whole course Here.

We are both going to draw a house, so pick a paper and a pen and get to drawing.

Take 5-10 seconds and scroll down

This is the house that I made.

I can bet $ 1,000 that neither of our houses was drawn the same. @ebaad request me venmo

Here are houses drawn by others—there's a castle, a banana-shaped house, and a skinny house. Why is each drawing so different despite identical instructions?

Even though the instructions didn’t specify details, everyone made unique decisions about window placement, size, and style. We often overlook how flexible humans are at interpreting minimal instructions.

Now, look at the house I made by instructing a computer. Guess how many instructions it took? Over 200!

Since each house has a square lets start with the instructions there

Drawing a square in scratch

import turtle

  
turtle.forward(20)
turtle.left(90)

turtle.forward(20)
turtle.left(90)

turtle.forward(20)
turtle.left(90)

turtle.forward(20)
turtle.left(90)

Here is the same thing in python

Try it here.

Notice how the instructions never explicitly mention a square—just lines, angles, and movements.

In that instruction. Let's change one thing. Instead of a 90-degree turn.

let's make it a 50-degree turn.

What happened to our beloved square? NOOOOOOO! Computers can perform billions of calculations per second, sending billions of emails in minutes yet fail at simple tasks like recognizing a square because they operate on low-level instructions executed one by one by the CPU—each step might be as fundamental as coloring a pixel in video memory.

So when you look at the world through like that, it is hard to know what a house is, much less a square. So you have to tell it exactly what to do.

Drawing a square from the lens of a computer, brotha can’t see like we see the world, it is a cold dark world that it sees

I once asked a student to find the number 7 in an array using a loop. After demonstrating, he innocently asked:

"Why use a loop? Can't the computer see the number right there at position five?"

This captures precisely the problem: computers don't perceive screens or arrays visually. They see one memory location at a time.

What if we looked at the world like computers

For us humans it will be like looking to find Waldo through a peep hole.

Imagine trying to find Waldo through a peephole or reading a book one word at a time.

Could you play soccer if you couldn't see your feet and the ball simultaneously?

The human brain thrives in ambiguity and can easily handle vague instructions, such as "draw a house." Computers historically couldn't—until recently.

This flexibility challenge led to the rise of Large Language Models (LLMs), inspired by the human brain’s adaptability. LLMs have enabled computer programs, called AI agents, to tackle vague problems—like drawing houses, creating unspecified games, or planning trips without detailed instructions.

The anatomy of a problem

“I got 99 problems, but all are non-deterministic” — some ai agent

Understanding Problem Flexibility

Problems vary greatly in specificity:

Draw something.
Draw a house.
Draw a house with two windows.
Draw a house with two windows and a red door.

The first instruction is extremely flexible, leaving countless possibilities open. In that flexibility lies the need—and freedom—to make decisions.

Humans cope with such ambiguity effortlessly; computers, by contrast, demand painstaking detail—exact coordinates, angles, colors, and a precise sequence of operations.

Early plotters or drawing robots had to be programmed step‑by‑step, for example MOVE 0,0; LINE 0,0,50,0; TURN 90; LINE 50; …, just to sketch a simple square. They could only reproduce exactly what the programmer described—nothing more.

So how do we tell computers to make decisions

Not every task in programming is a straight line like in drawing a house, there are decision to be made in real programs. Which means to I run one part of the code or ther other. If a person has been logged in, they will see a different piece of the program compared to if they're not logged in.

When your computer normally crashes, it ends up in a place where it doesn't know how to make a decision because it was never told to. A computer has no mind of its own. Everything that it does has been programmed by someone. If you tell it to kill itself, it will do that.

sudo dd if=/dev/zero of=/dev/sdX bs=4M status=progress (try it)

At the very core, a computer doesn't select what lines to run.

The current problem is sorting an array, which operates in a very basic algorithm of swapping if a number is greater or less.

The control logic for diagnosing and taking action might have a more complicated.

As problem complexity and flexibility increase, traditional logic spirals into "control logic hell," becoming nearly unmanageable.

Enter LLMs

Instead of rigid control logic, LLMs introduce dynamic logic:

This is the same problem we tried to solve with the control logic, as the complexity of the problem space increases, we'd have to program million and million of lines of code instead we passing the state of the ticker and asking the LLM to give us an action to run.

input ➔ Current ticket state:
  – Free disk space: 7%
  – High-CPU procs present: False
  What should I do next?

LLM ➔ Action: clean_temp_files

LLMs bring a dynamic logic to the program using the general intelligence. This is the start of agency and becoming agentic.

Once the text is returned you can use existing tools or functions to take action.

This is popularly called function or tool calling.

AI Agent Components

A large part of an AI agent is dynamic logic control and function calling. But that's not all. This is a Windsurf coding AI agent making a turtle game. What other things do you think are needed to make this possible?

The agent does not run all the steps all at once. It makes an initial decision, assesses it, and then makes the next. Decisions it will make with a blank directory are going to be completely different to when there's code within those files. This is a process of repeating again and again until the task has been done. Similar to a loop in programming, all loops have a termination condition like if i > len(array) . In the case of AI agents, the termination condition is very vague like When game is made and is also provided by the LLM after we give it all the information about the project.

while True
	prompt = prevoius_steps + "if trip is booked return done"
	status = LLM.eval(prompt)

	if status == “done”: # This can be passed to a regular if condition that terminates the loop.
		break

Previous steps

Storing information about previous steps is another essential component of an agent; it is important to have an understanding of what has occurred previously in order to make informed decisions or terminate the agent. This is a snapshot of the process, commonly referred to as the state, much like the memory component of the brain. Imagine if you forgot things as you learned them.

AI agent is like a classical program and an LLM having a baby.

Side Quest: Revisiting decision hell

When agents handle multiple tools/actions, the complexity explodes exponentially:

Each agent is given access to a few tools that can be used to take actions. The Winsurf IDE has agents access to tools such as writing files and running code. Some other generic tools are given access to Web Search and Calculator.

Let's look at six tools and running a sequence of five tasks. It will look something like this.

create_file > append_to_file > fetch_dependencies > run_tests > commit_changes

create_file > create_file > create_file > create_file > create_file

Formal definition:

For N = 6 tools and a sequence of k steps, the total number of permutations with repetition is

$N^k = 6^k$ What happens when the number of tools increase.

What if the number of tools changes? Try it out here

Building an agent from scratch

Pre-requisites: basic programming with an idea of object-oriented programming.

Let's build an AI agent that calculates the value of our stock portfolio.

V0, basic setup

class Agent:

    def __init__(self):
        load_dotenv()
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.system_msg = "You are a helpful assistant."
        self.messages = []
        self.messages.append({"role": "system", "content": self.system_msg})

    def send_message(self, message):
        self.messages.append({"role": "user", "content": str(message)})
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=self.messages
        )

        return response.choices[0].message.content

V 1.1 Tools

Stock price tool

#import yfinance
def get_stock_price(symbol):
    """
    Fetches the latest stock price for a given symbol from Yahoo Finance.

    This function retrieves real-time or most recent closing price data for
    the specified stock symbol using the Yahoo Finance API.

    Args:
        symbol (str): The stock ticker symbol (e.g., "AAPL", "MSFT", "GOOG")

    Returns:
        float: The current or most recent stock price

    Examples:
        >>> get_stock_price("AAPL")
        191.45
        >>> get_stock_price("MSFT")
        337.22
    """
    try:
        stock = yf.Ticker(symbol)
        # Get latest price (simplified)
        latest_data = stock.history(period='1d')
        if not latest_data.empty and 'Close' in latest_data.columns:
            return float(latest_data['Close'].iloc[-1])
        else:
            # Simple fallback
            return float(stock.info.get('regularMarketPrice', 0))
    except Exception as e:
        return f"Error fetching price for {symbol}: {str(e)}"

Calculator tool

def calculator(math_expression):
    """
    Evaluates a mathematical expression provided as a string.

    WARNING: Using eval() on arbitrary input is dangerous as it can execute malicious code.
    This function should only be used with trusted input in a controlled environment.

    Args:
        x (str): A string containing a mathematical expression (e.g. "2 + 2", "5 * 3")

    Returns:
        The numerical result of evaluating the expression

    Examples:
        >>> calculator("2 + 2")
        4
        >>> calculator("5 * 3")
        15
    """
    return eval(math_expression)

V 1.2 Prompt.txt

You are a helpful AI assistant with access to the following tools:

1. Calculator Tool:
  – Purpose: Evaluates mathematical expressions.
  – Tool Name: calculate
  – Input: A string representing the mathematical expression.
  – Example:
    User: What is 5 plus 7 divided by 2?
    Assistant:
    Action: calculator
    Action Input: (5 + 7) / 2

2. Stock Price Tool:
  – Purpose: Fetches the latest stock price for a given stock symbol.
  – Tool Name: get_stock_price
  – Input: A string representing the stock symbol (e.g., "AAPL", "GOOG").
  – Example:
    User: What's the current price of Google stock?
    Assistant:
    Action: get_stock_price
    Action Input: GOOG

PAUSE ←

The pause is used as a delimiter here


When you need to use a tool, respond strictly in the following format:
Question: [The user's question]
Thought: [Your thought process on which tool to use and why]
Action: [tool_name]
Action Input: [input to the tool]

If you do not need to use a tool, respond with:
Action: None
Final Answer: [Your direct answer to the user]

After an action is performed and an observation is provided, you will continue the thought process:
Thought: [Your thought process after receiving the observation]
Action: [Next action or None]
Final Answer: [Your final answer to the user if no more actions are needed]

Only use the tools provided. Do not make up tools.
If the user's query does not require a tool, provide a direct answer.

In the agent class

def __init__(self):
    load_dotenv()
    self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    self.system_msg = open("prompt.txt", "r").read()
    self.messages = []
    self.messages.append({"role": "system", "content": self.system_msg})

pass all the past messages into the prompt, to manage state

self.messages.append({"role": "system", "content": self.system_msg}) 

response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=self.messages # here
        )

Running the code manually, apologies for that handsome face that is getting in the way.

Me manually running the next function.

Thought: I have obtained the current price of Tesla stock. 
Now I need to get the price for Amazon.\nAction: get_stock_price.\n Action Input: AMZN

This process of reasoning and action, commonly referred to as a ReAct (Reasoning and Acting) agent, allows for a sequence of thoughts and actions that can be described as a chain of thought.

# Thought 1: I need to calculate total value of 3 TSLA and 3 AMZN stocks
# Action: get_stock_price("TSLA") → 325.31

# Thought 2: I got TSLA price, now I need AMZN price
# Action: get_stock_price("AMZN") → 212.75

# Thought 3: I have both prices, time to calculate total value
# Action: calculator("(3 * 325.31) + (3 * 212.75)") → 1612.23

# Thought 4: Done. Report result.
# Final Answer: "The total value of your 3 Tesla and 3 Amazon stocks is approximately $1,612.23."

V2.1 Parsing

Now lets make its acting also automatic.

known_tools = {
    "calculate": calculator,
    "get_stock_price": get_stock_price
}

def extract_answer(response):
    action_regex = r"Action: (.*?)\n"
    input_regex = r"Action Input: (.*?)\n"

    lines = response.split("\n")
    action = None
    input = None

    for line in lines:
        if "Action: " in line:
            action = line.split("Action: ")[1]
        if "Action Input: " in line:
            input = line.split("Action Input: ")[1]

    return action, input

V2.2 Get answer with loop

def get_the_answer(user_input):
    agent = Agent()
    iteration = 0

    while True and iteration < 10:
        
        print(f"\n--- Iteration {iteration} ---")
        task = user_input
        response = agent.send_message(task)
        print(response)
        print("\n")
        
        action, input = extract_answer(response)

        if action:
            
            result = known_tools[action](input)
            user_input = str(result)
            iteration += 1
        
        else: # when None is returned
            return extract_answer(response)
            
       """ 
       If you do not need to use a tool, respond with:
			 Action: None
			 Final Answer: [Your direct answer to the user]
			 
			 """

There we go, that was an AI agent.

AI agent is like a classical program and an LLM having a baby.

We brought the power of LLMs to enhancing decision logic and termination conditions. We also converted natural language into actionable function calls using LLMs. Allowing programs to interact with other tools flexibly, and navigate spaces which regular programs had a hard time in.

Exercise the mind.

Some vague problems and ai agents used to solve it. Think about what tools we need for each of the AI and what the chain of thought will look like.

Calculate the total price of construction cost from a blueprint.
Communicate with a colleague and find a time to schedule in a meeting.
Debug this code and get rid of the error.

Beyond Theory

Setting up an AI agent by yourself can be really challenging. There are tools that help you with states, evaluation and tools so you can focus on shipping code. One of the companies that do great job at that is crew AI.

So now you know what more or less is an AI agent, you can understand it as a tool rather than a cultural phenomenon. With that understanding you can build countless agents