Two "R"s in "strawberry"?!

It is no secret that there is a massive AI boom happening right now. There are incredible tools that exist today, like Perplexity for search, or ChatGPT's Pro models for reasoning. But have you ever wondered, given these incredible boundary-pushing PhD-level reasoning capabilities, why they (sometimes) cannot get some of the most basic information right? The answer seems to be in the way you prompt them. Below I attempt to break this down into the main moving parts (we're coming back to this point later).

How the modern LLMs "think"

These models the input context into small chunks, assign meaning to each of these tokens, and try to understand them as a whole in the context. This is all done via a process called utilizing . It is good to note that this can happen mostly in parallel.

How do these models "respond" to inquiries?

Without diving into the complicated inner workings of the LLMs, these models go in a loop of generating the next , one at a time. As they generate each token, they look back at what has been produced so far, and then probabilistically generate the next token. This process is repeated until the model reaches the special .

So, why do they struggle with basic questions?

Let's demonstrate this using the infamous "strawberry" example. For this test I picked a 3B llama model. Here is how one might naively prompt the LLM:

$ ollama run llama3.2:3b
>>> How many "r" letters are there in "strawberry"?
There are 2 "r"s and no other instances of the letter "r".

Now compare that to the following prompt:

$ ollama run llama3.2:3b
>>> How many "r" letters are there in "strawberry"? Break this down \
into smaller steps. Spell out the word, and then go one by one and \
count the number of "r" letters.

Let's break it down step by step:

S-T-R-A-W-B-E-R-R-Y

Starting from the first letter, let's count the "R"s:

1. S (no "R")
2. T (no "R")
3. R (1 "R")
4. A (no "R")
5. W (no "R")
6. B (no "R")
7. E (no "R")
8. R (2 "R"s)
9. R (3 "R"s)
10. Y (no "R")

There are 3 "R" letters in the word "strawberry".

The solution

As you can see, we have to let these models "think" by breaking them down into smaller chunks. In the LLM world, each token generation is quite compute intensive, and there is not a lot of room for long calculations. Thus if you prompt them in a way to allow them to "think" through generating smaller chunks, they can do a much better job in reaching the right final answer. This is obviously an oversimplified example to demonstrate the point, but you can scale this up to much more complex use cases.

Continue the conversation

I hope this helps you get one step closer to maximally utilizing these incredible tools.
Found this interesting, or have any questions? Feel free to reach out on X or LinkedIn!

Credits & References