The Wall
Where the table runs out of room
The Q-table worked. In a 7×7 maze, it filled with values, the agent learned, and we could watch every step of the process. That transparency was the point — Q-learning is an X-RAY — v1.8 tool precisely because it makes knowledge visible.
Now try something simple: make the maze bigger. Drag the slider below. The maze grows. The Q-table grows with it — one row per cell, four values per row. At 7×7 you have 49 states and 196 Q-values. At 10×10 you have 400. At 15×15 you have 900. Watch the policy map. The arrows become smaller, harder to read, eventually meaningless as symbols. The table is still there — but you can no longer see through it.
This is the wall. Not a conceptual limit — a physical one. A game of chess has somewhere between 1040 and 1050 legal positions. The observable universe contains roughly 1080 atoms. A table with one row per chess position cannot exist anywhere in reality. Q-learning, as we have seen it, simply stops being a viable approach.
Watch what changes as the grid grows: the agent still learns, but slower. With more states to explore, it takes longer to visit each one enough times to get reliable estimates. The convergence bar fills from the goal outward — that warmth has to travel further and further before it reaches the start. The wall is not that learning stops. The wall is that learning slows beyond use.
At some point, the approach breaks entirely. Not because the algorithm is wrong, but because the representation is wrong. A table assumes you can enumerate every possible situation in advance and give it its own row. Reality doesn't allow this.
What we need is something that can generalize — that can say: this situation is similar to that one, so its value should be similar too. A function, not a table. Something that takes a state as input and outputs a value estimate, without needing to have seen that exact state before. That something has a name.
"A table can only know what it has seen. A function can know what it hasn't."
The next chapter replaces the table with a neural network. The agent stops memorizing states and starts learning patterns. The Q-table — our transparent, inspectable, X-RAY — v1.8-friendly structure — will disappear. What takes its place is faster, more powerful, and considerably harder to see through. That trade-off is the subject of everything that follows.