Summary: Understanding and controlling a maze-solving policy network analyzed a maze-solving agent's behavior. We isolated four maze properties which seemed to predict whether the mouse goes towards the cheese or towards the top-right corner:

In this post, we conduct a more thorough statistical analysis, addressing issues of multicollinearity. We show strong evidence that (2) and (3) above are real influences on the agent's decision-making, and weak evidence that (1) is also a real influence. As we speculated in the original post,^[1] (4) falls away as a statistical artifact.

Peli did the stats work and drafted the post, while Alex provided feedback, expanded the visualizations, and ran additional tests for multicollinearity. Some of the work completed in Team Shard under SERI MATS 3.0.

Impressions from trajectory videos

Watching videos Langosco et al.'s experiment, we developed a few central intuitions about how the agent behaves. In particular, we tried predicting what the agent does at decision squares. From Understanding and controlling a maze-solving policy network:

Some mazes are easy to predict, because the cheese is on the way to the top-right corner. There's no decision square where the agent has to make the hard choice between the paths to the cheese and to the top-right corner:
At the decision square, the agent must choose between two paths—cheese, and top-right.

Here are four central intuitions which we developed:

Closeness between the mouse and the cheese makes cheese-getting more likely
Closeness between the mouse or cheese and the top-right makes cheese-getting more likely
The effect of closeness is smooth
Both ‘spatial’ distances and ‘legal steps’ distances matter when computing closeness in each case

The videos we studied are hard to interpret without quantitative tools, so we regard these intuitions as theoretically-motivated impressions rather than as observations. We wanted to precisify and statistically test these impressions, with an eye to their potential theoretical significance.

We suspect that the agent’s conditions for pursuing cheese generalize properties of historically reinforced cheese-directed moves in a very “soft” way. Consider that movements can be "directed" on paths towards the cheese, the top-right corner, both, or neither. In the training environment, unambiguously cheese-directed movements are towa...

Interpreting a Maze-Solving Network