LLMs Are a Dead End
Published on: October 02, 2025
Tags: #llm #ai #reinforcement-learning
Mimicry vs. True Understanding
graph TD subgraph "Large Language Models (LLMs): The Mimicry Loop" direction TB A[Internet Text Dataset] --> B{LLM Training}; B --> C[Predict Next Word]; C --> D((Generate Text)); style A fill:#f9f,stroke:#333,stroke-width:2px end subgraph "Reinforcement Learning (RL): The Experiential Loop" direction TB E[Environment] -- Sensation/State --> F{RL Agent}; F -- Action --> E; E -- Reward/Feedback --> F; style E fill:#ccf,stroke:#333,stroke-width:2px end subgraph "Sutton's Core Argument" direction TB G[Mimicking People] --x H(Lacks World Model & Goals); I[Learning from Experience] --> J(Develops World Model & Achieves Goals); end style G fill:#FFDDC1 style H fill:#FFDDC1 style I fill:#D4E4F7 style J fill:#D4E4F7
Revisiting "The Bitter Lesson"
graph LR subgraph "Current LLM Approach" A[Massive Compute] & B["Massive Human Knowledge
(Internet Text)"] --> C(Powerful LLMs); end subgraph "Sutton's Predicted Future (The Bitter Lesson Applied)" D[Massive Compute] & E["Learning from Raw Experience
(Interaction, Trial & Error)"] --> F(More Scalable & Capable AI); end C -- "Will be Superseded by" --> F; style A fill:#D4E4F7 style B fill:#FFDDC1 style E fill:#D4E4F7
The Continual Learning Agent
graph TD subgraph "The Experiential Paradigm (Refined)" direction LR A(Agent) B(Environment) A -- "Action" --> B; B -- "Sensation & Reward" --> A; subgraph "Agent's Internal Components" C["Policy
(What to do)"] D["Value Function
(How well it's going)"] E["World Model
(Predicts consequences)"] end A --> C A --> D A --> E B -- "Updates" --> E D -- "Guides" --> C %% --- New Connections --- E -- "Enables Planning to Refine" --> C; E -- "Provides Simulated Experience to Improve" --> D; end style B fill:#ccf,stroke:#333,stroke-width:2px style A fill:#f9f,stroke:#333,stroke-width:2px
Source: Richard Sutton – Father of RL thinks LLMs are a dead end