LLM

Published on: September 10, 2025

Tags: #llm #ai #mathematics


The Core Process: How an LLM Generates Text

graph TD
    subgraph "User Input"
        A[Raw Text: The cat sat on the...]
    end

    subgraph "LLM Core Processing"
        B(Tokenization) --> C(Embeddings);

        subgraph "Inside the Transformer Block"
            direction TB
            D["Input Embeddings
+
Positional Encoding
(to understand word order)"]; E["Self-Attention Mechanism
(weighs word importance & context)"]; F["Neural Network Layers
(for deep processing)"]; D --> E --> F; end C --> D; F --> G(Probability Calculation); end subgraph "Model Output" G --> H[Output Token: mat]; end A --> B; style A fill:#f9f,stroke:#333,stroke-width:2px style H fill:#ccf,stroke:#333,stroke-width:2px

The Learning Cycle: How an LLM Improves

graph TD
    A(Start With a Base Model) --> B["Make a Prediction"];
    B --> C["Calculate Error
(Cost Function)"]; C --> D["Adjust Model Parameters
(Gradient Descent &
Backpropagation)"]; D --> B; subgraph "Guiding the Learning Process" E(Regularization Techniques
e.g., Dropout) -- Prevents Overfitting --> D; F(Optimizers
e.g., Adam) -- Improves Efficiency --> D; end style A fill:#f9f,stroke:#333,stroke-width:2px

From General Knowledge to Aligned Specialist

graph TD
    subgraph "Phase 1: Foundation Building"
        A["Pretraining
Model learns general
language patterns,
grammar, and facts from a
massive, diverse dataset
(the entire internet, books,
etc.)."]; end subgraph "Phase 2: Specialization" B["Fine-Tuning/Transfer Learning
The foundational model is
adapted for a specific task
(e.g., medical analysis,
legal summaries) using a
smaller, domain-specific
dataset."]; end subgraph "Phase 3: Alignment" C["Reinforcement Learning
from Human Feedback
(RLHF)

Humans rank model outputs
for helpfulness and safety.
This feedback trains the
model to align its behavior
with human values and
expectations."]; end subgraph "Result" D["A Specialized,
Helpful, and Aligned LLM
(e.g., ChatGPT, Gemini)"]; end A --> B --> C --> D;

Timeline of Key Breakthroughs

timeline

    1940s-1960s : Early Foundations
        : Claude Shannon's Information Theory (Language as probability)
        : ELIZA Chatbot (Pattern matching)

    1980s : Statistical Approaches
        : N-gram Models (Predicting words based on the last few words)

    2013 : The Meaning Revolution
        : Word2Vec (Representing word meaning as vectors/embeddings)

    2017 : The Architecture Breakthrough
        : "Attention Is All You Need" paper introduces the Transformer Architecture

    2020s-Present : The Age of Scale & Multimodality
        : Massive models (GPT series, Gemini) with trillions of parameters
        : Integration of text, images, and audio (Multimodality)

Source: Mathematics of LLMs in Everyday Language

Share this post

Share on X  •  Share on LinkedIn  •  Share via Email