LLM

Published on: 10 September 2025

The Core Process: How an LLM Generates Text

graph TD
    subgraph "User Input"
        A[Raw Text: The cat sat on the...]
    end

    subgraph "LLM Core Processing"
        B(Tokenization) --> C(Embeddings);

        subgraph "Inside the Transformer Block"
            direction TB
            D["Input Embeddings
+
Positional Encoding
(to understand word order)"];
            E["Self-Attention Mechanism
(weighs word importance & context)"];
            F["Neural Network Layers
(for deep processing)"];
            D --> E --> F;
        end

        C --> D;
        F --> G(Probability Calculation);
    end

    subgraph "Model Output"
        G --> H[Output Token: mat];
    end

    A --> B;

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style H fill:#ccf,stroke:#333,stroke-width:2px

The Learning Cycle: How an LLM Improves

graph TD
    A(Start With a Base Model) --> B["Make a Prediction"];
    B --> C["Calculate Error
(Cost Function)"];
    C --> D["Adjust Model Parameters
(Gradient Descent &
Backpropagation)"];
    D --> B;

    subgraph "Guiding the Learning Process"
        E(Regularization Techniques
e.g., Dropout) -- Prevents Overfitting --> D;
        F(Optimizers
e.g., Adam) -- Improves Efficiency --> D;
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px

From General Knowledge to Aligned Specialist

graph TD
    subgraph "Phase 1: Foundation Building"
        A["Pretraining
Model learns general
language patterns,
grammar, and facts from a
massive, diverse dataset
(the entire internet, books,
etc.)."];
    end

    subgraph "Phase 2: Specialization"
        B["Fine-Tuning/Transfer Learning
The foundational model is
adapted for a specific task
(e.g., medical analysis,
legal summaries) using a
smaller, domain-specific
dataset."];
    end

    subgraph "Phase 3: Alignment"
        C["Reinforcement Learning
from Human Feedback
(RLHF)
Humans rank model outputs
for helpfulness and safety.
This feedback trains the
model to align its behavior
with human values and
expectations."];
    end

    subgraph "Result"
        D["A Specialized,
Helpful, and Aligned LLM
(e.g., ChatGPT, Gemini)"];
    end

    A --> B --> C --> D;

Timeline of Key Breakthroughs

timeline

    1940s-1960s : Early Foundations
        : Claude Shannon's Information Theory (Language as probability)
        : ELIZA Chatbot (Pattern matching)

    1980s : Statistical Approaches
        : N-gram Models (Predicting words based on the last few words)

    2013 : The Meaning Revolution
        : Word2Vec (Representing word meaning as vectors/embeddings)

    2017 : The Architecture Breakthrough
        : "Attention Is All You Need" paper introduces the Transformer Architecture

    2020s-Present : The Age of Scale & Multimodality
        : Massive models (GPT series, Gemini) with trillions of parameters
        : Integration of text, images, and audio (Multimodality)

Sources:

Mathematics of LLMs in Everyday Language

Share this post

Share on X • Share on LinkedIn • Share via Email

LLM

The Core Process: How an LLM Generates Text

The Learning Cycle: How an LLM Improves

From General Knowledge to Aligned Specialist

Timeline of Key Breakthroughs

Related Diagrams

Share this post