LLM
Published on: 10 September 2025
Tags: #llm #ai #mathematics
The Core Process: How an LLM Generates Text
graph TD
subgraph "User Input"
A[Raw Text: The cat sat on the...]
end
subgraph "LLM Core Processing"
B(Tokenization) --> C(Embeddings);
subgraph "Inside the Transformer Block"
direction TB
D["Input Embeddings
+
Positional Encoding
(to understand word order)"];
E["Self-Attention Mechanism
(weighs word importance & context)"];
F["Neural Network Layers
(for deep processing)"];
D --> E --> F;
end
C --> D;
F --> G(Probability Calculation);
end
subgraph "Model Output"
G --> H[Output Token: mat];
end
A --> B;
style A fill:#f9f,stroke:#333,stroke-width:2px
style H fill:#ccf,stroke:#333,stroke-width:2px
The Learning Cycle: How an LLM Improves
graph TD
A(Start With a Base Model) --> B["Make a Prediction"];
B --> C["Calculate Error
(Cost Function)"];
C --> D["Adjust Model Parameters
(Gradient Descent &
Backpropagation)"];
D --> B;
subgraph "Guiding the Learning Process"
E(Regularization Techniques
e.g., Dropout) -- Prevents Overfitting --> D;
F(Optimizers
e.g., Adam) -- Improves Efficiency --> D;
end
style A fill:#f9f,stroke:#333,stroke-width:2px
From General Knowledge to Aligned Specialist
graph TD
subgraph "Phase 1: Foundation Building"
A["Pretraining
Model learns general
language patterns,
grammar, and facts from a
massive, diverse dataset
(the entire internet, books,
etc.)."];
end
subgraph "Phase 2: Specialization"
B["Fine-Tuning/Transfer Learning
The foundational model is
adapted for a specific task
(e.g., medical analysis,
legal summaries) using a
smaller, domain-specific
dataset."];
end
subgraph "Phase 3: Alignment"
C["Reinforcement Learning
from Human Feedback
(RLHF)
Humans rank model outputs
for helpfulness and safety.
This feedback trains the
model to align its behavior
with human values and
expectations."];
end
subgraph "Result"
D["A Specialized,
Helpful, and Aligned LLM
(e.g., ChatGPT, Gemini)"];
end
A --> B --> C --> D;
Timeline of Key Breakthroughs
timeline
1940s-1960s : Early Foundations
: Claude Shannon's Information Theory (Language as probability)
: ELIZA Chatbot (Pattern matching)
1980s : Statistical Approaches
: N-gram Models (Predicting words based on the last few words)
2013 : The Meaning Revolution
: Word2Vec (Representing word meaning as vectors/embeddings)
2017 : The Architecture Breakthrough
: "Attention Is All You Need" paper introduces the Transformer Architecture
2020s-Present : The Age of Scale & Multimodality
: Massive models (GPT series, Gemini) with trillions of parameters
: Integration of text, images, and audio (Multimodality)
Sources: