Database Indexes & Performance Optimization: B-Tree, Hash, Fulltext & Clustered
This article is a comprehensive guide to database indexes and performance optimization – including B-Tree, Hash, Fulltext, and Clustered indexes with practical examples.
In a Nutshell
Database indexes significantly accelerate queries by enabling fast access to data. Different index types are optimized for different use cases.
Compact Technical Description
Database indexes are specialized data structures that improve the speed of data retrieval operations in database tables. They work similarly to an index in a book.
Index types and their characteristics:
B-Tree Index (Balanced Tree)
- Structure: Balanced tree with sorted values
- Usage: Equality, range search, sorting
- Performance: O(log n) for search operations
- Examples: Primary keys, foreign keys, standard indexes
Hash Index
- Structure: Hash table with direct addresses
- Usage: Equality searches only (=)
- Performance: O(1) for exact matches
- Examples: Memory tables, exact searches
Fulltext Index
- Structure: Inverted index for text search
- Usage: Full-text search, keyword search
- Performance: Optimized text search algorithms
- Examples: Document search, content search
Clustered Index
- Structure: Physical sorting of the table
- Usage: Primary key, frequent range searches
- Performance: Fast for primary key access
- Examples: Time series, log data
Exam-Relevant Key Points
- B-Tree Index: Balanced tree for equality and range search
- Hash Index: Direct addressing for exact matches
- Fulltext Index: Text search with word forms and relevance
- Clustered Index: Physical data storage by index
- Performance: Query optimization through indexes
- Trade-offs: Storage space vs. speed
- IHK-relevant: Important for database administration and optimization
Core Components
- Index Structure: Tree, Hash, Inverted Index
- Index Type: B-Tree, Hash, Fulltext, Clustered
- Query Type: Equality, Range, Fulltext
- Performance Metrics: Read time, Write time, Storage
- Index Strategy: Single-Column, Multi-Column, Covering
- Optimization: EXPLAIN analysis, Index Tuning
- Maintenance: Rebuild, Fragmentation, Statistics
Practical Examples
1. B-Tree Index Examples
-- MySQL B-Tree Index erstellen
CREATE INDEX idx_kunden_name ON kunden(name);
CREATE INDEX idx_bestellungen_datum ON bestellungen(bestelldatum);
-- Multi-Column B-Tree Index
CREATE INDEX idx_kunden_stadt_name ON kunden(stadt, name);
-- Unique B-Tree Index
CREATE UNIQUE INDEX idx_email_unique ON kunden(email);
-- Abfrage mit B-Tree Index Nutzung
EXPLAIN SELECT * FROM kunden WHERE name = 'Mustermann';
EXPLAIN SELECT * FROM bestellungen WHERE bestelldatum BETWEEN '2024-01-01' AND '2024-12-31';
-- Performance-Vergleich
-- Ohne Index: Full Table Scan
SELECT * FROM grosse_tabelle WHERE spalte_x = 'wert';
-- Mit B-Tree Index: Index Seek
SELECT * FROM grosse_tabelle WHERE spalte_x = 'wert';
2. Hash Index Examples
-- MySQL Hash Index (nur Memory Engine)
CREATE TABLE user_sessions (
session_id VARCHAR(255) PRIMARY KEY,
user_id INT,
created_at TIMESTAMP,
data TEXT,
INDEX ((session_id)) USING HASH
) ENGINE=MEMORY;
-- PostgreSQL Hash Index
CREATE INDEX idx_hash_email ON benutzer USING HASH (email);
-- Abfrage mit Hash Index (nur Gleichheit)
SELECT * FROM benutzer WHERE email = 'user@example.com';
-- Hash Index wird NICHT verwendet für:
SELECT * FROM benutzer WHERE email LIKE 'user%'; -- Bereichssuche
SELECT * FROM benutzer WHERE email > 'a'; -- Vergleichsoperation
3. Fulltext Index Examples
-- MySQL Fulltext Index
CREATE TABLE artikel (
id INT PRIMARY KEY,
titel VARCHAR(255),
inhalt TEXT,
FULLTEXT KEY ft_inhalt (titel, inhalt)
);
-- Fulltext Suche
SELECT titel, inhalt
FROM artikel
WHERE MATCH(titel, inhalt) AGAINST('datenbank performance' IN NATURAL LANGUAGE MODE);
-- Mit Relevanz-Score
SELECT titel,
MATCH(titel, inhalt) AGAINST('datenbank performance' IN NATURAL LANGUAGE MODE) AS score
FROM artikel
WHERE MATCH(titel, inhalt) AGAINST('datenbank performance' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC;
-- Boolean Mode für komplexe Suchen
SELECT titel, inhalt
FROM artikel
WHERE MATCH(titel, inhalt) AGAINST('+datenbank +performance -mysql' IN BOOLEAN MODE);
-- PostgreSQL Fulltext Index
CREATE TABLE dokumente (
id SERIAL PRIMARY KEY,
titel VARCHAR(255),
inhalt TEXT
);
-- Fulltext Index mit tsvector
ALTER TABLE dokumente ADD COLUMN searchable_text tsvector;
UPDATE dokumente SET searchable_text = to_tsvector('german', titel || ' ' || inhalt);
CREATE INDEX idx_fulltext ON dokumente USING GIN(searchable_text);
-- Fulltext Suche in PostgreSQL
SELECT titel, ts_rank(searchable_text, plainto_tsquery('german', 'datenbank performance')) AS rank
FROM dokumente
WHERE searchable_text @@ plainto_tsquery('german', 'datenbank performance')
ORDER BY rank DESC;
4. Clustered Index Examples
-- SQL Server Clustered Index
CREATE TABLE log_daten (
log_id INT IDENTITY(1,1),
zeitstempel DATETIME NOT NULL,
nachricht VARCHAR(1000),
level VARCHAR(10)
);
-- Clustered Index auf Zeitstempel (physische Sortierung)
CREATE CLUSTERED INDEX idx_log_zeitstempel ON log_daten(zeitstempel);
-- PostgreSQL Clustered Index (CLUSTER)
CREATE TABLE messwerte (
id SERIAL PRIMARY KEY,
sensor_id INT,
zeitpunkt TIMESTAMP NOT NULL,
wert DECIMAL(10,4)
);
-- Clustered Index auf Primärschlüssel (Standard)
CREATE INDEX idx_messwerte_zeitpunkt ON messwerte(zeitpunkt);
-- Tabelle nach Index clustern
CLUSTER messwerte USING idx_messwerte_zeitpunkt;
-- MySQL InnoDB Clustered Index (automatisch auf Primärschlüssel)
CREATE TABLE transaktionen (
id BIGINT PRIMARY KEY, -- Automatisch clustered
konto_id INT,
betrag DECIMAL(12,2),
datum DATETIME,
INDEX idx_konto_datum (konto_id, datum) -- Secondary Index
);
5. Performance Analysis with EXPLAIN
-- MySQL EXPLAIN Analyse
EXPLAIN FORMAT=JSON
SELECT k.name, b.bestelldatum, b.gesamtbetrag
FROM kunden k
JOIN bestellungen b ON k.id = b.kunden_id
WHERE k.stadt = 'Berlin'
AND b.bestelldatum >= '2024-01-01'
ORDER BY b.gesamtbetrag DESC;
-- PostgreSQL EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT k.name, b.bestelldatum, b.gesamtbetrag
FROM kunden k
JOIN bestellungen b ON k.id = b.kunden_id
WHERE k.stadt = 'Berlin'
AND b.bestelldatum >= '2024-01-01'
ORDER BY b.gesamtbetrag DESC;
-- Index-Nutzung überprüfen
-- MySQL: SHOW INDEX FROM tabelle;
SHOW INDEX FROM kunden;
-- PostgreSQL: \d tabelle
-- oder pg_indexes view
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'kunden';
6. Index Optimization and Best Practices
-- Covering Index (alle benötigten Spalten im Index)
CREATE INDEX idx_kunden_covering ON kunden(stadt, name, id);
-- Abfrage nutzt nur den Index (Index-Only Scan)
SELECT id, name FROM kunden WHERE stadt = 'Berlin';
-- Partial Index (nur für Teil der Daten)
CREATE INDEX idx_aktive_kunden ON kunden(id) WHERE status = 'aktiv';
-- Function-Based Index (PostgreSQL)
CREATE INDEX idx_kunden_name_lower ON kunden(LOWER(name));
-- Abfrage nutzt Function-Based Index
SELECT * FROM kunden WHERE LOWER(name) = 'mustermann';
-- Composite Index mit richtiger Spaltenreihenfolge
CREATE INDEX idx_bestellungen_optimal ON bestellungen(kunden_id, bestelldatum, status);
-- Gute Abfrage (nutzt Index vollständig)
SELECT * FROM bestellungen
WHERE kunden_id = 123
AND bestelldatum >= '2024-01-01'
AND status = 'completed';
-- Schlechte Abfrage (kann Index nicht nutzen)
SELECT * FROM bestellungen
WHERE status = 'completed'
AND bestelldatum >= '2024-01-01'; -- Führende Spalte nicht in WHERE
7. Index Maintenance and Monitoring
-- Update index statistics
-- MySQL
ANALYZE TABLE kunden;
-- PostgreSQL
ANALYZE kunden;
-- Check index fragmentation
-- SQL Server
SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('kunden'), NULL, NULL, 'DETAILED');
-- PostgreSQL
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE tablename = 'kunden';
-- Index rebuild for fragmentation
-- SQL Server
ALTER INDEX ALL ON kunden REBUILD;
-- MySQL (InnoDB)
ANALYZE TABLE kunden; -- Update statistics
OPTIMIZE TABLE kunden; -- Optimize table
-- PostgreSQL
REINDEX INDEX idx_kunden_name;
Performance Comparison Index Types
| Index Type | Search Type | Performance | Storage | Usage |
|---|---|---|---|---|
| B-Tree | Equality, Range | O(log n) | Medium | Standard Indexes |
| Hash | Equality Only | O(1) | Low | Memory Tables |
| Fulltext | Full Text | Variable | High | Text Search |
| Clustered | Primary Key | O(log n) | Table | Frequent Access |
Index Strategies
Single-Column Index
-- Simple index on one column
CREATE INDEX idx_kunden_email ON kunden(email);
Multi-Column Index
-- Composite index with optimal column order
CREATE INDEX idx_kunden_stadt_name ON kunden(stadt, name);
Covering Index
-- Index contains all required columns
CREATE INDEX idx_bestellungen_covering ON bestellungen(kunden_id, bestelldatum, gesamtbetrag);
Partial Index
-- Index only for part of the data
CREATE INDEX idx_aktive_kunden ON kunden(id) WHERE status = 'aktiv';
Function-Based Index
-- Index on calculated values
CREATE INDEX idx_kunden_name_lower ON kunden(LOWER(name));
Query Optimization
Maximize Index Usage
-- GOOD Queries (use indexes)
SELECT * FROM kunden WHERE stadt = 'Berlin'; -- Single-Column Index
SELECT * FROM bestellungen WHERE kunden_id = 123 AND datum > '2024-01-01'; -- Composite Index
SELECT * FROM artikel WHERE MATCH(titel) AGAINST('suchbegriff'); -- Fulltext Index
-- BAD Queries (don't use indexes)
SELECT * FROM kunden WHERE LOWER(name) = 'mustermann'; -- No Function-Based Index
SELECT * FROM bestellungen WHERE YEAR(datum) = 2024; -- Function on column
SELECT * FROM kunden WHERE name LIKE '%mustermann%'; -- Leading Wildcard
EXPLAIN Analysis
-- MySQL
EXPLAIN SELECT * FROM kunden WHERE stadt = 'Berlin';
-- Important columns:
-- type: ALL (bad), ref, range, index (good)
-- key: Used index
-- rows: Estimated row count
-- Extra: Using index (good), Using filesort (bad)
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM kunden WHERE stadt = 'Berlin';
-- Important information:
-- Seq Scan vs Index Scan
-- Index-Only Scan (very good)
-- Actual execution time
Advantages and Disadvantages
Advantages of Indexes
- Performance: Dramatically faster queries
- Sorting: ORDER BY without additional sorting
- Uniqueness: UNIQUE indexes guarantee data integrity
- Joins: Foreign key indexes accelerate JOINs
Disadvantages
- Storage Space: Indexes require additional storage
- Write Performance: INSERT/UPDATE/DELETE become slower
- Maintenance: Indexes must be maintained
- Overhead: Too many indexes can be counterproductive
Best Practices
When to Create Indexes
- Frequent WHERE clauses
- JOIN conditions
- ORDER BY clauses
- GROUP BY clauses
- UNIQUE constraints
When to Avoid Indexes
- Rarely used columns
- Tables with few rows
- Columns with high cardinality and low selectivity
- Frequent UPDATE/DELETE operations
Index Design
- Column Order: Selectivity descending
- Index Width: As narrow as possible
- Covering Index: Include all required columns
- Partial Index: Only for relevant data
Frequently Asked Questions
-
What is the difference between B-Tree and Hash Index? B-Tree supports equality and range search, Hash only equality search.
-
When do you use a Fulltext Index? For text search in large text fields with stemming and relevance scoring.
-
Explain Clustered vs Non-Clustered Index! Clustered: Physical data storage, Non-Clustered: Separate index structure.
-
How do you check index usage? With EXPLAIN/EXPLAIN ANALYZE to analyze the query plan.
Most Important Sources
- https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html
- https://www.postgresql.org/docs/current/indexes.html
- https://docs.microsoft.com/en-us/sql/relational-databases/indexes/indexes
Recommended Reading: Databases
Keine Bücher für Kategorie "datenbanken" gefunden.