Machine Learning Fundamentals: Supervised, Unsupervised & Reinforcement Learning with Algorithms
This article is a comprehensive introduction to Machine Learning Fundamentals – including Supervised, Unsupervised and Reinforcement Learning with algorithms and practical examples.
In a Nutshell
Machine Learning enables computers to learn from data. Supervised Learning learns with labeled data, Unsupervised Learning finds patterns in unlabeled data, Reinforcement Learning learns through rewards.
Compact Technical Description
Machine Learning is a subfield of artificial intelligence in which algorithms learn from data without being explicitly programmed.
Learning Categories:
Supervised Learning
- Concept: Learning with labeled training data
- Goal: Make predictions for new, unseen data
- Types: Classification (discrete values), Regression (continuous values)
- Algorithms: Linear Regression, Decision Trees, Random Forest, SVM, Neural Networks
Unsupervised Learning
- Concept: Learning without labeled data
- Goal: Discover structures and patterns in data
- Types: Clustering, Dimensionality Reduction, Association
- Algorithms: K-Means, Hierarchical Clustering, PCA, Apriori
Reinforcement Learning
- Concept: Learning through interaction with environment
- Goal: Maximization of cumulative reward
- Types: Model-based, Model-free, Multi-agent
- Algorithms: Q-Learning, Deep Q-Networks, Policy Gradients
Exam-Relevant Key Points
- Machine Learning: Automatic learning from data
- Supervised Learning: Learning with labeled data (Classification, Regression)
- Unsupervised Learning: Learning without labels (Clustering, Pattern Recognition)
- Reinforcement Learning: Learning through rewards (Agent, Environment, Actions)
- Training/Testing: Data splitting for model validation
- Overfitting/Underfitting: Model adaptation problems
- Feature Engineering: Data preparation and transformation
- Chamber of Commerce relevant: Modern AI technologies and applications
Core Components
- Data: Training, Validation, Test data
- Features: Input variables and characteristics
- Models: Mathematical functions and algorithms
- Training: Adjustment of model parameters
- Evaluation: Performance measurement and validation
- Prediction: Forecasts for new data
- Optimization: Hyperparameter tuning
- Deployment: Integration into production systems
Practical Examples
1. Supervised Learning with Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification, make_regression
# Supervised Learning Demo
class SupervisedLearningDemo:
def __init__(self):
self.models = {}
self.results = {}
# Linear Regression
def linear_regression_demo(self):
print("=== Linear Regression Demo ===")
# Create synthetic data
np.random.seed(42)
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Evaluation
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
print(f"Training MSE: {train_mse:.2f}")
print(f"Test MSE: {test_mse:.2f}")
print(f"Coefficient: {model.coef_[0]:.2f}")
print(f"Intercept: {model.intercept_:.2f}")
# Store results
self.models['linear_regression'] = model
self.results['linear_regression'] = {
'train_mse': train_mse,
'test_mse': test_mse,
'r2_score': model.score(X_test, y_test)
}
return X_train, X_test, y_train, y_test, y_test_pred
# Logistic Regression (Classification)
def logistic_regression_demo(self):
print("\n=== Logistic Regression Demo ===")
# Create classification data
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
n_informative=2, random_state=42, n_clusters_per_class=1)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model
model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)
# Make predictions
y_train_pred = model.predict(X_train_scaled)
y_test_pred = model.predict(X_test_scaled)
# Evaluation
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Training Accuracy: {train_accuracy:.3f}")
print(f"Test Accuracy: {test_accuracy:.3f}")
print("Test Classification Report:")
print(classification_report(y_test, y_test_pred))
# Store results
self.models['logistic_regression'] = model
self.results['logistic_regression'] = {
'train_accuracy': train_accuracy,
'test_accuracy': test_accuracy
}
return X_train_scaled, X_test_scaled, y_train, y_test, y_test_pred
# Decision Tree Classifier
def decision_tree_demo(self):
print("\n=== Decision Tree Demo ===")
# More complex classification data
X, y = make_classification(n_samples=300, n_features=4, n_redundant=1,
n_informative=3, random_state=42, n_clusters_per_class=2)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Decision Tree with different depths
depths = [3, 5, 10, None]
for depth in depths:
model = DecisionTreeClassifier(max_depth=depth, random_state=42)
model.fit(X_train, y_train)
# Make predictions
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Evaluation
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Max Depth {depth if depth else 'None'}:")
print(f" Training Accuracy: {train_accuracy:.3f}")
print(f" Test Accuracy: {test_accuracy:.3f}")
# Detect overfitting
overfitting = train_accuracy - test_accuracy
if overfitting > 0.1:
print(f" ⚠️ Overfitting detected (diff: {overfitting:.3f})")
# Store best model
best_model = DecisionTreeClassifier(max_depth=5, random_state=42)
best_model.fit(X_train, y_train)
self.models['decision_tree'] = best_model
return X_train, X_test, y_train, y_test
# Random Forest
def random_forest_demo(self):
print("\n=== Random Forest Demo ===")
# High-dimensional data
X, y = make_classification(n_samples=500, n_features=10, n_redundant=3,
n_informative=7, random_state=42, n_clusters_per_class=2)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Random Forest with different numbers of trees
n_estimators_list = [10, 50, 100, 200]
best_accuracy = 0
best_model = None
for n_estimators in n_estimators_list:
model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
model.fit(X_train, y_train)
# Make predictions
y_test_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_test_pred)
print(f"Trees: {n_estimators}, Test Accuracy: {accuracy:.3f}")
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model = model
print(f"\nBest Random Forest Accuracy: {best_accuracy:.3f}")
# Feature Importance
feature_importance = best_model.feature_importances_
print("Top 5 Feature Importances:")
for i, importance in sorted(enumerate(feature_importance), key=lambda x: x[1], reverse=True)[:5]:
print(f" Feature {i}: {importance:.3f}")
self.models['random_forest'] = best_model
self.results['random_forest'] = {'test_accuracy': best_accuracy}
return X_train, X_test, y_train, y_test
# Model Comparison
def compare_models(self):
print("\n=== Model Comparison ===")
# Comparison table
comparison_data = []
for model_name, results in self.results.items():
if 'test_accuracy' in results:
comparison_data.append({
'Model': model_name,
'Test Accuracy': f"{results['test_accuracy']:.3f}"
})
elif 'test_mse' in results:
comparison_data.append({
'Model': model_name,
'Test MSE': f"{results['test_mse']:.2f}",
'R² Score': f"{results['r2_score']:.3f}"
})
df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))
return df
# Run demo
def supervised_learning_demo():
demo = SupervisedLearningDemo()
# Linear Regression
X_lr_train, X_lr_test, y_lr_train, y_lr_test, y_lr_pred = demo.linear_regression_demo()
# Logistic Regression
X_log_train, X_log_test, y_log_train, y_log_test, y_log_pred = demo.logistic_regression_demo()
# Decision Tree
X_dt_train, X_dt_test, y_dt_train, y_dt_test = demo.decision_tree_demo()
# Random Forest
X_rf_train, X_rf_test, y_rf_train, y_rf_test = demo.random_forest_demo()
# Compare models
comparison = demo.compare_models()
return demo, comparison
if __name__ == "__main__":
demo, comparison = supervised_learning_demo()
2. Unsupervised Learning with Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.decomposition import PCA, TSNE
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs, make_moons, load_iris
# Unsupervised Learning Demo
class UnsupervisedLearningDemo:
def __init__(self):
self.models = {}
self.results = {}
# K-Means Clustering
def kmeans_demo(self):
print("=== K-Means Clustering Demo ===")
# Synthetic cluster data
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.8, random_state=42)
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# K-Means with different cluster numbers
cluster_range = range(2, 8)
silhouette_scores = []
inertias = []
for k in cluster_range:
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
cluster_labels = kmeans.fit_predict(X_scaled)
# Silhouette Score
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
silhouette_scores.append(silhouette_avg)
# Inertia (Within-cluster sum of squares)
inertias.append(kmeans.inertia_)
print(f"K={k}: Silhouette Score={silhouette_avg:.3f}, Inertia={inertia_avg:.1f}")
# Optimal K value based on Silhouette Score
optimal_k = cluster_range[np.argmax(silhouette_scores)]
print(f"\nOptimal K based on Silhouette: {optimal_k}")
# Final K-Means with optimal K
final_kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
final_labels = final_kmeans.fit_predict(X_scaled)
# Save results
self.models['kmeans'] = final_kmeans
self.results['kmeans'] = {
'optimal_k': optimal_k,
'silhouette_score': max(silhouette_scores),
'inertia': final_kmeans.inertia_
}
return X_scaled, final_labels, y_true
# DBSCAN Clustering
def dbscan_demo(self):
print("\n=== DBSCAN Clustering Demo ===")
# Non-spherical data
X, y_true = make_moons(n_samples=200, noise=0.1, random_state=42)
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# DBSCAN with different eps values
eps_values = [0.2, 0.3, 0.4, 0.5]
min_samples = 5
for eps in eps_values:
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
cluster_labels = dbscan.fit_predict(X_scaled)
# Number of clusters (ignores noise)
n_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
n_noise = list(cluster_labels).count(-1)
if n_clusters > 1:
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
else:
silhouette_avg = -1
print(f"eps={eps}: Clusters={n_clusters}, Noise={n_noise}, Silhouette={silhouette_avg:.3f}")
# Best DBSCAN
best_dbscan = DBSCAN(eps=0.3, min_samples=min_samples)
best_labels = best_dbscan.fit_predict(X_scaled)
self.models['dbscan'] = best_dbscan
self.results['dbscan'] = {
'n_clusters': len(set(best_labels)) - (1 if -1 in best_labels else 0),
'n_noise': list(best_labels).count(-1)
}
return X_scaled, best_labels, y_true
# Hierarchical Clustering
def hierarchical_clustering_demo(self):
print("\n=== Hierarchical Clustering Demo ===")
# Iris Dataset
iris = load_iris()
X = iris.data
y_true = iris.target
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Agglomerative Clustering with different linkage methods
linkage_methods = ['ward', 'complete', 'average', 'single']
for linkage in linkage_methods:
clustering = AgglomerativeClustering(n_clusters=3, linkage=linkage)
cluster_labels = clustering.fit_predict(X_scaled)
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
print(f"Linkage={linkage}: Silhouette Score={silhouette_avg:.3f}")
# Best linkage
best_clustering = AgglomerativeClustering(n_clusters=3, linkage='ward')
best_labels = best_clustering.fit_predict(X_scaled)
self.models['hierarchical'] = best_clustering
self.results['hierarchical'] = {
'silhouette_score': silhouette_score(X_scaled, best_labels)
}
return X_scaled, best_labels, y_true
# PCA (Principal Component Analysis)
def pca_demo(self):
print("\n=== PCA Demo ===")
# High-dimensional data
np.random.seed(42)
X = np.random.randn(100, 10)
# Generate correlations
X[:, 1] = X[:, 0] * 0.8 + np.random.randn(100) * 0.2
X[:, 2] = X[:, 0] * 0.6 + np.random.randn(100) * 0.4
X[:, 3] = X[:, 1] * 0.7 + np.random.randn(100) * 0.3
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# PCA with different numbers of components
n_components_range = range(2, 11)
explained_variances = []
for n in n_components_range:
pca = PCA(n_components=n)
X_pca = pca.fit_transform(X_scaled)
total_explained_variance = np.sum(pca.explained_variance_ratio_)
explained_variances.append(total_explained_variance)
print(f"Components={n}: Explained Variance={total_explained_variance:.3f}")
# Optimal number based on 95% variance
optimal_components = next(n for n, var in zip(n_components_range, explained_variances)
if var >= 0.95)
print(f"\nOptimal components for 95% variance: {optimal_components}")
# Final PCA
final_pca = PCA(n_components=optimal_components)
X_pca_final = final_pca.fit_transform(X_scaled)
# Feature Contributions
print("Top contributing features for first component:")
feature_contributions = np.abs(final_pca.components_[0])
top_features = np.argsort(feature_contributions)[-3:][::-1]
for i, feature_idx in enumerate(top_features):
print(f" Feature {feature_idx}: {feature_contributions[feature_idx]:.3f}")
self.models['pca'] = final_pca
self.results['pca'] = {
'optimal_components': optimal_components,
'explained_variance': np.sum(final_pca.explained_variance_ratio_)
}
return X_scaled, X_pca_final
# t-SNE for visualization
def tsne_demo(self):
print("\n=== t-SNE Demo ===")
# Iris Dataset for visualization
iris = load_iris()
X = iris.data
y = iris.target
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# t-SNE with different perplexity values
perplexity_values = [5, 15, 30, 50]
for perplexity in perplexity_values:
tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)
print(f"Perplexity={perplexity}: K-Loss={tsne.kl_divergence_:.3f}")
# Best t-SNE
best_tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne_final = best_tsne.fit_transform(X_scaled)
self.models['tsne'] = best_tsne
return X_scaled, X_tsne_final, y
# Clustering Evaluation
def evaluate_clustering(self, X, labels, true_labels=None):
print("\n=== Clustering Evaluation ===")
# Silhouette Score
if len(set(labels)) > 1:
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.3f}")
else:
print("Silhouette Score: N/A (only one cluster)")
# Cluster statistics
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
n_noise = list(labels).count(-1)
print(f"Number of clusters: {n_clusters}")
print(f"Number of noise points: {n_noise}")
# Cluster sizes
if n_clusters > 0:
cluster_sizes = [np.sum(labels == i) for i in range(n_clusters)]
print(f"Cluster sizes: {cluster_sizes}")
print(f"Average cluster size: {np.mean(cluster_sizes):.1f}")
return {
'silhouette_score': silhouette_avg if len(set(labels)) > 1 else None,
'n_clusters': n_clusters,
'n_noise': n_noise
}
# Model Comparison
def compare_clustering_models(self):
print("\n=== Clustering Models Comparison ===")
comparison_data = []
for model_name, results in self.results.items():
if 'silhouette_score' in results:
comparison_data.append({
'Model': model_name,
'Silhouette Score': f"{results['silhouette_score']:.3f}"
})
elif 'optimal_k' in results:
comparison_data.append({
'Model': model_name,
'Optimal K': results['optimal_k'],
'Silhouette Score': f"{results['silhouette_score']:.3f}"
})
df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))
return df
# Run demo
def unsupervised_learning_demo():
demo = UnsupervisedLearningDemo()
# K-Means
X_km, labels_km, true_km = demo.kmeans_demo()
demo.evaluate_clustering(X_km, labels_km, true_km)
# DBSCAN
X_db, labels_db, true_db = demo.dbscan_demo()
demo.evaluate_clustering(X_db, labels_db, true_db)
# Hierarchical Clustering
X_hc, labels_hc, true_hc = demo.hierarchical_clustering_demo()
demo.evaluate_clustering(X_hc, labels_hc, true_hc)
# PCA
X_pca, X_pca_transformed = demo.pca_demo()
# t-SNE
X_tsne, X_tsne_transformed, y_tsne = demo.tsne_demo()
# Compare models
comparison = demo.compare_clustering_models()
return demo, comparison
if __name__ == "__main__":
demo, comparison = unsupervised_learning_demo()
3. Reinforcement Learning with Python
import numpy as np
import random
import matplotlib.pyplot as plt
from collections import defaultdict
# Reinforcement Learning Demo
class ReinforcementLearningDemo:
def __init__(self):
self.environments = {}
self.agents = {}
self.results = {}
# Grid World Environment
class GridWorld:
def __init__(self, width=4, height=4):
self.width = width
self.height = height
self.state = (0, 0) # Starting position
self.goal = (width-1, height-1) # Goal position
self.obstacles = [(1, 1), (2, 2)] # Obstacles
self.terminal_states = [self.goal]
def reset(self):
self.state = (0, 0)
return self.state
def step(self, action):
x, y = self.state
# Execute actions
if action == 0: # Up
new_state = (x, max(0, y - 1))
elif action == 1: # Down
new_state = (x, min(self.height - 1, y + 1))
elif action == 2: # Left
new_state = (max(0, x - 1), y)
elif action == 3: # Right
new_state = (min(self.width - 1, x + 1), y)
else:
new_state = self.state
# Check obstacles
if new_state in self.obstacles:
new_state = self.state
# Calculate reward
if new_state == self.goal:
reward = 10
done = True
else:
reward = -1 # Small penalty for each step
done = False
self.state = new_state
return new_state, reward, done
def get_valid_actions(self):
return [0, 1, 2, 3] # Up, Down, Left, Right
def render(self):
grid = np.zeros((self.height, self.width))
# Mark obstacles
for obs in self.obstacles:
grid[obs[1], obs[0]] = -1
# Mark goal
grid[self.goal[1], self.goal[0]] = 10
# Mark current position
grid[self.state[1], self.state[0]] = 1
print("Grid World:")
print(grid)
print(f"Position: {self.state}, Goal: {self.goal}")
# Q-Learning Agent
class QLearningAgent:
def __init__(self, state_space_size, action_space_size, learning_rate=0.1,
discount_factor=0.9, epsilon=0.1):
self.state_space_size = state_space_size
self.action_space_size = action_space_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
# Initialize Q-table
self.q_table = defaultdict(lambda: np.zeros(action_space_size))
def get_state_index(self, state):
# Convert 2D coordinates to 1D index
x, y = state
return y * 4 + x
def choose_action(self, state, valid_actions):
state_idx = self.get_state_index(state)
# Epsilon-greedy strategy
if random.random() < self.epsilon:
return random.choice(valid_actions)
else:
q_values = self.q_table[state_idx]
valid_q_values = [q_values[action] for action in valid_actions]
max_q = max(valid_q_values)
# Choose randomly if Q-values are equal
best_actions = [action for action in valid_actions
if q_values[action] == max_q]
return random.choice(best_actions)
def update_q_value(self, state, action, reward, next_state, valid_next_actions):
state_idx = self.get_state_index(state)
next_state_idx = self.get_state_index(next_state)
# Update Q-value
current_q = self.q_table[state_idx][action]
if len(valid_next_actions) > 0:
max_next_q = max([self.q_table[next_state_idx][a] for a in valid_next_actions])
else:
max_next_q = 0
new_q = current_q + self.learning_rate * (
reward + self.discount_factor * max_next_q - current_q
)
self.q_table[state_idx][action] = new_q
def get_policy(self):
policy = {}
for state_idx in self.q_table.keys():
y = state_idx // 4
x = state_idx % 4
state = (x, y)
valid_actions = [0, 1, 2, 3] # All actions are valid
q_values = self.q_table[state_idx]
best_action = np.argmax(q_values)
policy[state] = best_action
return policy
# Q-Learning Demo
def q_learning_demo(self):
print("=== Q-Learning Demo ===")
# Create environment and agent
env = self.GridWorld(width=4, height=4)
agent = self.QLearningAgent(state_space_size=16, action_space_size=4)
# Training parameters
episodes = 1000
max_steps_per_episode = 100
# Training
episode_rewards = []
for episode in range(episodes):
state = env.reset()
total_reward = 0
done = False
steps = 0
while not done and steps < max_steps_per_episode:
valid_actions = env.get_valid_actions()
action = agent.choose_action(state, valid_actions)
next_state, reward, done = env.step(action)
valid_next_actions = env.get_valid_actions()
# Update Q-value
agent.update_q_value(state, action, reward, next_state, valid_next_actions)
state = next_state
total_reward += reward
steps += 1
episode_rewards.append(total_reward)
if episode % 100 == 0:
avg_reward = np.mean(episode_rewards[-100:])
print(f"Episode {episode}: Average Reward (last 100): {avg_reward:.2f}")
# Analyze results
final_policy = agent.get_policy()
print(f"\nFinal Policy:")
for state, action in final_policy.items():
action_names = {0: 'Up', 1: 'Down', 2: 'Left', 3: 'Right'}
print(f"State {state}: {action_names[action]}")
# Display Q-table
print(f"\nQ-Table (selected states):")
for state_idx in [0, 5, 10, 15]: # Corner points
y = state_idx // 4
x = state_idx % 4
state = (x, y)
q_values = agent.q_table[state_idx]
print(f"State {state}: {q_values}")
self.environments['gridworld'] = env
self.agents['qlearning'] = agent
self.results['qlearning'] = {
'episodes': episodes,
'final_avg_reward': np.mean(episode_rewards[-100:]),
'q_table_size': len(agent.q_table)
}
return episode_rewards
# Simple CartPole-like Environment
class CartPoleSimple:
def __init__(self):
self.angle = 0 # Angle of the pole
self.angular_velocity = 0 # Angular velocity
self.gravity = 9.8
self.pole_length = 1.0
self.dt = 0.1
def reset(self):
self.angle = random.uniform(-0.1, 0.1)
self.angular_velocity = 0
return self.get_state()
def get_state(self):
return (self.angle, self.angular_velocity)
def step(self, action):
# Actions: 0 = Left, 1 = Right
force = -10 if action == 0 else 10
# Physics update (simplified)
angular_acceleration = (self.gravity / self.pole_length) * np.sin(self.angle) + force
self.angular_velocity += angular_acceleration * self.dt
self.angle += self.angular_velocity * self.dt
# Reward and done condition
if abs(self.angle) > np.pi / 4: # Pole falls over
reward = -10
done = True
else:
reward = 1 # Reward for balancing
done = False
return self.get_state(), reward, done
def render(self):
print(f"Angle: {self.angle:.3f} rad ({np.degrees(self.angle):.1f}°), "
f"Angular Velocity: {self.angular_velocity:.3f}")
# Policy Gradient Agent (simplified)
class PolicyGradientAgent:
def __init__(self, state_dim=2, action_dim=2, learning_rate=0.01):
self.state_dim = state_dim
self.action_dim = action_dim
self.learning_rate = learning_rate
# Simple linear policy
self.weights = np.random.randn(state_dim, action_dim) * 0.1
def get_action_probabilities(self, state):
# Softmax over linear combination
logits = np.dot(state, self.weights)
exp_logits = np.exp(logits - np.max(logits))
return exp_logits / np.sum(exp_logits)
def choose_action(self, state):
action_probs = self.get_action_probabilities(state)
return np.random.choice(self.action_dim, p=action_probs)
def update_policy(self, states, actions, rewards):
# Simplified policy gradient update
for state, action, reward in zip(states, actions, rewards):
action_probs = self.get_action_probabilities(state)
# Calculate gradient
grad = np.zeros_like(self.weights)
for a in range(self.action_dim):
if a == action:
grad[:, a] = state * (1 - action_probs[a])
else:
grad[:, a] = -state * action_probs[a]
# Update
self.weights += self.learning_rate * reward * grad
# Policy Gradient Demo
def policy_gradient_demo(self):
print("\n=== Policy Gradient Demo ===")
env = self.CartPoleSimple()
agent = self.PolicyGradientAgent()
episodes = 500
episode_rewards = []
for episode in range(episodes):
state = env.reset()
states, actions, rewards = [], [], []
total_reward = 0
done = False
steps = 0
max_steps = 100
while not done and steps < max_steps:
action = agent.choose_action(state)
next_state, reward, done = env.step(action)
states.append(state)
actions.append(action)
rewards.append(reward)
state = next_state
total_reward += reward
steps += 1
# Policy update
agent.update_policy(states, actions, rewards)
episode_rewards.append(total_reward)
if episode % 50 == 0:
avg_reward = np.mean(episode_rewards[-50:])
print(f"Episode {episode}: Average Reward (last 50): {avg_reward:.2f}")
# Final evaluation
print(f"\nFinal Evaluation:")
state = env.reset()
for step in range(20):
action_probs = agent.get_action_probabilities(state)
action = np.argmax(action_probs)
state, reward, done = env.step(action)
env.render()
if done:
print("Episode finished!")
break
self.environments['cartpole'] = env
self.agents['policy_gradient'] = agent
self.results['policy_gradient'] = {
'episodes': episodes,
'final_avg_reward': np.mean(episode_rewards[-50:])
}
return episode_rewards
# Model Comparison
def compare_rl_models(self):
print("\n=== Reinforcement Learning Models Comparison ===")
comparison_data = []
for model_name, results in self.results.items():
comparison_data.append({
'Model': model_name,
'Episodes': results['episodes'],
'Final Avg Reward': f"{results['final_avg_reward']:.2f}"
})
df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))
return df
# Run demo
def reinforcement_learning_demo():
demo = ReinforcementLearningDemo()
# Q-Learning
q_rewards = demo.q_learning_demo()
# Policy Gradient
pg_rewards = demo.policy_gradient_demo()
# Compare models
comparison = demo.compare_rl_models()
# Visualize rewards
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(q_rewards)
plt.title('Q-Learning Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.subplot(1, 2, 2)
plt.plot(pg_rewards)
plt.title('Policy Gradient Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.tight_layout()
plt.show()
return demo, comparison
if __name__ == "__main__":
demo, comparison = reinforcement_learning_demo()
Machine Learning Types Overview
| Type | Data | Goal | Examples | Algorithms |
|---|---|---|---|---|
| Supervised | Labeled | Prediction | Classification, Regression | Linear Regression, Decision Trees |
| Unsupervised | Unlabeled | Find patterns | Clustering, Dimensionality Reduction | K-Means, PCA |
| Reinforcement | Environment | Maximum reward | Game-Playing, Robotics | Q-Learning, Policy Gradients |
Algorithms Comparison
Supervised Learning
| Algorithm | Type | Complexity | Advantages | Disadvantages |
|---|---|---|---|---|
| Linear Regression | Regression | O(n) | Interpretable | Only linear relationships |
| Logistic Regression | Classification | O(n) | Fast, interpretable | Linearity |
| Decision Trees | Both | O(n log n) | Interpretable | Overfitting |
| Random Forest | Both | O(n log n) | Robust, accurate | Complex |
| SVM | Both | O(n²) | High accuracy | Scales poorly |
Unsupervised Learning
| Algorithm | Type | Complexity | Advantages | Disadvantages |
|---|---|---|---|---|
| K-Means | Clustering | O(n k i) | Fast | Only spherical clusters |
| DBSCAN | Clustering | O(n log n) | Arbitrary shapes | Parameter-sensitive |
| PCA | Dimensionality | O(n d²) | Reduces dimensions | Linearity |
| t-SNE | Visualization | O(n²) | Non-linear | Slow |
Reinforcement Learning
| Algorithm | Type | Complexity | Advantages | Disadvantages |
|---|---|---|---|---|
| Q-Learning | Model-free | O(s a) | Simple | Discrete spaces |
| Deep Q-Network | Model-free | O(n) | Continuous | Unstable |
| Policy Gradients | Model-free | O(n) | Stochastic | High variance |
ML Workflow
1. Data Collection
# Identify data sources
# Ensure quality
# Consider ethics and privacy
2. Data Preparation
# Cleaning: Handle missing values
# Feature Engineering: Create new features
# Scaling: Normalization/Standardization
# Splitting: Train/Validation/Test
3. Model Selection
# Identify problem type
# Create baseline model
# Test multiple algorithms
# Optimize hyperparameters
4. Training
# Use cross-validation
# Avoid overfitting
# Implement early stopping
# Monitor metrics
5. Evaluation
# Measure performance
# Analyze errors
# Test robustness
# Assess business value
Evaluation Metrics
Classification
- Accuracy: Correct predictions / Total
- Precision: True Positives / (TP + FP)
- Recall: True Positives / (TP + FN)
- F1-Score: Harmonic mean of precision and recall
- ROC-AUC: Area Under ROC Curve
Regression
- MSE: Mean Squared Error
- RMSE: Root Mean Squared Error
- MAE: Mean Absolute Error
- R²: Coefficient of determination
Clustering
- Silhouette Score: Cluster quality
- Davies-Bouldin Index: Cluster separation
- Calinski-Harabasz: Cluster ratio
Overfitting vs Underfitting
Overfitting
- Symptoms: High training accuracy, low test accuracy
- Causes: Model too complex, insufficient data
- Solutions: Regularization, more data, simpler model
Underfitting
- Symptoms: Low accuracy on both datasets
- Causes: Model too simple, too few features
- Solutions: More complex model, feature engineering
Feature Engineering
Techniques
# Polynomial Features
# Interaction Terms
# Binning/Discretization
# Log-Transformation
# One-Hot Encoding
# Target Encoding
# Feature Selection
Automation
# AutoML Tools
# Feature Importance Analysis
# Recursive Feature Elimination
# Genetic Algorithms
Advantages and Disadvantages
Advantages of Machine Learning
- Automation: Reduce manual work
- Pattern Recognition: Find complex relationships
- Scalability: Process large amounts of data
- Adaptivity: Adapt to new data
Disadvantages
- Data Dependency: Result quality depends on data
- Complexity: Black-box problem
- Computational Costs: Training can be expensive
- Ethics: Consider bias and fairness
Common Exam Questions
-
What is the difference between Supervised and Unsupervised Learning? Supervised Learning uses labeled data for predictions, Unsupervised Learning finds patterns in unlabeled data.
-
Explain overfitting and how to avoid it! Overfitting is excessive adaptation to training data. Avoid it through regularization, more data, cross-validation.
-
When do you use Reinforcement Learning? When an agent should learn through interaction with an environment to achieve maximum reward.
-
What is the difference between Classification and Regression? Classification predicts discrete classes, Regression predicts continuous values.
Most Important Sources
- https://scikit-learn.org/stable/
- https://www.coursera.org/learn/machine-learning
- https://www.deeplearning.ai/