Skip to content
IRC-Coding IRC-Coding
Machine Learning Fundamentals Supervised Unsupervised Learning Reinforcement Learning ML Algorithms Artificial Intelligence

Machine Learning Fundamentals: Supervised, Unsupervised & Reinforcement

Master Machine Learning basics: Supervised, Unsupervised & Reinforcement Learning. Algorithms, concepts, applications with Python & Java examples.

S

schutzgeist

2 min read
Machine Learning Fundamentals: Supervised, Unsupervised & Reinforcement

Machine Learning Fundamentals: Supervised, Unsupervised & Reinforcement Learning with Algorithms

This article is a comprehensive introduction to Machine Learning Fundamentals – including Supervised, Unsupervised and Reinforcement Learning with algorithms and practical examples.

In a Nutshell

Machine Learning enables computers to learn from data. Supervised Learning learns with labeled data, Unsupervised Learning finds patterns in unlabeled data, Reinforcement Learning learns through rewards.

Compact Technical Description

Machine Learning is a subfield of artificial intelligence in which algorithms learn from data without being explicitly programmed.

Learning Categories:

Supervised Learning

  • Concept: Learning with labeled training data
  • Goal: Make predictions for new, unseen data
  • Types: Classification (discrete values), Regression (continuous values)
  • Algorithms: Linear Regression, Decision Trees, Random Forest, SVM, Neural Networks

Unsupervised Learning

  • Concept: Learning without labeled data
  • Goal: Discover structures and patterns in data
  • Types: Clustering, Dimensionality Reduction, Association
  • Algorithms: K-Means, Hierarchical Clustering, PCA, Apriori

Reinforcement Learning

  • Concept: Learning through interaction with environment
  • Goal: Maximization of cumulative reward
  • Types: Model-based, Model-free, Multi-agent
  • Algorithms: Q-Learning, Deep Q-Networks, Policy Gradients

Exam-Relevant Key Points

  • Machine Learning: Automatic learning from data
  • Supervised Learning: Learning with labeled data (Classification, Regression)
  • Unsupervised Learning: Learning without labels (Clustering, Pattern Recognition)
  • Reinforcement Learning: Learning through rewards (Agent, Environment, Actions)
  • Training/Testing: Data splitting for model validation
  • Overfitting/Underfitting: Model adaptation problems
  • Feature Engineering: Data preparation and transformation
  • Chamber of Commerce relevant: Modern AI technologies and applications

Core Components

  1. Data: Training, Validation, Test data
  2. Features: Input variables and characteristics
  3. Models: Mathematical functions and algorithms
  4. Training: Adjustment of model parameters
  5. Evaluation: Performance measurement and validation
  6. Prediction: Forecasts for new data
  7. Optimization: Hyperparameter tuning
  8. Deployment: Integration into production systems

Practical Examples

1. Supervised Learning with Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification, make_regression

# Supervised Learning Demo
class SupervisedLearningDemo:
    
    def __init__(self):
        self.models = {}
        self.results = {}
    
    # Linear Regression
    def linear_regression_demo(self):
        print("=== Linear Regression Demo ===")
        
        # Create synthetic data
        np.random.seed(42)
        X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Train model
        model = LinearRegression()
        model.fit(X_train, y_train)
        
        # Make predictions
        y_train_pred = model.predict(X_train)
        y_test_pred = model.predict(X_test)
        
        # Evaluation
        train_mse = mean_squared_error(y_train, y_train_pred)
        test_mse = mean_squared_error(y_test, y_test_pred)
        
        print(f"Training MSE: {train_mse:.2f}")
        print(f"Test MSE: {test_mse:.2f}")
        print(f"Coefficient: {model.coef_[0]:.2f}")
        print(f"Intercept: {model.intercept_:.2f}")
        
        # Store results
        self.models['linear_regression'] = model
        self.results['linear_regression'] = {
            'train_mse': train_mse,
            'test_mse': test_mse,
            'r2_score': model.score(X_test, y_test)
        }
        
        return X_train, X_test, y_train, y_test, y_test_pred
    
    # Logistic Regression (Classification)
    def logistic_regression_demo(self):
        print("\n=== Logistic Regression Demo ===")
        
        # Create classification data
        X, y = make_classification(n_samples=200, n_features=2, n_redundant=0, 
                                  n_informative=2, random_state=42, n_clusters_per_class=1)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Scale features
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)
        
        # Train model
        model = LogisticRegression(random_state=42)
        model.fit(X_train_scaled, y_train)
        
        # Make predictions
        y_train_pred = model.predict(X_train_scaled)
        y_test_pred = model.predict(X_test_scaled)
        
        # Evaluation
        train_accuracy = accuracy_score(y_train, y_train_pred)
        test_accuracy = accuracy_score(y_test, y_test_pred)
        
        print(f"Training Accuracy: {train_accuracy:.3f}")
        print(f"Test Accuracy: {test_accuracy:.3f}")
        print("Test Classification Report:")
        print(classification_report(y_test, y_test_pred))
        
        # Store results
        self.models['logistic_regression'] = model
        self.results['logistic_regression'] = {
            'train_accuracy': train_accuracy,
            'test_accuracy': test_accuracy
        }
        
        return X_train_scaled, X_test_scaled, y_train, y_test, y_test_pred
    
    # Decision Tree Classifier
    def decision_tree_demo(self):
        print("\n=== Decision Tree Demo ===")
        
        # More complex classification data
        X, y = make_classification(n_samples=300, n_features=4, n_redundant=1, 
                                  n_informative=3, random_state=42, n_clusters_per_class=2)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Decision Tree with different depths
        depths = [3, 5, 10, None]
        
        for depth in depths:
            model = DecisionTreeClassifier(max_depth=depth, random_state=42)
            model.fit(X_train, y_train)
            
            # Make predictions
            y_train_pred = model.predict(X_train)
            y_test_pred = model.predict(X_test)
            
            # Evaluation
            train_accuracy = accuracy_score(y_train, y_train_pred)
            test_accuracy = accuracy_score(y_test, y_test_pred)
            
            print(f"Max Depth {depth if depth else 'None'}:")
            print(f"  Training Accuracy: {train_accuracy:.3f}")
            print(f"  Test Accuracy: {test_accuracy:.3f}")
            
            # Detect overfitting
            overfitting = train_accuracy - test_accuracy
            if overfitting > 0.1:
                print(f"  ⚠️  Overfitting detected (diff: {overfitting:.3f})")
        
        # Store best model
        best_model = DecisionTreeClassifier(max_depth=5, random_state=42)
        best_model.fit(X_train, y_train)
        self.models['decision_tree'] = best_model
        
        return X_train, X_test, y_train, y_test
    
    # Random Forest
    def random_forest_demo(self):
        print("\n=== Random Forest Demo ===")
        
        # High-dimensional data
        X, y = make_classification(n_samples=500, n_features=10, n_redundant=3, 
                                  n_informative=7, random_state=42, n_clusters_per_class=2)
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Random Forest with different numbers of trees
        n_estimators_list = [10, 50, 100, 200]
        
        best_accuracy = 0
        best_model = None
        
        for n_estimators in n_estimators_list:
            model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
            model.fit(X_train, y_train)
            
            # Make predictions
            y_test_pred = model.predict(X_test)
            accuracy = accuracy_score(y_test, y_test_pred)
            
            print(f"Trees: {n_estimators}, Test Accuracy: {accuracy:.3f}")
            
            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_model = model
        
        print(f"\nBest Random Forest Accuracy: {best_accuracy:.3f}")
        
        # Feature Importance
        feature_importance = best_model.feature_importances_
        print("Top 5 Feature Importances:")
        for i, importance in sorted(enumerate(feature_importance), key=lambda x: x[1], reverse=True)[:5]:
            print(f"  Feature {i}: {importance:.3f}")
        
        self.models['random_forest'] = best_model
        self.results['random_forest'] = {'test_accuracy': best_accuracy}
        
        return X_train, X_test, y_train, y_test
    
    # Model Comparison
    def compare_models(self):
        print("\n=== Model Comparison ===")
        
        # Comparison table
        comparison_data = []
        
        for model_name, results in self.results.items():
            if 'test_accuracy' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Test Accuracy': f"{results['test_accuracy']:.3f}"
                })
            elif 'test_mse' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Test MSE': f"{results['test_mse']:.2f}",
                    'R² Score': f"{results['r2_score']:.3f}"
                })
        
        df = pd.DataFrame(comparison_data)
        print(df.to_string(index=False))
        
        return df

# Run demo
def supervised_learning_demo():
    demo = SupervisedLearningDemo()
    
    # Linear Regression
    X_lr_train, X_lr_test, y_lr_train, y_lr_test, y_lr_pred = demo.linear_regression_demo()
    
    # Logistic Regression
    X_log_train, X_log_test, y_log_train, y_log_test, y_log_pred = demo.logistic_regression_demo()
    
    # Decision Tree
    X_dt_train, X_dt_test, y_dt_train, y_dt_test = demo.decision_tree_demo()
    
    # Random Forest
    X_rf_train, X_rf_test, y_rf_train, y_rf_test = demo.random_forest_demo()
    
    # Compare models
    comparison = demo.compare_models()
    
    return demo, comparison

if __name__ == "__main__":
    demo, comparison = supervised_learning_demo()

2. Unsupervised Learning with Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.decomposition import PCA, TSNE
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs, make_moons, load_iris

# Unsupervised Learning Demo
class UnsupervisedLearningDemo:
    
    def __init__(self):
        self.models = {}
        self.results = {}
    
    # K-Means Clustering
    def kmeans_demo(self):
        print("=== K-Means Clustering Demo ===")
        
        # Synthetic cluster data
        X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.8, random_state=42)
        
        # Scale features
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # K-Means with different cluster numbers
        cluster_range = range(2, 8)
        silhouette_scores = []
        inertias = []
        
        for k in cluster_range:
            kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
            cluster_labels = kmeans.fit_predict(X_scaled)
            
            # Silhouette Score
            silhouette_avg = silhouette_score(X_scaled, cluster_labels)
            silhouette_scores.append(silhouette_avg)
            
            # Inertia (Within-cluster sum of squares)
            inertias.append(kmeans.inertia_)
            
            print(f"K={k}: Silhouette Score={silhouette_avg:.3f}, Inertia={inertia_avg:.1f}")
        
        # Optimal K value based on Silhouette Score
        optimal_k = cluster_range[np.argmax(silhouette_scores)]
        print(f"\nOptimal K based on Silhouette: {optimal_k}")
        
        # Final K-Means with optimal K
        final_kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
        final_labels = final_kmeans.fit_predict(X_scaled)
        
        # Save results
        self.models['kmeans'] = final_kmeans
        self.results['kmeans'] = {
            'optimal_k': optimal_k,
            'silhouette_score': max(silhouette_scores),
            'inertia': final_kmeans.inertia_
        }
        
        return X_scaled, final_labels, y_true
    
    # DBSCAN Clustering
    def dbscan_demo(self):
        print("\n=== DBSCAN Clustering Demo ===")
        
        # Non-spherical data
        X, y_true = make_moons(n_samples=200, noise=0.1, random_state=42)
        
        # Scale features
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # DBSCAN with different eps values
        eps_values = [0.2, 0.3, 0.4, 0.5]
        min_samples = 5
        
        for eps in eps_values:
            dbscan = DBSCAN(eps=eps, min_samples=min_samples)
            cluster_labels = dbscan.fit_predict(X_scaled)
            
            # Number of clusters (ignores noise)
            n_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
            n_noise = list(cluster_labels).count(-1)
            
            if n_clusters > 1:
                silhouette_avg = silhouette_score(X_scaled, cluster_labels)
            else:
                silhouette_avg = -1
            
            print(f"eps={eps}: Clusters={n_clusters}, Noise={n_noise}, Silhouette={silhouette_avg:.3f}")
        
        # Best DBSCAN
        best_dbscan = DBSCAN(eps=0.3, min_samples=min_samples)
        best_labels = best_dbscan.fit_predict(X_scaled)
        
        self.models['dbscan'] = best_dbscan
        self.results['dbscan'] = {
            'n_clusters': len(set(best_labels)) - (1 if -1 in best_labels else 0),
            'n_noise': list(best_labels).count(-1)
        }
        
        return X_scaled, best_labels, y_true
    
    # Hierarchical Clustering
    def hierarchical_clustering_demo(self):
        print("\n=== Hierarchical Clustering Demo ===")
        
        # Iris Dataset
        iris = load_iris()
        X = iris.data
        y_true = iris.target
        
        # Scale features
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # Agglomerative Clustering with different linkage methods
        linkage_methods = ['ward', 'complete', 'average', 'single']
        
        for linkage in linkage_methods:
            clustering = AgglomerativeClustering(n_clusters=3, linkage=linkage)
            cluster_labels = clustering.fit_predict(X_scaled)
            
            silhouette_avg = silhouette_score(X_scaled, cluster_labels)
            
            print(f"Linkage={linkage}: Silhouette Score={silhouette_avg:.3f}")
        
        # Best linkage
        best_clustering = AgglomerativeClustering(n_clusters=3, linkage='ward')
        best_labels = best_clustering.fit_predict(X_scaled)
        
        self.models['hierarchical'] = best_clustering
        self.results['hierarchical'] = {
            'silhouette_score': silhouette_score(X_scaled, best_labels)
        }
        
        return X_scaled, best_labels, y_true
    
    # PCA (Principal Component Analysis)
    def pca_demo(self):
        print("\n=== PCA Demo ===")
        
        # High-dimensional data
        np.random.seed(42)
        X = np.random.randn(100, 10)
        
        # Generate correlations
        X[:, 1] = X[:, 0] * 0.8 + np.random.randn(100) * 0.2
        X[:, 2] = X[:, 0] * 0.6 + np.random.randn(100) * 0.4
        X[:, 3] = X[:, 1] * 0.7 + np.random.randn(100) * 0.3
        
        # Scale features
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # PCA with different numbers of components
        n_components_range = range(2, 11)
        explained_variances = []
        
        for n in n_components_range:
            pca = PCA(n_components=n)
            X_pca = pca.fit_transform(X_scaled)
            
            total_explained_variance = np.sum(pca.explained_variance_ratio_)
            explained_variances.append(total_explained_variance)
            
            print(f"Components={n}: Explained Variance={total_explained_variance:.3f}")
        
        # Optimal number based on 95% variance
        optimal_components = next(n for n, var in zip(n_components_range, explained_variances) 
                               if var >= 0.95)
        print(f"\nOptimal components for 95% variance: {optimal_components}")
        
        # Final PCA
        final_pca = PCA(n_components=optimal_components)
        X_pca_final = final_pca.fit_transform(X_scaled)
        
        # Feature Contributions
        print("Top contributing features for first component:")
        feature_contributions = np.abs(final_pca.components_[0])
        top_features = np.argsort(feature_contributions)[-3:][::-1]
        
        for i, feature_idx in enumerate(top_features):
            print(f"  Feature {feature_idx}: {feature_contributions[feature_idx]:.3f}")
        
        self.models['pca'] = final_pca
        self.results['pca'] = {
            'optimal_components': optimal_components,
            'explained_variance': np.sum(final_pca.explained_variance_ratio_)
        }
        
        return X_scaled, X_pca_final
    
    # t-SNE for visualization
    def tsne_demo(self):
        print("\n=== t-SNE Demo ===")
        
        # Iris Dataset for visualization
        iris = load_iris()
        X = iris.data
        y = iris.target
        
        # Scale features
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # t-SNE with different perplexity values
        perplexity_values = [5, 15, 30, 50]
        
        for perplexity in perplexity_values:
            tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
            X_tsne = tsne.fit_transform(X_scaled)
            
            print(f"Perplexity={perplexity}: K-Loss={tsne.kl_divergence_:.3f}")
        
        # Best t-SNE
        best_tsne = TSNE(n_components=2, perplexity=30, random_state=42)
        X_tsne_final = best_tsne.fit_transform(X_scaled)
        
        self.models['tsne'] = best_tsne
        
        return X_scaled, X_tsne_final, y
    
    # Clustering Evaluation
    def evaluate_clustering(self, X, labels, true_labels=None):
        print("\n=== Clustering Evaluation ===")
        
        # Silhouette Score
        if len(set(labels)) > 1:
            silhouette_avg = silhouette_score(X, labels)
            print(f"Silhouette Score: {silhouette_avg:.3f}")
        else:
            print("Silhouette Score: N/A (only one cluster)")
        
        # Cluster statistics
        n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
        n_noise = list(labels).count(-1)
        
        print(f"Number of clusters: {n_clusters}")
        print(f"Number of noise points: {n_noise}")
        
        # Cluster sizes
        if n_clusters > 0:
            cluster_sizes = [np.sum(labels == i) for i in range(n_clusters)]
            print(f"Cluster sizes: {cluster_sizes}")
            print(f"Average cluster size: {np.mean(cluster_sizes):.1f}")
        
        return {
            'silhouette_score': silhouette_avg if len(set(labels)) > 1 else None,
            'n_clusters': n_clusters,
            'n_noise': n_noise
        }
    
    # Model Comparison
    def compare_clustering_models(self):
        print("\n=== Clustering Models Comparison ===")
        
        comparison_data = []
        
        for model_name, results in self.results.items():
            if 'silhouette_score' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Silhouette Score': f"{results['silhouette_score']:.3f}"
                })
            elif 'optimal_k' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Optimal K': results['optimal_k'],
                    'Silhouette Score': f"{results['silhouette_score']:.3f}"
                })
        
        df = pd.DataFrame(comparison_data)
        print(df.to_string(index=False))
        
        return df

# Run demo
def unsupervised_learning_demo():
    demo = UnsupervisedLearningDemo()
    
    # K-Means
    X_km, labels_km, true_km = demo.kmeans_demo()
    demo.evaluate_clustering(X_km, labels_km, true_km)
    
    # DBSCAN
    X_db, labels_db, true_db = demo.dbscan_demo()
    demo.evaluate_clustering(X_db, labels_db, true_db)
    
    # Hierarchical Clustering
    X_hc, labels_hc, true_hc = demo.hierarchical_clustering_demo()
    demo.evaluate_clustering(X_hc, labels_hc, true_hc)
    
    # PCA
    X_pca, X_pca_transformed = demo.pca_demo()
    
    # t-SNE
    X_tsne, X_tsne_transformed, y_tsne = demo.tsne_demo()
    
    # Compare models
    comparison = demo.compare_clustering_models()
    
    return demo, comparison

if __name__ == "__main__":
    demo, comparison = unsupervised_learning_demo()

3. Reinforcement Learning with Python

import numpy as np
import random
import matplotlib.pyplot as plt
from collections import defaultdict

# Reinforcement Learning Demo
class ReinforcementLearningDemo:
    
    def __init__(self):
        self.environments = {}
        self.agents = {}
        self.results = {}
    
    # Grid World Environment
    class GridWorld:
        def __init__(self, width=4, height=4):
            self.width = width
            self.height = height
            self.state = (0, 0)  # Starting position
            self.goal = (width-1, height-1)  # Goal position
            self.obstacles = [(1, 1), (2, 2)]  # Obstacles
            self.terminal_states = [self.goal]
            
        def reset(self):
            self.state = (0, 0)
            return self.state
        
        def step(self, action):
            x, y = self.state
            
            # Execute actions
            if action == 0:  # Up
                new_state = (x, max(0, y - 1))
            elif action == 1:  # Down
                new_state = (x, min(self.height - 1, y + 1))
            elif action == 2:  # Left
                new_state = (max(0, x - 1), y)
            elif action == 3:  # Right
                new_state = (min(self.width - 1, x + 1), y)
            else:
                new_state = self.state
            
            # Check obstacles
            if new_state in self.obstacles:
                new_state = self.state
            
            # Calculate reward
            if new_state == self.goal:
                reward = 10
                done = True
            else:
                reward = -1  # Small penalty for each step
                done = False
            
            self.state = new_state
            return new_state, reward, done
        
        def get_valid_actions(self):
            return [0, 1, 2, 3]  # Up, Down, Left, Right
        
        def render(self):
            grid = np.zeros((self.height, self.width))
            
            # Mark obstacles
            for obs in self.obstacles:
                grid[obs[1], obs[0]] = -1
            
            # Mark goal
            grid[self.goal[1], self.goal[0]] = 10
            
            # Mark current position
            grid[self.state[1], self.state[0]] = 1
            
            print("Grid World:")
            print(grid)
            print(f"Position: {self.state}, Goal: {self.goal}")
    
    # Q-Learning Agent
    class QLearningAgent:
        def __init__(self, state_space_size, action_space_size, learning_rate=0.1, 
                     discount_factor=0.9, epsilon=0.1):
            self.state_space_size = state_space_size
            self.action_space_size = action_space_size
            self.learning_rate = learning_rate
            self.discount_factor = discount_factor
            self.epsilon = epsilon
            
            # Initialize Q-table
            self.q_table = defaultdict(lambda: np.zeros(action_space_size))
            
        def get_state_index(self, state):
            # Convert 2D coordinates to 1D index
            x, y = state
            return y * 4 + x
        
        def choose_action(self, state, valid_actions):
            state_idx = self.get_state_index(state)
            
            # Epsilon-greedy strategy
            if random.random() < self.epsilon:
                return random.choice(valid_actions)
            else:
                q_values = self.q_table[state_idx]
                valid_q_values = [q_values[action] for action in valid_actions]
                max_q = max(valid_q_values)
                # Choose randomly if Q-values are equal
                best_actions = [action for action in valid_actions 
                              if q_values[action] == max_q]
                return random.choice(best_actions)
        
        def update_q_value(self, state, action, reward, next_state, valid_next_actions):
            state_idx = self.get_state_index(state)
            next_state_idx = self.get_state_index(next_state)
            
            # Update Q-value
            current_q = self.q_table[state_idx][action]
            
            if len(valid_next_actions) > 0:
                max_next_q = max([self.q_table[next_state_idx][a] for a in valid_next_actions])
            else:
                max_next_q = 0
            
            new_q = current_q + self.learning_rate * (
                reward + self.discount_factor * max_next_q - current_q
            )
            
            self.q_table[state_idx][action] = new_q
        
        def get_policy(self):
            policy = {}
            for state_idx in self.q_table.keys():
                y = state_idx // 4
                x = state_idx % 4
                state = (x, y)
                
                valid_actions = [0, 1, 2, 3]  # All actions are valid
                q_values = self.q_table[state_idx]
                best_action = np.argmax(q_values)
                
                policy[state] = best_action
            
            return policy
    
    # Q-Learning Demo
    def q_learning_demo(self):
        print("=== Q-Learning Demo ===")
        
        # Create environment and agent
        env = self.GridWorld(width=4, height=4)
        agent = self.QLearningAgent(state_space_size=16, action_space_size=4)
        
        # Training parameters
        episodes = 1000
        max_steps_per_episode = 100
        
        # Training
        episode_rewards = []
        
        for episode in range(episodes):
            state = env.reset()
            total_reward = 0
            done = False
            steps = 0
            
            while not done and steps < max_steps_per_episode:
                valid_actions = env.get_valid_actions()
                action = agent.choose_action(state, valid_actions)
                
                next_state, reward, done = env.step(action)
                valid_next_actions = env.get_valid_actions()
                
                # Update Q-value
                agent.update_q_value(state, action, reward, next_state, valid_next_actions)
                
                state = next_state
                total_reward += reward
                steps += 1
            
            episode_rewards.append(total_reward)
            
            if episode % 100 == 0:
                avg_reward = np.mean(episode_rewards[-100:])
                print(f"Episode {episode}: Average Reward (last 100): {avg_reward:.2f}")
        
        # Analyze results
        final_policy = agent.get_policy()
        
        print(f"\nFinal Policy:")
        for state, action in final_policy.items():
            action_names = {0: 'Up', 1: 'Down', 2: 'Left', 3: 'Right'}
            print(f"State {state}: {action_names[action]}")
        
        # Display Q-table
        print(f"\nQ-Table (selected states):")
        for state_idx in [0, 5, 10, 15]:  # Corner points
            y = state_idx // 4
            x = state_idx % 4
            state = (x, y)
            q_values = agent.q_table[state_idx]
            print(f"State {state}: {q_values}")
        
        self.environments['gridworld'] = env
        self.agents['qlearning'] = agent
        self.results['qlearning'] = {
            'episodes': episodes,
            'final_avg_reward': np.mean(episode_rewards[-100:]),
            'q_table_size': len(agent.q_table)
        }
        
        return episode_rewards
    
    # Simple CartPole-like Environment
    class CartPoleSimple:
        def __init__(self):
            self.angle = 0  # Angle of the pole
            self.angular_velocity = 0  # Angular velocity
            self.gravity = 9.8
            self.pole_length = 1.0
            self.dt = 0.1
            
        def reset(self):
            self.angle = random.uniform(-0.1, 0.1)
            self.angular_velocity = 0
            return self.get_state()
        
        def get_state(self):
            return (self.angle, self.angular_velocity)
        
        def step(self, action):
            # Actions: 0 = Left, 1 = Right
            force = -10 if action == 0 else 10
            
            # Physics update (simplified)
            angular_acceleration = (self.gravity / self.pole_length) * np.sin(self.angle) + force
            
            self.angular_velocity += angular_acceleration * self.dt
            self.angle += self.angular_velocity * self.dt
            
            # Reward and done condition
            if abs(self.angle) > np.pi / 4:  # Pole falls over
                reward = -10
                done = True
            else:
                reward = 1  # Reward for balancing
                done = False
            
            return self.get_state(), reward, done
        
        def render(self):
            print(f"Angle: {self.angle:.3f} rad ({np.degrees(self.angle):.1f}°), "
                  f"Angular Velocity: {self.angular_velocity:.3f}")
    
    # Policy Gradient Agent (simplified)
    class PolicyGradientAgent:
        def __init__(self, state_dim=2, action_dim=2, learning_rate=0.01):
            self.state_dim = state_dim
            self.action_dim = action_dim
            self.learning_rate = learning_rate
            
            # Simple linear policy
            self.weights = np.random.randn(state_dim, action_dim) * 0.1
            
        def get_action_probabilities(self, state):
            # Softmax over linear combination
            logits = np.dot(state, self.weights)
            exp_logits = np.exp(logits - np.max(logits))
            return exp_logits / np.sum(exp_logits)
        
        def choose_action(self, state):
            action_probs = self.get_action_probabilities(state)
            return np.random.choice(self.action_dim, p=action_probs)
        
        def update_policy(self, states, actions, rewards):
            # Simplified policy gradient update
            for state, action, reward in zip(states, actions, rewards):
                action_probs = self.get_action_probabilities(state)
                
                # Calculate gradient
                grad = np.zeros_like(self.weights)
                for a in range(self.action_dim):
                    if a == action:
                        grad[:, a] = state * (1 - action_probs[a])
                    else:
                        grad[:, a] = -state * action_probs[a]
                
                # Update
                self.weights += self.learning_rate * reward * grad
    
    # Policy Gradient Demo
    def policy_gradient_demo(self):
        print("\n=== Policy Gradient Demo ===")
        
        env = self.CartPoleSimple()
        agent = self.PolicyGradientAgent()
        
        episodes = 500
        episode_rewards = []
        
        for episode in range(episodes):
            state = env.reset()
            states, actions, rewards = [], [], []
            total_reward = 0
            done = False
            steps = 0
            max_steps = 100
            
            while not done and steps < max_steps:
                action = agent.choose_action(state)
                next_state, reward, done = env.step(action)
                
                states.append(state)
                actions.append(action)
                rewards.append(reward)
                
                state = next_state
                total_reward += reward
                steps += 1
            
            # Policy update
            agent.update_policy(states, actions, rewards)
            episode_rewards.append(total_reward)
            
            if episode % 50 == 0:
                avg_reward = np.mean(episode_rewards[-50:])
                print(f"Episode {episode}: Average Reward (last 50): {avg_reward:.2f}")
        
        # Final evaluation
        print(f"\nFinal Evaluation:")
        state = env.reset()
        for step in range(20):
            action_probs = agent.get_action_probabilities(state)
            action = np.argmax(action_probs)
            state, reward, done = env.step(action)
            env.render()
            
            if done:
                print("Episode finished!")
                break
        
        self.environments['cartpole'] = env
        self.agents['policy_gradient'] = agent
        self.results['policy_gradient'] = {
            'episodes': episodes,
            'final_avg_reward': np.mean(episode_rewards[-50:])
        }
        
        return episode_rewards
    
    # Model Comparison
    def compare_rl_models(self):
        print("\n=== Reinforcement Learning Models Comparison ===")
        
        comparison_data = []
        
        for model_name, results in self.results.items():
            comparison_data.append({
                'Model': model_name,
                'Episodes': results['episodes'],
                'Final Avg Reward': f"{results['final_avg_reward']:.2f}"
            })
        
        df = pd.DataFrame(comparison_data)
        print(df.to_string(index=False))
        
        return df

# Run demo
def reinforcement_learning_demo():
    demo = ReinforcementLearningDemo()
    
    # Q-Learning
    q_rewards = demo.q_learning_demo()
    
    # Policy Gradient
    pg_rewards = demo.policy_gradient_demo()
    
    # Compare models
    comparison = demo.compare_rl_models()
    
    # Visualize rewards
    plt.figure(figsize=(12, 4))
    
    plt.subplot(1, 2, 1)
    plt.plot(q_rewards)
    plt.title('Q-Learning Rewards')
    plt.xlabel('Episode')
    plt.ylabel('Total Reward')
    
    plt.subplot(1, 2, 2)
    plt.plot(pg_rewards)
    plt.title('Policy Gradient Rewards')
    plt.xlabel('Episode')
    plt.ylabel('Total Reward')
    
    plt.tight_layout()
    plt.show()
    
    return demo, comparison

if __name__ == "__main__":
    demo, comparison = reinforcement_learning_demo()

Machine Learning Types Overview

TypeDataGoalExamplesAlgorithms
SupervisedLabeledPredictionClassification, RegressionLinear Regression, Decision Trees
UnsupervisedUnlabeledFind patternsClustering, Dimensionality ReductionK-Means, PCA
ReinforcementEnvironmentMaximum rewardGame-Playing, RoboticsQ-Learning, Policy Gradients

Algorithms Comparison

Supervised Learning

AlgorithmTypeComplexityAdvantagesDisadvantages
Linear RegressionRegressionO(n)InterpretableOnly linear relationships
Logistic RegressionClassificationO(n)Fast, interpretableLinearity
Decision TreesBothO(n log n)InterpretableOverfitting
Random ForestBothO(n log n)Robust, accurateComplex
SVMBothO(n²)High accuracyScales poorly

Unsupervised Learning

AlgorithmTypeComplexityAdvantagesDisadvantages
K-MeansClusteringO(n k i)FastOnly spherical clusters
DBSCANClusteringO(n log n)Arbitrary shapesParameter-sensitive
PCADimensionalityO(n d²)Reduces dimensionsLinearity
t-SNEVisualizationO(n²)Non-linearSlow

Reinforcement Learning

AlgorithmTypeComplexityAdvantagesDisadvantages
Q-LearningModel-freeO(s a)SimpleDiscrete spaces
Deep Q-NetworkModel-freeO(n)ContinuousUnstable
Policy GradientsModel-freeO(n)StochasticHigh variance

ML Workflow

1. Data Collection

# Identify data sources
# Ensure quality
# Consider ethics and privacy

2. Data Preparation

# Cleaning: Handle missing values
# Feature Engineering: Create new features
# Scaling: Normalization/Standardization
# Splitting: Train/Validation/Test

3. Model Selection

# Identify problem type
# Create baseline model
# Test multiple algorithms
# Optimize hyperparameters

4. Training

# Use cross-validation
# Avoid overfitting
# Implement early stopping
# Monitor metrics

5. Evaluation

# Measure performance
# Analyze errors
# Test robustness
# Assess business value

Evaluation Metrics

Classification

  • Accuracy: Correct predictions / Total
  • Precision: True Positives / (TP + FP)
  • Recall: True Positives / (TP + FN)
  • F1-Score: Harmonic mean of precision and recall
  • ROC-AUC: Area Under ROC Curve

Regression

  • MSE: Mean Squared Error
  • RMSE: Root Mean Squared Error
  • MAE: Mean Absolute Error
  • : Coefficient of determination

Clustering

  • Silhouette Score: Cluster quality
  • Davies-Bouldin Index: Cluster separation
  • Calinski-Harabasz: Cluster ratio

Overfitting vs Underfitting

Overfitting

  • Symptoms: High training accuracy, low test accuracy
  • Causes: Model too complex, insufficient data
  • Solutions: Regularization, more data, simpler model

Underfitting

  • Symptoms: Low accuracy on both datasets
  • Causes: Model too simple, too few features
  • Solutions: More complex model, feature engineering

Feature Engineering

Techniques

# Polynomial Features
# Interaction Terms
# Binning/Discretization
# Log-Transformation
# One-Hot Encoding
# Target Encoding
# Feature Selection

Automation

# AutoML Tools
# Feature Importance Analysis
# Recursive Feature Elimination
# Genetic Algorithms

Advantages and Disadvantages

Advantages of Machine Learning

  • Automation: Reduce manual work
  • Pattern Recognition: Find complex relationships
  • Scalability: Process large amounts of data
  • Adaptivity: Adapt to new data

Disadvantages

  • Data Dependency: Result quality depends on data
  • Complexity: Black-box problem
  • Computational Costs: Training can be expensive
  • Ethics: Consider bias and fairness

Common Exam Questions

  1. What is the difference between Supervised and Unsupervised Learning? Supervised Learning uses labeled data for predictions, Unsupervised Learning finds patterns in unlabeled data.

  2. Explain overfitting and how to avoid it! Overfitting is excessive adaptation to training data. Avoid it through regularization, more data, cross-validation.

  3. When do you use Reinforcement Learning? When an agent should learn through interaction with an environment to achieve maximum reward.

  4. What is the difference between Classification and Regression? Classification predicts discrete classes, Regression predicts continuous values.

Most Important Sources

  1. https://scikit-learn.org/stable/
  2. https://www.coursera.org/learn/machine-learning
  3. https://www.deeplearning.ai/
Back to Blog
Share:

Related Posts