Published on

How Do Machines Learn? A Beginner-Friendly Breakdown

Authors
  • avatar
    Name
    Nguyen Phuc Cuong

Introduction

In my previous post, I shared an overview of AI and how to start learning AI in 2025. Today, as we continue this series, let's dive deeper into the AI world and explore a fundamental concept.

Have you ever wondered why your phone's camera can recognize your face, or how Gmail automatically sorts spam emails? The answer lies in machine learning – but what does that actually mean?

Think of it like teaching a child to distinguish between cats and dogs. Parents show real-world examples, pointing out the differences until the child learns to identify them independently. Machine learning works similarly – we provide as many examples as possible during training, allowing the system to predict results based on vast amounts of data.

A perfect example of this is handwriting recognition applications, where machines learn to read different handwriting styles by analyzing thousands of writing samples.

Machine Learning is the science (and art) of programming computers so they can learn from data

— Aurélien Géron

How does machine learn

Types of Machine Learning

Let's break down the three main types in simple terms:

Supervised Learning

The machine learns from labeled data, like teaching it to identify "spam" or "not spam" emails by showing thousands of pre-labeled examples.

Supervised Learning

Example: Email spam detection, medical diagnosis systems, image classification

Unsupervised Learning

The system finds hidden patterns in data without being given specific labels, like discovering customer groups based on purchasing behavior.

UnSupervised Learning

Example: Customer segmentation, recommendation systems, market research analysis

Reinforcement Learning

The machine learns through trial and error, receiving rewards for correct actions and penalties for mistakes – similar to training a pet or learning to play a game.

Reinforcement Learning

Example: Game-playing AI (like AlphaGo), autonomous vehicles, chatbot optimization

Real-World Machine Learning Applications

You interact with machine learning more often than you might think:

  • Social media feeds: Algorithms decide which posts you see first
  • Autonomous vehicles: Self-driving cars navigate using ML algorithms
  • Email filtering: Automatic spam detection and organization
  • Voice assistants: Siri, Alexa, and Google Assistant understand your commands
  • Streaming services: Netflix and Spotify recommendations
  • E-commerce: Product suggestions on Amazon and other platforms

Let's dive deeper in an example

Challenge: Predicting Home Values

Step 1: Import Libraries

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt

What's happening here?

  • pandas - Think of this as Excel for Python. It helps us organize data in tables
  • LinearRegression - This is our AI "brain" that will learn to predict house prices
  • train_test_split - A helper that splits our data into "study material" and "exam questions"
  • matplotlib - For creating charts and graphs (like making a presentation)

Step 2: Create Sample Data (Our House Examples)

# Sample house data
data = {
    'bedrooms': [2, 3, 4, 2, 3, 4, 5, 3, 2, 4],
    'bathrooms': [1, 2, 3, 1, 2, 2, 3, 2, 1, 3],
    'sqft': [1000, 1500, 2000, 900, 1200, 1800, 2500, 1400, 800, 2200],
    'price': [200000, 300000, 400000, 180000, 250000, 350000, 500000, 280000, 150000, 420000]
}

What's happening here?

  • We're creating a dictionary (like a filing cabinet) with 4 categories
  • Each list contains 10 examples of houses with their features
  • bedrooms - How many bedrooms each house has
  • bathrooms - How many bathrooms each house has
  • sqft - Square footage (house size)
  • price - What each house actually sold for

Step 3: Convert Data to DataFrame (Organize Like a Spreadsheet)

df = pd.DataFrame(data)

What it looks like:

   bedrooms  bathrooms  sqft   price
0         2          1  1000  200000
1         3          2  1500  300000
2         4          3  2000  400000
...

Step 4: Separate Features and Target (Input vs Output)

# Features (input) and target (output)
X = df[['bedrooms', 'bathrooms', 'sqft']]
y = df['price']

What's happening here?

  • X (features) = What we KNOW about a house (bedrooms, bathrooms, size)
  • y (target) = What we want to PREDICT (the price)

Real-world analogy:

  • X = The house description you show to a real estate agent
  • y = The price estimate they give you back

Why this separation?

  • The machine needs to learn: "When I see these features (X), the price should be (y)"
  • It's like teaching a child: "When you see these clues, this is the answer"

Step 5: Split Data for Training and Testing (Study vs Exam)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

What's happening here?

  • test_size=0.2 means 20% for testing, 80% for training
  • random_state=42 ensures we get the same split every time (for consistency)

Step 6: Create the AI Model (Build the Brain)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

What's happening here?

  • LinearRegression() creates an empty "brain" that can learn patterns
  • model.fit() is like the "learning" phase - it studies the training data

Step 7: Make a Prediction (Test the AI)

# Make predictions
new_house = pd.DataFrame([[3, 2, 1600]], columns=['bedrooms', 'bathrooms', 'sqft'])
predictions = model.predict(new_house)

What's happening here?

  • We create a new house with 3 bedrooms, 2 bathrooms, and 1600 sqft
  • model.predict() asks our trained AI: "What do you think this house costs?"
  • The AI uses what it learned to give us an estimated price

Step 8: Display the Results (Show the Answer)

print(f"House with 3 bedrooms, 2 bathrooms, 1600 sqft")
print(f"Predicted price: ${predictions[0]:,.2f}")

Step 9: Check Model Accuracy (How Good Is Our AI?)

# Show accuracy
accuracy = model.score(X_test, y_test)
print(f"Model accuracy: {accuracy:.2f}")

What we got

Run python your-code.py, you will got the result like this:

House with 3 bedrooms, 2 bathrooms, 1600 sqft
Predicted price: $311,844.96
Model accuracy: 0.98

Final thoughts

  • Machine learning isn't magic – it's all about data. The more quality data you have, the more accurate your results
  • Understanding these fundamentals is the best foundation for diving deeper into your AI journey
  • Every AI application you use daily relies on these core machine learning principles

Ready to explore more AI concepts? Follow along as we continue this journey into the fascinating world of artificial intelligence!

Last updated: Monday, July 21, 2025
Subscribe to the Newsletter

Get notified when I publish new articles. No spam, just high-quality tech content. After subscribing, please check your inbox for a confirmation email.

Subscribe to the newsletter