Friday, November 14, 2025

thumbnail

Reinforcement Learning: An Introduction with a Simple Game

 Reinforcement Learning: An Introduction with a Simple Game


Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment. Instead of being told the correct answers (as in supervised learning), the agent learns through trial and error, guided by rewards or penalties.


1. What Is Reinforcement Learning?


In RL, a system learns by doing:


The agent takes an action.


The environment responds with a new state.


The agent receives a reward (positive or negative).


Over time, the agent learns which actions lead to the most reward.


RL focuses on learning a policy—a strategy that tells the agent the best action to take in each situation.


2. Key Concepts in Reinforcement Learning

a. Agent


The learner or decision-maker (e.g., a robot, game player, etc.).


b. Environment


The world the agent interacts with.


c. State


A description of the current situation.


d. Action


What the agent chooses to do.


e. Reward


Feedback from the environment indicating how good the action was.


f. Policy


A mapping from states to actions (the agent’s strategy).


g. Value Function


An estimate of how good a state or action is in terms of long-term rewards.


3. A Simple Game Example: The 1D Treasure Hunt


Let's introduce reinforcement learning with a very simple game:


Game Description


The environment is a 1-dimensional line with 5 positions.


The agent starts at position 1.


The treasure is at position 5.


The agent can move Left or Right.


If the agent reaches position 5, it gets a reward of +10.


Each move costs -1 (to encourage efficient behavior).


Board Layout

[1] - [2] - [3] - [4] - [5]

       Treasure here →


Goal


Learn the quickest path to the treasure.


4. How Reinforcement Learning Works in This Game


The agent explores by trying different actions:


Moves right or left randomly.


Receives –1 for each step.


Receives +10 when it reaches the treasure.


Over many episodes, it learns that moving Right consistently leads to better long-term rewards.


This process is known as exploration and exploitation:


Exploration → trying new actions


Exploitation → using what it has learned to get more reward


5. Using Q-Learning (A Basic RL Algorithm)


Q-Learning helps the agent learn the best action for each state.


Q-Table


We can create a table where:


Rows = states (1 to 5)


Columns = actions (Left, Right)


Cell value = expected reward (Q-value)


Initially, all Q-values start at zero.


Learning Formula (Simplified)


When taking action A in state S and moving to state S’, update:


Q(S, A) = reward + max(Q(S'), all actions)



Over time, the Q-table fills with values that represent the best decisions.


6. Final Learned Behavior


After enough learning, the agent develops this policy:


State Best Action

1 Right

2 Right

3 Right

4 Right

5 None (goal reached)


The agent has discovered that always moving Right is the fastest way to get the treasure.


7. Why Reinforcement Learning Is Powerful


RL can solve complex problems such as:


Playing chess, Go, or video games


Controlling robots


Optimizing factory processes


Dynamic pricing and recommendation systems


Autonomous driving


Smart energy management


The treasure-hunt example is small, but the same principles power advanced RL systems.


8. Summary


Reinforcement Learning is about learning by doing.

In our simple game:


The agent interacts with an environment.


It learns from rewards and penalties.


It gradually discovers the optimal strategy.


This simple treasure-hunt example demonstrates the foundation of RL algorithms used in real-world applications.


If you want, I can also provide:


✅ Python code for the simple RL game

✅ A more advanced example (e.g., Grid World, Tic-Tac-Toe)

✅ Visual diagrams of the RL process

✅ A comparison of popular RL algorithms (Q-learning, DQN, PPO)


Learn Data Science Course in Hyderabad

Read More

Advanced and Niche Topics in Data Science

The Essential ETL Pipeline for Data Engineering

Data Visualization Tools: Power BI vs. Tableau

An Introduction to Data Warehousing and Data Lakes

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions 

Subscribe by Email

Follow Updates Articles from This Blog via Email

No Comments

About

Search This Blog

Powered by Blogger.

Blog Archive