Getting Started with Reinforcement Learning

In this post, I will cover an Introduction to Reinforcement Learning and how you can get started with it. The key takeaways that you will learn are:

Main takeaways

  1. Chapter 1: What is Reinforcement Learning
  2. Chapter 2: Why we use Reinforcement Learning and why you should learn about it
  3. Chapter 3: Explain Basic terms
  4. Chapter 4: Prerequisites
  5. Chapter 5: Resources to Learn

Chapter 1:  What is Reinforcement Learning

It is a branch of machine learning that deals with training autonomous agents which can take optimal actions in their environments, so as the rewards they get are maximized. 

It is mostly used in gaming and Robotics, but can be adapted to  other areas such as financial markets, medicine , arts or communications.

Some example are 

1. Training an agent to play a game of chess

2. Training a robot to pick apples from a tree

Chapter 2: Why Reinforcement Learning

In situations, where we want the agent to learn on its own without any training data, reinforcement learning provides a viable way to train an autonomous agent. 

Models like GPT also use a variant of reinforcement learning called RLHF a short form for Reinforcement learning with human feedback. Where the responses by the Chatbot are graded by human trainers and then the rating is fed back to the model.

Sometimes it is hard to get all scenarios captured in the training data for a machine learning model. Reinforcement learning overcomes this limitation by letting the agent experience all kinds of scenarios or states of a given environment. It can even get into situations which are very rare and are difficult to capture in real world training data. Hence this technique provides an efficient way of training an agent to act smartly in a given environment

Some of the notable uses of RL happened, when Deepmind now a subsidiary company of Google launched AlphaGo program which was trained to play the game of GO using RL. It was so good that it defeated the world champion of that time. A Netflix documentary with the same name Alpha Go was a key contributor in my interest in this field.

Alpha Go and later other variants of it were trained using RL techniques where an agent was trained by playing millions of games against itself.

I hope this is enough motivation for you to get started in this very interesting field of Reinforcement Learning.

Lets move on to chapter 3 to learn some basic terms

Chapter 3: Basic Terminology

Some of the basic terms that are used in the field of RL are:

  1. Agent
  2. Environment
  3. States
  4. Episode
  5. Rewards
  6. Policy
  7. Policy Parameters
  8. Policy Search
  9. Policy Space

1. Agent: 

Agent is the autonomous actor or player which takes actions at each time step in an environment. It goes through the environment and experiences every state by taking the most optimal action which can maximize the rewards it gets

2. Environment:

Environment is the overall world of the agent, in which it can take actions in. For a game playing agent, the environment is the game it is learning to play. This is one of the key component and prerequisite for training an agent. You need an environment first to get started with training an agent.

3. States:

The distinct configuration an environment can take at a particular point in time is called a state. In a game it is usually each time step as the player goes through the game. At each state the agent has an option to choose an action. An environment can have finite or infinite states and the total possible states is referred to as state-space. 

4. Episode:

For an agent to learn optimal actions it needs to experience the environment multiple times so that each time it can use past knowledge and improve upon its actions. Each play of an environment is called an episode.

5. Rewards:

These are the points an agent gets from its environment each time it takes an action. It can be positive or negative, for example score points in a game of Pac-Man. 

The goal of an agent is to maximize its total rewards at the end of each episode

The next few terms are related to Policy

6. Policy:

Policy is the algorithm the agent uses to traverse through the states to maximize the rewards. It could be deterministic, stochastic or neural network based

7. Policy Parameters: 

Policy parameters are the parameters you can tweak to optimize the policy in order to maximize the rewards. 

For example in case of neural networks these are the weights and biases of the network.

8. Policy Search:

Policy search is the mechanism used to tweak and find optimal policy for the agent. For example neural network training

9. Policy Space: 

The number of values the Policy parameters can take is the policy space of the agent. It could be too large to try out all possible values by brute force. Hence the use of neural networks to learn the best policy is widely used these days.

Well there will be more terms you need to know about but these are the ones which will get you started and help you learn more as you move along your learning journey

Now let’s move on to chapter 4 The perquisites. 

Chapter 4: Prerequisites 

Essential concepts which will help you grasp the RL theory much faster are. 

  1. Linear Algebra
    1. Especially Matrices, Matrix Multiplication, Dot products etc. 
  2. Probability and Statistics
    1. Fundamental to modeling uncertainty in any area of machine learning 
  3. Implementation of Neural Networks and Deep Learning using Tensor-flow or PyTorch
    1. Even if you dont completely understand neural networks in depth yet, you must know how to implement deep neural networks with multiple layers using a framework like Tensor-flow or PyTorch
  4. Python
    1. You need to to know a programming language in order to implement your policy search algorithms 

In the next chapter we will cover the resources you can use to get started

Chapter 5: Resources to get started

According to me, one of the best ways to learn anything is though books and the first few resource I am going to suggest are the following books

Books

  1. Hands on Machine Learning by Aurelin Geron
  2. Reinforcement Learning Barto Sutton
  3. Algorithms for Reinforcement Learning by Csaba Szepesvari

Youtube videos

David Silver – Reinforcement Learning Course 

Andrej Karpathy

Two min papers

Courses

Coursera  Reinforcement Learning Specialization

Udacity – Deep Reinforcement Learning Nanodegree

Blogs

Open AI Spinning up in Deep RL

Research papers

Arxiv

Neurips

Journal of machine learning

Conclusion

Reinforcement learning is the most interesting area of AI. It is the secret ingredient in training ChatGPT. It was the key behind defeating Lee Sedol in game of GO. Deepmind used it to train its agent to predict protein folding patterns that they named as AlphFold. There are numerous ways RL can be used in. It is the ultimate training algorithm for creating autonomous agents. I hope this post was helpful in piquing your interest in this field. I am still learning therefore not all information presented above might be accurate. I will keep on creating more posts as my learning progresses.

Posted in AI

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.