What algorithms are considered reinforcement learning algorithms?What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?
Gladys unchained
Why am I receiving the identity insert error even after explicitly setting IDENTITY_INSERT ON and using a column list?
Why didn't this character get a funeral at the end of Avengers: Endgame?
Why is my arithmetic with a long long int behaving this way?
Change in "can't be countered" wording
Are pressure-treated posts that have been submerged for a few days ruined?
Why aren't nationalizations in Russia described as socialist?
How can internet speed be 10 times slower without a router than when using the same connection with a router?
How does the reduce() method work in Java 8?
Is an HNN extension of a virtually torsion-free group virtually torsion-free?
Are there terms in German for different skull shapes?
Is 'contemporary' ambiguous and if so is there a better word?
Dangerous workplace travelling
Can you use "едать" and "игрывать" in the present and future tenses?
Hostile Divisor Numbers
How does summation index shifting work?
Is the book wrong about the Nyquist Sampling Criterion?
Any examples of liquids volatile at room temp but non-flammable?
Why would a military not separate its forces into different branches?
Which US defense organization would respond to an invasion like this?
Can there be a single technologically advanced nation, in a continent full of non-technologically advanced nations?
Handling Null values (and equivalents) routinely in Python
Feasibility of lava beings?
Why does sound not move through a wall?
What algorithms are considered reinforcement learning algorithms?
What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
reinforcement-learning terminology definitions
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited Apr 30 at 14:58
nbro
2,7701726
2,7701726
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Apr 30 at 14:26
Miguel SaraivaMiguel Saraiva
1183
1183
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
edited Apr 30 at 15:25
answered Apr 30 at 14:54
nbronbro
2,7701726
2,7701726
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
answered Apr 30 at 16:06
Neil SlaterNeil Slater
6,7491620
6,7491620
add a comment |
add a comment |
Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.
Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.
Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.
Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Artificial Intelligence Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown