What algorithms are considered reinforcement learning algorithms?What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?

Gladys unchained

Why am I receiving the identity insert error even after explicitly setting IDENTITY_INSERT ON and using a column list?

Why didn't this character get a funeral at the end of Avengers: Endgame?

Why is my arithmetic with a long long int behaving this way?

Change in "can't be countered" wording

Are pressure-treated posts that have been submerged for a few days ruined?

Why aren't nationalizations in Russia described as socialist?

How can internet speed be 10 times slower without a router than when using the same connection with a router?

How does the reduce() method work in Java 8?

Is an HNN extension of a virtually torsion-free group virtually torsion-free?

Are there terms in German for different skull shapes?

Is 'contemporary' ambiguous and if so is there a better word?

Dangerous workplace travelling

Can you use "едать" and "игрывать" in the present and future tenses?

Hostile Divisor Numbers

How does summation index shifting work?

Is the book wrong about the Nyquist Sampling Criterion?

Any examples of liquids volatile at room temp but non-flammable?

Why would a military not separate its forces into different branches?

Which US defense organization would respond to an invasion like this?

Can there be a single technologically advanced nation, in a continent full of non-technologically advanced nations?

Handling Null values (and equivalents) routinely in Python

Feasibility of lava beings?

Why does sound not move through a wall?



What algorithms are considered reinforcement learning algorithms?


What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


What are the areas that belong to the Reinforcement Learning?



TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










share|improve this question









New contributor




Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    3












    $begingroup$


    What are the areas that belong to the Reinforcement Learning?



    TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



    Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










    share|improve this question









    New contributor




    Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      3












      3








      3





      $begingroup$


      What are the areas that belong to the Reinforcement Learning?



      TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



      Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










      share|improve this question









      New contributor




      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      What are the areas that belong to the Reinforcement Learning?



      TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



      Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?







      reinforcement-learning terminology definitions






      share|improve this question









      New contributor




      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Apr 30 at 14:58









      nbro

      2,7701726




      2,7701726






      New contributor




      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Apr 30 at 14:26









      Miguel SaraivaMiguel Saraiva

      1183




      1183




      New contributor




      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Miguel Saraiva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          4












          $begingroup$

          The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



          However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



          On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



          If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



          There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






          share|improve this answer











          $endgroup$












          • $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59






          • 1




            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19











          • $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54


















          1












          $begingroup$

          In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




          Reinforcement learning, like many topics whose names end with “ing,” such as machine
          learning and mountaineering, is simultaneously a problem, a class of solution methods
          that work well on the problem, and the field that studies this problem and its solution
          methods. It is convenient to use a single name for all three things, but at the same time
          essential to keep the three conceptually separate. In particular, the distinction between
          problems and solution methods is very important in reinforcement learning; failing to
          make this distinction is the source of many confusions.




          And:




          Markov decision processes are intended to include just
          these three aspects—sensation, action, and goal—in their simplest possible forms without
          trivializing any of them. Any method that is well suited to solving such problems we
          consider to be a reinforcement learning method.




          So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



          Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



          Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4












            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$












            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54















            4












            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$












            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54













            4












            4








            4





            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$



            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 30 at 15:25

























            answered Apr 30 at 14:54









            nbronbro

            2,7701726




            2,7701726











            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54
















            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54















            $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59




            $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59




            1




            1




            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19





            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19













            $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54




            $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54













            1












            $begingroup$

            In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




            Reinforcement learning, like many topics whose names end with “ing,” such as machine
            learning and mountaineering, is simultaneously a problem, a class of solution methods
            that work well on the problem, and the field that studies this problem and its solution
            methods. It is convenient to use a single name for all three things, but at the same time
            essential to keep the three conceptually separate. In particular, the distinction between
            problems and solution methods is very important in reinforcement learning; failing to
            make this distinction is the source of many confusions.




            And:




            Markov decision processes are intended to include just
            these three aspects—sensation, action, and goal—in their simplest possible forms without
            trivializing any of them. Any method that is well suited to solving such problems we
            consider to be a reinforcement learning method.




            So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



            Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



            Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






            share|improve this answer









            $endgroup$

















              1












              $begingroup$

              In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




              Reinforcement learning, like many topics whose names end with “ing,” such as machine
              learning and mountaineering, is simultaneously a problem, a class of solution methods
              that work well on the problem, and the field that studies this problem and its solution
              methods. It is convenient to use a single name for all three things, but at the same time
              essential to keep the three conceptually separate. In particular, the distinction between
              problems and solution methods is very important in reinforcement learning; failing to
              make this distinction is the source of many confusions.




              And:




              Markov decision processes are intended to include just
              these three aspects—sensation, action, and goal—in their simplest possible forms without
              trivializing any of them. Any method that is well suited to solving such problems we
              consider to be a reinforcement learning method.




              So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



              Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



              Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






              share|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$

                In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




                Reinforcement learning, like many topics whose names end with “ing,” such as machine
                learning and mountaineering, is simultaneously a problem, a class of solution methods
                that work well on the problem, and the field that studies this problem and its solution
                methods. It is convenient to use a single name for all three things, but at the same time
                essential to keep the three conceptually separate. In particular, the distinction between
                problems and solution methods is very important in reinforcement learning; failing to
                make this distinction is the source of many confusions.




                And:




                Markov decision processes are intended to include just
                these three aspects—sensation, action, and goal—in their simplest possible forms without
                trivializing any of them. Any method that is well suited to solving such problems we
                consider to be a reinforcement learning method.




                So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



                Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



                Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






                share|improve this answer









                $endgroup$



                In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




                Reinforcement learning, like many topics whose names end with “ing,” such as machine
                learning and mountaineering, is simultaneously a problem, a class of solution methods
                that work well on the problem, and the field that studies this problem and its solution
                methods. It is convenient to use a single name for all three things, but at the same time
                essential to keep the three conceptually separate. In particular, the distinction between
                problems and solution methods is very important in reinforcement learning; failing to
                make this distinction is the source of many confusions.




                And:




                Markov decision processes are intended to include just
                these three aspects—sensation, action, and goal—in their simplest possible forms without
                trivializing any of them. Any method that is well suited to solving such problems we
                consider to be a reinforcement learning method.




                So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



                Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



                Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 30 at 16:06









                Neil SlaterNeil Slater

                6,7491620




                6,7491620




















                    Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.












                    Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.











                    Miguel Saraiva is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

                    Area configuration aggregation error after install Porto themeMagento 2.1 CE Installed but front/backend not loading/workingCSS not loading on page within Magento 2 pageCannot install module in Magento 2no commands defined in the “setup” namespace. in Magento2Magento 2: Static files are present but shows 404Why do i have to always run the commands to clean cache in Magento 2.1.8?Failure reason: 'Unable to unserialize value.'Error 500 after magento migrationIn production mode the site does not loadMagento 2 : Error 500 after installing

                    Middle Expansion Olielle Resaix Definition: Uttering songs of triumph shouting with joy triumphant exulting Sejunction Journal 붙다 달 고급 품목 외출 The stretch trades the screeching tin. Definition: The act of speaking with a drawl a drawl Cough Sand Definition: An uproar a quarrel a noisy outbreak Shake Iron Publicize Horse House Baby 사과 Resaix Flaggy Jelly Temporary Unequaled Puppet A drop in the bucket Shrew 성격 회원 성질 미팅 The burn frames the tacky quality. Materialistic The smoke reduces the way. Yammoe Nondescript Cheek 얼굴 배 약하다 날리다 타다 The illegal country shows the iron. Help Rule Drearien Smoke Teaching Meaty Wasp Abraham Lincoln Jaws 진심 수리하다 Size Cork Idea Convert Think Lark John Lennon 거울 청소 군 추천하다 아이스크림