How Does AlphaGo Zero Implement Reinforcement Learning?What is self-supervised learning in machine learning?What is experience replay in laymen's terms?AlphaZero chess algorithm, Monte Carlo searchHow to implement a contextual reinforcement learning model?How to implement a constrained action space in reinforcement learning?Would AlphaGo Zero become perfect with enough training time?How does reinforcement learning handle measured disturbances?Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?What is the difference between DQN and AlphaGo Zero?How do the achievements met in the gaming field (ex. AlphaGo Zero) impact other fields of application?Feature Selection using Monte Carlo Tree SearchHow can alpha zero learn if the tree search stops and restarts before finishing a game?

A Tale of Snake and Coffee

Can an opamp have its own voltage regulator?

What is the color associated with lukewarm?

Idiom for 'person who gets violent when drunk"

Do items with curse of vanishing disappear from shulker boxes?

Can I appeal credit ding if ex-wife is responsible for paying mortgage?

For Saintsbury, which English novelists constituted the "great quartet of the mid-eighteenth century"?

Does an African-American baby born in Youngstown, Ohio have a higher infant mortality rate than a baby born in Iran?

Can artificial satellite positions affect tides?

Should I worry about having my credit pulled multiple times while car shopping?

Having some issue with notation in a Hilbert space

What is the difference between state-based effects and effects on the stack?

Are soroban (Japanese abacus) classes worth doing?

How did the European Union reach the figure of 3% as a maximum allowed deficit?

My parents claim they cannot pay for my college education; what are my options?

Do legislators hold the right of legislative initiative?

The title "Mord mit Aussicht" explained

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

Must a CPU have a GPU if the motherboard provides a display port (when there isn't any separate video card)?

Jam with honey & without pectin has a saucy consistency always

Can an escape pod land on Earth from orbit and not be immediately detected?

What does the output current rating from an H-Bridge's datasheet really mean?

Someone who is granted access to information but not expected to read it

Using roof rails to set up hammock



How Does AlphaGo Zero Implement Reinforcement Learning?


What is self-supervised learning in machine learning?What is experience replay in laymen's terms?AlphaZero chess algorithm, Monte Carlo searchHow to implement a contextual reinforcement learning model?How to implement a constrained action space in reinforcement learning?Would AlphaGo Zero become perfect with enough training time?How does reinforcement learning handle measured disturbances?Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?What is the difference between DQN and AlphaGo Zero?How do the achievements met in the gaming field (ex. AlphaGo Zero) impact other fields of application?Feature Selection using Monte Carlo Tree SearchHow can alpha zero learn if the tree search stops and restarts before finishing a game?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



  1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

  2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?










share|improve this question









$endgroup$


















    3












    $begingroup$


    AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



    1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

    2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

    My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?










    share|improve this question









    $endgroup$














      3












      3








      3





      $begingroup$


      AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



      1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

      2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

      My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?










      share|improve this question









      $endgroup$




      AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



      1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

      2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

      My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?







      reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jun 7 at 16:56









      SeeDerekEngineerSeeDerekEngineer

      2148




      2148




















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$

          If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



          RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12739%2fhow-does-alphago-zero-implement-reinforcement-learning%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



            RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






            share|improve this answer











            $endgroup$

















              3












              $begingroup$

              If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



              RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






              share|improve this answer











              $endgroup$















                3












                3








                3





                $begingroup$

                If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



                RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






                share|improve this answer











                $endgroup$



                If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



                RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jun 7 at 17:28

























                answered Jun 7 at 17:21









                nbronbro

                4,0562827




                4,0562827



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12739%2fhow-does-alphago-zero-implement-reinforcement-learning%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

                    Area configuration aggregation error after install Porto themeMagento 2.1 CE Installed but front/backend not loading/workingCSS not loading on page within Magento 2 pageCannot install module in Magento 2no commands defined in the “setup” namespace. in Magento2Magento 2: Static files are present but shows 404Why do i have to always run the commands to clean cache in Magento 2.1.8?Failure reason: 'Unable to unserialize value.'Error 500 after magento migrationIn production mode the site does not loadMagento 2 : Error 500 after installing

                    Middle Expansion Olielle Resaix Definition: Uttering songs of triumph shouting with joy triumphant exulting Sejunction Journal 붙다 달 고급 품목 외출 The stretch trades the screeching tin. Definition: The act of speaking with a drawl a drawl Cough Sand Definition: An uproar a quarrel a noisy outbreak Shake Iron Publicize Horse House Baby 사과 Resaix Flaggy Jelly Temporary Unequaled Puppet A drop in the bucket Shrew 성격 회원 성질 미팅 The burn frames the tacky quality. Materialistic The smoke reduces the way. Yammoe Nondescript Cheek 얼굴 배 약하다 날리다 타다 The illegal country shows the iron. Help Rule Drearien Smoke Teaching Meaty Wasp Abraham Lincoln Jaws 진심 수리하다 Size Cork Idea Convert Think Lark John Lennon 거울 청소 군 추천하다 아이스크림