Number of states in taxi environment (Dietterich 2000)Different action spaces for different states of the environmentFeeding a Q-learning algorithms a greater fraction of terminal statesHow to define states in reinforcement learning?3D environment for RL research in AcademiaInput for the Env.step() in the 'Pendulum-v0' environmentReinforcement Learning with more actions than statesInform policy learning of environment constantsCatastrophic Forgetting on Pong Environment using DQNHow should I deal with vector states in reinforcement learning?DQN in stochastic environment

How to unload a Mathematica package?

What alternatives exist to at-will employment?

If a player tries to persuade somebody what should that creature roll not to be persuaded?

Concentration in the Planes

Could I use a Greatsword and a Longsword in one turn with Two-weapon fighting and dual wielding feat?

Why does the Earth have a z-component at the start of the J2000 epoch?

What made Windows ME so crash-prone?

Problem with interpolating function returned by NDEigensystem

Clarification on defining FFT bin sizes

Why did Spider-Man take a detour to Dorset?

why run a service as a system user?

I gave my characters names that are exactly like another book. Is it a problem?

Why is "dark" an adverb in this sentence?

Can a continent naturally split into two distant parts within a week?

Draw a line nicely around notes

Accidentally deleted python and yum is not working in centos7

I quit, and boss offered me 3 month "grace period" where I could still come back

How to honestly answer questions from a girlfriend like "How did you find this place" without giving the impression I'm always talking about my exes?

Construct a pentagon avoiding compass use

What are the arguments for California’s nonpartisan blanket primaries?

I have accepted an internship offer. Should I inform companies I have applied to that have not gotten back to me yet?

Manually select/unselect lines before forwarding to stdout

Three-dimensional light bulbs

Can you perfectly wrap a cube with this blocky shape?



Number of states in taxi environment (Dietterich 2000)


Different action spaces for different states of the environmentFeeding a Q-learning algorithms a greater fraction of terminal statesHow to define states in reinforcement learning?3D environment for RL research in AcademiaInput for the Env.step() in the 'Pendulum-v0' environmentReinforcement Learning with more actions than statesInform policy learning of environment constantsCatastrophic Forgetting on Pong Environment using DQNHow should I deal with vector states in reinforcement learning?DQN in stochastic environment






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).



However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)



Why does Dietterich come up with 500 states?










share|improve this question











$endgroup$


















    1












    $begingroup$


    Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).



    However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)



    Why does Dietterich come up with 500 states?










    share|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).



      However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)



      Why does Dietterich come up with 500 states?










      share|improve this question











      $endgroup$




      Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).



      However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)



      Why does Dietterich come up with 500 states?







      reinforcement-learning combinatorics






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 7 at 11:01







      F.M.F.

















      asked Jul 7 at 9:52









      F.M.F.F.M.F.

      1424 bronze badges




      1424 bronze badges




















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$

          This is more of a combinatorics than AI question but regradless, the full state information for the environment is:



          $(taxi space position, passenger space position, destination space position)$



          There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.



          You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:



          $(taxi space position, destination space position)$



          There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.



          EDIT



          Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states






          share|improve this answer











          $endgroup$




















            2












            $begingroup$

            This . . .




            because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off




            . . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.



            Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.



            An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.




            * You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.






            share|improve this answer











            $endgroup$












            • $begingroup$
              Thanks for being informative. keep up the spirit
              $endgroup$
              – quintumnia
              Jul 7 at 10:24










            • $begingroup$
              Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
              $endgroup$
              – F.M.F.
              Jul 7 at 11:08













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13241%2fnumber-of-states-in-taxi-environment-dietterich-2000%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            This is more of a combinatorics than AI question but regradless, the full state information for the environment is:



            $(taxi space position, passenger space position, destination space position)$



            There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.



            You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:



            $(taxi space position, destination space position)$



            There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.



            EDIT



            Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states






            share|improve this answer











            $endgroup$

















              2












              $begingroup$

              This is more of a combinatorics than AI question but regradless, the full state information for the environment is:



              $(taxi space position, passenger space position, destination space position)$



              There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.



              You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:



              $(taxi space position, destination space position)$



              There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.



              EDIT



              Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states






              share|improve this answer











              $endgroup$















                2












                2








                2





                $begingroup$

                This is more of a combinatorics than AI question but regradless, the full state information for the environment is:



                $(taxi space position, passenger space position, destination space position)$



                There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.



                You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:



                $(taxi space position, destination space position)$



                There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.



                EDIT



                Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states






                share|improve this answer











                $endgroup$



                This is more of a combinatorics than AI question but regradless, the full state information for the environment is:



                $(taxi space position, passenger space position, destination space position)$



                There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.



                You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:



                $(taxi space position, destination space position)$



                There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.



                EDIT



                Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jul 7 at 10:19

























                answered Jul 7 at 10:13









                Brale_Brale_

                7461 silver badge7 bronze badges




                7461 silver badge7 bronze badges























                    2












                    $begingroup$

                    This . . .




                    because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off




                    . . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.



                    Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.



                    An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.




                    * You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.






                    share|improve this answer











                    $endgroup$












                    • $begingroup$
                      Thanks for being informative. keep up the spirit
                      $endgroup$
                      – quintumnia
                      Jul 7 at 10:24










                    • $begingroup$
                      Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
                      $endgroup$
                      – F.M.F.
                      Jul 7 at 11:08















                    2












                    $begingroup$

                    This . . .




                    because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off




                    . . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.



                    Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.



                    An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.




                    * You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.






                    share|improve this answer











                    $endgroup$












                    • $begingroup$
                      Thanks for being informative. keep up the spirit
                      $endgroup$
                      – quintumnia
                      Jul 7 at 10:24










                    • $begingroup$
                      Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
                      $endgroup$
                      – F.M.F.
                      Jul 7 at 11:08













                    2












                    2








                    2





                    $begingroup$

                    This . . .




                    because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off




                    . . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.



                    Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.



                    An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.




                    * You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.






                    share|improve this answer











                    $endgroup$



                    This . . .




                    because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off




                    . . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.



                    Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.



                    An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.




                    * You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Jul 7 at 10:31

























                    answered Jul 7 at 10:20









                    Neil SlaterNeil Slater

                    7,6931 gold badge6 silver badges22 bronze badges




                    7,6931 gold badge6 silver badges22 bronze badges











                    • $begingroup$
                      Thanks for being informative. keep up the spirit
                      $endgroup$
                      – quintumnia
                      Jul 7 at 10:24










                    • $begingroup$
                      Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
                      $endgroup$
                      – F.M.F.
                      Jul 7 at 11:08
















                    • $begingroup$
                      Thanks for being informative. keep up the spirit
                      $endgroup$
                      – quintumnia
                      Jul 7 at 10:24










                    • $begingroup$
                      Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
                      $endgroup$
                      – F.M.F.
                      Jul 7 at 11:08















                    $begingroup$
                    Thanks for being informative. keep up the spirit
                    $endgroup$
                    – quintumnia
                    Jul 7 at 10:24




                    $begingroup$
                    Thanks for being informative. keep up the spirit
                    $endgroup$
                    – quintumnia
                    Jul 7 at 10:24












                    $begingroup$
                    Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
                    $endgroup$
                    – F.M.F.
                    Jul 7 at 11:08




                    $begingroup$
                    Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
                    $endgroup$
                    – F.M.F.
                    Jul 7 at 11:08

















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13241%2fnumber-of-states-in-taxi-environment-dietterich-2000%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown