Number of states in taxi environment (Dietterich 2000)Different action spaces for different states of the environmentFeeding a Q-learning algorithms a greater fraction of terminal statesHow to define states in reinforcement learning?3D environment for RL research in AcademiaInput for the Env.step() in the 'Pendulum-v0' environmentReinforcement Learning with more actions than statesInform policy learning of environment constantsCatastrophic Forgetting on Pong Environment using DQNHow should I deal with vector states in reinforcement learning?DQN in stochastic environment
How to unload a Mathematica package?
What alternatives exist to at-will employment?
If a player tries to persuade somebody what should that creature roll not to be persuaded?
Concentration in the Planes
Could I use a Greatsword and a Longsword in one turn with Two-weapon fighting and dual wielding feat?
Why does the Earth have a z-component at the start of the J2000 epoch?
What made Windows ME so crash-prone?
Problem with interpolating function returned by NDEigensystem
Clarification on defining FFT bin sizes
Why did Spider-Man take a detour to Dorset?
why run a service as a system user?
I gave my characters names that are exactly like another book. Is it a problem?
Why is "dark" an adverb in this sentence?
Can a continent naturally split into two distant parts within a week?
Draw a line nicely around notes
Accidentally deleted python and yum is not working in centos7
I quit, and boss offered me 3 month "grace period" where I could still come back
How to honestly answer questions from a girlfriend like "How did you find this place" without giving the impression I'm always talking about my exes?
Construct a pentagon avoiding compass use
What are the arguments for California’s nonpartisan blanket primaries?
I have accepted an internship offer. Should I inform companies I have applied to that have not gotten back to me yet?
Manually select/unselect lines before forwarding to stdout
Three-dimensional light bulbs
Can you perfectly wrap a cube with this blocky shape?
Number of states in taxi environment (Dietterich 2000)
Different action spaces for different states of the environmentFeeding a Q-learning algorithms a greater fraction of terminal statesHow to define states in reinforcement learning?3D environment for RL research in AcademiaInput for the Env.step() in the 'Pendulum-v0' environmentReinforcement Learning with more actions than statesInform policy learning of environment constantsCatastrophic Forgetting on Pong Environment using DQNHow should I deal with vector states in reinforcement learning?DQN in stochastic environment
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).
However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)
Why does Dietterich come up with 500 states?
reinforcement-learning combinatorics
$endgroup$
add a comment |
$begingroup$
Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).
However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)
Why does Dietterich come up with 500 states?
reinforcement-learning combinatorics
$endgroup$
add a comment |
$begingroup$
Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).
However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)
Why does Dietterich come up with 500 states?
reinforcement-learning combinatorics
$endgroup$
Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).
However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)
Why does Dietterich come up with 500 states?
reinforcement-learning combinatorics
reinforcement-learning combinatorics
edited Jul 7 at 11:01
F.M.F.
asked Jul 7 at 9:52
F.M.F.F.M.F.
1424 bronze badges
1424 bronze badges
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This is more of a combinatorics than AI question but regradless, the full state information for the environment is:
$(taxi space position, passenger space position, destination space position)$
There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.
You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:
$(taxi space position, destination space position)$
There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.
EDIT
Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states
$endgroup$
add a comment |
$begingroup$
This . . .
because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off
. . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.
Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.
An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.
* You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.
$endgroup$
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13241%2fnumber-of-states-in-taxi-environment-dietterich-2000%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This is more of a combinatorics than AI question but regradless, the full state information for the environment is:
$(taxi space position, passenger space position, destination space position)$
There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.
You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:
$(taxi space position, destination space position)$
There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.
EDIT
Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states
$endgroup$
add a comment |
$begingroup$
This is more of a combinatorics than AI question but regradless, the full state information for the environment is:
$(taxi space position, passenger space position, destination space position)$
There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.
You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:
$(taxi space position, destination space position)$
There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.
EDIT
Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states
$endgroup$
add a comment |
$begingroup$
This is more of a combinatorics than AI question but regradless, the full state information for the environment is:
$(taxi space position, passenger space position, destination space position)$
There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.
You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:
$(taxi space position, destination space position)$
There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.
EDIT
Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states
$endgroup$
This is more of a combinatorics than AI question but regradless, the full state information for the environment is:
$(taxi space position, passenger space position, destination space position)$
There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.
You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:
$(taxi space position, destination space position)$
There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.
EDIT
Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states
edited Jul 7 at 10:19
answered Jul 7 at 10:13
Brale_Brale_
7461 silver badge7 bronze badges
7461 silver badge7 bronze badges
add a comment |
add a comment |
$begingroup$
This . . .
because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off
. . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.
Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.
An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.
* You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.
$endgroup$
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
add a comment |
$begingroup$
This . . .
because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off
. . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.
Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.
An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.
* You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.
$endgroup$
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
add a comment |
$begingroup$
This . . .
because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off
. . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.
Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.
An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.
* You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.
$endgroup$
This . . .
because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off
. . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.
Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.
An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.
* You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.
edited Jul 7 at 10:31
answered Jul 7 at 10:20
Neil SlaterNeil Slater
7,6931 gold badge6 silver badges22 bronze badges
7,6931 gold badge6 silver badges22 bronze badges
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
add a comment |
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08
add a comment |
Thanks for contributing an answer to Artificial Intelligence Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13241%2fnumber-of-states-in-taxi-environment-dietterich-2000%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown