Number of states in taxi environment (Dietterich 2000)Different action spaces for different states of the environmentFeeding a Q-learning algorithms a greater fraction of terminal statesHow to define states in reinforcement learning?3D environment for RL research in AcademiaInput for the Env.step() in the 'Pendulum-v0' environmentReinforcement Learning with more actions than statesInform policy learning of environment constantsCatastrophic Forgetting on Pong Environment using DQNHow should I deal with vector states in reinforcement learning?DQN in stochastic environment

How to unload a Mathematica package?

What alternatives exist to at-will employment?

If a player tries to persuade somebody what should that creature roll not to be persuaded?

Concentration in the Planes

Could I use a Greatsword and a Longsword in one turn with Two-weapon fighting and dual wielding feat?

Why does the Earth have a z-component at the start of the J2000 epoch?

What made Windows ME so crash-prone?

Problem with interpolating function returned by NDEigensystem

Clarification on defining FFT bin sizes

Why did Spider-Man take a detour to Dorset?

why run a service as a system user?

I gave my characters names that are exactly like another book. Is it a problem?

Why is "dark" an adverb in this sentence?

Can a continent naturally split into two distant parts within a week?

Draw a line nicely around notes

Accidentally deleted python and yum is not working in centos7

I quit, and boss offered me 3 month "grace period" where I could still come back

How to honestly answer questions from a girlfriend like "How did you find this place" without giving the impression I'm always talking about my exes?

Construct a pentagon avoiding compass use

What are the arguments for California’s nonpartisan blanket primaries?

I have accepted an internship offer. Should I inform companies I have applied to that have not gotten back to me yet?

Manually select/unselect lines before forwarding to stdout

Three-dimensional light bulbs

Can you perfectly wrap a cube with this blocky shape?

Number of states in taxi environment (Dietterich 2000)

Different action spaces for different states of the environmentFeeding a Q-learning algorithms a greater fraction of terminal statesHow to define states in reinforcement learning?3D environment for RL research in AcademiaInput for the Env.step() in the 'Pendulum-v0' environmentReinforcement Learning with more actions than statesInform policy learning of environment constantsCatastrophic Forgetting on Pong Environment using DQNHow should I deal with vector states in reinforcement learning?DQN in stochastic environment

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

Dietterich, who introduced the taxi environment (see p. 9), states the following: In total there “are 500 [distinct] possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations” (Dietterich, 2000, p. 9).

However, in my opinion there are only 25 (grid) * 4 (locations) * 2 (passenger in car) = 200 different states, because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off. Only the action at the destination is different which would be stored binary (passenger in car or not)

Why does Dietterich come up with 500 states?

edited Jul 7 at 11:01

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

add a comment |

Why does Dietterich come up with 500 states?

edited Jul 7 at 11:01

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

add a comment |

Why does Dietterich come up with 500 states?

edited Jul 7 at 11:01

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

Why does Dietterich come up with 500 states?

reinforcement-learning combinatorics

edited Jul 7 at 11:01

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

edited Jul 7 at 11:01

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

edited Jul 7 at 11:01

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

asked Jul 7 at 9:52

F.M.F.

1424 bronze badges

add a comment |

2 Answers
2

active

oldest

votes

This is more of a combinatorics than AI question but regradless, the full state information for the environment is:

$(taxi space position, passenger space position, destination space position)$

There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.

You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:

$(taxi space position, destination space position)$

There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.

EDIT

Actually in the second case, I think you could get away with only 1 policy where you would simply change the destination position once you pick up the passenger so you dont need 2 separate policies and you would have only $100$ states

edited Jul 7 at 10:19

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

add a comment |

This . . .

because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off

. . . might seem logical/intuitive to a person understanding the task, but it is not mathematically correct. The agent cannot "merge" states because they involve the same behaviour. It must count differences in state as the combinations are presented. Critically, heading towards the passenger location or heading towards the goal location are not in any way similar to the agent, unless you manipulate the state to make them so*.

Eventually the taxi will learn very similar navigation behaviour for picking up and dropping off a passenger. However, using a basic RL agent it learns these very much separately, and must re-learn the navigation rules independently for each combination of passenger and goal location.

An agent that learned navigation within the environment, and then combined it into different tasks might be an example of hierarchical reinforcement learning, transfer learning, or curriculum learning. These are more sophisticated learning approaches, but it is quite interesting that even very basic RL problems can demonstrate a use for higher level abstractions. Most agents used on the taxi problem don't do this though, as 500 states is really very easy to "brute force" using the simplest algorithms.

* You could modify the state representation to rationalise the task and make it have less states, similar to your suggestion. For instance, have one "target" location which could either be pickup or drop off, and a boolean "carrying passenger" state component. That would indeed reduce the number of states. However, that has involved you as the problem designer simplifying the problem to make it easier for the agent. Given that this is a toy problem designed as a benchmark to see how different agents perform, by doing that you subvert the purpose of the environment. If you were creating an agent to work on a harder real world problem though, it might be a very good idea to look for symmetries and ways to simplify state representation which would speed up learning.

edited Jul 7 at 10:31

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24

$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13241%2fnumber-of-states-in-taxi-environment-dietterich-2000%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

This is more of a combinatorics than AI question but regradless, the full state information for the environment is:

$(taxi space position, passenger space position, destination space position)$

There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.

You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:

$(taxi space position, destination space position)$

There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.

EDIT

edited Jul 7 at 10:19

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

add a comment |

This is more of a combinatorics than AI question but regradless, the full state information for the environment is:

$(taxi space position, passenger space position, destination space position)$

There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.

You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:

$(taxi space position, destination space position)$

There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.

EDIT

edited Jul 7 at 10:19

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

add a comment |

This is more of a combinatorics than AI question but regradless, the full state information for the environment is:

$(taxi space position, passenger space position, destination space position)$

There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.

You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:

$(taxi space position, destination space position)$

There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.

EDIT

edited Jul 7 at 10:19

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

This is more of a combinatorics than AI question but regradless, the full state information for the environment is:

$(taxi space position, passenger space position, destination space position)$

There are 25 possible taxi positions, 5 passenger positions and 4 destination positions making it $25 cdot 5 cdot 4 = 500$, so the paper is correct.

You are also correct but you divided 1 objective into 2 objectives and you have 2 separate policies, a pickup policy and a dropoff policy. So your state information would be for each policy:

$(taxi space position, destination space position)$

There are 25 possible taxi positions and 4 possible destination positions making it $25 cdot 4 = 100$. You have 2 policies so you have $200$ states.

EDIT

edited Jul 7 at 10:19

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

edited Jul 7 at 10:19

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

answered Jul 7 at 10:13

Brale_

7461 silver badge7 bronze badges

add a comment |

This . . .

because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off

edited Jul 7 at 10:31

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24

$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08

add a comment |

This . . .

because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off

edited Jul 7 at 10:31

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24

$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08

add a comment |

This . . .

because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off

edited Jul 7 at 10:31

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

This . . .

because for the agent it should be the same task to go to a certain point, regardless of whether it's on its way to pick up or to drop-off

edited Jul 7 at 10:31

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

edited Jul 7 at 10:31

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

answered Jul 7 at 10:20

Neil Slater

7,6931 gold badge6 silver badges22 bronze badges

$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24

$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08

add a comment |

$begingroup$
Thanks for being informative. keep up the spirit
$endgroup$
– quintumnia
Jul 7 at 10:24

$begingroup$
Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!
$endgroup$
– F.M.F.
Jul 7 at 11:08

Thanks for being informative. keep up the spirit

– quintumnia
Jul 7 at 10:24

Both answers are very informative. Accepted the first one for giving a short, specific answer but thank you for your abstraction of the underlying problem!

– F.M.F.
Jul 7 at 11:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Artificial Intelligence Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttdfjt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

2 Answers
2

2 Answers
2

2 Answers
2