Neologistic Paraphasic Mumbling

npm

Need private packages and team management tools?Check out npm Orgs. »

tabular-sarsa

1.0.6 • Public • Published

Tabular Expected SARSA Agent

This contains an agent that learns to maximize reward through reinforcement learning. The agent works by building a table that can predict the expected value of every possible action from every possible state. Exploration is accomplished by following an epsilon greedy policy.

Because this uses a table-based Q function, it only works in environments with a discrete set of states and actions. You must be able to convert all states and actions to integers to use this agent.

Demo:

Select the "Tabular Sarsa" agent here: http://rodmcnew.github.io/reinforcement-learning-agent-tester-js/

Installation:

npm install tabular-sarsa

Usage:

var agent = new tabularSarsa.Agent(
    numberOfPossibleStates,
    numberOfPossibleActions
);
var lastReward = null;
 
function tick() {
    /*
     * Tell the agent about the current environment state and
     * have it choose an action to take.
     */
    var action = agent.decide(
        lastReward,
        environment.getCurrentState()
    );
 
    /*
     * Take the action inside the environment find out how 
     * rewarding the action was.
     */
    lastReward = environment.takeAction(action);
}

Saving trained agents for later:

//Saving an agent
var agentA = new tabularSarsa.Agent(100, 4);
var savedAgentData = agentA.saveToJson();
 
//Loading an agent
var agentB = new tabularSarsa.Agent(100, 4);
agentB.loadFromJson(savedAgentData);

Extra options:

var agent = new tabularSarsa.Agent(
    100,//Number of possible states
    4,//Number of possible actions
    {
        learningEnabled: true,//set to false to disable all learning for higher execution speeds
        learningRate: 0.1,//alpha - how much new experiences overwrite previous ones
        explorationProbability: 0.05,//epsilon - the probability of taking random actions in the Epsilon Greedy policy
        discountFactor: 0.9,//discountFactor - future rewards are multiplied by this
    }
);
 

Optimizations beyond plain SARSA that speed up learning:

  • Uses "Expected SARSA" rather than plain SARSA
  • Uses the first seen reward for each state-action as the initial Q value

More info about the Expected-SARSA algorithm: http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/vanseijenadprl09.pdf

install

npm i tabular-sarsa

Downloadsweekly downloads

0

version

1.0.6

license

MIT

homepage

github.com

repository

Gitgithub

last publish

collaborators

  • avatar
Report a vulnerability