com.mapr.stats.bandit

## Class BanditTrainer

• public class BanditTrainer
extends Object
Simulate a two-armed bandit playing against a beta-Bayesian model.

The output indicates the quantiles of the distribution for regret relative to the optimal pick. The regret distribution is estimated by picking two random conversion probabilities and then running the beta-Bayesian model for a number of steps. The regret is computed by taking the expectation for the optimal choice and subtracting from the actual percentage of conversion achieved. On average, this should be somewhat negative since the model has to spend some effort examining the sub-optimal choice. The median, 25 and 75%-ile marks all scale downward fairly precisely with the square root of the number of trials which is to be expected from theoretical considerations.

The beta-Bayesian model works by keeping an estimate of the posterior distribution for the conversion probability for each of the bandits. We take a uniform distribution as the prior so the posterior is a beta distribution. The model samples probabilities from the two posterior distributions and chooses the model whose sample is larger. As data is collected for the two bandits, the better of the bandits will quickly have a pretty narrow posterior distribution and the lesser bandit will rarely have a sampled probability higher than the better bandit. This means that we will stop getting data from the less bandit, but only when there is essentially no chance that it is better.

• ### Constructor Summary

Constructors
Constructor and Description
BanditTrainer()
• ### Method Summary

Methods
Modifier and Type Method and Description
static double averageRegret(String outputFile, int[] sizes, int replications, int bandits)
Computes average regret relative to perfect knowledge given uniform random probabilities.
static double commitTime(String outputFile, int n, double p1, double p2, int cutoff)
Records which bandit was chosen for many runs of the same scenario.
static void main(String[] args)
static double totalRegret(String cumulativeOutput, String perTurnOutput, int replications, int bandits, int maxSteps, BanditFactory modelFactory, DistributionGenerator refSampler)
Computes average regret relative to perfect knowledge given uniform random probabilities.
• ### Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• ### Constructor Detail

• #### BanditTrainer

public BanditTrainer()
• ### Method Detail

• #### main

public static void main(String[] args)
throws FileNotFoundException,
NoSuchMethodException,
InvocationTargetException,
InstantiationException,
IllegalAccessException,
InterruptedException
Throws:
FileNotFoundException
NoSuchMethodException
InvocationTargetException
InstantiationException
IllegalAccessException
InterruptedException
• #### commitTime

public static double commitTime(String outputFile,
int n,
double p1,
double p2,
int cutoff)
throws FileNotFoundException
Records which bandit was chosen for many runs of the same scenario. This output is kind of big an hard to digest visually. As such, it is probably better to reduce this using a mean. In R, this can be done like this:
    plot(tapply(z$k, floor(z$i/10), mean), type='l')

Parameters:
outputFile - Where to write results
n - How many steps to follow
p1 - First probability of reward
p2 - Second probability of reward
cutoff - Only keep results after this many steps
Returns:
Average number of correct choices.
Throws:
FileNotFoundException - If the directory holding the output directory doesn't exist.
• #### averageRegret

public static double averageRegret(String outputFile,
int[] sizes,
int replications,
int bandits)
throws FileNotFoundException
Computes average regret relative to perfect knowledge given uniform random probabilities. The output contains the quartiles for different numbers of trials. The quartiles are computed by running many experiments for each specified number of trials.

This can be plotted pretty much directly in R

 > x=read.delim(file='~/Apache/storm-aggregator/regret.tsv')
> bxp(list(com.mapr.stats=t(as.matrix(x[,2:6])), n=rep(1000,times=8),names=x$n))  Parameters: outputFile - Where to put the output sizes - The different size experiments to use replications - Number of times to repeat the experiment bandits - How many bandits to simulate Returns: Returns the average regret per trial Throws: FileNotFoundException - If the output file can't be opened due to a missing directory. • #### totalRegret public static double totalRegret(String cumulativeOutput, String perTurnOutput, int replications, int bandits, int maxSteps, BanditFactory modelFactory, DistributionGenerator refSampler) throws FileNotFoundException Computes average regret relative to perfect knowledge given uniform random probabilities. The output contains the quartiles for different numbers of trials. The quartiles are computed by running many experiments for each specified number of trials. This can be plotted pretty much directly in R  > x=read.delim(file='~/Apache/storm-aggregator/regret.tsv') > bxp(list(com.mapr.stats=t(as.matrix(x[,2:6])), n=rep(1000,times=8),names=x$n))

Parameters:
cumulativeOutput - Where to write the cumulative regret results
perTurnOutput - Where to write the per step regret results
replications - How many times to replicate the experiment
bandits - How many bandits to emulate
maxSteps - Maximum number of trials to run per experiment
modelFactory - How to construct the solver.
refSampler - How to get reward distributions for bandits
Returns:
An estimate of the average final cumulative regret
Throws:
FileNotFoundException - If the output file can't be opened due to a missing directory.