BanditTrainer (bandit-ranking 1.0-SNAPSHOT API)

java.lang.Object
- com.mapr.stats.bandit.BanditTrainer

```
public class BanditTrainer
extends Object
```
Simulate a two-armed bandit playing against a beta-Bayesian model.
The output indicates the quantiles of the distribution for regret relative to the optimal pick. The regret distribution is estimated by picking two random conversion probabilities and then running the beta-Bayesian model for a number of steps. The regret is computed by taking the expectation for the optimal choice and subtracting from the actual percentage of conversion achieved. On average, this should be somewhat negative since the model has to spend some effort examining the sub-optimal choice. The median, 25 and 75%-ile marks all scale downward fairly precisely with the square root of the number of trials which is to be expected from theoretical considerations.
The beta-Bayesian model works by keeping an estimate of the posterior distribution for the conversion probability for each of the bandits. We take a uniform distribution as the prior so the posterior is a beta distribution. The model samples probabilities from the two posterior distributions and chooses the model whose sample is larger. As data is collected for the two bandits, the better of the bandits will quickly have a pretty narrow posterior distribution and the lesser bandit will rarely have a sampled probability higher than the better bandit. This means that we will stop getting data from the less bandit, but only when there is essentially no chance that it is better.

Constructor Summary

Constructors
Constructor and Description

BanditTrainer()

Constructors
Constructor and Description
`BanditTrainer()`

Method Summary

Methods
Modifier and Type	Method and Description
`static double`	`averageRegret(String outputFile, int[] sizes, int replications, int bandits)` Computes average regret relative to perfect knowledge given uniform random probabilities.
`static double`	`commitTime(String outputFile, int n, double p1, double p2, int cutoff)` Records which bandit was chosen for many runs of the same scenario.
`static void`	`main(String[] args)`
`static double`	`totalRegret(String cumulativeOutput, String perTurnOutput, int replications, int bandits, int maxSteps, BanditFactory modelFactory, DistributionGenerator refSampler)` Computes average regret relative to perfect knowledge given uniform random probabilities.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - BanditTrainer
```
public BanditTrainer()
```
- Method Detail
  - main
```
public static void main(String[] args)
                 throws FileNotFoundException,
                        NoSuchMethodException,
                        InvocationTargetException,
                        InstantiationException,
                        IllegalAccessException,
                        InterruptedException
```
    Throws:
    
    FileNotFoundException
    
    NoSuchMethodException
    
    InvocationTargetException
    
    InstantiationException
    
    IllegalAccessException
    
    InterruptedException
  - commitTime
```
public static double commitTime(String outputFile,
                int n,
                double p1,
                double p2,
                int cutoff)
                         throws FileNotFoundException
```
    Records which bandit was chosen for many runs of the same scenario. This output is kind of big an hard to digest visually. As such, it is probably better to reduce this using a mean. In R, this can be done like this:
```
    plot(tapply(z$k, floor(z$i/10), mean), type='l')
 
```
    Parameters:
    outputFile - Where to write results
    n - How many steps to follow
    p1 - First probability of reward
    p2 - Second probability of reward
    cutoff - Only keep results after this many steps
    
    Returns:
    Average number of correct choices.
    
    Throws:
    
    FileNotFoundException - If the directory holding the output directory doesn't exist.
  - averageRegret
```
public static double averageRegret(String outputFile,
                   int[] sizes,
                   int replications,
                   int bandits)
                            throws FileNotFoundException
```
    Computes average regret relative to perfect knowledge given uniform random probabilities. The output contains the quartiles for different numbers of trials. The quartiles are computed by running many experiments for each specified number of trials.
    This can be plotted pretty much directly in R
```
 > x=read.delim(file='~/Apache/storm-aggregator/regret.tsv')
 > bxp(list(com.mapr.stats=t(as.matrix(x[,2:6])), n=rep(1000,times=8),names=x$n))
 
```
    Parameters:
    outputFile - Where to put the output
    sizes - The different size experiments to use
    replications - Number of times to repeat the experiment
    bandits - How many bandits to simulate
    
    Returns:
    Returns the average regret per trial
    
    Throws:
    
    FileNotFoundException - If the output file can't be opened due to a missing directory.
  - totalRegret
```
public static double totalRegret(String cumulativeOutput,
                 String perTurnOutput,
                 int replications,
                 int bandits,
                 int maxSteps,
                 BanditFactory modelFactory,
                 DistributionGenerator refSampler)
                          throws FileNotFoundException
```
    Computes average regret relative to perfect knowledge given uniform random probabilities. The output contains the quartiles for different numbers of trials. The quartiles are computed by running many experiments for each specified number of trials.
    This can be plotted pretty much directly in R
```
 > x=read.delim(file='~/Apache/storm-aggregator/regret.tsv')
 > bxp(list(com.mapr.stats=t(as.matrix(x[,2:6])), n=rep(1000,times=8),names=x$n))
 
```
    Parameters:
    cumulativeOutput - Where to write the cumulative regret results
    perTurnOutput - Where to write the per step regret results
    replications - How many times to replicate the experiment
    bandits - How many bandits to emulate
    maxSteps - Maximum number of trials to run per experiment
    modelFactory - How to construct the solver.
    refSampler - How to get reward distributions for bandits
    
    Returns:
    An estimate of the average final cumulative regret
    
    Throws:
    
    FileNotFoundException - If the output file can't be opened due to a missing directory.

Class BanditTrainer

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

BanditTrainer

Method Detail

main

commitTime

averageRegret

totalRegret