OBP Overview

OBP Tutorial: The focus is on offline experimentation with four different components

The data set component: the pipeline data set.
The learner or policy component: the policy itself.
The simulator component: runs the policy.
The evaluation component: analyzes if the policy compares well with another type of policy or the standard policy.

OBP PROCESS

OBP_Process From OBP Project

Data Management

Datasets
Bandit Feedback (similar to historial data on SCRUF-D terms).
1. Dictionary storing logged data
2. Action_context: Context vectors characterizing actions (i.e., a vector representation or an embedding of each action).
3. OBP Extension (Slate): Comparison of bandit feedback

Off-Policy Learner

OBP Off-Policy Learner Notebook

Class wrapper for ML model
1. Example IPWLearner
2. Off-policy learner based on Inverse Probability Weighting and Supervised Classification.
Outputs
1. predictions
2. action_probabilty distributions(where len_size = 1)

Simulation

Based on online policy mainly

Off-Policy Estimators (OPE)

Policy values (metrics) based on reward system
Return values within [0,1]
Estimates the performance of a policy based on log history

Additional Notes

OBP Summary Diagram

obp_summary