Link Search Menu Expand Document

OBP Overview

OBP Tutorial: The focus is on offline experimentation with four different components

  1. The data set component: the pipeline data set.
  2. The learner or policy component: the policy itself.
  3. The simulator component: runs the policy.
  4. The evaluation component: analyzes if the policy compares well with another type of policy or the standard policy.

OBP PROCESS

OBP_Process From OBP Project

Data Management

  • Datasets
  • Bandit Feedback (similar to historial data on SCRUF-D terms).
    1. Dictionary storing logged data
    2. Action_context: Context vectors characterizing actions (i.e., a vector representation or an embedding of each action).
    3. OBP Extension (Slate): Comparison of bandit feedback

Off-Policy Learner

OBP Off-Policy Learner Notebook

  • Class wrapper for ML model
    1. Example IPWLearner
    2. Off-policy learner based on Inverse Probability Weighting and Supervised Classification.
  • Outputs
    1. predictions
    2. action_probabilty distributions(where len_size = 1) obp_action_dist

Simulation

Based on online policy mainly

Off-Policy Estimators (OPE)

  • Policy values (metrics) based on reward system
  • Return values within [0,1]
  • Estimates the performance of a policy based on log history

Additional Notes

OBP Summary Diagram

obp_summary