Can You Best The Mysterious Man In The Trench Coat?

Expanding on the Riddler's Problem

in r

March 20, 2016

The latest Riddler puzzle on FiveThirtyEight:

A man in a trench coat approaches you and pulls an envelope from his pocket. He tells you that it contains a sum of money in bills, anywhere from $1 up to $1,000. He says that if you can guess the exact amount, you can keep the money. After each of your guesses he will tell you if your guess is too high, or too low. But! You only get nine tries. What should your first guess be to maximize your expected winnings?

My solution is based on a basic, yet elegant, strategy. The first guess can be selected arbitrarily between $1 and $1000 - let’s say here that my first guess is $ 500. If my guess is correct, then I win (yay!). But since I have just a 1 in 1000 probability of guessing correctly on the first try, I’m probably not done. So if the trenchcoat man says the actual value is higher, my next guess will be the midway point between my first guess and the maximum possible value. Initially, this will be $1000. If the trenchcoat man says the actual value is lower, my next guess will be the midway point between my first guess and the minimum possible value ($1).

So let’s say my guess is too low and the actual value is higher. My second guess would be $750. If I’m correct, I win. If the actual amount is lower, my next guess will be the midpoint between $500 and $750 - remember that I now know it must be within this range.

I can iterate through this process with up to 9 guesses. At that point, if I still have not guessed the amount, I lose.

To simulate this process in R, I wrote the following function

library(dplyr)
library(ggplot2)
library(ggrepel)

theme_set(theme_minimal(base_size = 16))
set.seed(123)

# function to guess money amount using strategy
guess_money <- function(actual, initial, n_tries = 9,
                        min_val = 1, max_val = 1000,
                        print_guess = FALSE) {
  # set iterator
  i <- 1

  # while i is less than the max number of guesses, find the median value
  # within the possible range. if guess is not correct, reset min_val or max_val
  # depending on info trenchcoat man provides
  while (i <= n_tries) {
    if (i == 1) {
      guess <- initial
    } else {
      guess <- round(mean(c(min_val, max_val)))
    }

    # print the guess if print_guess is TRUE
    if (print_guess) cat(paste0("Guess Number ", i, ": $", guess), sep = "\n")

    # if guess is correct, immediately exit the loop and return true
    # if guess is not correct:
    ## if actual is higher than guess, change min_val to guess
    ## if actual is lower than guess, change max_val to guess
    if (actual == guess) {
      return(c(win = TRUE, round = i))
    } else if (actual > guess) {
      min_val <- guess
    } else if (actual < guess) {
      max_val <- guess
    }

    # iterate to next round if guess was incorrect
    i <- i + 1
  }

  # at this point still have not guessed the money amount, so lose
  # correct i since we didn't really guess the i-th time
  return(c(win = FALSE, round = i - 1))
}

As an example, let’s say the actual amount of money is $736 and my first guess is $500. Here’s how that would play out:

guess_money(actual = 736, initial = 500, print_guess = TRUE)

## Guess Number 1: $500
## Guess Number 2: $750
## Guess Number 3: $625
## Guess Number 4: $688
## Guess Number 5: $719
## Guess Number 6: $734
## Guess Number 7: $742
## Guess Number 8: $738
## Guess Number 9: $736

##   win round 
##     1     9

This tells me the different guesses, as well as the fact that I eventually won (win = 1) in the ninth round.

Of course, there is no reason why I have to choose $500 for my initial guess. What if I instead started at $1?

guess_money(actual = 736, initial = 1, print_guess = TRUE)

## Guess Number 1: $1
## Guess Number 2: $500
## Guess Number 3: $750
## Guess Number 4: $625
## Guess Number 5: $688
## Guess Number 6: $719
## Guess Number 7: $734
## Guess Number 8: $742
## Guess Number 9: $738

##   win round 
##     0     9

Clearly not the best initial guess. I wasted my first guess and ended up not winning the money. But how do we know which initial guess provides the highest expected value? That is, the initial guess that maximizes my potential winnings regardless of the actual amount of money held by the trenchcoat man?

To answer that question, I calculate the results for every potential initial guess (each integer between 1 and 1000) and every potential actual amount of money (again, each integer between 1 and 1000). This results in 1,000,000 different potential game states. From there, we can calculate the average winnings for each initial guess. These average winnings are the expected value, or what we might expect to win if we always use that amount for the initial guess.

In order to do this in R, I use the Vectorize function to expand my original function to work with multiple game states.

min_val <- 1
max_val <- 1000
actual_vals <- min_val:max_val
guess_vals <- min_val:max_val

data <- expand.grid(actual = actual_vals, guess = guess_vals) %>%
  tbl_df()

## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.

data

## # A tibble: 1,000,000 x 2
##    actual guess
##     <int> <int>
##  1      1     1
##  2      2     1
##  3      3     1
##  4      4     1
##  5      5     1
##  6      6     1
##  7      7     1
##  8      8     1
##  9      9     1
## 10     10     1
## # … with 999,990 more rows

result <- with(data, Vectorize(guess_money)(actual = actual,
  initial = guess,
  min_val = min_val,
  max_val = max_val))

both <- bind_cols(data, t(result) %>%
  as.data.frame())
both

## # A tibble: 1,000,000 x 4
##    actual guess   win round
##     <int> <int> <dbl> <dbl>
##  1      1     1     1     1
##  2      2     1     0     9
##  3      3     1     0     9
##  4      4     1     1     9
##  5      5     1     0     9
##  6      6     1     0     9
##  7      7     1     0     9
##  8      8     1     1     8
##  9      9     1     0     9
## 10     10     1     0     9
## # … with 999,990 more rows

Now that we have all the potential outcomes of the game, I can calculate the expected winnings for each initial guess and find the best starting point.

exp_val <- both %>%
  group_by(guess) %>%
  summarize(
    win_rate = mean(win),
    exp_val = mean(actual * win)
  ) %>%
  ungroup()
exp_val

## # A tibble: 1,000 x 3
##    guess win_rate exp_val
##    <int>    <dbl>   <dbl>
##  1     1    0.256    128.
##  2     2    0.256    128.
##  3     3    0.257    128.
##  4     4    0.258    128.
##  5     5    0.259    128.
##  6     6    0.26     128.
##  7     7    0.261    129.
##  8     8    0.262    129.
##  9     9    0.263    129.
## 10    10    0.264    129.
## # … with 990 more rows

exp_val_max <- exp_val %>%
  filter(exp_val == max(exp_val))

ggplot(exp_val, aes(guess, exp_val)) +
  geom_line() +
  geom_point(data = exp_val_max) +
  geom_text(
    data = exp_val_max, aes(label = paste0("$", guess)),
    hjust = -.25
  ) +
  scale_x_continuous(labels = scales::dollar) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    x = "Initial Guess",
    y = "Average Winnings"
  )

So if you get up to nine guesses, your first guess should be $744. Why is it not $500? Shouldn’t that be optimal, since it minimizes the potential range of values for which you’ll need to initially account? Well, not quite.

There are a range of initial guesses that provide you the same overall win rate.

both %>%
  group_by(guess) %>%
  summarize(win_rate = mean(win)) %>%
  ggplot(aes(guess, win_rate)) +
  geom_line() +
  scale_x_continuous(labels = scales::dollar) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    x = "Initial Guess",
    y = "Win Rate"
  )

The win rate for initially guessing $300 is the same as for initially guessing $600 - 51.1%. However the expected value for initially guessing $300 is just $204, compared to initially guessing $600 ($281). Which actual values can you win before you run out of attempts?

both %>%
  filter(guess == 300 | guess == 600) %>%
  mutate(
    win = factor(win, levels = 0:1, labels = c("Lose", "Win")),
    guess = factor(guess, labels = c(
      "Initial Guess: $300",
      "Initial Guess: $600"
    ))
  ) %>%
  ggplot(aes(x = actual, color = win)) +
  facet_wrap(~guess) +
  geom_vline(aes(xintercept = actual, color = win)) +
  scale_color_brewer(type = "qual") +
  labs(
    x = "Actual Amount of Money",
    color = NULL
  ) +
  scale_x_continuous(labels = scales::dollar) +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(override.aes = list(size = 3)))

This is the crux: lower starting guesses allow you to win at the same rate, but the value of each set of winnings is lower.

More (or Fewer) Guesses

But what if we modify the game rules so that you get fewer guesses? Or more guesses? How does the number of attempts change the optimal starting guess?

Here I do the same thing as before, but I vary the number of tries the player gets for each set of simulations.

guess_money_mult <- function(n_tries = 1, min_val = 1, max_val = 1000) {
  actual_vals <- min_val:max_val
  guess_vals <- min_val:max_val

  data <- expand.grid(actual = actual_vals, guess = guess_vals) %>%
    tbl_df()

  result <- with(data, Vectorize(guess_money)(actual = actual,
    initial = guess,
    n_tries = n_tries,
    min_val = min_val,
    max_val = max_val))

  both <- bind_cols(data, t(result) %>%
    as.data.frame()) %>%
    mutate(n_tries = n_tries)

  return(both)
}

tries_all <- lapply(1:11, function(x) guess_money_mult(n_tries = x)) %>%
  bind_rows()

tries_all_exp <- tries_all %>%
  mutate(n_tries = factor(n_tries)) %>%
  group_by(guess, n_tries) %>%
  summarize(
    win_rate = mean(win),
    exp_val = mean(actual * win)
  )

## `summarise()` has grouped output by 'guess'. You can override using the `.groups` argument.

tries_all_exp_max <- tries_all_exp %>%
  group_by(n_tries) %>%
  filter(exp_val == max(exp_val)) %>%
  arrange(-exp_val) %>%
  slice(1)

ggplot(tries_all_exp, aes(guess, exp_val,
  group = n_tries, color = n_tries
)) +
  geom_line() +
  geom_point(data = tries_all_exp_max) +
  geom_label_repel(
    data = tries_all_exp_max,
    aes(label = paste0("$", guess)),
    show.legend = FALSE
  ) +
  scale_x_continuous(labels = scales::dollar) +
  scale_y_continuous(labels = scales::dollar) +
  scale_color_brewer(type = "seq", guide = guide_legend(reverse = TRUE)) +
  labs(
    x = "Initial Guess",
    y = "Expected Value",
    color = "Number of\nGuesses",
    group = "Number of\nGuesses"
  )

## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors

## Warning: Removed 2000 row(s) containing missing values (geom_path).

## Warning: Removed 2 rows containing missing values (geom_point).

The fewer guesses you receive, the higher your initial guess must be to maximize your expected winnings. If you had 12 11 or more guesses, it simply does not matter what your initial guess is: you can always win using my proposed strategy.

Update: Only Need 11 Guesses

ggplot(tries_all_exp, aes(guess, win_rate,
  group = n_tries, color = n_tries
)) +
  geom_line() +
  scale_x_continuous(labels = scales::dollar) +
  scale_y_continuous(labels = scales::percent) +
  scale_color_brewer(type = "seq", guide = guide_legend(reverse = TRUE)) +
  labs(
    x = "Initial Guess",
    y = "Win Rate",
    color = "Number of\nGuesses",
    group = "Number of\nGuesses"
  )

## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors

## Warning: Removed 2000 row(s) containing missing values (geom_path).

11 is the minimum number of guesses needed to guarantee victory.

Update 2: $744 or $745?

Others have found the optimal starting guess to be $745. This discrepancy is based on how you round each guess. The default R approach to rounding is complicated, but adheres to international standards.

Original rounding method

min_val <- 1
max_val <- 1000
actual_vals <- min_val:max_val
guess_vals <- min_val:max_val

data <- expand.grid(actual = actual_vals, guess = 744:745) %>%
  tbl_df()

result <- with(data, Vectorize(guess_money)(actual = actual,
  initial = guess,
  min_val = min_val,
  max_val = max_val))

bind_cols(data, t(result) %>%
  as.data.frame()) %>%
  group_by(guess) %>%
  summarize(
    win_rate = mean(win),
    exp_val = mean(actual * win)
  ) %>%
  ungroup() %>%
  filter(guess == 744 | guess == 745)

## # A tibble: 2 x 3
##   guess win_rate exp_val
##   <int>    <dbl>   <dbl>
## 1   744    0.511    318.
## 2   745    0.51     317.

Always round down

guess_money_floor <- function(actual, initial, n_tries = 9,
                              min_val = 1, max_val = 1000,
                              print_guess = FALSE) {
  # set iterator
  i <- 1

  # while i is less than the max number of guesses, find the median value
  # within the possible range. if guess is not correct, reset min_val or max_val
  # depending on info trenchcoat man provides
  while (i <= n_tries) {
    if (i == 1) {
      guess <- initial
    } else {
      guess <- floor(mean(c(min_val, max_val)))
    }

    # print the guess if print_guess is TRUE
    if (print_guess) cat(paste0("Guess Number ", i, ": $", guess), sep = "\n")

    # if guess is correct, immediately exit the loop and return true
    # if guess is not correct:
    ## if actual is higher than guess, change min_val to guess
    ## if actual is lower than guess, change max_val to guess
    if (actual == guess) {
      return(c(win = TRUE, round = i))
    } else if (actual > guess) {
      min_val <- guess
    } else if (actual < guess) {
      max_val <- guess
    }

    # iterate to next round if guess was incorrect
    i <- i + 1
  }

  # at this point still have not guessed the money amount, so lose
  # correct i since we didn't really guess the i-th time
  return(c(win = FALSE, round = i - 1))
}

result <- with(data, Vectorize(guess_money_floor)(actual = actual,
  initial = guess,
  min_val = min_val,
  max_val = max_val))

bind_cols(data, t(result) %>%
  as.data.frame()) %>%
  group_by(guess) %>%
  summarize(
    win_rate = mean(win),
    exp_val = mean(actual * win)
  ) %>%
  ungroup() %>%
  filter(guess == 744 | guess == 745)

## # A tibble: 2 x 3
##   guess win_rate exp_val
##   <int>    <dbl>   <dbl>
## 1   744    0.511    318.
## 2   745    0.51     317.

Always round up

guess_money_ceiling <- function(actual, initial, n_tries = 9,
                                min_val = 1, max_val = 1000,
                                print_guess = FALSE) {
  # set iterator
  i <- 1

  # while i is less than the max number of guesses, find the median value
  # within the possible range. if guess is not correct, reset min_val or max_val
  # depending on info trenchcoat man provides
  while (i <= n_tries) {
    if (i == 1) {
      guess <- initial
    } else {
      guess <- ceiling(mean(c(min_val, max_val)))
    }

    # print the guess if print_guess is TRUE
    if (print_guess) cat(paste0("Guess Number ", i, ": $", guess), sep = "\n")

    # if guess is correct, immediately exit the loop and return true
    # if guess is not correct:
    ## if actual is higher than guess, change min_val to guess
    ## if actual is lower than guess, change max_val to guess
    if (actual == guess) {
      return(c(win = TRUE, round = i))
    } else if (actual > guess) {
      min_val <- guess
    } else if (actual < guess) {
      max_val <- guess
    }

    # iterate to next round if guess was incorrect
    i <- i + 1
  }

  # at this point still have not guessed the money amount, so lose
  # correct i since we didn't really guess the i-th time
  return(c(win = FALSE, round = i - 1))
}

result <- with(data, Vectorize(guess_money_ceiling)(actual = actual,
  initial = guess,
  min_val = min_val,
  max_val = max_val))

bind_cols(data, t(result) %>%
  as.data.frame()) %>%
  group_by(guess) %>%
  summarize(
    win_rate = mean(win),
    exp_val = mean(actual * win)
  ) %>%
  ungroup() %>%
  filter(guess == 744 | guess == 745)

## # A tibble: 2 x 3
##   guess win_rate exp_val
##   <int>    <dbl>   <dbl>
## 1   744    0.511    318.
## 2   745    0.511    319.

Session Info

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.4 (2021-02-15)
##  os       macOS Big Sur 10.16         
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2021-06-01                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
##  blogdown      1.3     2021-04-14 [1] CRAN (R 4.0.2)
##  bookdown      0.22    2021-04-22 [1] CRAN (R 4.0.2)
##  bslib         0.2.5   2021-05-12 [1] CRAN (R 4.0.4)
##  cachem        1.0.5   2021-05-15 [1] CRAN (R 4.0.2)
##  callr         3.7.0   2021-04-20 [1] CRAN (R 4.0.2)
##  cli           2.5.0   2021-04-26 [1] CRAN (R 4.0.2)
##  colorspace    2.0-1   2021-05-04 [1] CRAN (R 4.0.2)
##  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.2)
##  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.2)
##  desc          1.3.0   2021-03-05 [1] CRAN (R 4.0.2)
##  devtools      2.4.1   2021-05-05 [1] CRAN (R 4.0.2)
##  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.2)
##  dplyr       * 1.0.6   2021-05-05 [1] CRAN (R 4.0.2)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.2)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
##  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.2)
##  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.0.2)
##  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
##  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.2)
##  ggplot2     * 3.3.3   2020-12-30 [1] CRAN (R 4.0.2)
##  ggrepel     * 0.9.1   2021-01-15 [1] CRAN (R 4.0.2)
##  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
##  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.0.0)
##  here          1.0.1   2020-12-13 [1] CRAN (R 4.0.2)
##  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.0.2)
##  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.0.2)
##  knitr         1.33    2021-04-24 [1] CRAN (R 4.0.2)
##  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.2)
##  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.2)
##  memoise       2.0.0   2021-01-26 [1] CRAN (R 4.0.2)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.0.0)
##  pillar        1.6.1   2021-05-16 [1] CRAN (R 4.0.4)
##  pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 4.0.2)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
##  pkgload       1.2.1   2021-04-06 [1] CRAN (R 4.0.2)
##  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
##  processx      3.5.2   2021-04-30 [1] CRAN (R 4.0.2)
##  ps            1.6.0   2021-02-28 [1] CRAN (R 4.0.2)
##  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
##  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)
##  Rcpp          1.0.6   2021-01-15 [1] CRAN (R 4.0.2)
##  remotes       2.3.0   2021-04-01 [1] CRAN (R 4.0.2)
##  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.0.2)
##  rmarkdown     2.8     2021-05-07 [1] CRAN (R 4.0.2)
##  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.2)
##  sass          0.4.0   2021-05-12 [1] CRAN (R 4.0.2)
##  scales        1.1.1   2020-05-11 [1] CRAN (R 4.0.0)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
##  stringi       1.6.1   2021-05-10 [1] CRAN (R 4.0.2)
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
##  testthat      3.0.2   2021-02-14 [1] CRAN (R 4.0.2)
##  tibble        3.1.1   2021-04-18 [1] CRAN (R 4.0.2)
##  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.0.2)
##  usethis       2.0.1   2021-02-10 [1] CRAN (R 4.0.2)
##  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.2)
##  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.0.2)
##  withr         2.4.2   2021-04-18 [1] CRAN (R 4.0.2)
##  xfun          0.23    2021-05-15 [1] CRAN (R 4.0.2)
##  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Posted on:: March 20, 2016

Length:: 15 minute read, 3170 words

Categories:: r

Tags:: r riddler

See Also:: Can You Defeat the Detroit Lines?; More Menorah Math!; Time travel for the lottery