A superstitious person may believe they are more likely to win the lottery by playing their lucky numbers. They believe there is a greater chance that their chosen numbers will be drawn.
Of course, unless the lottery is rigged, that is not the case. Most will realize that every number and every combination of numbers has the same likelihood. However – and this is the exciting part – not every combination has the same expected value.
Our idea is this: Because some people are superstitious, and humans are bad at picking numbers randomly, there will be numbers that gamblers play much more frequently than other numbers. If we can play the numbers that other people are least likely to play, we expect to win more money because if we do actually win, the pot will be split among fewer people.
So, it is true that some numbers will make more money in the long run. By making more money, we mean you’d lose money at a slower rate than the average lottery player; you’d still lose money.
Full source code can be found here: https://github.com/retowyss/swisslos-r-griddb/
$ git clone https://github.com/retowyss/swisslos-r-griddb.git
Data Overview and Analysis Tools
We analyze approximately seven years of lottery drawings (n = 720, start = 2013-01-12, end = 2019-12-04) from “Swisslos,” the national lottery of Switzerland, using the statistical programming language R, and we use GridDB for our data storage.
# Required packages RJDBC and tidyverse
# griddb is the connection object to our GridDB
drv <- JDBC(
driverClass = "com.toshiba.mwcloud.gs.sql.Driver",
# Point this to your gridstore jar
classPath = "/jdbc/bin/gridstore-jdbc.jar"
)
# IP and port depend on your setup
griddb <- dbConnect(
drv,
"jdbc:gs://172.20.0.42:20001/dockerGridDB/public",
"admin",
"admin"
)
# vectorized insert function
dbInsertTable <- function(conn, name, df, append = TRUE) {
for (i in seq_len(nrow(df))) {
dbWriteTable(conn, name, df[i, ], append = append)
}
}
dbSendUpdate(griddb, paste(
"CREATE TABLE IF NOT EXISTS swisslos_jackpots",
"(date STRING, jackpot INTEGER);"
))
dbInsertTable(griddb, "swisslos_jackpots", read_csv("data/swisslos_jackpots.csv"))
dbSendUpdate(griddb, paste(
"CREATE TABLE IF NOT EXISTS swisslos_payouts",
"(combination STRING, winners INTEGER, prize FLOAT, date STRING);"
))
dbInsertTable(griddb, "swisslos_payouts", read_csv("data/swisslos_payouts.csv"))
dbSendUpdate(griddb, paste(
"CREATE TABLE IF NOT EXISTS swisslos_numbers",
"(type STRING, number INTEGER, date STRING);"
))
dbInsertTable(griddb, "swisslos_numbers", read_csv("data/swisslos_numbers.csv"))
dbListTables(griddb)
## [1] "swisslos_jackpots" "swisslos_numbers" "swisslos_payouts"
- Jackpot sizes (swisslos_jackpots)
- date
- jackpot: maximum payout (CHF) for 6 + 1
- Drawn numbers (swisslos_numbers)
- type (normal, lucky, replay)
- number
- date
- Payout per category correct (swisslos_payouts)
- combination: normal + lucky (for example, 3 + 1 => three regular correct and lucky number correct)
- winners: number of winning tickets
- prize: payout (CHF) per winner
- date
# stringr::str_interp is a handy function to parameterize SQL queries from R
# just be careful; SQL injections happen.
show_date <- function(conn, table, date = "2013-02-13") {
dbGetQuery(conn, str_interp("SELECT * FROM ${table} WHERE date = '${date}';"))
}
#only show swisslos_ tables
map(keep(dbListTables(griddb), ~ str_detect(., "swisslos_")), ~ show_date(griddb, .))
## [[1]]
## date jackpot
## 1 2013-02-13 8600000
##
## [[2]]
## type number date
## 1 normal 13 2013-02-13
## 2 normal 21 2013-02-13
## 3 normal 25 2013-02-13
## 4 normal 26 2013-02-13
## 5 normal 32 2013-02-13
## 6 normal 40 2013-02-13
## 7 lucky 1 2013-02-13
## 8 replay 13 2013-02-13
##
## [[3]]
## combination winners prize date
## 1 6 + 1 0 0.00 2013-02-13
## 2 6 0 0.00 2013-02-13
## 3 5 + 1 6 7570.15 2013-02-13
## 4 5 36 1000.00 2013-02-13
## 5 4 + 1 283 208.90 2013-02-13
## 6 4 1690 87.35 2013-02-13
## 7 3 + 1 4681 31.85 2013-02-13
## 8 3 28264 10.55 2013-02-13
Swisslos rules and probabilities
To play Swisslos, you choose six numbers between 1 and 42 and a single
Lucky Number between 1 and 6.
regular_numbers <- 42
regular_draws <- 6
lucky_numbers <- 6
lucky_draws <- 1
We can compute the number of possible combinations as follows. Of course, the
Lucky Number increases the number of combinations by a factor of six.
# Binomial coefficient function
# Bin(a, b)
bin <- function(a, b) {
map2_dbl(a, b, function(.a, .b) {
if (.b == 0 | .a == .b) {
1
} else {
.c <- .a - .b + 1
prod(.c:.a) / prod(1:.b)
}
})
}
# 42 choose 6
swisslos_regular_combos <- bin(regular_numbers, regular_draws)
There are 5245786 ways to choose six from 42, factoring in the
Lucky Number there are 31474716 combinations. Similarly, we can calculate the combinations of three, four, and five with or without the lucky number. So, we can calculate the probabilities.
# probability to get n correct
swisslos_prob <- function(n) {
n_match <- bin(regular_draws, n)
n_miss <- bin(regular_numbers - regular_draws, regular_draws - n)
n_miss * n_match / swisslos_regular_combos
}
# We can check correctness with sum(swisslos_prob(0:6)) == 1, which yield true
tibble(n_correct = 0:6) %>%
mutate(
prob_base = swisslos_prob(n_correct),
prob_lucky = prob_base / 6,
prob_not_lucky = prob_base - prob_lucky
) %>%
knitr::kable(digits = 8)
n_correct | prob_base | prob_lucky | prob_not_lucky |
---|---|---|---|
0 | 0.37130603 | 0.06188434 | 0.30942170 |
1 | 0.43119411 | 0.07186568 | 0.35932842 |
2 | 0.16843520 | 0.02807253 | 0.14036266 |
3 | 0.02722185 | 0.00453698 | 0.02268488 |
4 | 0.00180145 | 0.00030024 | 0.00150120 |
5 | 0.00004118 | 0.00000686 | 0.00003431 |
6 | 0.00000019 | 0.00000003 | 0.00000016 |
Note: It’s more likely to get one right (~43%) than none (~37%). It’s counter-intuitve and parallel to the
Analysis
How many people play Swisslos every week?
Given that chance to get three out of six is 2.72%, we can estimate the number of people that play Swisslos (actually, we estimate the number of played tickets).
# To retrieve the data from GridDB we type a SQL query
# Computing the result with SQL makes it unnecessary to pull the entire table
# into R
# paste makes it easy to type long SQL and break it up into multiple lines
# we make n a parameter because you could use 4 instead of 3 but 3 gets greater
# counts so we will stick with that
get_n_correct <- function(conn, n) {
q <- paste(
"SELECT SUM(winners) AS winners, date",
"FROM swisslos_payouts",
"WHERE combination = '${n}' OR combination = '${n} + 1'",
"GROUP BY date",
"ORDER BY date;"
)
dbGetQuery(conn, str_interp(q))
}
three_correct <- get_n_correct(griddb, 3) %>%
mutate(tickets_played = winners / swisslos_prob(3)) %>%
as_tibble()
three_correct %>% head(5)
## # A tibble: 5 × 3
## winners date tickets_played
## <dbl> <chr> <dbl>
## 1 60705 2013-01-12 2230010.
## 2 33745 2013-01-16 1239629.
## 3 43457 2013-01-19 1596401.
## 4 35013 2013-01-23 1286209.
## 5 48120 2013-01-26 1767698.
We can now plot the number of tickets that have been played overtime,
three_correct %>%
ggplot(aes(x = lubridate::as_date(date), y = tickets_played)) +
geom_col()
but maybe more interestingly, we can plot the number of tickets against the jackpot.
# You can test your SQL here, but because we need to get the entire
# swisslos_jackpots table, we might just as well join it in R
three_correct_jp <- three_correct %>%
left_join(dbGetQuery(griddb, "SELECT * FROM swisslos_jackpots;"), by = "date")
three_correct_jp %>%
ggplot(aes(x = tickets_played, y = jackpot)) +
geom_point()
We can clearly see a positive correlation between jackpot size and the number of tickets played.
cor.test(three_correct_jp$tickets_played, three_correct_jp$jackpot)
##
## Pearson's product-moment correlation
##
## data: three_correct_jp$tickets_played and three_correct_jp$jackpot
## t = 33.147, df = 718, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7470597 0.8050008
## sample estimates:
## cor
## 0.7776764