I recently got a comment on a post I published one year ago, here, about the fact that in September 2009, on the 6th and the 10th, the 6 same numbers came out at the lottery, in Bulgaria (but I do not understand the question: the author of the comment ask about the order the numbers came out…)

Xi’an published also a post on that topic, there, since last week, the same thing happened in Israel.

All that reminded me a discussion I had with a colleague about another post (here) where I mentioned that I found a strange distribution of numbers in the French lottery (the old one actually). For those who want to check, all historical events are here, in a zip file. My colleague was wondering if I found the martingale to win the lottery…

First, I do not like that term, since *martingale* is something different from a mathematical point of view… Second, let us look if it would have been possible to make some money… (free lunch ?)

> loto=read.table("D:\\loto.csv",dec=",",header=TRUE,sep=";") > ntirage=nrow(loto) > loto=loto[51:ntirage,] > ntirage=nrow(loto) > N=as.matrix(loto[,c("boule_1","boule_2","boule_3","boule_4","boule_5","boule_6")]) > n=as.vector(N) > length(n) [1] 28848 > (TN=table(n)) n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 607 576 571 618 579 598 608 582 588 590 562 577 577 580 591 630 558 567 594 608 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 578 562 579 583 574 589 602 572 550 598 604 582 545 646 597 618 599 636 609 588 41 42 43 44 45 46 47 48 49 576 589 577 585 618 596 560 571 604

So, it might look nice, but we have to compare that distribution with the one we should have with “*independent*” draws. It is not possible to look at a discrete uniform distribution: the six numbers are not independent. Each day, the 49 balls are back in the urn, but within a day, we do not have independent draws (it is a sample without replacement of balls). Hence, with 4808 lottery draws, each number cannot be obtained more than 4808 times. So, let us use monte carlo techniques to look at the *theoretical *distribution,

> M=matrix(NA,49,1000) > for(s in 1:1000){ + B=NA + for(i in 1:ntirage){B=c(B,sample(1:49,size=6,replace=FALSE))} + B=B[-1] + M[,s]=sort(table(B)) + } > q50=function(x){quantile(x,.5)} > Q50=apply(M,1,q50) > lines(1:49,Q50,col="red",lwd=2) > q10=function(x){quantile(x,.1)} > Q10=apply(M,1,q10) > q90=function(x){quantile(x,.9)} > Q90=apply(M,1,q90) > polygon(c(1:49,49:1),c(Q10,rev(Q90)),col="light blue",border=NA) > lines(1:49,Q10,col="red",lty=2) > lines(1:49,Q90,col="red",lty=2) > lines(1:49,Q50,col="red",lwd=2) > points(1:49,sort(TN),pch=19,type="b")

Looking at the graph, it *looks* like some numbers appeared *too* frequently, especially the ones that did not appear frequently (bottom left). So, since I have removed the last 50 draws, let us see if we could have used that information, somehow…

> nb=names(sort(TN)) > loto=read.table("D:\\loto.csv",dec=",",header=TRUE,sep=";") > loto=loto[1:50,] > N=as.matrix(loto[,c("boule_1","boule_2","boule_3","boule_4","boule_5","boule_6")]) > n=as.vector(N) > TN=table(n) > TN[nb] > barplot(TN[nb])

Unfortunately, numbers that came out *too* frequently over 4800 draws did not appear that frequently of the last 50. Playing top number might not have been a great strategy.

(numbers that came out frequently are on the right, while those we did not see much are on the left)… What about worst numbers: if I had decided to play the 6 that did not come out very frequently (we’ve seen earlier that they should have appeared even less, actually), would it have been interesting ? As we can see, our top 2 numbers were numbers that did not appear frequently earlier (29 and 47 appears respectively 10 and 11 times over 50 draws)….

Over 50 draws of 6 balls, the expected frequency of 6 given number is around 36.7,..

> S=rep(NA,10000) > for(s in 1:10000){ + B=NA + for(i in 1:50){B=c(B,sample(1:49,size=6,replace=FALSE))} + B=B[-1] + S[s]=sum(B%in%(1:6)) + } > mean(S) [1] 36.7694

But here for the top 6, we have

> z=TN[nb] > sum(rev(z)[1:6]) [1] 29

i.e. the top 6 appeared 29 times over 50 draw of 6 balls (which looks low) and for the worst 6, it is a bit higher,

> sum(z[1:6]) [1] 38

If we look at the *theoretical *density of the frequency of 6 given number, we have

i.e. our worst 6 is a nice average (in green) while top 6 did not appear frequently this time (here in blue) ! So we could not have used that information….

Anyway, if some of you are interesting using statistics to get a free lunch, with the *nouveau loto*, I did not see any strange pattern (data can be downloaded here).

I am terribly sorry, but I cannot help anyone winning at the French Lottery….