lichess.org
Donate

Exploring how Lichess' players spend their playing time, Part 1 of 2

@Anfield said in #12:

> If so I think it would be interesting to create the same matrix with B only including players who play a certain TC the most (measured by no idea what, maybe just number of games) and looking at how many of them also play other TCs. That way each player only shows up in one square and you can maybe see what TC leads to the most variety and which one is the most specialized.

Great question thanks for asking! I've actually recomputed the matrix with your idea, but using time spent playing instead of games:
github.com/kraktus/lichess-time-spent/assets/56031107/e9d0a8da-7b76-41a6-b5e6-8a13f581ca77

to explain using ultrabullet and bullet as an example: the 6.8% figure mean that 6.8% of ultrabullet players (players with at least one game), also have played at least one game of bullet **but** less time than they have played ultrabullet (hence the primarily in the y-axis, and secondary on the x-axis).
And 4.6% of bullet players (players with at least one game), also have played at least one game of ultabullet **but** less time than they have played bullet, etc.

@NoseKnowsAll said in #13:
> I hope part 3 compares ratings across time controls. Would be interesting to quantify the average rating in rapid of a 1700 blitz player, for instance. Or the average rating in bullet of a 2000 classical player.
>
> If you want to go a lot deeper, I think it would be super interesting to see how players do in 3+0 vs 3+2 or 15+10 vs 10+0. I could imagine the different pools show different distributions even within one "category." Showing that 2000 classical players do better at 5+3 than 3+2 than 3+0 for instance would be enlightening if true.

Time controls sub-pools are definitely something interesting to analyse! I don't have any data to produce charts easily since I only exported the time control category, but as an anecdotal evidence I see a rating change of more than 100 points when switching from 1+0 to 2+1.

@hoover2 said in #16:
> "Players of a time control are defined as all players that have played at least one game in that specific TC."
>
> Without being an expert in statistics I felt that selection may bring a substantial bias to the data. This may be supported by one of the graphs presented (no. of players vs. no. of games): A substantial amount may just have tried one time control once or twice, but that does not essentially mean they are regularily active in this TC no?
>
> Therefore I wondered, why the data were selected as presented and how the results would look like with a more stringent selection. Would it make sense to use a higher "min number of games"-cutoff to have a more obust baseline?
>
> Any feedback on my thoughts will be appreciated.

Thanks for the compliment! Very accurate remark. How to properly define what is a "(ultra)bullet/blitz/rapid/classical" player is a crucial question and unfortunately there was no clear cutoff in the number of games nor time spent. So I decided somewhat arbitrarily to cut at 1 game. I've reproduced the figure 3, this time with the minimum cutoff to be 10 games (which is almost the median number of games in every TC) for the TC on the y-axis, and the numbers are not significantly changed:
user-images.githubusercontent.com/56031107/279203895-eadf70b7-3297-408c-994b-987b2ee627c3.png

@Bobbylon said in #17:
> I did not see two simple figures which are of interest to me, and the reason why I read this post:
>
> 1.How much time does the average player spend playing for a time period (month/year).
>
> 2.How many games does the average player complete for a time period (month/year).

Here are the results (with some extra measurements). Note that the average is not very representative given the super active players, and that the median would be more appropriate. For example in blitz in a month, median is 15 games in a month, or 1h17, whereas the average is 138 games and 6h10.
github.com/kraktus/lichess-time-spent/assets/56031107/c5122d43-14d3-4de3-8a55-a8b8a9716e5d
@Solal35 said in #22:
> Great question thanks for asking! I've actually recomputed the matrix with your idea, but using time spent playing instead of games:
> github.com/kraktus/lichess-time-spent/assets/56031107/e9d0a8da-7b76-41a6-b5e6-8a13f581ca77
>
> to explain using ultrabullet and bullet as an example: the 6.8% figure mean that 6.8% of ultrabullet players (players with at least one game), also have played at least one game of bullet **but** less time than they have played ultrabullet (hence the primarily in the y-axis, and secondary on the x-axis).
> And 4.6% of bullet players (players with at least one game), also have played at least one game of ultabullet **but** less time than they have played bullet, etc.

Thank you very much for this! Interesting to see that players playing faster TCs generally seem to not care too much for the slower ones, myself included. That also might be further proof that patience correlates with openness, as for example analyzed here: files.eric.ed.gov/fulltext/EJ1170094.pdf

Great work @Solal35
@fisherslo said in #18:
> Can you let us know what is the ratio among all games between time + 0 seconds Vs Time + X seconds. Meaning, how many games are played with incremental?
>
> 3 + 0 vs 3 +2
> 5 + 0 vs 5+ 3
> 10 + 0 vs 10 + 5
>
> Thank you.

That would be nice analysis, unfortunately it's not easy to get from my data, as I only keep the time category

@CrucibleCupid said in #19:
> May I know How you got this data, is there any API to access lichess database?

Hey, I've explained it at the very top, with a link to the lichess database

@OneAndAHalph said in #21:
> Awesome job, congrats!
>
> What steps did you follow, starting from the donwload of the raw data until obtaining the clean dataframe for which you showed a few rowes? I looked into the GitHub repo and failed to find it–apologies!
>
> I think that writing about these steps could inspire others to squeeze a little bit more these data :D

Thanks! I've significantly more detailed the steps in the README, I hope it will be enough for you: github.com/kraktus/lichess-time-spent. You still need to have some basic knowledge of the command-line to get it working!
Tnx. Doesnt Lichess write for every game in time mode if it is 300 + 0 or 300 + 3 , meaning that every gane where you have positive number after + , it means it is game with time increment?

From here you only count all games with 300 + 0. vs all games 300 + 3 and you get the ratio?
@fisherslo said in #25:
> 0 + 0 or 300 + 3 , meaning that every gane where you have positive number after + , it means it is game with time increment?

Yes, but considering a month of data is 300Gb, it's not that easy to count, and needs specialised tools. I could extract those numbers but it would take me some time I do not currently have!
Ohh, I tought you already extracted the data for allnthe reports you did in your anlysis.