If the two were to make a match, say over 10 games, then the better of the two would have to score, say, 60% of the points, i.e. 6 out of 10. Should the result be otherwise, one would conclude that either one has improved or the other has deteriorated, or both, and would try to adjust the playing strengths appropriately.
To give an example: player 1 scores 7 instead of 6. Then he was apparently better than initially thought. His playing strength would be adjusted. Conversely, player 2 may have been overrated and his playing strength would be adjusted downwards. The players make another match. However, player 1 would now have to score 6.5 out of 10, due to the correction of the playing strengths. If he actually achieves this, there would be no need to correct. But if he now scores only 5.5, then he would have to be corrected downwards again and player 2 upwards.
Because you already have an estimate for many players, you assume that the numbers will eventually adjust to more or less exactly reflect their playing strengths and that you can still make an approximately good forecast for the distribution of the chances of winning when two players who are complete strangers meet.
However, if this forecast is not correct, no harm is done. The playing strengths are simply corrected again. The problem here is that incorrect forecasts cannot be checked in the form of bets. An incorrect forecast based on the elite system only leads to a correction in the playing strengths, no one loses or wins money with it.
By the way, I myself offered bets on chess games very early on, in 1981, at a tournament in San Bernardino with my friend Christian Maier. It was a very informal tournament and all the participants took part. We offered all the games of the tournament in the form of odds on 1-X-2 and everyone bet. Everyone bet one to two D-marks. I can no longer judge how good our odds were. But we made a good profit at the end of the tournament. Of course, that could also have been luck.
1) Suitability as a prediction system
So, if the Elosystem is to be used as a prediction system, some things would have to be improved. First of all, the statement “Player 1 scores 60% of the points against player 2” is not very suitable as a prediction. 60% can be 6 wins and 4 losses or 2 wins and 8 draws or anything in between. It is a question that does not exist in the elosystem.
Obviously, the probability of a draw depends on both strength and character. Weaker players are generally less likely to draw. The reason is quite obvious: the mistakes constantly result in a large swing in favour of one or the other. Why, then, should there be no swing at the end and the two meet in a draw (for proof, please take another look at the chapter “Game Developments”; there you can see this confirmed in the diagram).
In addition, however, there is the criterion of a player’s character. So there are also weaker players who are rather cautious and anxious, which inevitably increases their probability of a draw (as proof it is enough to consider that they are more inclined both to offer a draw and to accept one that is offered). In the same way, there are also players among grandmasters who are not very peaceful and always play to win, which inevitably involves risks that can also result in a higher probability of losing.
The probability of a draw therefore depends partly on the individual and partly on the general level of playing strength. This would first have to be introduced as an additional parameter(s). Then it is quite obvious that the attracting player, i.e. the player with the white pieces, has an advantage. This is also called the move advantage. Databases show that about 70% of all winning games are won by the white pieces.
iv) The Pauli system
The system I propose for calculating the probability distribution is, of course, the so-called “Pauli system”. I have already derived the formula in the chapter on tennis. Each player is given a playing strength in the form of a percentage number between 0 and 1(00%). The two playing strength numbers are calculated in the same way as in tennis. In the end, this results in the percentage distribution for this game, exactly comparable to Elo, but more straightforward, easier to calculate and also reliable and exact.
So, if we let two players compete against each other, one with a playing strength of 0.82 (=82%), the other with a playing strength of 0.64 (=64%),
then we first divide both strengths by their counter probabilities. 0.82/0.18 = 4.56. Player 1 is therefore 4.56 times as strong as the average player. Player 2 has a value of 0.64/0.36 = 1.78. So he is only 1.78 times as strong as the average player.
The ratio of player 1 to player 2 is, as it is pronounced, a ratio. Ratios are, mathematically speaking, quotients (i.e. fractions; but don’t break now, and especially not because of this, with your partner, please!). So we divide 4.56/1.78 and get the winning ratio of player 1 to player 2 as a 2.56. So player 1 wins against player 2 2.56 times as often as player 2 wins against player 1. We have to calculate that back into a percentage, so we divide 2.56/3.56 and get the expected point yield for player 1 in this game as a 71.93%. I only say “score” here because in chess exactly 1 point is awarded per game. So 1 in total, just as probabilities for an event must add up to 1. And the points yield is made up of a certain proportion of draws and another from victories. How large these are in each case is an as yet unanswered question.
2) The reaction time
In the Elosystem, the playing strengths are simply adjusted according to a predefined pattern. That a good result brings an improvement in the number, a bad one a deterioration seems obvious. But the question arises here, as in football: how strongly (or how quickly) do you have to react to results in order to get the best possible forecast for the next game, the next match. This question is actually objective. Only if the question neither exists nor is considered interesting at all, it is logically simply ignored.
The participants, in this case chess players, simply resign themselves to the given reaction time. “How many points did I win in this tournament, how many did I lose in that one.” That is the only relevant question. And it is answered by the system according to its specifications. The question: How many points would I have had to win/lose objectively in order to get the best possible assessment of my chances in each individual game for the next tournament does not exist, is subordinate or irrelevant.
However, I maintain that the quality and suitability of the system for measuring playing strength is greatest when the forecasts it produces are as good as possible. The Elo system works, it is used, everyone has come to terms with the shortcomings, doesn’t know about them or considers them irrelevant.
With the Pauli system I have solutions for all these problems, of course (somehow I always vacillate between megalomania and insanity, modesty, which can also be wrong, and absolute cluelessness, if only I knew the noun of “submissive”, I don’t know the passive subjunctive perfect of “know”, if there is such a thing at all. But then it would have to be invented, so why not me). So good old Pauli has once again thought (a few) thoughts (too many).
Well, actually I only transferred old ideas. But still. So you have an expectation for a game. This is, as in the example above, 71.93% of the possible points for player 1. Then you have a result in the game. Be it a draw. Then you have a deviation from the forecast result. This would be 0.2193 points. Player 1, the favourite, has scored too few points. Player 2 has exceeded his expectation by this number of points. So you should now correct the playing strengths of both players in the right direction (and what can be wrong with a DIRECTION? Oh yes, the direction!). The best way to do this is to use a factor, which you then place in the denominator, thus making it a quotient. I’m just occasionally puzzling over whether I should always use a different font for the lousy puns in the text. Your view? So, for example, in football I calculate with a playing strength update factor of 30. That is the “best” value determined over years.
With chess, of course, you could similarly create an optimisation function that calculates the best possible “update factor” for chess. In analogy to football, one tries out all possible update factors with a set of known results, which must be chronologically ordered, and takes the one that has produced the smallest deviation from the forecast result over all results. This is because the forecasts will result in changed values for the following game on the basis of the changed playing strengths due to the different update factors then tried. So you will get different deviations per update factor.
However, even this would not be quite sufficient. Obviously, there are players who should react more quickly and players who should react more slowly. However, this is not a purely individual problem, but is determined by the number of games played. A 40-year-old, of whom I already have 1000 games in the database, is less set back by two defeats in a row than a 17-year-old with 5 games so far. So far, that makes sense.
So age and the number of games should (and must) be taken into account. This would already be a small challenge for the optimisation programme, as it would require a fair amount of artificial intelligence to adjust several parameters in order to optimise both at the same time. But it would still have to be done and one would definitely not do worse than with the Elo system used so far.
However, another question remains unanswered: should the parameter “playing strength update factor” also be individually designed or allowed? This raises two problems: First, it seems relatively obvious that there are different characters of players. There is the so-called “solid” player who shies away from risk anyway and also plays very consistently. That’s just the way it is. And there is the risk-taker, who is also often enough exposed to large fluctuations in playing strength. On the one hand, this can be due to the risks he takes, which then occasionally “backfire”, but also due to the basic character itself, which carries him along in a winning streak and makes him keep winning, but unfortunately also in a losing streak.
But one would have the problem of acceptance in particular. Imagine that two players of the same level beat the same player one after the other in a tournament. And one would gain more than the other. “Yes, that’s because you play too consistently. You need to build bigger fluctuations into your results.” A somewhat weak rationale. However, to reassure: After all, it would have to happen anyway due to the consideration of age and number of games. Any change in playing strength on a result would be individual.
Of course, I also have suggestions to make for the calculation of the draw probability. Just this much in advance: this kind of prophecy would be pure gimmickry. It is not decisive for the feasibility of the system. But I have to mention it here in order to do justice to the claim of the “suitability of Pauli as a prediction system”.
Draw frequencies obviously depend on both the character and the level of a person’s playing strength. The weaker, the fewer draws occur, generally speaking. The stronger the players, the more. But here, too, there are individual differences. If, one would have to carry this parameter individually as well.
Of course, all these parameters would have to be maintained and serviced. So a player who was previously risk-averse and suddenly becomes solid (due to age) would have to experience an increase in his individual draw factor. Likewise, a player who has played rather consistently up to now and suddenly allows greater fluctuations to occur would also be “rewarded” there individually with a greater factor for the reaction.
Likewise, the general parameters must be maintained and serviced. So the average draw value, for example, can continue to rise or fall again.
3) The errors in the system