See the code on GitHub

I spent a lot of time this past Thanksgiving playing Settlers of Catan with my family. I lost a lot of games, so I wanted to see if I could use data to come up with a better game strategy.

Skip this if you’re familiar with the game

For those who aren’t familiar, Settlers of Catan is a board game that focuses on resource control. Players collect resources and use them to spread their settlements across the game board, with the end goal reaching a specific level of expansion/development (collect 10 victory points).

The game is governed by five resources:

  • Clay/Brick
  • Wood
  • Sheep/Wool
  • Wheat
  • Ore

Players can spend resources to build/acquire the following:

  • roads
  • settlements (worth 1 victory point)
  • cities (worth 2 victory points)
  • development cards (you might get victory points here)

Resources are determined by random dice rolls (the sum of 2 six-sided dice), and players are free to trade amongst one another.

Building a predictive model

So, this dataset shows a record of 50 different instances of a 4-player session for Settlers of Catan.

Using this dataset, I built a classification model to evaluate the effect of starting settlement selection on overall game performance. I looked at this as both a classification problem (predicting whether a given selection will result in a win) and a regression problem (predicting the number of victory points)

Key features extracted from data:

  • turn order
  • probability of collecting specific resources given starting settlements

Unfortunately, the results were not the best. On the classification side, using a Random Forest model, the test AUC is roughly 0.51, which is basically equivalent to guessing randomly.

On the regression side, using a Ridge regression model, results are no better, with a r2 score of 0.01.

On the bright side

There were some interesting observations outside of the predictive models. When looking at the probabilities of collecting specific resources from starting settlements, wheat had the highest average probability across all games. This makes sense, as wheat is required for 3 out of the 4 development options in the game (settlements, cities and development cards). These are also the only development options which yield victory points, making wheat the most important resource in the game.

On the flipside, ore and brick had the lowest average probabilities. In a typical game of Catan, ore and brick have 3 hex tiles, while every other resource gets 4 tiles. Also, ore is generally more valuable late-game for building cities. As such, early access to is not a large priority. It might be that many players prefer to expand to a new settlement for ore access, or to obtain it through ports and trading.

It was quite rare to see a starting settlement on a port. Ports allow players to trade 2 cards of a specific resource for 1 resource card of their choice. However, ports are only adjacent to 2 resource tiles, while many settlement locations are adjacent to 3. It is clear that players will not sacrifice early resource access for port control in most games. Ports are most useful once a player has a monopoly, or at least consistent access to, a particular resource. In my personal experience, ports are frequently seen as possible expansion opportunities, and some starting settlement are often very close to ports.

Also, out of the 5 resource ports, the sheep port was most frequently chosen as a starting settlement. In all but one of those games, the player also had reliable access to sheep (at least 1/6 probability). Sheep is arguably one of the most abundant resources in the game, so it sounds reasonable that a good strategy would revolve around controlling sheep access.


Settlers of Catan is a complex game that requires a lot of strategy and long-term planning. While settlement selection is important, it is far from being the deciding factor in most games.

Settlers of Catan is an interactive game with a lot of tradeoffs, so of course one player’s choices will affect the decisions and strategies made by other players. As such, this model does not account for interactions between player strategies. Another thing to note is that the resource tile placement is random and changes between games, so a strategy that works well in one game will not necessarily work well in another game. From one game to the next, a particular resource could go from abundant to nearly nonexistent or vice versa.

Some important questions to consider for future strategies:

  • Are two players in competition for any settlement points?
  • What expansion opportunities does a particular settlement yield?
  • How is the initial game board set up? Are there particular resources that are abundant/scarce?