Dr. Dad, PhD 🔄🔼◀️🔽▶️@GarrettPetersen
Some of you may have played my puzzle game, Connecdoku. You start with a 4x4 grid of phrases, and you need to rearrange them such that all the rows and all the columns fit into distinct (and hidden!) categories. It's like NYT Connections meets Sudoku, thus the name.
It is extremely difficult to make valid Connecdoku puzzles. I have a word-to-category mapping JSON with 3603 words mapping to 2618 categories. An example is the word "Titanic" which maps to categories like "Shipwrecks" and "Synonyms for Big" and "Movies directed by James Cameron". If you just sit down and try to think of 4 James Cameron movies and 4 shipwrecks, you'll quickly paint yourself into a corner. Maybe you think of Bismarck as a shipwreck, and then you add a German Politicians category, and now you need to find categories to connect Angela Merkel with Terminator 2: Judgment Day. Impossible.
A valid puzzle must have 2 sets of 4 categories (rows and columns, though the user could transpose them when solving), such that the intersection of each row and each column has at least one valid word that is NOT in any of the other 6 categories. Trying to brute force all category combinations would take billions of years, so I needed to be efficient.
To build an efficient puzzle-generating algorithm, I used a bit of graph theory. Imagine creating a graph with all categories as nodes and shared words as edges. For two categories to share a puzzle in the same dimension (both rows or both columns) there need to be at least 4 other categories that share words with both of them. Or, in terms of the graph, you need to be able to take at least 4 separate 2-step routes between the two categories for them to be in the same dimension of the same puzzle.
Finding and counting all these 2-step routes is quite quick: If you have a matrix with 1s for distinct categories that share a word and 0s for categories that don't, you can multiply that matrix by itself to get the number of 2-step routes between categories. My laptop does it in a few seconds, even with a big matrix.
Then, having found all valid pairs of categories, you can search for 4-cliques of categories that all have the potential to share one dimension of a puzzle. Having found all possible sets of 4 categories, we can brute force the rest: Check every set of 4 categories against every other set of 4 categories to see if they form a valid puzzle. This takes about 48 hours on my laptop and saves every valid category combination to a SQLite database for later use.
Not all valid puzzles are interesting puzzles. So I built a curation process where I searched through the database for high-quality puzzles that don't share categories with any recent puzzles. Then I chose all 16 words manually, trying to choose the best intersecting word in cases where there were multiple options. Finally, I removed myself from the loop entirely and had an AI agent go through the puzzle curation workflow.
And that's how new Connecdoku puzzles are made.