Tackling Advent of Code
By Ewan Cahen
Advent of Code 2024 has started this week! If you are not familiar with Advent of Code, it’s an annual coding challenge created by Eric Wastl. It’s like an advent calendar for coding challenges containing 25 daily programming puzzles, released once a day between December 1–25. You can still join this year’s edition, and even join our Dutch research community leaderboard when you sign up here.
Whether you’re new to Advent of Code or if you want to brush up on your programming skills, below you can find a list of useful tricks, data structures and algorithms that are often needed when solving the challenges. Some of these are accompanied with links on how to use them in a few popular programming languages.
Note, the list is rather large. We recommend only picking a few items where you think your knowledge is lacking.
Parsing the input
In almost every exercise, you are given input, presented as plain text, that you have to parse (i.e. transform it into some structure that is useful for solving the problem). There are a few techniques you can use to accomplish this:
- read the input line by line into a list (Python, R, Java)
- split a string, useful if data is delimited by e.g. whitespace or a comma (Python, R, Java)
- parsing a string to an integer (Python, R, Java)
- (Java only) use Java’s Scanner to easily parse various types of data
- (advanced) use regular expressions to parse the input (Python, R, Java)
Integer division and modular arithmetic
In Advent of Code, you’ll often have to use integer division (where e.g. 11 / 4 = 2
instead of 2.75
). For Python, have a look at the double slash operator (//
) and for R use %/%
(see R operators).
You’ll also have to use modular arithmetic, where you want to know the remainder after integer division (e.g. 11 % 4 = 3
, because 4 fits 2 times into 11 and then you have 3 remaining). You’ll use these when, for example, you have to “wrap around” an array, i.e., when you reach the end of an array, you have to return to the start of the array. For Python, use the modulo operator %
, for R use %%
and for Java use%
. Also look up how your language behaves when any of the numbers is negative.
Working with large integers
Sometimes, you need to handle large integers (especially when multiplying numbers). In some languages, where there are several integer types of several sizes, you need to prevent integer overflow. This is sometimes a problem when working with 32 bit integers. Usually, using 64 bit integers (keyword: long
) is sufficient for Advent of Code. In Python, this is not needed, as it supports arbitrary large integers. For R, have a look at this package for 64 bit integers. For Java, use the long type or, if that is not sufficient, use the BigInteger class.
Data structures
Using the right data structure is crucial to solving the problems. These are the most commonly used ones:
- array: An array is a fixed-size, ordered data structure, consisting of multiple entries of the same type in a row. You can save and retrieve data in an array by using an index (usually a number from
0
ton — 1
(inclusive) if the array has length n. While Python’s standard library doesn’t have built-in arrays, the NumPy library provides a powerful array implementation that’s widely considered a de-facto standard for numerical computing in Python. In R, one-dimensional arrays are referred to as vectors, and are indexed starting from1
(so you can index a number from1 to n
). Used when storing data without further requirements or when the order of the data is important. (R, Java) - list (vector): A data structure that stores multiple entries of (usually) the same type in a row, with dynamic size that grows automatically as needed. While lists offer flexibility, arrays generally provide better performance due to their fixed size and contiguous memory allocation. Arrays are particularly advantageous in Python when using NumPy, as they enable efficient vectorized operations that can significantly speed up numerical computations. Choose arrays when performance and vectorization are priorities, and lists when frequent size changes are required. (Python, R, Java)
- dictionary/(hash)map: A data structure that stores key-value pairs. If you want to store/retrieve data by something more complex than an index (as with arrays and lists), like a string, this is the data structure to use. While R does not technically have a dictionary data structure, you can often use a named vector as a quick-and-dirty replacement (Python, R, Java)
- (hash)set: An unordered data structure that cannot contain duplicates of an element. Useful when you often need to check if some element is present in a data structure (often used in graph traversal algorithms, see below). (Python, R has external libraries, such as r2r, Java)
- queue: A first in, first out (FIFO) data structure where elements are added to the end and removed from the front. While Python lists can be used as queues, this is inefficient due to their underlying array implementation — removing from the front requires shifting all remaining elements. For better performance, use Python’s
collections.deque
which is optimized for both front and back operations. Lists are better suited as stacks (last in, first out). Queues are commonly used in breadth-first search algorithms in graphs (see section on graphs below). (for Python and R, you can use the list as a queue, or you can use Python’s deque, Java) - stack: A last in, first out data structure, meaning you can add and/or remove elements to the front of the stack only. Used in depth-first search algorithms (see below). (see queue for Python and R, Java)
- (advanced) priority queue/heap: Similar to a queue, except that the elements in the queue have a priority, and the element with the highest priority will always be served first when retrieving/removing an element, independent of the order in which the elements where added. Used in Dijkstra’s algorithm (see below). (Python, look for external packages for R, Java)
Algorithms
Many problems can be solved with (a variation of) a well known algorithm. Below are listed some commonly needed algorithms for Advent of Code. This is by no means an exhaustive list. Furthermore, you are encouraged to do more research on these algorithms.
- sorting an array/list: You don’t have to implement your own sorting algorithm, but you have to know how to call the built-in sorting functionality of your language, sometimes using a custom sort function/comparator. (Python, R, Java arrays, Java lists)
- breadth-first search: An algorithm for finding a node in a graph with a certain property. Used for example when looking for the shortest path between two nodes in a graph, when all edge weights have the same value. This uses a queue.
- depth-first search: An algorithm for finding a node in a graph with a certain property. Used for example when the node(s) you’re looking for in a graph are far away from the starting point. This uses a stack.
- Dijkstra’s algorithm: An algorithm for finding a shortest path from a fixed starting point to every other node in a graph, when the edge costs have varying values.
- memoization: Not an algorithm, but rather a technique, in which you store (cache) intermediate results so that you don’t have to recompute these over and over again. These intermediate results are usually stored in a dictionary/(hash)map.
❄️ Closing words ❄️
I hope this overview is useful to you. I’m not that well-versed in the Python or R ecosystem, so if you know of better resources or techniques on any of the topics presented, please let me know.
Is your favourite technique/algorithm/programming language missing? Feel free to add it below!
Good luck this year!
Thanks to Raoul Schram and Bjørn Bartholdy for comments