
Rodrigo Ribeiro, Ph.D.
Professor
IMPA Tech
Sala 02
Writing the TBRW R package
Created: April 13, 2025 17:10
updated: April 14, 2025 05:17
The Tree Builder Random Walk (TBRW) is a stochastic process in which a walker moves on a dynamically growing tree, adding new vertices at its current position according to a given probability distribution over $\mathbb{N}$. This process is a good example of a random walk interacting with and influencing its environment in a non-independent manner, presenting numerous technical challenges. Beyond theoretical interest, the TBRW has practical applications, notably modeling chemical reactions where the walker acts as a catalyst in polymer formation, see [2]. For more details on the theoretical aspects and implications of this model, see [1].
In a model like this, simulation can play an important role. Although it does not constitute a proper mathematical proof of asymptotic behavior, it can give us a lot of insights about possible phenomena the model could undergo. For that reason, I decided to write an R package name tbrw to simulate the process for different types of distributions. When I started writing it, my initial approach was straightforward: I stored the entire evolving tree structure using the igraph package. It seemed natural and intuitive—after all, igraph conveniently handles graphs and offers extensive functionality out of the box.
In a model like this, simulation can play an important role. Although it does not constitute a proper mathematical proof of asymptotic behavior, it can give us a lot of insights about possible phenomena the model could undergo. For that reason, I decided to write an R package name tbrw to simulate the process for different types of distributions. When I started writing it, my initial approach was straightforward: I stored the entire evolving tree structure using the igraph package. It seemed natural and intuitive—after all, igraph conveniently handles graphs and offers extensive functionality out of the box.
However, reality quickly set in when I tested this method with larger simulations. Running just 10,000 steps of the random walk took hours, clearly indicating significant performance issues. It was evident that while igraph is powerful, it was overkill for this particular use-case. So, I realized a simpler approach might be more efficient. Instead of maintaining a full-featured graph structure, I switched to a basic adjacency list representation. Specifically, I used a straightforward list where each index represents a vertex, and each element in the list stores only the identifier of the vertex to which the new vertex is connected. For instance, suppose we start from an edge with vertices 0 and -1:
$$
[-1,0,2]
$$
means that the first vertex added was added to vertex -1, then the second one was added to vertex 0 and finally the third one was added to the vertex labeled 2. This is enough to reconstruct the whole graph. This simple yet effective change dramatically improved performance. By adding a new vertex as a single, constant-time operation (simply appending an element to the end of the list), I significantly reduced the computational overhead. This adjacency list structure efficiently captures the essential properties of the TBRW, allowing simulations that previously took hours to run in just a matter of seconds.
$$
[-1,0,2]
$$
means that the first vertex added was added to vertex -1, then the second one was added to vertex 0 and finally the third one was added to the vertex labeled 2. This is enough to reconstruct the whole graph. This simple yet effective change dramatically improved performance. By adding a new vertex as a single, constant-time operation (simply appending an element to the end of the list), I significantly reduced the computational overhead. This adjacency list structure efficiently captures the essential properties of the TBRW, allowing simulations that previously took hours to run in just a matter of seconds.
The lesson here was clear: when working with algorithms—especially for processes with a dynamically changing structure—complexity can be a silent performance killer. Choosing the right data structure, even if it's simpler or seemingly less sophisticated, can have enormous impacts on efficiency. This experience reinforced to me the importance of thoughtful algorithm design and continuous reassessment of chosen tools, particularly in computational mathematics and algorithm-intensive simulations.
References
References
- Giulio Iacobelli, Rodrigo Ribeiro, Glauco Valle, Leonel Zuaznábar. "Tree builder random walk: Recurrence, transience and ballisticity." Bernoulli. 28 (1) 2022.
- Dockhorn R, Sommer JU. Theory of chain walking catalysis: From disordered dendrimers to dendritic bottle-brushes. J Chem Phys. 2022 Jul 28;157(4):044902.