Study and Implementation of a Chess Engine

International Hellenic University — School of Engineering Department of Information and Electronic Engineering Diploma Thesis (Code: 25331)
Student: Konstantinos Despoinidis (ID: 2021035) Supervisor: Professor Panagiotis Tzekis Completed: 2 June 2026

Dedicated to my parents.
Prologue
Section titled “Prologue”I chose this thesis topic because it allowed me to combine a personal
passion for chess with my interest in low-level systems programming.
Implementing a chess engine from scratch in C was a unique challenge
that required me to think critically about performance and hardware
efficiency.
Through this project, I gained hands-on experience with the fundamentals of Artificial Intelligence, specifically in search algorithms and heuristic design. I also learned the importance of code optimization; when dealing with millions of potential moves, every microsecond and byte of memory matters. While building a chess engine is a notoriously difficult task, the process has significantly improved my problem-solving skills and my confidence in handling complex, high-performance programming projects.
Abstract
Section titled “Abstract”This thesis presents Irida, a competitive chess engine written in C that adheres to the Universal Chess Interface (UCI) protocol. Its architecture separates board-state mechanics (Castro) from search logic, combining hybrid bitboard/grid representation, magic bitboards, Zobrist hashing, and full rule support (castling, en passant, and draw conditions). Correctness and efficiency are validated through perft verification and throughput microbenchmarks.
The search module is based on negamax with alpha—beta pruning and iterative deepening, extended with quiescence search, transposition tables, null move pruning, late move reductions (LMR), aspiration windows, and principal variation search (PVS). Evaluation supports both a handcrafted tapered function and NNUE integration. Experimental assessment uses SPRT, Elo estimation, and staged ablation under two controls (fixed movetime and fixed nominal depth). Results show that quiescence, transposition tables, and LMR provide the largest practical gains under short time budgets, while aspiration/PVS are time-control-sensitive and do not consistently improve strength. The final reference match against strength-limited Stockfish 18 yields an estimated rating around 2110 Elo. Overall, the thesis contributes a reproducible, evidence-driven workflow for developing and validating chess-engine search stacks.
Περίληψη
Section titled “Περίληψη”Η παρούσα διατριβή παρουσιάζει την Irida, μια ανταγωνιστική μηχανή σκακιού γραμμένη σε C η οποία είναι συμβατή με το πρωτόκολλο Universal Chess Interface (UCI). Η αρχιτεκτονική του συστήματος βασίζεται στο Castro, μια εξειδικευμένη βιβλιοθήκη που διασφαλίζει την ορθότητα της αναπαράστασης του σκακιού μέσω μιας υβριδικής προσέγγισης που συνδυάζει bitboard και grid-state. Η βάση αυτή ενσωματώνει magic bitboards για τις ολισθαίνουσες επιθέσεις, Zobrist hashing για αναγνώριση θέσεων και πλήρη υλοποίηση κανόνων (ροκέ, en passant και συνθήκες ισοπαλίας). Με τον διαχωρισμό της μηχανικής σκακιού από τη λογική αναζήτησης, επιτυγχάνεται υψηλή απόδοση παραγωγής κινήσεων, η οποία επαληθεύεται με perft και διαγνωστικά μικρο-benchmarks.
Ο πυρήνας αναζήτησης βασίζεται σε negamax με κλάδεμα άλφα-βήτα και επαναληπτική εμβάθυνση, εμπλουτισμένη με quiescence search, πίνακες μεταθέσεων, null move pruning, late move reductions (LMR), aspiration windows και principal variation search (PVS). Η αξιολόγηση υποστηρίζει τόσο χειροποίητη tapered συνάρτηση όσο και ενσωμάτωση NNUE. Η πειραματική αποτίμηση γίνεται με SPRT, εκτίμηση Elo και σταδιακές μελέτες αφαίρεσης σε δύο καθεστώτα ελέγχου (σταθερός χρόνος ανά κίνηση και σταθερό ονομαστικό βάθος). Τα αποτελέσματα δείχνουν ότι το quiescence, οι πίνακες μεταθέσεων και το LMR προσφέρουν τα μεγαλύτερα πρακτικά οφέλη σε μικρό χρόνο σκέψης, ενώ τα aspiration/PVS παρουσιάζουν εξάρτηση από το time control και δεν βελτιώνουν σταθερά τη δύναμη. Στον τελικό αγώνα αναφοράς έναντι περιορισμένου Stockfish 18, η Irida εκτιμάται περίπου στα 2110 Elo. Συνολικά, η διατριβή προτείνει μια αναπαραγώγιμη και τεκμηριωμένη μεθοδολογία για την ανάπτυξη και αξιολόγηση μεθόδων αναζήτησης σε μηχανές σκακιού.
Acknowledgments
Section titled “Acknowledgments”I would like to express my sincere gratitude to my supervisor, Professor Panagiotis Tzekis, for their support throughout the duration of this project. I am especially grateful for the academic autonomy they afforded me. Their trust gave me the freedom to shape this work personally and develop the confidence to conduct research on my own terms. I am also profoundly indebted to my family for their unwavering love and belief in my potential. To my parents, thank you for being my constant foundation and for the countless ways you supported me, both seen and unseen, during my education. Finally, to my friends, thank you for sticking by me even when I became a person who only talks about their thesis. I am grateful for the breaks, the laughs, and for simply being there when I needed to get my head out of my work.
Abbreviations
Section titled “Abbreviations”| Abbreviation | Meaning |
|---|---|
| PVS | Principal Variation Search |
| NMP | Null Move Pruning |
| LMR | Late Move Reductions |
| TT | Transposition Table |
| NNUE | Efficiently Updatable Neural Network |
| UCI | Universal Chess Interface |
| NPS | Nodes Per Second |
| LLR | Log Likelihood Ratio |
| SPRT | Sequential Probability Ratio Test |
| EBF | Effective Branching Factor |
| CCRL | Computer Chess Ranking Lists |
| TCEC | Top Chess Engine Championship |
| LC0 | Leela Chess Zero |
| CPU | Central Processing Unit |
| GPU | Graphics Processing Unit |
| PST | Piece-Square Table |
| MCTS | Monte-Carlo Tree Search |
| MVV-LVA | Most Valuable Victim - Least Valuable Attacker |
| SMP | Symmetric Multiprocessing |
| ep | en-passant |
| CI | Confidence Interval |
| TC | Time Control |
| i.i.d. | independent and identically distributed |
Introduction
Section titled “Introduction”Motivation
Section titled “Motivation”Chess has long been regarded as a benchmark problem in computer science due to its immense combinatorial complexity. Despite having simple and well-defined rules, the number of possible positions grows exponentially, making exhaustive search infeasible in practice. While the game has been partially solved in restricted cases, most notably with endgame tablebases covering up to seven pieces, the complete solution of chess remains computationally out of reach with current technology. This gap between theoretical solvability and practical limitations makes chess an ideal domain for exploring efficient algorithms and intelligent decision-making.
From a programming perspective, developing a chess engine presents a challenging and rewarding problem. Performance, memory management, and algorithmic efficiency, particularly when working in a low-level language such as C, must be taken seriously. Implementing core components such as move generation, evaluation functions, and search algorithms, force the developer to deep dive in fundamental concepts in systems programming and optimization.
Another motivating factor is the evolving landscape of computer chess, especially with the introduction of neural-network-based evaluation methods such as NNUE (Efficiently Updatable Neural Networks). These hybrid approaches combine traditional handcrafted evaluation techniques with machine-learned models, resulting in highly performant engines. However, they also introduce some uncertainty, often referred to as the “black box” problem, where the reasoning behind certain evaluations becomes less interpretable.
Finally, building a chess engine provides lots of educational value. It serves as a practical application in search theory, including algorithms such as minimax, iterative deepening and alpha-beta pruning. At the same time, it showcases the importance of low-level programming, since efficiency in memory management and execution is critical for a successful chess engine.
Objectives and Scope
Section titled “Objectives and Scope”The primary objective of this thesis is to design and implement a competitive chess engine in C that is fully compliant with the Universal Chess Interface (UCI) protocol.
The scope of the project includes the following:
-
Move Generation: Implementation of a complete and efficient move generator capable of handling all legal positions, including special moves such as castling, en passant, and promotions.
-
Search Algorithms and Optimizations: Development of a high-performance search framework based on Negamax with Alpha-Beta pruning, enhanced with common optimizations such as iterative deepening, move ordering, and pruning techniques.
-
Evaluation Function: Design of a handcrafted evaluation function to assess positional strength, along with integration of NNUE (Efficiently Updatable Neural Networks) to improve evaluation quality.
-
UCI Protocol Implementation: Full support for the UCI protocol to ensure compatibility with existing graphical user interfaces and benchmarking tools.
-
Ablation Studies and Analysis: Systematic evaluation of individual components and optimizations to measure their impact on engine strength and performance.
The following aspects are explicitly out of the scope of this thesis:
-
Custom NNUE Training: The project utilizes pre-trained NNUE networks and does not involve the training or generation of new neural models.
-
Evaluation Function Tuning: The evaluation function is based on hand-crafted heuristics and fixed parameter values, rather than systematically tuned or automatically optimized weights.
This scope ensures a focused exploration of the fundamental techniques behind modern chess engines, while avoiding areas that would significantly expand the project beyond a manageable size.
Structure of the Thesis
Section titled “Structure of the Thesis”This thesis is structured as follows. First, the necessary background on chess and computer chess programming is introduced to establish the required theoretical foundation. Next, the overall system design and architecture are presented, including the analysis of the UCI protocol. This is followed by a detailed examination of the implementation of Castro and Irida, covering key components such as board representation, move generation, search algorithms, and evaluation techniques. Subsequently, the testing methodology and experimental environment are described, providing the basis for performance assessment. The results are then presented and analyzed, including ablation studies and measured Elo improvements. Finally, the thesis ends with a discussion of the main findings, the contributions of this work, and possible directions for future improvements.
Background and Literature Review
Section titled “Background and Literature Review”Rules of Chess
Section titled “Rules of Chess”Chess is a two-player board game played between White and Black on an $8 \times 8$ grid. The horizontal coordinates are labeled files (a—h), while the vertical coordinates are labeled ranks (1—8). Each player begins with sixteen pieces: eight pawns, two knights, two bishops, two rooks, one queen, and one king. The standard starting position is shown below.
Starting position
Movement
Section titled “Movement”Each piece type in chess follows distinct movement rules.
Players alternate turns, with White making the first move. A move consists of transferring a piece from one square to another according to its movement rules. A capture occurs when a piece moves to a square occupied by an opponent’s piece, removing it from the board.
Pawns move forward one square at a time along the file on which they are placed. On their first move, they may advance two squares forward, provided that both squares in front of them are unoccupied. Pawns capture diagonally forward one square. When a pawn reaches the final rank (rank 8 for White or rank 1 for Black), it must be promoted to a queen, rook, bishop, or knight.
A special pawn capture called en passant (French for “in passing”) may occur when a pawn advances two squares from its starting position and lands adjacent to an opposing pawn. In this case, the opposing pawn may capture it as if it had moved only one square forward. This capture must be performed immediately on the following move or the opportunity is lost.
Knights
Section titled “Knights”Knights move in an L-shape: two squares in one direction (horizontal or vertical), followed by one square perpendicular to that direction. They are the only pieces that can jump over other pieces.
Bishops
Section titled “Bishops”Bishops move diagonally any number of squares, provided that no pieces block their path. Each bishop is confined to squares of a single color throughout the game, since diagonal movement does not allow them to change color complexes.
Rooks move horizontally or vertically across the board for any number of squares, subject only to blocking pieces. They play a central role in controlling open files and ranks, particularly in the endgame.
The queen
Section titled “The queen”The queen combines the movement capabilities of both the rook and the bishop. It may move any number of squares horizontally, vertically, or diagonally, making it the most powerful piece in terms of mobility.
The king
Section titled “The king”The king moves one square in any direction: horizontally, vertically, or diagonally. Although its mobility is limited, it is the most important piece, as the objective of the game is to checkmate the opponent’s king.
A king is said to be in check when it is under attack by one or more opposing pieces. In such a situation, the player must make a move that removes the threat, either by moving the king, capturing the attacking piece, or blocking the attack. A fundamental rule of chess is that no move may leave or place one’s own king in check; any such move is considered illegal.
Since a king attacks by controlling all adjacent squares, two kings may never occupy adjacent squares or directly attack each other.
0 1 King movement kingmovement
The king is also subject to special rules, one of the most important being castling. Castling is a move in which the king moves two squares towards a rook on the same rank, while the rook is simultaneously transferred to the square over which the king has crossed. Castling is the only move in chess that allows two pieces to move at the same time.
In order to castle, the following conditions must be satisfied:
-
The king and the chosen rook must not have previously moved.
-
There must be no pieces between the king and the rook.
-
The king must not be in check, and it must not pass through or land on a square that is under attack.
Results
Section titled “Results”A chess game can end in several distinct ways.
A win is typically achieved by checkmate, although in practical play it may also occur through resignation or time forfeiture. Checkmate is a position in which the king is in check and no legal move exists that can remove this threat. This may occur either by moving the king to a safe square, capturing the attacking piece, or interposing another piece to block the attack. If none of these options are available, the position is checkmate.
A draw, on the other hand, can be reached in multiple ways. The simplest form is stalemate, a position in which the side to move has no legal moves, yet its king is not in check.
Stalemate example
In addition, a draw may occur under the following conditions:
-
Threefold repetition: The same position occurs three times with the same player to move and identical legal possibilities
-
Fifty-move rule: Fifty consecutive moves are played by both sides without any pawn movement or capture.
-
Insufficient material: Neither side has sufficient material to force a checkmate, such as king versus king, king and bishop versus king, or king and knight versus king.
-
Agreement: The players mutually agree to a draw.
A deep understanding of the rules described above is essential for implementing correct move generation and handling all edge cases. For a complete and authoritative specification, refer to the official FIDE Laws of Chess.
Fundamental Principles of Computer Chess
Section titled “Fundamental Principles of Computer Chess”Computer chess models the game as a deterministic, perfect-information, zero-sum system. This means that both players have complete knowledge of the game state, and there is no element of chance. The zero-sum property implies that any advantage gained by one player corresponds to an equivalent disadvantage for the opponent. Formally, if a position $s$ is assigned a value $V(s)$, then from the opposing player’s perspective the value is $-V(s)$.
The decision-making process in computer chess is typically formulated using the minimax paradigm. Given a game tree rooted at position $s$, the optimal value of the position can be defined recursively as:
$$ V(s) = \begin{cases} | \max\limits_{a \in A(s)} V(s_a), | \text{if } s \text{ is a maximizing (White) node} \ | | \min\limits_{a \in A(s)} V(s_a), | \text{if } s \text{ is a minimizing (Black) node} | \end{cases}
$$
where $A(s)$ denotes the set of legal moves in position $s$, and $s_a$ is the resulting position after applying move $a$.
Negamax formulation
Section titled “Negamax formulation”The alternating $\max$ and $\min$ in The [eq:minimax-recursion] is often implemented as a single recursive routine negamax: one always evaluates positions from the side to move’s perspective and negates scores when crossing the edge to the opponent. If $s$ is a terminal or leaf-under-depth position with static score $f(s)$ from the side to move’s viewpoint, then $V(s)=f(s)$; otherwise $$ V(s) = \max_{a \in A(s)} \bigl(, -V(s_a) ,\bigr),
$$ because $V(s_a)$ is defined in the child’s frame and must be negated to express the parent’s utility. This is algebraically the same minimax recursion as above, but every node is a “max for the mover” node and the implementation does not branch on player colour.
Game tree fragment and negamax view
Section titled “Game tree fragment and negamax view”shows a shallow fragment of a game tree. Panel (a) highlights the alternating maximizer (White-to-move) and minimizer (Black-to-move) layers in the classical view. Panel (b) redraws the same fragment in the negamax view: each internal node is a maximization for the side to move, and returning a score to the parent negates it, exactly as in The [eq:negamax].
Classical minimax layers; dashed edge indicates a branch that alpha-beta can skip once a refutation is known.
Negamax view of the same fragment: every internal node is a $max$ for the side to move; edges to children negate returned scores.
Search windows and cutoff semantics
Section titled “Search windows and cutoff semantics”Depth-first alpha-beta maintains a window $(\alpha,\beta)$ of plausible scores (in negamax coordinates, both from the side to move’s perspective at the current node). While searching a child, any return value $v \le \alpha$ cannot improve the node’s value and yields an alpha cutoff (sometimes called a fail low at the parent: the child establishes only an upper bound on how good that line was). Symmetrically, if $v \ge \beta$, the line is strong enough to force a beta cutoff (fail high: the child proves a lower bound at least $\beta$, so remaining siblings need not be searched). When neither bound bites, the search returns an exact value inside the window.
These terms also describe the root outcome under a narrow window, as used by aspiration search: if the true minimax value lies below the left end of the window, the iteration fails low (the search returns an upper-bound-style outcome relative to the window); if it lies above the right end, it fails high. The engine then widens the window and re-searches until the value lies inside or the window is widened to the full real line.
In practice, the full game tree cannot be explored due to its enormous size. Instead, the search is truncated at a fixed depth $d$, and a heuristic evaluation function $f(s)$ is used to approximate the value of non-terminal positions:
$$V(s) \approx f(s) \quad \text{for leaf nodes at depth } d$$
The evaluation function typically encodes domain-specific knowledge, such as material balance, piece activity, king safety, and pawn structure.
The complexity of chess arises from its large branching factor and depth. The average branching factor $b$ is approximately 35, meaning that each position has, on average, 35 legal moves. A naive minimax search to depth $d$ therefore requires examining:
$$O(b^d)$$
nodes. For example, a depth-6 search would involve on the order of $35^6 \approx 1.8 \times 10^9$ positions, which is computationally expensive.
To address this, various optimizations are employed. The most fundamental is alpha-beta pruning, which reduces the number of nodes that must be evaluated by eliminating branches that cannot influence the final decision (). In the best case, alpha-beta pruning reduces the effective complexity to:
$$O(b^{d/2})$$
effectively doubling the achievable search depth for the same computational cost, although in practice performance depends heavily on move ordering quality.
Schematic beta cutoff: the third child is never visited because an earlier line already refutes the current window.
Historically, Shannon distinguished between two primary approaches to computer chess. Type A strategies rely on exhaustive search with minimal domain knowledge, exploring the game tree uniformly. In contrast, Type B strategies use selective search, focusing on promising moves while pruning less relevant branches based on heuristics. Modern chess engines combine these approaches, performing deep search while incorporating selective extensions, pruning techniques, and sophisticated evaluation functions.
Despite these optimizations, the state space of chess is extraordinarily large, often estimated to be on the order of $10^{43}$ possible positions, with a game-tree complexity around $10^{120}$ (order-of-magnitude figures popularised in early computer-chess discussions of search limits). As a result, computer chess fundamentally relies on approximations, heuristics, and efficient search strategies to make strong decisions within practical time constraints.
Classical Engine Architecture
Section titled “Classical Engine Architecture”A classical chess engine follows a clear sequence of steps to choose the best move in a position. In general, this process consists of move generation, move ordering, search, and evaluation.
The first step is to generate all legal moves. This step must be correct, as even a single missing legal move or an illegal one can lead to incorrect results. Move generation therefore has to fully respect the rules of chess, including checks, pins, castling, en passant, and promotions. It must also ensure that no move leaves the king in check.
Once all legal moves are generated, they are ordered. Some moves are more promising than others, so it makes sense to examine them first. Good move ordering improves search efficiency, especially when using alpha-beta pruning, where strong moves considered early can significantly reduce the number of positions that need to be explored. Common techniques include prioritizing captures, promotions, killer moves, and moves that have performed well in similar positions.
Next comes the search. The engine explores possible moves and their consequences by building a game tree, typically using a depth-limited minimax algorithm with alpha-beta pruning. In practice, this is often implemented using the single-routine negamax formulation of The [eq:negamax]; see for a comparison with the classical minimax view. The search is further improved with techniques such as iterative deepening, which increases depth gradually, and quiescence search, which extends the search in unstable positions to avoid missing important tactical sequences.
When the search reaches its limit, the engine evaluates the resulting positions using a static evaluation function. This function assigns a score based on factors such as material balance, piece activity, king safety, and pawn structure. The scores are then propagated back through the tree to determine the best move.
Modern engines include additional features to improve performance. Transposition tables store previously evaluated positions to avoid redundant work, while time management controls how long the engine searches based on the situation. Together, these components allow the engine to search efficiently and make strong decisions.
Modern Trends: Neural Networks and Alpha-Beta Refinements
Section titled “Modern Trends: Neural Networks and Alpha-Beta Refinements”Modern chess engines have increasingly adopted neural networks, most notably NNUE (Efficiently Updatable Neural Networks), as part of their evaluation pipeline. These networks either replace traditional hand-crafted evaluation functions entirely or augment them to improve accuracy, particularly in complex or tactically sensitive positions.
At the same time, the search process has been heavily optimized through advanced pruning and reduction techniques. These methods aggressively limit the number of positions explored, allowing engines to search deeper while maintaining efficiency. Although not all such techniques are implemented in Irida, a subset is used to reduce the search space effectively.
In addition, modern engines take advantage of parallelism by distributing the search across multiple threads. This significantly accelerates computation, enabling strong performance even on relatively modest hardware.
Finally, some engines adopt a fundamentally different approach based on reinforcement learning and Monte-Carlo Tree Search (MCTS). This paradigm is less common, because it demands substantially more computation than typical alpha—beta engines. In those systems, a neural network proposes promising moves (policy) and evaluates positions (value); MCTS then simulates many possible continuations and grows a search tree biased toward stronger moves. Rather than exhaustive width-first expansion like alpha—beta, MCTS performs selective, statistically guided exploration.
Existing Solutions
Section titled “Existing Solutions”Among contemporary chess engines, Stockfish has been the most dominant over the past decade. According to CCRL ratings, it has achieved an Elo exceeding 3600, placing it well beyond human competitive levels. For comparison, the current best player Magnus Carlsen has a classical rating of approximately 2840. While such comparisons are not directly equivalent due to differing rating pools, they provide a useful illustration of the performance gap between top engines and human players.
Stockfish relies on highly optimized alpha-beta search, capable of evaluating hundreds of millions of positions per second, combined with an NNUE for fast and accurate position evaluation. This hybrid approach preserves the strengths of classical search while incorporating neural network guidance.
In contrast, Leela Chess Zero (LC0) follows a fundamentally different paradigm. Inspired by reinforcement learning systems such as AlphaZero, LC0 replaces alpha-beta search with Monte-Carlo Tree Search (MCTS) and relies on a deep neural network trained through self-play. This approach typically leverages GPU acceleration to efficiently evaluate positions.
System Design
Section titled “System Design”Design Philosophy
Section titled “Design Philosophy”Irida is organized, with performance and modularity in mind. Board representation and move generation and other correctness-critical mechanics (rules, move generation, make/unmake, repetition and draw state) are handled by castro, while the engine core consentrates on search, ordering, evaluation and tablebases. This seperation of concerns ensures, the thinking modules can never corrupt the board state, while also allowing them to evolve without rewritting low-level move logic.
The codebased is also shaped for iterative optimization: features are
grouped into coherent compilation units (search, quiescence,
transposition table, move ordering, evaluation, UCI front-end) with a
smaill composition root (thin-main pattern). The main binary wires the
engine together through function pointers (SearchFn, EvalFn,
OrderFn), so alternate search modes (e.g., benchmarking or simplified
drivers) or evaluation backends (classical PeSTO, NNUE, material-only
debugging) can be selected without changing the negamax loop’s contract.
Optional subsystems (NNUE via an external probe library, Syzygy via
Fathom) integrate at the edges rather than
spreading conditionals through the tree.
Architectural Overview: Module Analysis
Section titled “Architectural Overview: Module Analysis”At the center of the architecture is an Engine aggregate
(include/core.h): metadata (name, author), the active chess Board
(provided by the castro library),
and three function pointers, search, evaluation, and move ordering, that
define the engine’s behavior at runtime. The UCI entry point initializes
this structure and dispatches protocol traffic; when a search is
requested, the driver invokes the configured search implementation
with the current board and the injected eval and order callbacks.
Move generation and board representation
Section titled “Move generation and board representation”live outside Irida’s core sources, in castro: legal and pseudo-legal
move generation, captures vs. quiet flags, null moves, hashing, and FEN
import/export. The search layer treats castro as the single source of
truth for position legality and state updates; Irida consumes a stable C
API (deps/include/castro.h) rather than embedding its own generator.
Search
Section titled “Search”(src/search/) implements the iterative-deepening driver, recursive
negamax with alpha-beta and optional PVS, quiescence, null-move and LMR
policies, aspiration windows at the root, and integration hooks for
Syzygy WDL probes.
The transposition table (src/search/tt.c, include/tt.h) is a
self-contained subsystem keyed by the board hash,
with generation-based invalidation between searches.
Move ordering
Section titled “Move ordering”(src/moveordering.c) is deliberately isolated: the search routine
never hard-codes ordering heuristics; it calls the OrderFn on
generated move lists and updates killer/history state on cutoffs. This
keeps ordering experiments localized.
Evaluation
Section titled “Evaluation”(src/eval/) provides classical scoring (PeSTO-style tapered eval plus supplementary terms and optional
material-only paths) and an optional NNUE adapter (src/eval/nnue.c)
that delegates inference to a linked probe. Evaluation is
invoked only through the EvalFn indirection, matching the same
boundary used by quiescence stand-pat.
Interface and services
Section titled “Interface and services”: the UCI module (src/uci/) parses commands, maps options (hash
size, tablebase paths, eval file, search flags) into a SearchConfig,
synchronizes with a search thread where applicable, and reports info
lines. Syzygy support (src/syzygy.c) wraps the
Fathom probe layer for root and in-tree WDL. Smaller applications under apps/
reuse the same engine wiring for benchmarks, validation, or ad-hoc
tools; tests under test/ exercise symmetry, search, and evaluation
in isolation.
In sum, the architecture is a thin orchestration layer (UCI +
Engine) around three replaceable pillars, search, ordering,
and evaluation, all operating on a shared board abstraction
owned by castro, with tablebase and NNUE components attached as
optional services rather than entangled with the core tree search.
summarizes the major dependencies at a glance.
High-level module layout: UCI drives the Irida core (search, ordering, TT, evaluation) on a castro board; NNUE and Syzygy attach at the edges rather than inside move generation.
The Universal Chess Interface (UCI) Protocol
Section titled “The Universal Chess Interface (UCI) Protocol”The Universal Chess Interface was first developed in 2000 by Rudolf Huber and Stefan Meyer-Kahlen. It specifies a text-based command channel between a chess engine and a separate graphical user interface or tournament manager: the GUI owns the board display and clock, while the engine receives positions and search constraints and returns principal variations, scores, and best moves. Irida follows this contract so it can plug into standard arenas (Cutechess, Banksia, Lichess bot bridges, and similar tools) without bespoke wiring.
Architectural model
Section titled “Architectural model”UCI adopts a client—server style: the GUI drives the session; the
engine remains passive until commanded. The engine boots quickly and
waits for commands; heavy initialization is deferred until after the
initial handshake, so the GUI may query engine identity and options and
then issue quit without forcing a full setup. The engine must accept
input from standard input even while searching, and must remain in
forced mode: it never begins calculation (including pondering) without
an explicit go.
Lexical and syntactic conventions
Section titled “Lexical and syntactic conventions”Commands are newline-terminated text lines; line endings may vary by
platform (0x0a, 0x0d0a, etc.), which matters when mixing OS
boundaries. Arbitrary whitespace may appear between tokens; unknown
commands or tokens are ignored where possible, and misplaced commands
(e.g. stop when not searching) are also ignored. Before any search,
the current position is always communicated with a position command.
Command Role
uci Enable UCI mode; triggers id, option, uciok
debug on|off Verbose debugging output
isready Synchronization; answer readyok
setoption name …[value …] Change engine parameters
register … Registration flow for commercial engines
ucinewgame Next search is a new game/context
position … Set board and move list
go … Start search with optional constraints
stop Stop search; still emit bestmove
ponderhit Opponent played expected ponder move
quit Terminate engine
: Selected UCI commands (GUI $\rightarrow$ engine)
Move representation
Section titled “Move representation”Moves use long algebraic notation: from-square and to-square (four
characters), with a promotion suffix where needed. Castling is encoded
as the king’s move (e.g. e1g1 for short castling in normal chess). The
engine signals a null move to the GUI as 0000.
Handshake and configuration
Section titled “Handshake and configuration”After startup, the GUI sends uci once. The engine replies with
id name … and id author …, advertises configurable parameters via
option lines, and finishes with uciok. If uciok does not arrive
within a GUI-defined timeout, the GUI may terminate the engine.
The isready command synchronizes the engine with the GUI: the engine
must answer readyok after processing pending input. It is required
before the first search so initialization can complete, and may be sent
during search, in that case the engine answers immediately without
stopping the search.
setoption name id[valuex] changes engine-internal
settings; option names and values are case-insensitive; the substrings
name and value should be avoided inside identifiers to keep parsing
unambiguous. setoption is only sent while the engine is waiting (not
mid-search).
Game lifecycle
Section titled “Game lifecycle”ucinewgame marks that the next position/go sequence starts a new
analytical or competitive context; engines must not rely on receiving
it, because older GUIs might omit it. GUIs should send isready after
ucinewgame if the engine’s response may be slow.
position [fen <fen> | startpos] moves <move_1> ... <move_i> sets the internal board from FEN or the standard start position,
then applies the move list. If the position belongs to a different game
than the previous one, a ucinewgame should have been sent in between.
Search control
Section titled “Search control”go begins analysis on the current position. Optional parameters in the
same line restrict or bound the search, including: searchmoves,
ponder, clock data (wtime, btime, winc, binc, movestogo),
depth/node/time limits (depth, nodes, mate, movetime), and
infinite. In ponder mode the last move in the position line is the
ponder move; the engine must not exit search on its own even if mate is
found; ponderhit switches from pondering to normal search when the
expected move occurs. stop requests termination as soon as possible;
every go must eventually be answered with bestmove.
Engine-to-GUI feedback
Section titled “Engine-to-GUI feedback”During search, the engine may emit info lines with structured tokens
(depth, seldepth, time, nodes, pv, multipv, score,
currmove, hashfull, nps, tablebase hits, CPU load, free-form
string, etc.). PV-related fields should be kept consistent; the
specification suggests throttling verbose diagnostics (currmove,
currmovenumber, currline, refutation) until after one second of
search to limit traffic.
When the search ends, the engine sends
bestmove m[ponderp], optionally with a ponder
suggestion; a final info line should precede bestmove so the GUI has
complete statistics. Optional commercial features appear as
copyprotection and registration exchanges; the GUI may need to send
register after certain engine messages.
Configurable options (option)
Section titled “Configurable options (option)”Each option describes a user-tunable parameter: name, type
(check, spin, combo, button, string), and optionally
default, min, max, and repeated var for combo boxes. Several
names have standardized semantics (Hash, NalimovPath, Ponder,
OwnBook, MultiPV, and many UCI_* options); GUIs may hide unknown
UCI_* options.
Chess960 extension
Section titled “Chess960 extension”The specification states that UCI can be extended to Chess960 via option name UCI_Chess960 type check default false.
In Chess960, castling is encoded as the king “capturing” its own rook
(e.g. e1h1); FEN castling rights use file letters of the castling
rooks because kq / KQ alone are insufficient when multiple rooks
exist on one side.
A simple UCI example can be found in Appendix [uci-usage].
Technical Stack: Languages and Tooling
Section titled “Technical Stack: Languages and Tooling”The project is implemented in C, adhering to modern standards while
maintaining high portability. The development environment utilizes a
cross-platform compilation strategy: gcc is employed
for Linux environments, while clang is used for
macOS.
Unlike many modern C projects, the build process relies on custom Makefiles rather than CMake to provide more granular control over compilation flags and dependencies. Version control is managed via Git, with releases and changelogs automated through changelogger to ensure consistency.
For quality assurance and documentation, the project integrates:
-
Documentation: Generated using
tinydocsfor a lightweight, source-integrated manual. -
Testing: Data-driven unit testing is handled by
test.h, a minimalist header-only framework.
Finally, custom tooling is written in both bash and python.
Implementation: The Engine Core
Section titled “Implementation: The Engine Core”Move Generation
Section titled “Move Generation”Board Representation: Bitboards and Magic Bitboards
Section titled “Board Representation: Bitboards and Magic Bitboards”The board uses a hybrid representation: twelve uint64 bitboards (one
per piece kind and colour), together with an $8\times 8$ character grid
for PieceAt-style queries, and cached aggregates white, black, and
empty recomputed whenever occupancy changes. Square-centric loops use
poplsb / lsb to iterate set bits, which maps cleanly to 64-bit word
operations on typical architectures. The hybrid design trades a small
amount of redundant state (bitboards plus a square grid plus cached
colour/occupancy aggregates) for simpler hot paths: PieceAt and other
square-indexed logic avoid reconstructing piece type from twelve
bitboards on every access, while move generation still benefits from
bit-parallel attacks and poplsb walks. The aggregates keep common
occupancy tests $O(1)$ without repeatedly OR-ing all piece bitboards.
This is a standard memory-for-clarity pattern in bitboard engines; a
pure bitboard design would save bytes but often increase branches in
evaluation and make/unmake.
Sliding pieces (bishops, rooks; queens combine both) use magic
bitboards.
At initialisation, for each square a mask of relevant blockers is built;
a random 64-bit magic multiplier is searched so that all occupancy
subsets of that mask map injectively into a small precomputed attack
table (rook tables are larger than bishop tables, reflecting different
blocker counts). At runtime, attacks are
| $$\texttt{table}[s]\bigl[((\texttt{occ} \mathbin{\ | } \texttt{mask}[s]) \cdot |
\texttt{magic}[s]) \gg \texttt{shift}\bigr],$$ giving $O(1)$ lookup
given occupied squares occ. If no satisfactory magic is found for a
square, the implementation falls back to classical on-the-fly ray
casting (AttacksFromOccupancy). Non-sliding pieces use precomputed
jump or step masks (knight, king) and rank/file/diagonal masks for
pawns. Magic bitboards pay a one-time initialisation cost (magic search
and table fill) for attack tables that are read-only at runtime; rook
tables are larger than bishop tables because more blocker combinations
arise on ranks and files. The asymptotic attack query remains $O(1)$ per
square after setup. The ray-casting fallback for a stubborn square adds
a second implementation path, so that path should remain cold; it
preserves correctness if a magic is not found without blocking engine
startup. Other sliding-attack schemes (e.g. masked occupancies with
different decompositions) offer different size/speed tradeoffs; magic
tables are a common choice when a few hundred kilobytes of precomputed
data is acceptable.
Game Logic and Draw Rules
Section titled “Game Logic and Draw Rules”Moves are applied with MakeMove and reversed with UnmakeMove. Before
mutating the board, an Undo record captures castling rights and en
passant square before the move, the halfmove clock before the move,
the encoded move, and any captured piece type. Castling, en passant, and
promotion are handled as distinct paths: the king-rook rearrangement
updates both bitboards and grid; en passant removes the captured pawn on
its true square and resets the halfmove clock; promotion replaces the
pawn bitboard with the chosen promoted piece and updates hash
contributions accordingly. After the piece update, castling rights and
the en passant target are recomputed from the move (including rook or
king moves that forfeit rights, and captures on corner squares). The
side to move and fullmove number follow FEN-style conventions.
The fifty-move rule is implemented as a halfmove clock (halfmove):
it is reset to zero when a pawn moves or when any capture occurs (the
latter detected by comparing total piece counts before and after the
move); otherwise it is incremented once per halfmove. A draw is declared
when this counter reaches $100$ halfmoves (i.e. fifty full moves without
a pawn move or capture).
Separately, a repetition structure keyed by the position hash maintains occurrence counts along the current line; unmaking decrements the entry for the child position. The draw-by-repetition predicate follows the usual threefold repetition idea: a position that has already appeared twice on the line from the root to the current node (i.e. the third occurrence) is treated as a draw, under the same encoding of full game state (side to move, castling, en passant, and piece placement) that the Zobrist key represents. As with any 64-bit hash, accidental collisions are exceedingly rare but theoretically possible; the implementation assumes identical keys imply identical rule-relevant state. Insufficient material is detected by a dedicated bitmap-based material test. Together, these checks feed the general result / draw logic used after move generation.
Zobrist Hashing
Section titled “Zobrist Hashing”Positions are identified by a 64-bit Zobrist key. The scheme follows the Polyglot layout
of fixed pseudorandom words Random64[781]: one
word per piece type and square for the twelve piece classes, four words
for the individual castling rights, eight for en passant files, and
one word toggled when Black is to move. The en passant contribution is
not keyed solely from FEN: a file’s random word is XORed in only when
a pawn of the side to move could capture en passant on that file
(mirroring Polyglot’s rule).
The full hash can be recomputed from scratch by XORing every occupied
piece key and the meta-keys. For performance, MakeMove and
UnmakeMove update board->hash incrementally: piece movements XOR out
the old square and XOR in the new (and captured) pieces; castling and en
passant components are removed with the pre-move meta, then re-XORed
after rights and the ep target are updated; the side-to-move word is
flipped every move. This incremental update is kept consistent with the
full recomputation (verified in tests). The same key drives opening-book
lookup in Polyglot binary files. Incremental Zobrist updates are an
invariant: after every MakeMove/UnmakeMove, the stored hash must
equal the xor of the same key material that a from-scratch xor would
use. The test suite’s full recomputation check catches any drift in
castling, en passant, or captured pieces. The TT and Polyglot book
therefore see the same position identifier the evaluation and repetition
logic see, which is prerequisite for correct transposition-table use and
for opening preparation that matches the engine’s own rule encoding.
Pseudo-legal Generation, Legality, and the Public API
Section titled “Pseudo-legal Generation, Legality, and the Public API”GenerateMoves dispatches to pseudo-legal move generation (all moves
that ignore check), legal generation, or legal captures only.
Pseudo-legal targets are built per piece from bitboards: pawns (pushes,
captures, double push, promotions), knights, bishops, rooks, queens, and
kings, with sliding attacks obtained via magic bitboards masked by empty
and enemy squares.
Legal moves reuse a precomputed LegalityContext: enemy sliders and
pieces on the same rank, file, or diagonal as the king yield either a
directed check (with a check_mask of squares that block or capture the
checker) or a pin (per-source pin_masks intersected when multiple
pinners apply). Knight and pawn checks are merged into the same mask
semantics. A fast filter rejects destinations that violate check evasion
or pin geometry; pinned knights cannot move. King moves and some pawn
cases (en passant, promotions) still use MakeMove plus an in-check
test on a board copy, because discovered attacks and ep special-cases
are not fully captured by the bit mask alone. The legality
precomputation turns most moves into cheap mask intersection and
filtering, which is $O(1)$ per generated move aside from building the
context once per node. The remaining cases (king moves, some pawn moves
including en passant and certain promotions) fall back to a copy,
MakeMove, and in-check detection because discovered check or ep is
awkward to fold into a single set of static masks. That hybrid gives a
predictable fast path in tactical positions (many check evasions
filtered by masks) while still guaranteeing correctness. Worst-case cost
rises where many moves need the slow path, but such nodes are a minority
relative to nodes that fail fast in the bit filter.
Search Algorithm
Section titled “Search Algorithm”Negamax, Alpha-Beta, and Iterative Deepening
Section titled “Negamax, Alpha-Beta, and Iterative Deepening”The recursive search core is a negamax formulation of minimax. At each node the side to move chooses a legal move that maximizes the backed-up score, where every child search returns a value in the child’s frame; the parent negates that return so both White and Black plies use the same comparison direction and one routine suffices (The [eq:negamax] and in ). Irida keeps scores in centipawns from White’s perspective at the root, but the negamax layer only cares about the sign flip across plies.
Alpha-beta pruning bounds the search with parameters $\alpha$ and $\beta$ in negamax coordinates at the current node. Whenever a child proves a value $\le \alpha$, remaining siblings cannot raise the node’s value (alpha cutoff); whenever a child reaches $\ge \beta$, the parent can stop searching further siblings (beta cutoff). Pruning is correct because those branches cannot change the minimax outcome once $\alpha \geq \beta$.
The root driver implements iterative deepening: the engine repeats a full-width search for increasing nominal depths, reusing transposition-table information from earlier iterations to improve move ordering and time management. For numerical stability with the underlying board representation, the root position is re-initialized from a FEN snapshot at the start of each depth iteration before searching again. In theory, with perfect move ordering, alpha—beta reduces the effective branching factor from $b$ toward $\sqrt{b}$ in many models, so doubling nominal depth is far cheaper than the naive $b^d$ count. In practice ordering is imperfect, so the gain is less than the ideal but still large enough that late improvements (TT, killers, history) have outsized impact. Iterative deepening recycles the TT and ordering hints from depth $d-1$ when starting depth $d$, so later iterations are not “from scratch” despite the re-search. Re-initialising the board from a FEN snapshot at each new nominal depth is a safety trade: it costs a few synchronous state writes but rules out hard-to-debug drift between iterations in long test runs and tournament games.
Quiescence Search
Section titled “Quiescence Search”When the main search reaches depth zero, evaluating the position statically would often stop in the middle of volatile tactical sequences (the horizon effect). Irida therefore extends the search with quiescence search: a secondary recursion that applies stand-pat logic using the same static evaluator as the main tree, then considers capture-only continuations. Captures are generated explicitly, ordered with the same ordering pipeline as the main search (without a transposition-table move hint), and searched with negated bounds in the usual alpha-beta style. Stand-pat returns immediately if the static score is at least $\beta$, a local fail high that proves the quiet position already refutes the current window without expanding captures; otherwise $\alpha$ may be raised before captures are tried. A maximum ply cap guards pathological capture chains; at that limit the routine returns a full static evaluation. Quiescence nodes are accounted for separately from main-tree nodes for diagnostics and time polling. The implementation deliberately extends only captures in quiescence, not all checks or all noisy quiet moves, which is a node-count trade: adding non-captures (even checks) would better resolve some tactical sequences but can explode the quiescence tree. Stand-pat with a static evaluation is the standard compromise: it allows a cutoff when a quiet position is already provably good enough ($\ge\beta$) before paying for more capture plies. The hard ply cap is a backstop against infinite or near-infinite capture—recapture lines; at the cap, returning a full static eval accepts a small risk of horizon error in pathological cases but keeps time bounds manageable.
Transposition Tables
Section titled “Transposition Tables”Irida uses a transposition table (TT) keyed by the position hash. Each entry stores depth, score bound type (exact, lower, or upper), mate-distance, adjusted scores, the best move from the node when known, and a generation tag so entries from previous searches are not reused incorrectly. Probes may return a cutoff score only when the stored depth is at least the current remaining depth, but the stored best move is always exposed for ordering even when the score is too shallow to cut off. Replacement favors empty slots, updates to the same key, deeper results, or stale generations. A TT hit is a dual artifact: the score is only trusted to cut off the search when the stored search depth is at least the depth still needed at this node, otherwise a shallow value might mis-rank a quiet line (mate-distance and exact-value handling mitigate gross errors). The best move field is useful even from shallow entries, because it seeds search ordering: playing that move first improves alpha—beta’s pruning rate even when the score is not deep enough to prune. The generation tag implements aging: entries from a previous search pass are not confused with the current pass, so replacement and probe semantics stay aligned with iterative-depth cycles. The replacement policy (prefer empty, same key, deeper, or stale) is a practical compromise between retaining deep results and letting the table reflect new subtrees; alternative schemes (always-replace depth, two-tier buckets) are mainly tuning choices.
Move Ordering
Section titled “Move Ordering”Move ordering combines several layers. Quiet and tactical moves receive a composite integer score and are sorted in descending order. The TT move is ranked highest when present. Captures use an MVV-LVA (most valuable victim / least valuable attacker) table with dedicated treatment for en-passant and promotion bonuses. Among non-captures, quiet queen promotions, then killer moves (two slots per ply: recent quiet cutoffs at the same depth), then a history heuristic indexed by side, from-square, and to-square. On beta cutoffs caused by quiet moves, Irida stores killers and increases history for the cutoff move while decreasing history for other quiet moves tried earlier at the same node, scaled by the square of the remaining depth and clamped to fixed limits. The order of heuristics mirrors how often they fire: the transposition best move, when present, is a near-optimal first try because it encodes the conclusion of a previous search of the same or parent position. MVV-LVA is cheap to compute and good at ranking obvious tactical tries among captures. Killers and history are complementary: killers are depth-local, whereas history summarises long-run success of (side, from, to) over many nodes. On quiet beta cutoffs, boosting the cutoff and damping other tried quiets (scaled by depth squared) concentrates credit on moves that actually caused a cutoff at this depth band.
Null Move Pruning
Section titled “Null Move Pruning”Null-move pruning passes the turn with a “null” ply when it is safe: not at the root, sufficient remaining depth, not in check, the side retains non-pawn material, and the position is not too sparse in pieces. A fast null-window child search uses a fixed reduction beyond the usual decrement. If the preliminary result does not justify a cutoff (including mate-score guards), no pruning occurs; otherwise a verification search at reduced but less aggressive depth can confirm the cutoff before returning a beta-bound score. Null-move pruning is unsound in zugzwang-heavy endgames where “passing” would be a legal win for the opponent; the non-pawn-material and piece-sparsity guards reduce the risk of pruning in positions where the side to move might need a null move to lose. The reduced-depth null search is a fast signal: if even that lazy line refutes the current $\beta$ window, the real line is likely at least as strong. Mate-score checks avoid turning a spurious null score into a false mate claim. Optional verification (reduced, less aggressive than the null probe) is a second line of defence: it trades a few extra nodes to avoid a rare spurious beta cutoff that would be wrong in exceptional fortresses. Together, preconditions, reduction, and verification are the standard engineering response to the fact that NMP is a heuristic, not a theorem.
Late Move Reductions
Section titled “Late Move Reductions”Late move reductions shrink the search depth for late-ordered moves that are unlikely to be best: beyond a minimum depth and move index, on non-PV nodes, when the parent is not in check, the move is neither a capture nor a check-giver, and the move is not the TT move, Irida subtracts a reduction that grows with depth and move index, clamping so the child still receives at least one ply. If the reduced search returns a score above $\alpha$, a full-depth re-search restores correctness. LMR formalises the observation that after good move ordering, late moves are rarely best. The reduction is therefore applied only when the move is neither obviously tactical (capture, check) nor the TT’s favourite, and the parent is not in check (where every legal move can defend a narrow eval margin). A reduced search that still exceeds $\alpha$ is a fail high in spirit: the quiet move is better than a lazy search suggested, so a full-width or full-window follow-up re-establishes correctness. LMR is intentionally coupled to PVS: many late moves are first touched as zero-window scouts at reduced depth, so the cost of the reduction is paid only when the move orders badly or surprises the scout.
Aspiration Windows
Section titled “Aspiration Windows”From moderate depths onward, Irida narrows the root window around the previous iteration’s score using an aspiration offset proportional to that score’s magnitude. Let $v_{d-1}$ be the fully searched value at the previous nominal depth. The next iteration starts with a window $[\alpha,\beta]$ straddling $v_{d-1}$. If the true root value lies below $\alpha$, the search stops with an outcome that is only an upper bound relative to the attempted window, a fail low. If it lies above $\beta$, the returned score is only a lower bound, a fail high. Either case means the narrow window missed the minimax value; Irida then widens the bounds multiplicatively and re-runs the same depth until the value lands inside the window or the guard triggers a fallback to the full-line window $(-\infty, +\infty)$ for robustness. Aspiration windows are most valuable when the root value changes only slowly between depths; they shrink the search space by causing quick cutoffs on fail-high or fail-low transpositions. When the tactical character of the position shifts (a sacrifice appears, a fortress cracks), the previous-iteration value can be a poor centre for the next window, so the search may fail low or high repeatedly until the window widens enough. Multiplicative widening is a simple policy that recovers from that instability; the fall-back to an infinite window guarantees termination of the same depth with a numerically complete minimax value under time control. Tuning the offset as a function of score magnitude addresses the observation that centipawn scale is not uniform across game phases: large material swings tolerate wider bands than dead-equal endings.
Principal Variation Search
Section titled “Principal Variation Search”Principal Variation Search (PVS) searches the
first successfully played legal move (the PV candidate after ordering)
with a full $(\alpha, \beta)$ window. Subsequent moves are probed with a
zero window $(\alpha, \alpha{+}1)$ in negamax coordinates; only when a
scout proves unexpectedly strong does Irida re-search with the full
window. The implementation couples PVS with LMR: a reduced-depth scout
that becomes interesting triggers verification at full depth, and
zero-window re-search is guarded so that obvious fail-highs do not
explode the node count.
sketches the high-level control
flow for these three mechanisms and how they interact at a node. PVS is
the outer ordering of move trials: the first move after
TT+killers+sorting is treated as the putative principal variation; it
receives a full window so the true minimax value of that subtree is
known. Siblings are probed with a null window; only a result strictly
better than $\alpha$ (a surprise) forces a re-search with the parent’s
real window. LMR typically applies to those late siblings, so the
“scout” is often a reduced-depth null-window pass first; a high score
there triggers a full-depth confirmation, not an immediate wide window
on every move. That stacking is what keeps node count near the practical
optimum of alpha—beta while retaining soundness. The same structure
applies in the quiescence search for capture ordering, though without a
TT move hint the first capture is a weaker stand-in for a true PV.
Pass turn when preconditions hold; a threatening null score may trigger verification before a cutoff.
Quiet, late-ordered moves use a reduced depth; a strong reply forces a full-depth confirmation.
After the PV candidate, scouts use a narrow window; failures require a full-window re-search.
The three panels are not a strict left-to-right pipeline: null-move applies at most once per node before child generation (when its preconditions hold), whereas LMR and PVS apply per move child. A single child can be both LMR-reduced and probed with a zero window under PVS; a score that fails high for that stacked probe is what triggers full-depth or full-window re-search. NMP is therefore a node-level early exit, while LMR and PVS shape the cost of exploring the move list.
Hybrid Evaluation
Section titled “Hybrid Evaluation”Hand-Crafted Evaluation (HCE)
Section titled “Hand-Crafted Evaluation (HCE)”The primary hand-crafted evaluator follows the PeSTO paradigm: separate midgame and endgame piece values and piece-square tables, accumulated per square for both colors. A discrete game phase in $[0,24]$ (from material composition) tapers the midgame and endgame contributions into a single blended material-PST score. On top of this kernel, Irida adds structured heuristics, including pawn structure, mobility, king safety, piece activity, spatial pressure, tactical threats, specialized endgame handling, and rook activity, and combines them into a total score expressed in centipawns from the side to move’s perspective. A separate material-only function exists for debugging and tooling, and the PeSTO tables can be retuned externally (e.g., Texel-style optimization) by reloading tuned midgame/endgame piece weights. The discrete phase in $[0,24]$ implements a tapered eval: material and piece—square tables contribute both a midgame and an endgame component, and the phase weighting (roughly, “more endgame phase as material comes off”) blends them so that the same position does not use opening king safety heuristics in a pure pawn endgame. The additional terms (mobility, structure, threats, king safety, specialised endgames) are additive on top of that kernel; their relative scale is a design choice that Texel-style tuning can adjust by optimising against game databases. The important analysis point is separation of concerns: PeSTO provides a smooth material surface, while the hand-tuned terms inject strategic and tactical shape that pure PSTs under-express.
NNUE Integration
Section titled “NNUE Integration”Irida can replace or augment classical evaluation with a NNUE network
loaded at runtime from a binary .nnue file (UCI EvalFile).
Integration is implemented as a thin adapter around an external probe
library: the board is exported to FEN, the network returns a score in
centipawns for the side to move, and the engine negates appropriately
when Black is to move. If no network is loaded, the FEN fails structural
validation, or probing fails, evaluation transparently falls back to the
PeSTO-based hand-crafted function, yielding a practical hybrid
pipeline without duplicating search logic.
The deployed NNUE path evaluates the full position on each call
through the probe rather than maintaining a custom incremental
accumulator inside Irida; efficient NNUE inference and any low-level
incremental updates are delegated to the linked probe implementation.
Training and conversion tooling in the repository can produce compatible
networks and optional diagnostics (e.g., evaluation breakdown logging
remains centered on the classical evaluator). Exporting a full FEN and
invoking the external probe on every evaluation call is simple and keeps
Irida’s core free of accumulator state, but it implies per-node string
formatting and a full forward pass through the network (handled inside
the library) instead of incremental feature updates as pieces move, as
in high-performance Stockfish-style integrators. The trade is
implementation complexity versus raw nodes per second: the present path
favours a maintainable adapter and correct side-to-move semantics at the
cost of higher per-eval overhead. The binary .nnue layout is that of
the loaded probe (typically UCI-compatible EvalFile networks in the
nnue training ecosystem); the engine does not re-implement the format,
it passes bytes to the library, so any format discussion belongs with
the probe or training tooling documentation. When loading fails,
transparent fallback to HCE keeps search and time control well-defined
for development and for users who omit a network file.
Experimental Methodology
Section titled “Experimental Methodology”Test Environment
Section titled “Test Environment”To ensure the reproducibility of the results, all experiments were conducted in a controlled environment. The hardware and software specifications are as follows:
-
CPU: AMD Ryzen 5 4000
-
RAM: 24GB DDR4
-
OS: Void Linux
-
Compiler: GCC version 14.2.1 with -O3 and -march=native flags
-
Engine Settings: Hash size was fixed at 64MB, and all tests were conducted on a single thread to eliminate SMP (Symmetric Multiprocessing) variance.
Performance Metrics
Section titled “Performance Metrics”Reported nodes per second (NPS) and search depth (iterative deepening under a wall-clock or depth limit) summarise engine throughput and foresight; interprets them alongside the measurements in this thesis. Fine-grained empirical tree growth on the starting position, including the geometric effective branching factor $\text{EBF}_{\text{geom}}$, incremental work between deepening steps, and the single-number proxy $\bar{b}=N^{1/D}$, is defined operationally in ; those diagnostics quantify pruning efficiency for the staged binaries there and should not be confused with informal uses of “branching factor” elsewhere.
Competitive Evaluation
Section titled “Competitive Evaluation”The Elo rating system, developed by Arpad Elo, was introduced to replace the Harkness system used in zero-sum games. Although originally designed to rate human chess players, it is now widely used to evaluate chess engines as well. Elo is a relative rating system: the difference in rating between two players determines the expected outcome of a match. When a player wins, their rating increases, while the opponent’s rating decreases. In the case of a draw, rating changes are small, especially when the players have similar ratings.
The expected score of a player is given by:
$$E = \frac{1}{1 + 10^{-D/400}}$$
where $D$ is the rating difference between the two players. For example, a player rated 100 Elo points higher than their opponent is expected to win approximately 64% of the time.
Statistical significance was evaluated using the Sequential Probability Ratio Test (SPRT), a method introduced by Abraham Wald for sequential hypothesis testing. Unlike fixed-sample tests, SPRT evaluates data as it is collected and allows early termination once sufficient evidence has been accumulated.
The test compares two competing hypotheses: the null hypothesis $H_0$, typically representing no performance difference, and the alternative hypothesis $H_1$, representing a meaningful improvement. After each observation, a Log-Likelihood Ratio (LLR) is computed, measuring how much more likely the observed results are under $H_1$ than under $H_0$.
The LLR is compared against two decision thresholds. If it exceeds the upper bound, $H_1$ is accepted; if it falls below the lower bound, $H_0$ is accepted. Otherwise, testing continues. These thresholds are determined by predefined Type I and Type II error probabilities, controlling the risk of incorrect conclusions.
In the context of chess engine evaluation, game outcomes are typically modeled using probabilistic frameworks such as the Elo system, where the expected score depends on the rating difference between players. Because match results are inherently noisy, SPRT provides an efficient and statistically grounded way to determine whether observed performance differences are significant.
More generally, parameter optimisation in chess engines can be viewed as a noisy black-box optimisation problem, where improvements must be validated under uncertainty.
Head-to-Head Match Protocol
Section titled “Head-to-Head Match Protocol”Strength comparisons between staged Irida builds use paired games with balanced openings unless a later chapter specifies otherwise. Resource limits follow : $64$ MB hash and one thread. Each ablation pairing in uses the game counts stated in each ablation table (movetime control: $5{,}000$ games per pairing, except $2{,}180$ games for the aspiration-vs.-PVS pairing; fixed-depth control: $100$, $1{,}000$, or $2{,}030$ games depending on the stage). Those figures are the realised totals from each run: significance is assessed online with SPRT (), so a pairing can stop early as soon as the log-likelihood ratio crosses an acceptance or rejection boundary, without playing out a larger nominal schedule (e.g. $10{,}000$ games) to the end whenever the evidence is already conclusive. Two scheduling regimes are observed in parallel: a fixed per-move time of $0.1$ s, and a fixed nominal iterative-deepening depth of $4$ main-line plies (both sides stopped at the same outer depth). Outcomes are mapped to logistic $\Delta$Elo with confidence intervals; sequential testing uses SPRT and reported LLR values (). Other competitive matches, notably the reference games against an external engine in , use the opponent, sample size, and controls stated there instead of this template.
Results and Analysis
Section titled “Results and Analysis”Move Generation Benchmarking
Section titled “Move Generation Benchmarking”Move-generation correctness was checked with perft counts (legal move enumeration to fixed depth). The pseudo-legal version of the Perft test is shown in the appendix.
As demonstrated by the benchmarks in Table [tab:movegen_benchmarks], the move generator delivers competitive performance, exceeding two million nodes per second (MN/s) across all tested positions. Notably, in the notoriously complex “Kiwipete” position, the engine achieves a throughput of 6.15 MN/s for depth 3.
| Test Position | Nodes/Cycles | Time (s) | MN/s (MCy/s) |
|---|---|---|---|
| Start Pos (d4) | 197281 | 0.085 | 2.33 |
| Start Pos (d5) | 4865609 | 1.814 | 2.68 |
| Kiwipete (d3) | 97862 | 0.016 | 6.15 |
| Kiwipete (d4) | 4085603 | 0.747 | 5.47 |
| Endgame (d5) | 674624 | 0.221 | 3.06 |
| Pos5 (d4) | 2103487 | 0.484 | 4.35 |
| Starting Pos | 4000000 | 1.912 | 2.09 |
| Kiwipete | 9600000 | 1.560 | 6.15 |
| Middlegame | 7200000 | 1.741 | 4.14 |
| Starting Pos | 10000000 | 7.158 | 1.40 |
| Kiwipete | 24000000 | 16.418 | 1.46 |
| Middlegame | 18000000 | 13.457 | 1.34 |
a fixed per-move time of $0.1$ s,
Ablation chain: pooled logical Elo advantage of $E_2$ and SPRT
| Metric | Value |
|---|---|
| $E_1$ (baseline) | Base |
| $E_2$ (candidate) | Quiescence |
| Total games (paired) | 5000 (2500 pairs) |
| Outcome distribution^*^ | [192, 242, 1548, 817, 2201] |
| Mean score per game^*^ | 0.7297 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +172.48 |
| 95% confidence interval for $\Delta$ | [+164.81, +180.14] |
| Log-likelihood ratio (LLR) | 101.4659 |
| SPRT conclusion | $H_1$ accepted |
Performance impact of adding Quiescence Search to the baseline.
Table [tab:quiescence] shows the largest single jump in the chain ($\Delta$Elo $=+172.48$, $H_1$ accepted): extending tactical lines is decisive relative to the plain horizon-limited search. This indicates that this advantage is not a narrow artifact of one panel; instead, it reflects a broad reduction in tactical blunders that the fixed horizon baseline cannot repair in time.
Transposition table.
Section titled “Transposition table.”| Metric | Value |
|---|---|
| $E_1$ (baseline) | Q |
| $E_2$ (candidate) | TT |
| Total games (paired) | 5000 (2500 pairs) |
| Outcome distribution^*^ | [85, 505, 1155, 1025, 2230] |
| Mean score per game^*^ | 0.7405 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +182.16 |
| 95% confidence interval for $\Delta$ | [+164.79, +199.52] |
| Log-likelihood ratio (LLR) | 22.1268 |
| SPRT conclusion | $H_1$ accepted |
| Metric | Value |
|---|---|
| $E_1$ (baseline) | TT |
| $E_2$ (candidate) | NMP |
| Total games (paired) | 5000 (2500 pairs) |
| Outcome distribution^*^ | [714, 152, 2860, 208, 1066] |
| Mean score per game^*^ | 0.5380 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +26.46 |
| 95% confidence interval for $\Delta$ | [+19.63, +33.29] |
| Log-likelihood ratio (LLR) | 13.4445 |
| SPRT conclusion | $H_1$ accepted |
Evaluation of NMP performance against a TT baseline.
NMP (Table [tab:nmp]) yields a clear but smaller increase ($\Delta$Elo $=+26.46$, $H_1$ accepted). The direction matches the EBF diagnostics: pruning lowers search work, but its standalone strength effect is limited compared to quiescence, TT, or LMR in this staged pipeline.
Late move reductions.
Section titled “Late move reductions.”| Metric | Value |
|---|---|
| $E_1$ (baseline) | NMP |
| $E_2$ (candidate) | LMR |
| Total games (paired) | 5000 (2500 pairs) |
| Outcome distribution^*^ | [346, 148, 2064, 315, |
| Mean score per game^*^ | 0.5310 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +136.12 |
| 95% confidence interval for $\Delta$ | [+128.78, +143.46] |
| Log-likelihood ratio (LLR) | 69.4826 |
| SPRT conclusion | $H_1$ accepted |
| Metric | Value |
|---|---|
| $E_1$ (baseline) | LMR |
| $E_2$ (candidate) | AW |
| Total games (paired) | 5000 (2500 pairs) |
| Outcome distribution^*^ | [983, 306, 2547, 262, 902] |
| Mean score per game^*^ | 0.4897 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | -7.16 |
| 95% confidence interval for $\Delta$ | [-13.97, -0.35] |
| Log-likelihood ratio (LLR) | -4.9383 |
| SPRT conclusion | $H_0$ accepted |
Performance delta: Aspiration Windows vs. LMR.
The aspiration row (Table [tab:aw]) is the clearest negative result in the movetime chain ($\Delta$Elo $=-7.16$, CI entirely below zero, $H_0$ accepted): at 0.1 s, the extra fail-high/fail-low recovery cost outweighs nominal windowing savings. aligns with this clock-driven interpretation and motivates the fixed-depth cross-check in .
Principal Variation search.
Section titled “Principal Variation search.”| Metric | Value |
|---|---|
| $E_1$ (baseline) | AW |
| $E_2$ (candidate) | PVS |
| Total games (paired) | 2180 (1090 pairs) |
| Outcome distribution^*^ | [442, 138, 1064, 119, |
| Mean score per game^*^ | 0.4921 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | -5.50 |
| 95% confidence interval for $\Delta$ | [-15.81, +4.82] |
| Log-likelihood ratio (LLR) | -1.7062 |
| SPRT conclusion | $H_0$ accepted |
Ablation chain (fixed nominal depth): pooled logical Elo advantage
| Metric | Value |
|---|---|
| $E_1$ (baseline) | Base |
| $E_2$ (candidate) | Quiescence |
| Total games (paired) | 100 (50 pairs) |
| Outcome distribution^*^ | [0, 0, 0, 4, 96] |
| Mean score per game^*^ | 0.9900 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +798.25 |
| 95% confidence interval for $\Delta$ | [+556.28, +1040.23] |
| Log-likelihood ratio (LLR) | 145.8211 |
| SPRT conclusion | $H_1$ accepted |
Fixed-depth control: performance impact of adding Quiescence Search to the baseline.
Quiescence remains overwhelmingly favourable at fixed depth (Table [tab:quiescence-depth], $H_1$ accepted). This confirms that its benefit is not merely “more plies under time”, but higher tactical stability even when both engines are constrained to the same nominal horizon.
Transposition table.
Section titled “Transposition table.”| Metric | Value |
|---|---|
| $E_1$ (baseline) | Q |
| $E_2$ (candidate) | TT |
| Total games (paired) | 1000 (500 pairs) |
| Outcome distribution^*^ | [3, 81, 854, 59, 3] |
| Mean score per game^*^ | 0.4945 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | -3.82 |
| 95% confidence interval for $\Delta$ | [-19.05, +11.41] |
| Log-likelihood ratio (LLR) | -6.4050 |
| SPRT conclusion | $H_0$ accepted |
| Metric | Value |
|---|---|
| $E_1$ (baseline) | TT |
| $E_2$ (candidate) | NMP |
| Total games (paired) | 1000 (500 pairs) |
| Outcome distribution^*^ | [14, 140, 721, 108, 17] |
| Mean score per game^*^ | 0.4935 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | -4.52 |
| 95% confidence interval for $\Delta$ | [-19.75, +10.71] |
| Log-likelihood ratio (LLR) | -3.1305 |
| SPRT conclusion | $H_0$ accepted |
Fixed-depth control: NMP performance against a TT baseline.
NMP is likewise inconclusive at fixed depth (Table [tab:nmp-depth], $H_0$ accepted). The positive movetime gain therefore appears schedule-sensitive: NMP helps when saved nodes can be reinvested into extra completed iterations, but yields little measurable advantage when both sides are locked to the same nominal depth.
Late move reductions.
Section titled “Late move reductions.”| Metric | Value |
|---|---|
| $E_1$ (baseline) | NMP |
| $E_2$ (candidate) | LMR |
| Total games (paired) | 1000 (500 pairs) |
| Outcome distribution^*^ | [66, 275, 416, 191, 52] |
| Mean score per game^*^ | 0.4720 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | -19.48 |
| 95% confidence interval for $\Delta$ | [-34.73, -4.23] |
| Log-likelihood ratio (LLR) | -3.9305 |
| SPRT conclusion | $H_0$ accepted |
| Metric | Value |
|---|---|
| $E_1$ (baseline) | LMR |
| $E_2$ (candidate) | AW |
| Total games (paired) | 1000 (500 pairs) |
| Outcome distribution^*^ | [33, 203, 529, 187, 48] |
| Mean score per game^*^ | 0.5035 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +2.43 |
| 95% confidence interval for $\Delta$ | [-12.80, +17.66] |
| Log-likelihood ratio (LLR) | -0.0157 |
| SPRT conclusion | $H_1$ accepted |
Fixed-depth control: aspiration windows vs. LMR.
Aspiration is close to neutral at fixed depth (Table [tab:aw-depth]): the point estimate is slightly positive, but uncertainty is wide. Compared with the clear movetime regression, this supports the interpretation that aspiration’s main failure mode here is clock overhead from re-searches rather than a systematic fixed-horizon search defect.
Principal Variation search.
Section titled “Principal Variation search.”| Metric | Value |
|---|---|
| $E_1$ (baseline) | AW |
| $E_2$ (candidate) | PVS |
| Total games (paired) | 2030 (1015 pairs) |
| Outcome distribution* | 87, 358, 1098, 412, 75 |
| Mean score per game* | 0.5043 |
| Elo difference $\Delta$ ($E_2$ vs. $E_1$, logistic) | +3.01 |
| 95% confidence interval for $\Delta$ | [-7.80, +13.75] |
| Log-likelihood ratio (LLR) | +2.10 |
| SPRT conclusion | $H_1$ accepted |
Fixed-depth control: principal variation search (PVS) vs. aspiration windows.
PVS follows the same pattern as aspiration: near-neutral to slightly positive at fixed depth (the table above), but not decisively separated from zero. In combination with the movetime row, the practical conclusion is that late-stage windowing benefits are contingent on robust move ordering and enough time per move to absorb occasional re-search bursts.
Empirical Analysis of Search Growth and Branching Metrics
Section titled “Empirical Analysis of Search Growth and Branching Metrics”This subsection evaluates the impact of incremental search optimizations on the Irida engine’s search efficiency. By measuring the effective branching factor (EBF) and node growth across staged binary builds, we quantify the reduction in the state-space search achieved by specific heuristic implementations.
Methodology and Metrics
Section titled “Methodology and Metrics”The evaluation was conducted using a standardized automated harness on
the engine’s starting position (startpos). To isolate the effects of
search heuristics, the transposition table (TT) size was fixed at 16 MB,
and the search depth was capped at $D=8$. Nodes are sampled from the UCI
info stream, representing cumulative counts inclusive of quiescence
search nodes.
To characterize tree growth, we define the following metrics:
-
Incremental Work ($\Delta_d$): The number of nodes explored at depth $d$ that were not explored during the previous iterative deepening step, defined as $\Delta_d = N_d - N_{d-1}$.
-
Growth Ratio ($R_d$): The ratio of incremental work between successive depths, $R_d = \Delta_d / \Delta_{d-1}$.
-
Effective Branching Factor ($\text{EBF}_{\text{geom}}$): The geometric mean of $R_d$ for $d \in {2, \dots, D}$. This provides a measure of the multiplicative growth rate of the search tree.
-
Average Branching Factor ($\bar{b}$): A simplified metric calculated as $N^{1/D}$, providing a global approximation of the tree’s expansion.
It is critical to distinguish these empirical diagnostics from theoretical branching factors. Heuristics such as Late Move Reductions (LMR) and Alpha-Beta pruning dynamically reshape the search tree, meaning these values reflect the efficiency of the implementation rather than the absolute complexity of the game state.
Experimental Setup: Build Configurations
Section titled “Experimental Setup: Build Configurations”A feature matrix was constructed to isolate the contributions of specific search algorithms. Each binary build, summarized in , represents a cumulative addition of features following the standard Irida optimization path.
Binary Configuration Quiescence TT NMP LMR Asp. PVS Syzygy
irida (Default) ---
irida-plain --- --- --- --- --- --- ---
irida-q --- --- --- --- --- ---
irida-q-tt --- --- --- --- ---
irida-q-tt-nmp --- --- --- ---
irida-q-tt-nmp-lmr --- --- ---
irida-q-tt-nmp-lmr-aw --- ---
irida-q-tt-nmp-lmr-aw-pvs ---
: Feature composition of evaluated engine binaries.
Results
Section titled “Results”The results of the depth-8 search benchmarks are presented in . Analysis of these metrics reveals several key trends regarding search efficiency.
Binary Cumulative Nodes ($N$) $\bar{b}$ ($N^{1/D}$) $\text{EBF}_{\text{geom}}$
irida 362,823 4.954 3.166
irida-plain 1,297,709 5.810 4.634
irida-q 5,731,889 6.995 5.273
irida-q-tt 1,639,226 5.982 4.330
irida-q-tt-nmp 2,261,276 6.227 4.607
irida-q-tt-nmp-lmr 389,612 4.998 3.513
irida-q-tt-nmp-lmr-aw 362,823 4.954 3.166
irida-q-tt-nmp-lmr-aw-pvs 335,834 4.906 3.138
: Node growth and EBF metrics at depth 8 (initial position).
Comparison of search metrics across engine iterations.
Elo Estimation
Section titled “Elo Estimation”To estimate Irida’s Elo rating, a match of 2,000 paired games was conducted against Stockfish 18, which was configured to a fixed strength of 2000 Elo using the UCI_LimitStrength and UCI_Elo UCI options.
Statistical analysis of the match results using SPRT () yielded an Elo difference ($\Delta \text{Elo}$) of +110.94. Under the assumption of independent and identically distributed (i.i.d.) games, the 95% confidence interval (CI) for this $\Delta \text{Elo}$ is [+79.48, +142.40]. Based on these findings, Irida’s performance results in an estimated rating of 2110 Elo.
Discussion
Section titled “Discussion”Interpreting the Results: Search, Strength, and Measurements
Section titled “Interpreting the Results: Search, Strength, and Measurements”Ablation Observations
Section titled “Ablation Observations”The LMR $\to$ aspiration and aspiration $\to$ PVS pairings are the only rows in this chain where $E_2$ is estimated below $E_1$ in the pooled tables (Tables [tab:aw]—[tab:pvs]). The multi-panel figures for these matchups (Figures 22—23) make the mechanism clear: the 0.1 s per-move control is where the win-rate and side-performance panels show a relative drop for the augmented build, whereas the fixed depth-$4$ control is much closer because both engines are forced to the same nominal horizon.
All three effects below are time-budget phenomena; they largely disappear when depth, not the clock, is what is held fixed.
Work versus depth under a wall clock.
Section titled “Work versus depth under a wall clock.”Aspiration windows and PVS do not change the rules of chess; they change how many nodes are visited and in what order, by inserting small-window probes and, on failure, re-searches (root restarts with a widened window for aspiration; full-window re-searches for PVS when a zero-width scout is too good). When the only budget is wall-clock time, any extra pass that does not end in an immediate cutoff subtracts from how many iterative-deepening cycles complete before the UCI time handler stops the move. A build that prunes more in principle can therefore reach a shallower completed depth in 0.1 s than a simpler build that does fewer “meta” retries. In the fixed-depth control, by contrast, both programs may search the same horizon; the comparison is then about efficiency at that depth, not about how many plies fit in the per-move time budget.
Aspiration and noisy priors at short time.
Section titled “Aspiration and noisy priors at short time.”Narrow root windows are anchored to the value of the previous iteration. At 0.1 s, iterative deepening may stop after only a few outer iterations, so the previous score is a noisier centre for the next window. That raises the rate of fail low and fail high root outcomes and forces expensive full-window re-runs of the same nominal depth, which is exactly the overhead aspiration is meant to avoid when the prior is well-calibrated.
PVS and move ordering at short time.
Section titled “PVS and move ordering at short time.”Principal variation search pays off when the first move after ordering is usually best, so most siblings are dismissed after a cheap null-window try. With very short thinking time, the transposition table and history heuristics are less fully populated: more scouts fail into full-window re-searches, and the intended node savings of PVS erode. shows that, on a single diagnostic position, the AW and default profiles already converge in node count; PVS nudges nodes down only slightly, so the match is won or lost in scheduling and ordering noise more than in gross per-position tree size. In the pooled ablation row (Table [tab:pvs]) this shows up as a muddled $\Delta$Elo: the $95%$ interval straddles zero, unlike the aspiration case (Table [tab:aw], entirely below zero), so should be read as a clear loss for added aspiration, but an inconclusive (and schedule-driven) PVS comparison at this sample size.
Heuristic Impact Analysis
Section titled “Heuristic Impact Analysis”The staged EBF results () and dual-control ablations (Tables 1 and 2) show that each mechanism contributes in a different way, and raw node reduction alone is not a sufficient predictor of strength:
-
Quiescence is quality-first, not speed-first. In the EBF harness, quiescence increases node count relative to
irida-plainbecause tactical leaves are expanded further, yet it is the strongest gain in both ablation regimes. The interpretation is that better terminal stability dominates extra search work. -
TT mostly converts efficiency into depth under time caps. TT sharply lowers node growth after quiescence in the fixed-position harness, and it is strongly positive in the movetime chain. Under fixed depth, however, TT is near-neutral, indicating that much of its competitive value appears when saved work can be reinvested into more completed iterations before timeout.
-
NMP and LMR are schedule-dependent. NMP is a moderate gain in movetime but inconclusive at fixed depth. LMR is strongly positive in movetime yet negative at fixed depth. Together, these rows indicate that selective pruning/reduction is excellent for short-time efficiency, but can trade off fixed-horizon search fidelity.
-
AW and PVS are marginal in this budget regime. Node counts in show only a small late-stage difference between AW and PVS builds. Correspondingly, both are neutral-to-negative in movetime and near-neutral in fixed depth, which is consistent with sensitivity to ordering quality and re-search overhead.
The convergence of irida-q-tt-nmp-lmr-aw and the default irida
profile suggests that Syzygy tablebase integration does not affect node
counts at the early-game start position, as no tablebase probes are
triggered.
Design Trade-offs: Throughput, Depth, and Heuristic Overhead
Section titled “Design Trade-offs: Throughput, Depth, and Heuristic Overhead”The results of reinforce a key engineering distinction: throughput-oriented optimisations and decision-quality optimisations are not interchangeable, and each is mediated by the active time control.
Move-generation and lifecycle benchmarks (Table [tab:movegen_benchmarks]) establish the local performance envelope: raw move generation is fast, but end-to-end search throughput is materially lower once evaluation, state updates, and table lookups are included. Consequently, search strength cannot be inferred from MN/s alone. What matters in play is how efficiently that throughput is converted into completed iterative-deepening layers before the clock stops the search ().
The ablation chain then quantifies where that conversion works. Quiescence, TT, and LMR provide the largest practical gains under 0.1 s movetime, indicating that tactical stability, transposition reuse, and aggressive late-move control collectively improve strength-per-second. However, fixed-depth results expose their limits: TT and NMP become statistically neutral, and LMR can turn negative. This is evidence that some gains are primarily time-management gains (more effective depth for the same wall clock), not unconditional improvements in equal-horizon move quality.
The late-stage AW/PVS rows sharpen this trade-off. Their theoretical purpose is to reduce work via tighter windows, but in short time controls the cost of fail-high/fail-low recovery and scout re-searches competes directly with iterative deepening. In other words, reducing nodes conditional on stable bounds is not sufficient when bounds are noisy and the remaining time is small. This motivates practical tuning priorities for student engines: robust ordering, conservative window management at short TC, and parameter sets conditioned on the expected playing schedule rather than a single “global best” profile.
Finally, the Elo estimate against strength-limited Stockfish situates these trade-offs in an external reference frame. The net system is competitive and benefits from the chosen search stack, but the internal ablations show that “add all standard heuristics” is not automatically optimal unless each component is validated under the same time model as intended deployment.
Lessons for Building a Student Engine
Section titled “Lessons for Building a Student Engine”Three methodological lessons are especially transferable to future student projects.
1) Evaluate in stages, not only final-vs-final.
Section titled “1) Evaluate in stages, not only final-vs-final.”The cumulative chain design makes causal interpretation possible.
Comparing irida-plain directly with the final build would show a large
total gain, but would hide that some late additions are neutral or
adverse under specific controls. Stage-wise ablation therefore functions
as both a research tool and a debugging method for search regressions.
2) Use at least two complementary test regimes.
Section titled “2) Use at least two complementary test regimes.”Movetime and fixed-depth controls answer different questions. The first reflects tournament-like constraints (strength per second), while the second probes equal-horizon quality. Running both prevents overfitting to one protocol and helps separate algorithmic merit from time-allocation artifacts.
3) Combine statistical outcomes with search diagnostics.
Section titled “3) Combine statistical outcomes with search diagnostics.”Elo/SPRT outputs tell whether a change is likely beneficial in match play; EBF and node-growth diagnostics help explain why. Interpreting them together improves tuning decisions, for example identifying LMR as time-efficient but fixed-depth-fragile, or identifying AW/PVS as sensitive to re-search overhead despite small node-count gains.
Conclusions and Future Work
Section titled “Conclusions and Future Work”What Was Built and What Was Shown
Section titled “What Was Built and What Was Shown”This thesis presented the design, implementation, and empirical evaluation of a UCI-compatible chess engine in C, with emphasis on transparent search engineering rather than opaque end-to-end learning. The resulting system, Irida, combines legal move generation, handcrafted evaluation with NNUE support, and a staged alpha—beta search stack including quiescence, TT, NMP, LMR, aspiration windows, and PVS.
The main empirical contribution is not only the final engine strength, but the measurement process used to justify each component. Chapter 6 showed that the largest practical gains under short per-move time come from quiescence, transposition reuse, and LMR, while late-stage windowing heuristics (aspiration and PVS) are sensitive to time budget and do not automatically improve results under every control. The paired movetime/fixed-depth protocol, together with EBF diagnostics and SPRT-based decisions, made this trade-off visible and reproducible.
In summary, the thesis demonstrates that competitive strength in a student engine emerges from the interaction of implementation efficiency, time-control-aware search design, and statistically grounded evaluation, rather than from adding heuristics in isolation.
Limits of the Study
Section titled “Limits of the Study”The conclusions should be interpreted with the following boundaries:
-
Hardware and runtime scope. All experiments were conducted on a single machine, single thread, and fixed hash settings (Chapter 5). Results are therefore internally consistent but not a full cross-hardware performance map.
-
Protocol scope. The ablation study uses two controls (0.1 s movetime and fixed nominal depth), which intentionally capture two complementary viewpoints. Additional controls (increment, sudden death, longer TCs) may shift the relative value of some heuristics.
-
Statistical uncertainty. Although sample sizes are large and SPRT/CI reporting is used throughout, engine match outcomes remain noisy and opening-dependent. Borderline rows, especially among late-stage heuristics, should be treated as context-specific rather than universal.
-
Evaluation and learning scope. The project does not include custom NNUE training or large-scale automated parameter tuning. This keeps the implementation focused and interpretable, but likely leaves strength on the table relative to modern auto-tuned pipelines.
-
External Elo anchoring. The external Elo estimate is tied to the chosen reference setup and should be read as a practical benchmark, not an absolute rating transferable to all pools.
Future Work
Section titled “Future Work”Several extensions follow naturally from the observed results:
-
Time-control-aware tuning. Introduce parameter profiles for short and long controls (e.g., aspiration width, reduction tables, null move margins), selected dynamically at runtime.
-
Evaluation refinement. Expand and tune handcrafted terms, then perform controlled comparisons with stronger NNUE integration paths (network choice, update policy, fallback logic).
-
Parallel search support. Add SMP search (e.g., Lazy SMP) and repeat ablations under multi-thread settings, where TT dynamics and move-order quality can differ substantially.
-
Broader benchmark matrix. Extend testing to additional time controls, opening suites, and external opponents to improve generalisability and reduce schedule-specific bias.
-
Automated tuning workflow. Integrate black-box optimization (for search and evaluation parameters) with SPRT gating, so candidate configurations are accepted or rejected automatically.
-
Tooling and reproducibility. Package scripts, configuration presets, and experiment manifests for repeatable runs and easier educational reuse by other students.
-
Accumulator—based NNUE. NNUE that updates as the game progresses to avoid full re-evaluation of the position.
Taken together, these steps would move the project from a strong single-machine research prototype to a more broadly tuned and deployable chess engine framework, while preserving the thesis emphasis on explainability and evidence-based development.
Usage Guide
Section titled “Usage Guide”Castro Library
Section titled “Castro Library”Castro is a high-performance move generation library. To build the
library from source, clone the repository and use the provided
Makefile:
| git clone https://github.com/KDesp73/castro | | cd castro |make all -j2C API Example
Section titled “C API Example”The following example (example.c) demonstrates how to initialize the
global lookup tables, load a board from a FEN string, and generate legal
moves. Note that we include castro.h, which contains the core board
structures and function prototypes.
#include <stdio.h>#include "castro.h"
int main(void){ // Initialize global lookup tables (Zobrist, Bitmasks, Magic Bitboards) castro_InitZobrist(); castro_InitMasks(); castro_InitMagic();
const char* fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"; Board board = {0};| castro_BoardInitFen( | board, fen); |
// Generate all legal moves for the current position| Moves moves = castro_GenerateMoves( | board, MOVE_LEGAL); |
printf("Legal moves in the starting position:\n"); for (size_t i = 0; i < moves.count; ++i) { char move_str[8]; castro_MoveToString(moves.list[i], move_str); printf("%s\n", move_str); }
| castro_BoardFree( | board); | return 0;}Compilation and Execution
Section titled “Compilation and Execution”To compile your application, link against the static library generated
in the previous step. Ensure the include path (-I) points to the
directory containing castro.h.
gcc example.c -o example -L. -l:libcastro.a -I src/./exampleExpected Output:
Legal moves in the starting position:a2a3a2a4b2b3b2b4c2c3c2c4d2d3d2d4e2e3e2e4f2f3f2f4g2g3g2g4h2h3h2h4b1a3b1c3g1f3g1h3Irida Engine
Section titled “Irida Engine”Irida is a UCI-compliant chess engine. It can be used via any standard Chess GUI (such as Arena or Cutechess) or directly through the terminal for debugging and testing.
Building from Source
Section titled “Building from Source”To build the engine, clone the repository and run the make command:
| git clone https://github.com/KDesp73/irida | | cd irida |make irida./iridaManual UCI Usage
Section titled “Manual UCI Usage”Once the engine is running, you can communicate with it using the UCI
protocol. In the example below, > denotes user input and < denotes
engine output.
[]
< irida v0.9.0 by Konstantinos Despoinidis (KDesp73)> uci< id name irida< id author Konstantinos Despoinidis (KDesp73)< id version 0.9.0< ... [options omitted for brevity] ...< uciok
# Command the engine to search the current position to depth 6> go depth 6< info depth 1 seldepth 1 score cp 162 nodes 40 nps 0 pv g1f3< info depth 2 seldepth 3 score cp 0 nodes 197 nps 0 pv g1f3 g8f6< info depth 3 seldepth 6 score cp 144 nodes 1732 nps 0 pv g1f3 g8f6 b1c3< info depth 4 seldepth 7 score cp 0 nodes 4009 nps 0 pv b1c3 g8f6 g1f3 b8c6< info depth 5 seldepth 9 score cp 140 nodes 14982 nps 2 pv e2e3 e7e6 d1f3 d8f6 f1d3< info depth 6 seldepth 12 score cp 0 nodes 46862 nps 8 pv b1c3 b8c6 c3b5 c6b4 g1f3 g8f6< bestmove b1c3Games and Positions
Section titled “Games and Positions”Kiwipete
Endgame
Position 5
Middlegame
Irida 0.10.0 delivering mate with white against Stockfish 18. Stockfish was limited to 2000 Elo with 0.2s per move for both engines.
Irida 0.10.0 delivering mate with black against Stockfish 18. Stockfish was limited to 2000 Elo with 0.2s per move for both engines.
Die Schachspieler painting position. Irida evaluates this position as $−23.61$ so the devil is winning the game.
Figures
Section titled “Figures”Layered structure of the evaluation function. A PeSTO-based material and piece-square table core is augmented with positional heuristics and tactical/dynamic terms before producing the final score.
High-level structure of the search system: a single negamax–alpha-beta core with grouped enhancements (schematic, not a call graph).
Code Snippets
Section titled “Code Snippets”#include "castro.h"
typedef unsigned long long u64;
u64 castro_PerftPseudoLegal(Board* board, int depth){ if (depth <= 0) return 1;
Moves moves = castro_GenerateMoves(board, MOVE_PSEUDO); u64 total = 0;
for (size_t i = 0; i < moves.count; i++) { if (!castro_MakeMove(board, moves.list[i])) continue;
// After MakeMove, board->turn is the opponent; // the side that just moved is !board->turn if (!castro_IsInCheckColor(board, (PieceColor)(!board->turn))) { if (depth == 1) total++; else total += castro_PerftPseudoLegal(board, depth - 1); } castro_UnmakeMove(board); } return total;}// @type Board// @desc Core board structure combining bitboards and grid for performance and simplicity.typedef struct { Bitboard bitboards[PIECE_TYPES]; ///< One bitboard per piece type (white/black) char grid[8][8]; ///< ASCII piece grid for quick access Bitboard white; ///< Cached: all white pieces Bitboard black; ///< Cached: all black pieces Bitboard empty; ///< Cached: empty squares (~(white|black))
// Game state Square enpassant_square; ///< En passant target square, if any bool turn; ///< true = white to move, false = black uint8_t castling_rights; ///< Castling rights bitfield size_t halfmove; ///< Halfmove clock for 50-move rule size_t fullmove; ///< Fullmove number (starts at 1)
History history; ///< Move history uint64_t hash; ///< Zobrist hash of current position
struct { bool turn; int halfmoveClock; int fullmoveNumber; Square epSquare; } null_move_state;} Board;// @struct Engine// @desc Engine descriptor: name, author, function pointers (search, eval, order), and board.typedef struct { char name[128]; char author[128];
SearchFn search; EvalFn eval; OrderFn order;
Board board;} Engine;