# Design

(algorithm-goals)=
## Goals

The core reduction algorithm has three goals:

1. Go fast
2. Produce small test cases
3. Produce readable test cases

Unlike more theoretical work in this space, the algorithm does *not* attempt to
minimize the number of "oracle queries", that is, invocations of the
user-provided "interestingness test".

## Design Statement

The aims of `treereduce` are more pragmatic than academic. It is based on
similar principles to those discussed in the [AFL whitepaper][afl]:

> [treereduce] does its best not to focus on any singular principle of operation
> and not be a proof-of-concept for any specific theory. [...] The only true
> governing principles are speed, reliability, and ease of use.

(algorithm-assumptions)=
## Assumptions

These assumptions strongly inform the algorithm design:

1. The interestingness will be comparatively slow---it will generally involve
   spinning up a new process, disk I/O, parsing, type-checking, etc.
2. Smaller inputs lead to faster interestingness tests.

These assumptions hold for the use-case of reducing programs that cause compiler
crashes.

## High-Level Design

Due to [Assumption (1)](algorithm-assumptions), it's essential that `treereduce`
execute several interestingness tests in parallel. Luckily, for the same reason,
lock contention is unlikely to be a major issue---so long as executing the
interestingness test doesn't require holding a lock, most threads will spend the
majority of their time executing the interestingness test, rather than waiting
for locks on shared data. (This claim has been validated by {ref}`profiling
<profiling>` several {doc}`benchmarks <benchmarks>`.)

The recent paper "[PARDIS][pardis] : Priority Aware Test Case Reduction"
highlights the importance of *prioritization* of reductions. Greedy removal of
the *largest* subtrees leads to greatly increased performance due to faster
interestingness tests ([Assumption (2)](algorithm-assumptions)).

With these ideas in mind, the overall algorithm design involves spinning up some
number of threads, which share three pieces of data:

1. The original text and syntax tree of the target (read-only)
2. A prioritized max-heap of [*reduction tasks*](reduction-strategies)
   (read-write)
3. A set of *edits* to the syntax tree (read-write)

where an edit is either the deletion of a subtree, or a replacement of one
subtree by another. Each thread executes the following loop:

1. Pop a reduction task off the heap.
2. Create a local copy of the edits. Add edits according to the current task.
3. Execute the interestingness test on the reduced program, i.e., the target
   as altered by the local set of edits.
4. If the reduced program was still interesting, try replacing the global edits.
   If the global edits were changed by another thread, try this task again with
   the new edits, that is, go to (2).
5. Push any new tasks onto the task queue.
6. Go to (1).

If lock contention does become an issue, it may be beneficial for each thread to
maintain a local task heap in addition to the global one, or even to attempt
multiple tasks before replacing the global edits.

(reduction-strategies)=
## Reduction Strategies

`treereduce` uses several strategies during program minimization:

- *Deletion*: When a child is optional, `treereduce` attempts to delete it. For
  example, `treereduce` might delete the `const` in `const int x;`.
- *Delta debugging* (TODO([#2][#2])): When a node has a list of children,
  `treereduce` uses *delta debugging* to delete as many as possible in an efficient
  way.
- *Hoisting* (TODO([#3][#3])): Nodes with a recursive structure may be replaced
  by their descendants, e.g. replacing `5 + (3 * y)` with just `y`.

(bib)=
## Bibliography

TODO(#16): BibTeX

- Gharachorlu, G. and Sumner, N., 2019, April. : Priority Aware Test Case
  Reduction. In International Conference on Fundamental Approaches to Software
  Engineering (pp. 409-426). Springer, Cham.
- Sun, C., Li, Y., Zhang, Q., Gu, T. and Su, Z., 2018, May. Perses:
  Syntax-guided program reduction. In Proceedings of the 40th International
  Conference on Software Engineering (pp. 361-371).
- Hodován, R. and Kiss, Á., 2016, July. Practical Improvements to the Minimizing
  Delta Debugging Algorithm. In ICSOFT-EA (pp. 241-248).
- Hodován, R. and Kiss, Á., 2016, November. Modernizing hierarchical delta
  debugging. In Proceedings of the 7th International Workshop on Automating Test
  Case Design, Selection, and Evaluation (pp. 31-37).
- Vince, D., Hodován, R., Bársony, D. and Kiss, Á., 2021, May. Extending
  Hierarchical Delta Debugging with Hoisting. In 2021 IEEE/ACM International
  Conference on Automation of Software Test (AST) (pp. 60-69). IEEE.
- Kiss, Á., Hodován, R. and Gyimóthy, T., 2018, November. HDDr: a recursive
  variant of the hierarchical delta debugging algorithm. In Proceedings of the
  9th ACM SIGSOFT International Workshop on Automating TEST Case Design,
  Selection, and Evaluation (pp. 16-22).
- Hodován, R., Kiss, Á. and Gyimóthy, T., 2017, September. Coarse hierarchical
  delta debugging. In 2017 IEEE international conference on software maintenance
  and evolution (ICSME) (pp. 194-203). IEEE.
- https://blog.sigplan.org/2021/03/30/an-overview-of-test-case-reduction/
- https://blog.trailofbits.com/2019/11/11/test-case-reduction/
- https://www.drmaciver.com/2017/06/adaptive-delta-debugging/
- https://www.drmaciver.com/2019/01/notes-on-test-case-reduction/

[#1]: https://github.com/langston-barrett/treereduce/issues/1
[#2]: https://github.com/langston-barrett/treereduce/issues/2
[#3]: https://github.com/langston-barrett/treereduce/issues/3
[#16]: https://github.com/langston-barrett/treereduce/issues/16
[afl]: https://lcamtuf.coredump.cx/afl/technical_details.txt
[pardis]: https://github.com/golnazgh/PARDIS