Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Index

Researcher with experience in functional programming, formal methods, programming languages, and security. I work at Galois.

The statements and views posted here are my own and do not reflect those of my employer.

Projects

Highlighted Projects

  • Lesshand: A shorthand for the 21st century
  • treereduce: A fast, parallel, syntax-aware test case reducer based on tree-sitter grammars
  • tree-splicer: Simple grammar-based test case generator (black-box fuzzer)

Datalog

  • cclyzer++: a precise and scalable global pointer analysis for LLVM code
  • dlsubst: Experimental bottom-up Datalog engine based on explicit substitutions
  • Duckalog: Datalog engine based on DuckDB
  • rdf-star-souffle: Toolkit for importing and reasoning about RDF-star data in Soufflé
  • souffle-lint: A linter for Soufflé Datalog
  • treeedb: Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
  • tree-sitter-souffle: A tree-sitter grammar for Soufflé Datalog

Fuzzing

  • czz: Whole-program, Scheme-scriptable, multi-language, coverage-guided fuzzer
  • radamsa-sys: Rust bindings for Radamsa
  • treereduce: A fast, parallel, syntax-aware test case reducer based on tree-sitter grammars
  • tree-crasher: Easy-to-use grammar-based black-box fuzzer
  • tree-splicer: Simple grammar-based test case generator

Tiny tools

  • drice: Dr. Ice diagnoses internal compiler errors (ICEs) in rustc
  • Icemelter: Icemelter automates steps in debugging rustc internal compiler errors (ICEs)
  • linkup: Automatically add links to Markdown files
  • Marki: Generate Anki cards from Markdown notes
  • mdlynx: Small, fast tool to find broken file links in Markdown documents
  • ttlint: tiny text linter

Tiny libraries

  • fin-part-ord: Crate for representing finite partial orders
  • tree-sitter-edit: A crate for printing modified tree-sitter parse trees, intended for use in multi-language code refactoring, linting, or modification (codemod) tools
  • twjsrs: Crate for (de)serializing TiddlyWiki tiddlers to and from their JSON format

Other

  • anki-wikidata: Create high-quality Anki cards with data from Wikidata
  • contract.el: Racket-style higher-order contracts for Emacs Lisp
  • coq-big-o: A formalization of Big O, Big Theta, and more based on normed vector spaces
  • nixpkgs-blight: Instrument the builds of tens of thousands of open-source packages
  • Spacelix: Spacemacs-like configuration for Helix
  • sql-cli: Dynamically generate CLIs from SQL databases that support CRUD operations
  • write-yourself-a-scheme-in-agda: Like “Write Yourself a Scheme in 48 Hours”, but in Agda
  • zsh-contextual-abbrevs: Supercharged context-dependent aliases for ZSH that save you keystrokes!

Writing

Survey of static reasoning in Haskell

This post surveys a few techniques for achieving high confidence in Haskell code at type-checking time. (It will not cover any aspects of testing.) To compare the approaches, I'll implement a single interface with each of them.

The surveyed techniques are fairly complicated. I'll try to explain a bit about each, but please refer to their documentation for additional background. This post will also assume familiarity with a lot of Haskell: existential quantification (-XExistentialQuantification, -XGADTs, or -XRank2Types), existential quantification with higher-rank polymorphism (-XRankNTypes), safe coercions (Data.Coerce), phantom types, constraint kinds (-XConstraintKinds), GADTs, and data kinds (-XDataKinds and -XPolyKinds).

This post probably should have included hs-to-coq, but I'm unfamiliar with it and perfection is the enemy of completion.

The Specification

The interface in question is for lists of key/value pairs, also known as association lists. For the sake of brevity, we'll only require four operations:

  • empty: An empty association list
  • insert: Takes a key and value and inserts the pair into the list
  • lookup: Takes a key and returns a value associated with it
  • fmap: The association list type must be a functor over the value type

The point is to do a broad-strokes comparison, so we'll adopt a very minimal specification, only requiring that

forall k v assoc. lookup k (insert k v assoc) == v

along with the usual functor laws. The asymptotics should be:

  • empty: O(1)
  • insert: O(1)
  • lookup: O(n)
  • fmap f: O(n) if f takes constant time

Vanilla Haskell

First, let's see what an implementation in "plain Haskell" might look like:

{-# LANGUAGE DeriveFunctor #-}

module Vanilla
  ( Assoc
  , empty
  , insert
  , lookup
  , lookupMaybe
  ) where

import Prelude hiding (lookup)

newtype Assoc k v = Assoc [(k, v)]
  deriving Functor

empty :: Assoc k v
empty = Assoc []

insert :: k -> v -> Assoc k v -> Assoc k v
insert k v (Assoc assoc) = Assoc ((k, v):assoc)

lookup :: Eq k => k -> Assoc k v -> v
lookup k (Assoc assoc) =
  case assoc of
    [] -> error "Failure!"
    ((k', v):rest) ->
      if k == k'
      then v
      else lookup k (Assoc rest)

Short and sweet! Using the API is pretty simple, too:

test :: String
test = lookup 2 (insert 2 "" empty)

But there's one problem: lookup calls error, which can cause the program to fail at run-time. How can we gain more assurance that a program written using this interface won't unexpectedly fail? A common answer would be to reify the partiality in the return type, e.g., with Maybe or Either:

lookupMaybe :: Eq k => k -> Assoc k v -> Maybe v
lookupMaybe k (Assoc assoc) =
  case assoc of
    [] -> Nothing
    ((k', v):rest) ->
      if k == k'
      then Just v
      else lookupMaybe k (Assoc rest)

This amounts to passing the buck. Every caller of lookupMaybe has to decide how to handle the Nothing case; they can choose to "infect" their own return type with partiality, or to risk a run-time error by using a partial function like fromJust :: Maybe a -> a.

In general, API authors choose between four options to deal with partiality:1

  1. Run-time failure, as in lookup
  2. Returning a "default" value, as in a division function that returns zero when given a zero divisor
  3. Reifying the partiality in the return type, as in lookupMaybe
  4. Restricting the function's domain, e.g., head :: NonEmpty a -> a

What if we want to the avoid runtime failure in (1), the ambiguity of (2), and give callers more flexibility than (3)? (In the lookup example, it's not even clear that (4) is at all applicable.) Well hold on to your hats, because there are at least four different ways to do it and they all involve quite a bit of type-system tomfoolery.

Smart Constructors and the Existential Trick

Types are the main tool that Haskell programmers use to get static guarantees about their programs. But type signatures like Int -> Bool encode statements about sets of values (in this case, this is a function that will take any Int as an input), whereas the precondition of lookup is a relationship between the particular values of the key and association list. lookup requires that this key is present in this specific association list. We need some way to attach unique type-level names to each association list, and to tie keys to particular names.

First, we'll add a type variable name to both parameters that acts as a unique identifier for the association list. Assoc name k v is an association list with name name, and Key name k is a key into the association list with name name.

{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ExistentialQuantification #-}

module Smart
  ( Assoc
  , Key
  , SomeAssoc(..)
  , empty
  , insert
  , isKey
  , lookup
  ) where

import Prelude hiding (lookup)

newtype Assoc name k v = Assoc [(k, v)]
  deriving Functor

newtype Key name k = Key k

The question is now: How do we make sure each new Assoc has a fresh name and can't be confused with any other Assoc? The trick that makes this work is existential quantification. Consider the type Something:

data Something = forall a. Something a

Say that I give you two Something values, and you pattern match on them:

you :: Something -> Something -> Int
you (Something x) (Something y) = _

me :: Int
me = you (Something ()) (Something "foo")

What can you do with the data I've given you? In short, pretty much nothing. You can't use x and y in any meaningful way, they have completely unknown and unrelated types (GHC will name their types something like a and a1). In other words, we've given x and y unique type-level labels. The same trick will work for Assoc!

data SomeAssoc k v = forall name. SomeAssoc (Assoc name k v)

empty :: SomeAssoc k v
empty = SomeAssoc (Assoc [])

Since we didn't export the constructor of Assoc but we did export that of SomeAssoc, the only way to make a new association list is to pattern match on the existentially-quantified empty value, which invents a new type-level name for each Assoc.

insert and lookup much the same, modulo slightly changed types:

insert :: k -> v -> Assoc name k v -> Assoc name k v
insert k v (Assoc assoc) = Assoc ((k, v):assoc)

lookup :: Eq k => Key name k -> Assoc name k v -> v
lookup (Key k) (Assoc assoc) =
  case assoc of
    [] -> error "Impossible"
    ((k', v):rest) ->
      if k == k'
      then v
      else lookup (Key k) (Assoc rest)

It looks like lookup is still partial, but by carefully controlling how clients get a Key, we can ensure that it's total. And the only way we'll let them do so is with isKey (note that the constructor of Key, like that of Assoc, is not exported):

isKey :: Eq k => k -> Assoc name k v -> Maybe (Key name k)
isKey k (Assoc assoc) =
  case assoc of
    [] -> Nothing
    ((k', _):rest) ->
      if k == k'
      then Just (Key k)
      else isKey k (Assoc rest)

It's basically identical to lookup: it searches through the list and only yields a Key if k really is a key in the list. Since this is the only way to get a Key, every lookup is guaranteed to succeed.

Here's an example usage:

test :: String
test =
  case empty of
    SomeAssoc emp ->
      let assoc = insert 2 "" emp
      in case isKey 2 assoc of
           Just k -> lookup k assoc
           Nothing -> error "not a key"

Note that while isKey introduces a Maybe, the actual lookup operation doesn't. You can, e.g., store keys in a list and look them up later, without introducing partiality at the lookup. This still isn't ideal; one way to work around it in some cases would be to have insert return a Key.

GADTs, or Dependent Types

Dependent types are (roughly speaking) types that can mention values. Haskell doesn't have "full" dependent types, but GADTs make the type system quite expressive. Let's see what we can do with them!

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeOperators #-}

module GADT
  ( Assoc
  , NatRepr(..)
  , Key
  , empty
  , insert
  , isKey
  , lookup
  ) where

import Prelude hiding (lookup)
import Data.Kind (Type)
import Data.Type.Equality (TestEquality(testEquality), (:~:)(Refl))
import GHC.TypeLits (Nat, type (+))

For the sake of concreteness, we'll fix the keys to be natural numbers:

data NatRepr (n :: Nat) where
  Zero :: NatRepr 0
  Succ :: NatRepr n -> NatRepr (n + 1)

NatRepr is a singleton type meaning each value has a unique type. For example, Zero :: NatRepr 0 and Succ Zero :: NatRepr 1.

Just as lookup had an Eq constraint above, NatRepr needs a TestEquality instance, which is similar but for singletons:

instance TestEquality NatRepr where
  testEquality n m =
    case (n, m) of
      (Succ _, Zero) -> Nothing
      (Zero, Succ _) -> Nothing
      (Zero, Zero) -> Just Refl
      (Succ n', Succ m') ->
        case testEquality n' m' of
          Just Refl -> Just Refl
          Nothing -> Nothing

The Assoc type carries a type-level list of its keys:

data Assoc (keys :: [Nat]) (a :: Type) where
  Nil :: Assoc '[] a
  Cons :: NatRepr n -> a -> Assoc keys a -> Assoc (n ': keys) a

Because this is a GADT, we can no longer derive Functor, we have to write it ourselves:

instance Functor (Assoc keys) where
  fmap f =
    \case
      Nil -> Nil
      Cons n a rest -> Cons n (f a) (fmap f rest)

empty has an empty type-level list of keys:

empty :: Assoc '[] a
empty = Nil

insert adds one key to the front of the type-level list. In fact, insert is just Cons:

insert :: NatRepr n -> a -> Assoc keys a -> Assoc (n ': keys) a
insert = Cons

As in the smart constructor approach, we use a new type Key to track the relationship between a key and an association list. Unlike the smart constructor approach, we don't have to trust the implementation of isKey; the types force it to be correct.

data Key (n :: Nat) (keys :: [Nat]) where
  KeyHere :: Key n (n ': l)
  KeyThere :: Key n keys -> Key n (m ': keys)

isKey :: NatRepr n -> Assoc keys a -> Maybe (Key n keys)
isKey n assoc =
  case assoc of
    Nil -> Nothing
    Cons m _ rest ->
      case testEquality n m of
        Just Refl -> Just KeyHere
        Nothing ->
          case isKey n rest of
            Nothing -> Nothing
            Just key -> Just (KeyThere key)

In lookup, GHC's pattern match checker can tell that if there's a Key n keys, then the type-level list keys can't be empty, so there's no need to handle the Nil case:

lookup :: Key n keys -> Assoc keys a -> a
lookup key assoc =
  case (key, assoc) of
    (KeyHere, Cons _ a _) -> a
    (KeyThere key', Cons _ _ rest) -> lookup key' rest

Here's an example usage:

test :: String
test =
  case isKey two assoc of
    Just k -> lookup k assoc
    Nothing -> error "not a key"
  where
    two = Succ (Succ Zero)
    assoc = insert two "" empty

Ghosts of Departed Proofs

The Ghosts of Departed Proofs (GDP) paper (PDF) outlined several insights that supercharge the smart constructor/existential trick above.

It's worth noting that I'm not an expert in GDP.

GDP Infrastructure

The following introduction to the core GDP infrastructure is fairly cursory, please check out the paper for more information. I'm defining all this here to make this post self-contained, but the combinators in this module are available in the gdp package on Hackage.

{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE DefaultSignatures #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}

module GDP.Named
  ( Named
  , name
  , the
  , SuchThat
  , suchThat
  , Defn
  , Proof
  , axiom
  , type (-->)
  , implElim
  ) where

import Prelude hiding (lookup)
import Data.Coerce (Coercible, coerce)

First, the Named type and name function generalize the existential introduction of type-level names for values, using an equivalence between rank-2 types (the forall in the second parameter to name) and existential types:

-- | Called @~~@ in GDP.
newtype Named name a = Named a
  deriving Functor

name :: a -> (forall name. Named name a -> t) -> t
name x k = coerce k x
{-# INLINE name #-}

To extract a Named value, clients use the:

class The x a | x -> a where
  the :: x -> a
  default the :: Coercible x a => x -> a
  the = coerce

instance The (Named name a) a where

Modules can give type-level names to functions using Defn (demonstrated later):

data Defn = Defn

type Defining name = (Coercible Defn name, Coercible name Defn)

Type-level propositions are called Proof, and can be manipulated with axioms like implElim:

data Proof a = Proof  -- called QED in GDP

axiom :: Proof a
axiom = Proof

data p --> q

implElim :: Proof (p --> q) -> Proof p -> Proof q
implElim _ _ = axiom

Type-level propositions can be attached to values with SuchThat (which can also be cast away with the):

-- | Called @:::@ in GDP.
newtype SuchThat p a = SuchThat a
  deriving Functor

instance The (SuchThat p a) a where

suchThat :: a -> Proof p -> SuchThat p a
suchThat x _ = coerce x

GDP Implementation of Association Lists

Now let's take a look at using those tools to provide more assurance for the Assoc API. Happily, we can directly reuse Assoc and empty from the vanilla Haskell implementation!

{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ExplicitNamespaces #-}
{-# LANGUAGE RoleAnnotations #-}
{-# LANGUAGE TypeOperators #-}

module GDP
  ( Assoc
  , Vanilla.empty
  , IsKey
  , Insert
  , insert
  , lookup
  ) where

import           Prelude hiding (lookup)

import qualified GDP.Named as GDP
import           GDP.Named (type (-->))
import           GDP.Maybe (IsJust)
import qualified GDP.Maybe as GDP
import qualified Vanilla as Vanilla
import           Vanilla (Assoc)

Instead of a separate Key type, we create a type-level proposition IsKey. A GPD.Proof (IsKey kn assocn) means that the key of type GDP.Named kn k is present in the association list of type GDP.Named assocn (Assoc k v).

-- | @kn@ is a key of association list @assocn@
data IsKey kn assocn
type role IsKey nominal nominal

lookup takes a key with an additional IsKey proof as a parameter and then just shells out to the underlying vanilla Haskell lookup implementation. The return type and the Lookup newtype will be explained in just a moment.

-- | Type-level name of 'lookup' for use in lemmas.
newtype Lookup kn assocn = Lookup GDP.Defn
type role Lookup nominal nominal

lookup ::
  Eq k =>
  GDP.SuchThat (IsKey kn assocn) (GDP.Named kn k) ->
  GDP.Named assocn (Assoc k v) ->
  GDP.Named (Lookup kn assocn) v
lookup k assoc =
  GDP.defn (Vanilla.lookup (GDP.the (GDP.the k)) (GDP.the assoc))

insert works much the same way, but it doesn't have any preconditions and so doesn't use SuchThat:

-- | Type-level name of 'insert' for use in lemmas.
newtype Insert kn vn assocn = Insert GDP.Defn
type role Insert nominal nominal nominal

insert ::
  GDP.Named kn k ->
  GDP.Named vn v ->
  GDP.Named assocn (Assoc k v) ->
  GDP.Named (Insert kn vn assocn) (Assoc k v)
insert k v assoc =
  GDP.defn (Vanilla.insert (GDP.the k) (GDP.the v) (GDP.the assoc))

So how do clients construct a GDP.SuchThat (IsKey kn assocn) (GDP.Named kn k) to pass to lookup? One of the nice parts of GDP is that library authors can determine how clients reason about propositions they define, like IsKey. In particular, library authors can introduce type-level names for their API functions like Lookup and Insert, and export axioms that describe how the API relates to the propositions.

One way to know that a key is in a map is if we just inserted that key. We can express this fact by exporting an axiom:

insertIsKey :: GDP.Proof (IsKey kn (Insert kn vn assocn))
insertIsKey = GDP.axiom

With that lemma, here's an example use of this safe API:

test :: String
test =
  GDP.name Vanilla.empty $ \emp ->
    GDP.name (2 :: Int) $ \k ->
      GDP.name "" $ \v ->
        GDP.the $
          lookup
            (k `GDP.suchThat` insertIsKey)
            (insert k v emp)

We can also conclude IsKey when lookupMaybe returns a Just value, kind of like what isKey did in previous examples:

-- | Type-level name of 'lookupMaybe' for use in lemmas.
newtype LookupMaybe kn assocn = LookupMaybe GDP.Defn

lookupMaybe ::
  Eq k =>
  GDP.Named kn k ->
  GDP.Named assocn (Assoc k v) ->
  GDP.Named (LookupMaybe kn assocn) (Maybe v)
lookupMaybe k assoc =
  GDP.defn (Vanilla.lookupMaybe (GDP.the k) (GDP.the assoc))

lookupMaybeKey ::
  GDP.Proof (IsJust (LookupMaybe kn assocn) --> IsKey kn assocn)
lookupMaybeKey = GDP.axiom

test2 :: String
test2 =
  GDP.name Vanilla.empty $ \emp ->
    GDP.name (2 :: Int) $ \k ->
      GDP.name "" $ \v ->
        GDP.the $
          let assoc = insert k v emp
          in case GDP.maybeCase (lookupMaybe k assoc) of
               GDP.IsNothing _ -> error "impossible"
               GDP.IsJust isJustPf _ ->
                 lookup
                   (k `GDP.suchThat` GDP.implElim lookupMaybeKey isJustPf)
                   assoc

where IsJust and maybeCase are defined in GDP.Maybe:

{-# LANGUAGE ExplicitNamespaces #-}
{-# LANGUAGE TypeOperators #-}

module GDP.Maybe
  ( IsNothing
  , IsJust
  , MaybeCase(IsNothing, IsJust)
  , maybeCase
  ) where

import qualified GDP.Named as GDP

data IsJust name
data IsNothing name
data MaybeCase name a
  = IsNothing (GDP.Proof (IsNothing name))
  | IsJust (GDP.Proof (IsJust name)) a

-- | Called @classify@ in GDP.
maybeCase :: GDP.Named name (Maybe a) -> MaybeCase name a
maybeCase x =
  case GDP.the x of
    Nothing -> IsNothing GDP.axiom
    Just a -> IsJust GDP.axiom a

Such code could clearly be defined once and for all in a library somewhere, and probably even generated by Template Haskell.

Liquid Haskell

Liquid Haskell adds refinement types to Haskell, which are kind of like dependent types. It operates as a GHC type-checking plugin, and doesn't affect the runtime semantics of Haskell. I'm no Liquid Haskell expert, but luckily one of the case studies in the documentation is on association lists. What follows is a slight adaptation.

{-# LANGUAGE LambdaCase #-}

module Liquid
  ( Assoc
  , empty
  , insert
  , lookup
  ) where

import Prelude hiding (lookup)

import           Data.Set (Set)
import qualified Data.Set as Set

The Assoc type is isomorphic to the vanilla implementation:

data Assoc k v
  = Nil
  | Cons k v (Assoc k v)

keys retrieves the set of keys in the association list. It's only used at the type level. Using keys, we can refine the type of empty to state that empty has no keys. Types like { v : T | C v } should be read like "values of type T that additionally satisfy constraint C". The base type T can almost always be omitted, as it can be inferred from the vanilla-Haskell type signature.

{-@ measure keys @-}
keys :: Ord k => Assoc k v -> Set k
keys =
  \case
    Nil -> Set.empty
    Cons k _ rest -> Set.union (Set.singleton k) (keys rest)

{-@ empty :: { v : _ | keys v == Set.empty } @-}
empty :: Assoc k v
empty = Nil

Similarly, the function addKey is used to refine the type of insert:

{-@ inline addKey @-}
addKey :: Ord k => k -> Assoc k v -> Set k
addKey k kvs = Set.union (Set.singleton k) (keys kvs)

{-@ insert :: k : _ -> _ -> assoc : _ -> { v : _ | keys v = addKey k assoc } @-}
insert :: k -> v -> Assoc k v -> Assoc k v
insert = Cons

Finally, has is used to express the precondition on lookup:

{-@ inline has @-}
has :: Ord k => k -> Assoc k v -> Bool
has k assoc = Set.member k (keys assoc)

{-@ lookup :: assoc : _ -> k : { key : _ | has key assoc } -> v @-}
lookup :: Eq k => Assoc k v -> k -> v
lookup assoc k =
  case assoc of
    Nil -> error "Impossible"
    Cons k' v rest ->
      if k == k'
      then v
      else lookup rest k

Note that the order of parameters to lookup has to be flipped so that assoc is in scope in the refinement type for k.

Using this API looks almost exactly like it does for the vanilla implementation:

test :: String
test = lookup (insert 2 "" empty) 2

The type-checker will catch invalid calls to lookup:

src/Liquid.hs:58:36-45: error:
    Liquid Type Mismatch
    .
    The inferred type
      VV : {v : GHC.Types.Int | v == 3}
    .
    is not a subtype of the required type
      VV : {VV : GHC.Types.Int | Set_mem VV (Liquid.keys ?d)}
    .
    in the context
      ?b : {?b : (Liquid.Assoc GHC.Types.Int [GHC.Types.Char]) | Liquid.keys ?b == Set_empty 0}

      ?d : {?d : (Liquid.Assoc GHC.Types.Int [GHC.Types.Char]) | Liquid.keys ?d == Set_cup (Set_sng 2) (Liquid.keys ?b)}
   |
58 | test2 = lookup (insert 2 "" empty) (3 :: Int)
   |                                    ^^^^^^^^^^

Conclusions

Smart Constructors and the Existential Trick

Smart constructors and the existential trick are likely the most accessible of all of the above techniques. They require only a single fairly conservative extension to Haskell 2010 (-XExistentialQuantification or -XRank2Types). However, the safety argument relies on the correct implementation of the newtype abstraction by the library author; in this case it would be easy enough to accidentally export the Key constructor or provide some other way of constructing a Key that violates the intended invariant. Also, a new newtype and smart constructor generally have to be custom-built for each desired safety property.

GADTs, or Dependent Types

Dependent types significantly reduce the possibility of programmer error. Once appropriate definitions for Assoc and Key are found, it is impossible for the library author to mis-implement isKey. Like smart constructors and the vanilla Haskell code, they require no external tooling nor libraries. However, programming with dependently-typed techniques has its downsides:

  • It requires knowledge of lots of complex GHC extensions like GADTs, type families (-XTypeFamilies), data kinds/polykinds, and rank-n types.
  • Since types carry proof-relevant information (such as the keys type index on the Assoc type), you can end up with a ton of slight variations on a single interface depending on the properties you need to prove.
  • It's hard to write dependently-typed code, so in practice you can end up with inefficient code due to the difficulty of proving efficient code correct. This is pretty much why this post focused on association lists instead of a more efficient map structure: I think it would probably be pretty hairy to write one up with GADTs. Consider also that Key is linearly sized in the size of the association map.
  • GADTs are not well-supported by existing infrastructure, so we can't e.g. derive TestEquality nor Functor instances, we have to write them ourselves. It's also generally impossible even to hand-write instances for hugely useful typeclasses like Generic and Data. In fact, many important type classes relating to singletons aren't even defined in base (see, e.g., the typeclasses in parameterized-utils). Libraries like singletons and parameterized-utils do have Template Haskell techniques for generating instances for certain of these classes.
  • Error messages involving dependently typed Haskell extensions continue a venerable tradition of flummoxing learners.

I'm very familiar with dependent types, which likely partially accounts for why this list is so extensive.

GDP

The GDP approach is remarkably flexible. The ability to define arbitrary lemmas about an API gives the API consumers a huge amount of power. For example, it was possible to provide two ways to produce an IsKey proof without changing the actual implementation. Furthermore, the GDP approach was able to directly reuse the vanilla Haskell implementation by simply providing wrappers with more interesting type signatures.

On the other hand, understanding the GDP approach requires a good deal of study. GDP also requires trusting the library author to provide sound reasoning principles. The GDP paper outlines a few possible approaches (including use of Liquid Haskell) to improve confidence in the exported lemmas.

Liquid Haskell

Liquid Haskell was a bit difficult to install, some of the project's packages had too-restrictive bounds (resulting in Cabal being unable to find a build plan) whereas others had too-permissive bounds (resulting in compilation failures when Cabal did find a plan). Generally, Liquid Haskell is the unfortunate combination of an impressive amount of infrastructure and code that's maintained mostly by academics. They've done a great job of making it fairly easy to get started, but I wouldn't rely on it for a commercial project for mundane reasons like not being able to upgrade to the latest GHC until the Liquid Haskell team releases a corresponding version of liquid-base.

Liquid Haskell sometimes requires you to write code in a particular way, rejecting semantically equivalent (and arguably, clearer) rewrites. For instance, I originally tried to write keys like so:

{-@ measure keys @-}
keys :: Ord k => Assoc k v -> Set k
keys =
  \case
    Nil -> Set.empty
    Cons k _ rest -> Set.insert k (keys rest)

That didn't work, Liquid Haskell said Unbound symbol Data.Set.Internal.insert. So I tried to factor the insert-via-union into a where binding for the sake of clarity like so:

{-@ measure keys @-}
keys :: Ord k => Assoc k v -> Set k
keys =
  \case
    Nil -> Set.empty
    Cons k _ rest -> setInsert k (keys rest)
  where setInsert x = Set.union (Set.singleton x)

This resulted in Uh oh. Cannot create measure 'Liquid.keys': Does not have a case-of at the top-level. If I understand correctly, this is probably because Liquid Haskell analyzes code at the GHC Core level, so it's sensitive to changes that look similar at the source level but compile to different Core. Such errors are annoying, but I was able to find workarounds whenever they came up. Just like "dependent Haskell", Liquid Haskell error messages leave a lot to be desired.

Like dependently typed Haskell and GDP, actually understanding Liquid Haskell would take at least several hours if not days of study. That being said, I was able to get started with this example in maybe an hour or so using their documentation.

Aside from the type signatures, the Liquid Haskell implementation looks pretty much exactly like the "vanilla" implementation. Remarkably, the API is also almost identical, and users could choose whether or not to use (and understand!) Liquid Haskell when consuming the API, unlike in the other techniques.

As in the dependently-typed approach, less trust in the library author is required.

Like GDP, I believe that Liquid Haskell provides the ability to export lemmas for consumers to use in constructing proofs. Unlike GDP, I believe all such lemmas can be machine-checked.

Comparison and Takeaways

The following table summarizes the above remarks. A table can't have much nuance and I'm not an expert in GDP nor Liquid Haskell, so the ratings should be taken with a hefty grain of salt.

Key:

  • Trust: How much you need to trust the library author, i.e., to what degree are the claims of API safety machine-checked? (Lower is better.)
  • Knowledge: How much do you need to learn to effectively apply this technique? (Lower is better.)
  • Flexibility: How much flexibility do library consumers have to prove that they're using the API safely? (Higher is better.)
  • Infrastructure: How much infrastructure is necessary to make the approach work? (Lower is better.)
  • Reuse: How reusable are the components, e.g., if you needed to prove a different property or construct a larger API? (Higher is better.)

Approach Trust Knowledge Flexibility Infrastructure Reuse


Vanilla n/a Low n/a None n/a Smart High Med Low Low Low GADTs Low High Med Med Med GDP High High High High High Liquid Low High High Very high High

–>

TODO: Table got nerfed when changing blog software...

My overall take is that all of these techniques are under-utilized and under-explored. GADTs/dependent types, GDP, and Liquid Haskell would all be much easier to write if there were more widely-available libraries for programming with them. Given the attention that dependent types have received in the Haskell community, I think GDP and Liquid Haskell are especially neglected.

As the table shows at a glance, these techniques offer a wide range of benefits and drawbacks, which indicates that they're likely appropriate for different use-cases. Broad adoption of several would enable Haskell programmers to use techniques because they're appropriate to their goals, rather than because they have a paucity of options.

Footnotes

1

These are taken from the Ghosts of Departed Proofs paper.

Fun with mypy: reifying runtime relations on types

I'm a big fan of type checkers for Python, like Mypy. One fun aspect of these tools is that their type systems tend to have features that reflect the dynamic nature of Python, like type narrowing. This post describes another, less-documented way to use runtime evidence of type information.

First, consider the cast operation in in the typing module. The docs say:

typing.cast(ty, val)

Cast a value to a type.

This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally don't check anything (we want this to be as fast as possible).

In other words, it's up to the programmer to ensure that the runtime type of the value argument really is a subtype of the type argument. What if we could make a version of cast, perhaps safe_cast, that makes sure that's the case? It would need to meet two requirements:

  1. Performance: No overhead at calls to safe_cast, just like cast. No assertions, no conditionals.
  2. Safety: It should be the case that every call to safe_cast is guaranteed (up to bugs in Mypy or programmers doing something obviously tricky) to succeed.

Well, there's good news! We can:

from __future__ import annotations

from dataclasses import dataclass
from typing import Generic, Type, TypeVar, final

Sub = TypeVar("Sub")
Super = TypeVar("Super")


@final
class Subclass_(Generic[Sub, Super]):
    def __init__(self, sub: Type[Sub], sup: Type[Super]) -> None:
        assert issubclass(sub, sup)


def safe_cast(evidence: Subclass_[Sub, Super], value: Sub) -> Super:
    return value  # type: ignore[return-value]

Here's what it looks like at the REPL:

>>> Subclass_(bool, int)
Subclass_()
>>> Subclass_(bool, str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blog/reify.py", line 13, in __init__
    assert issubclass(sub, sup)
AssertionError
>>> safe_cast(Subclass_(bool, int), True) + 2
3

So how does this work? Well, if you've got a value of type Subclass_[Sub, Super] in scope, then (unless you've done something really tricky) it must be the case that at some point, the assertion written in Subclass_.__init__ passed. This means that Sub is a subclass of Super, and so by the Liskov substitution principle you can use any value of type Sub in a context where a Super is expected.

To understand the technique in generality, let's refine this example and take a look at a few more.

Subclass

For the sake of usability, let's extend the code with three additional features:

  • Throw a custom exception instead of using assertions
  • Add the types as fields (for reasons explained below)
@dataclass(frozen=True)
class NotASubclass(Generic[Sub, Super], Exception):
    sub: Type[Sub]
    sup: Type[Super]

    def __str__(self) -> str:
        return f"{self.sub} is not a subclass of {self.sup}"


@final
@dataclass
class Subclass(Generic[Sub, Super]):

    sub: Type[Sub]
    sup: Type[Super]

    def __init__(self, sub: Type[Sub], sup: Type[Super]) -> None:
        self.sub = sub
        self.sup = sup
        if not issubclass(sub, sup):
            raise NotASubclass(sub, sup)

So why add the types as fields? It's not necessary, but it does extend the reach of our reasoning by enabling us to add "lemmas" like this:

T1 = TypeVar("T1")
T2 = TypeVar("T2")
T3 = TypeVar("T3")

def subclass_transitive(
        evidence1: Subclass[T1, T2],
        evidence2: Subclass[T2, T3]) -> Subclass[T1, T3]:
    return Subclass(evidence1.sub, evidence2.sup)

If you don't like the memory overhead, you can instead "axiomatize" lemmas, by returning a sentinel value such as Subclass[object, object] with a # type: ignore[return-value] comment.

SameType

Subclassing is a partial order on types, and in particular, it's antisymmetric. If two types are both subclasses of each other, they're the same type:

@final
@dataclass
class SameType(Generic[T1, T2]):

    ty: Type[T1]

    def __init__(self, evidence1: Subclass[T1, T2],
                 evidence2: Subclass[T2, T1]) -> None:
        self.ty = evidence1.sub

    # These would be necessary for writing any lemmas:

    def get1(self) -> Type[T1]:
        return self.ty

    def get2(self) -> Type[T2]:
        # Could also save the evidence and use safe_cast here
        return self.ty  # type: ignore[return-value]

Here, we don't need a custom exception type nor any assertion in the constructor, since both of these are "inherited" from Subclass.

Some ideas for lemmas:

  • SameType[T1, T2] implies Subclass[T1, T2]
  • SameType is symmetric
  • SameType is transitive

InTuple

Now for something a bit more involved. Recall PEP 585: Type Hinting Generics In Standard Collections. This PEP obviates typing.Dict, typing.List, typing.Tuple, and more by allowing us to write the corresponding types directly inside type signatures:python```python def repeat(i: int, n: int) -> list[int]: pass

Old style (equivalent):

from typing import List

def repeat_old(i: int, n: int) -> List[int]: pass


Crucially, types like `dict[int, str]` are also values:

``` python
>>> dict[int, str]
dict[int, str]

Moreover, their type arguments are available at runtime:

>>> dict[int, str].__args__
(<class 'int'>, <class 'str'>)

So what can we do with this? Here's a generic class InTuple[T, Tup]. It has two type parameters, the second of which is bounded above by Tuple[Any, ...]. If you can construct one, it means that the type T is one of the types in the tuple type Tup.

from typing import Any, Tuple

# For some reason, tuple[Any, ...] doesn't work here, even though it is valid as
# a value at the REPL.
Tup = TypeVar("Tup", bound=Tuple[Any, ...])
T = TypeVar("T")


@dataclass(frozen=True)
class NotInTuple(Generic[T, Tup], Exception):
    ty: Type[T]
    tup: Type[Tup]

    def __str__(self) -> str:
        return f"Type {self.ty} is not in tuple type {self.tup}"


@final
@dataclass
class InTuple(Generic[T, Tup]):
    """Witness that ``T`` is in ``Tup``."""

    ty: Type[T]
    tup: Type[Tup]

    def __init__(self, ty: Type[T], tup: Type[Tup]) -> None:
        self.ty = ty
        self.tup = tup
        # Mypy version 0.910 doesn't acknowledge that this attribute is defined,
        # but it is.
        for tup_ty in tup.__args__:  # type: ignore[attr-defined]
            if ty == tup_ty:
                return
        raise NotInTuple(ty, tup)

Let's try it out:

>>> InTuple(int, tuple[int, str])
InTuple(ty=<class 'int'>, tup=tuple[int, str])

>>> InTuple(int, tuple[str, list[str], bool])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blog/reify.py", line 44, in __init__
    raise NotInTuple(ty, tup)
__main__.NotIn: Type <class 'int'> is not in tuple type tuple[str, list[str], bool]

So what can we do with this? Well, probably lots of things! We can write a function that gets the first value of a given type out of a tuple, regardless of its index:python```python def first(evidence: InTuple[T, Tup], tup: Tup) -> T: for elem in tup: if isinstance(elem, evidence.ty): return elem assert False, f"Impossible: Bad 'InTuple' value {evidence=} {tup=}"


``` python
>>> first(InTuple(int, tuple[str, int, bool, int]), ("string", 5, False, 10))
5

This isn't so impressive when we construct a literal InTuple value as the first argument, but might be helpful in more polymorphic scenarios.

It's also worth noting that this function can be a bit unintuitive when combined with certain subclass relationships:

>>> issubclass(bool, int)
True
>>> first(InTuple(int, tuple[bool, int, str, int]), (False, 5, "string", 10))
False

It's worth noting that any of these predicates or relationships could be negated, simply by inverting the conditional and renaming the class. You could probably have a lemma that the negation of InTuple is reflexive, but I can't think of how you'd use it.

IsGenericArgument

In fact, parameters to all generic types are available at runtime:

>>> from typing import Mapping, Generic, TypeVar

>>> Mapping[int, str].__args__
(<class 'int'>, <class 'str'>)

>>> T = TypeVar("T")
>>> class G(Generic[T]):
...     pass
...
>>> G[int].__args__
(<class 'int'>,)

So we can handily generalize the previous example just by creating a new exception type, replacing the Tup variable with something unconstrained, and renaming the class to IsGenericArgument.

NewtypeOf

How can we tell if a given type is just a NewType wrapper around another one? Well, let's investigate:

>>> from typing import NewType
>>> Pos = NewType("Pos", int)
>>> Pos
<function NewType.<locals>.new_type at 0x7fa08729b4c0>
>>> dir(Pos)
[..., '__supertype__']
>>> Pos.__supertype__
<class 'int'>

Ah, that's just what we need!

from typing import Callable


@dataclass(frozen=True)
class NotANewtypeOf(Generic[Sub, Super], Exception):
    sub: Callable[[Super], Sub]
    sup: Type[Super]

    def __str__(self) -> str:
        return f'{getattr(self.sub, "__name__", "<unknown>")} is not a newtype of {self.sup}'


@final
@dataclass
class NewtypeOf(Generic[Sub, Super]):

    sub: Callable[[Super], Sub]
    sup: Type[Super]

    def __init__(self, sub: Callable[[Super], Sub], sup: Type[Super]) -> None:
        # Mypy has trouble with Callable fields...
        self.sub = sub  # type: ignore[assignment]
        self.sup = sup
        if not getattr(sub, "__supertype__") == sup:
            raise NotANewtypeOf(sub, sup)

Some ideas for lemmas:

  • Is this transitive?
  • NewtypeOf implies Subtype

Newtype

The above example also leads naturally to our first predicate or property of interest on types, rather than a relationship between types, namely, whether or not a given type is in fact a NewType.

@dataclass(frozen=True)
class NotANewtype(Generic[Sub, Super], Exception):
    sub: Callable[[Super], Sub]

    def __str__(self) -> str:
        return f'{getattr(self.sub, "__name__", "<unknown>")} is not a newtype'


@final
@dataclass
class Newtype(Generic[Sub, Super]):

    sub: Callable[[Super], Sub]

    def __init__(self, sub: Callable[[Super], Sub]) -> None:
        self.sub = sub  # type: ignore[assignment]
        if getattr(sub, "__supertype__") is None:
            raise NotANewtype(sub)

More Ideas

What other predicates on and relations between types do we care about? A few ideas:

  • Subclassable: Some types aren't! For example, <class 'function'> (try type(lambda: None) in the REPL).

  • IsAbstract, or NotAbstract: I don't know if it's possible to check this at runtime, but it would be cool if it were!

  • HasMetaclass: Check that a class has a given metaclass.

Do you have other ideas or use-cases? Drop me a line!

A quick tutorial on setoids in Rocq

A setoid is a type T and an equivalence relation (=) : T -> T -> Prop. Coq has some built-in automation facilities for working with setoids that can make your life a lot easier.

Building a setoid

Let's suppose we wanted to implement a quick-and-dirty version of finite sets on some type A. One option is to use list A, but lists have two properties that get in the way: ordering and duplication. Building a Setoid allows us to abstract away these features, and consider two lists equivalent if they have the same elements.

Context {A : Type}.

Definition equiv_list (l1 l2 : list A) : Prop :=
  ∀ a : A, In a l1 <-> In a l2.

To register our new equivalence with Coq and reap the benefits of automation, we make list A an instance of the Setoid typeclass:

Instance ListSetoid : Setoid (list A) :=
  { equiv := equiv_list }.
Proof.
  split.

At this point, we're dropped into proof mode and our goal is to prove that equiv_list is an equivalence relation. To see the full proof, see the Coq source for this post (we use some setoid-related automation to prove this goal).

Immediately, we gain access to a few tactics that work for normal equality, like reflexivity, symmetry (plus symmetry in [...]), and transitivity.

setoid_rewrite

The #1 benefit to using setoids is access to the setoid_rewrite tactic. Rewriting is one of the most powerful features of the tactic language, and using setoids allows us to expand its reach.

Suppose we have l1 l2 : list A. If we have l1 = l2 (where = is Coq's built-in equality), we can replace l1 with l2 in all contexts:

Lemma hd_eq (l1 l2 : list A) : l1 = l2 -> hd l1 = hd l2.
Proof.
  intros l1eql2.
  now rewrite l1eql2.
Qed.

This isn't true if we just know l1 = l2= (where == is notation for list_equiv). However, there are certain contexts in which they are interchangeable. Suppose further that A has decidable equality, and consider the following remove function:

Context {deceq : ∀ x y : A, {x = y} + {x ≠ y}}.

Fixpoint remove (x : A) (lst : list A) : list A :=
  match lst with
  | nil => nil
  | cons y ys =>
    match deceq x y with
    | left  _ => remove x ys
    | right _ => y :: remove x ys
    end
  end.

Since this removes all occurrences of x in lst, we should be able to prove

∀ x l1 l2, l1 == l2 -> remove x l1 = remove x l2

We state this lemma a bit oddly:

Instance removeProper : Proper (eq ==> equiv ==> equiv) remove.

This concisely states "given two equal inputs x y : A and two equivalent lists l1 l2 : list A, remove x l1 is equivalent to remove y l2". In other words, remove respects equality on A (trivially -- every Coq function respects equality) and the equivalence relation == (AKA equiv) on list A.

Once we have an Instance of Proper, we can use setoid_rewrite to replace replace remove x l1 with remove x l2, even when they appear nested in the goal statement:

Instance revProper : Proper (equiv ==> equiv) (@rev A).
Proof.
  [...]
Qed.

Lemma remove_rev_remove {a b : A} {l1 l2 : list A} :
  l1 == l2 -> rev (remove a l1) == rev (remove a l2).
Proof.
  intros H.
  now setoid_rewrite H.
Qed.

We can even compose Proper instances automatically:

Require Import Coq.Program.Basics.
Local Open Scope program_scope.

Instance rev_remove_Proper : ∀ a, Proper (equiv ==> equiv) (@rev A ∘ remove a).
Proof.
  apply _.
Qed.

The benefits of setoid rewriting continue to increase with the complexity of the goal.

About my notes

These are my working notes. They are not written with an audience in mind, but might be useful to some!

Amd64 System V ABI

Function call sequence

Registers

The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, R9.

Stack layout

On x86_64 with the SysV ABI, the stack grows "downwards" from high addresses to low. The end of the stack is initialized with the ELF auxiliary vector, and functions expect the following data to be available above their stack frame (i.e., just above the address in rsp), from high addresses to low:

  • Padding (if necessary)
  • Their stack-spilled arguments
  • The return address

(The end of the stack-spilled argument list must be 16-byte aligned, which may necessitate the use of padding, depending on the number of spilled arguments and the layout of the caller’s stack frame.)

The following diagram summarizes the stack frame layout:

High addresses

|---------------------|
| Caller's frame      |
|---------------------|
| Padding (if needed) |
|---------------------|
| Spilled argument n  |
|---------------------|
| ...                 |
|---------------------|
| Spilled argument 2  |
|---------------------|
| Spilled argument 1  |
|---------------------| <- %rsp + 8 (16-byte aligned)
| Return address      |
|---------------------| <- %rsp
| Callee's frame      |
|---------------------|

Low addresses

Helpful links:

Build systems

See also Make.

Build systems I have enjoyed

Build systems I gave up on

  • tup: can’t install via Nix
  • redo: interesting, but unmaintained

What to look for

Allow or require out-of-tree builds

Justification: Out-of-tree builds are occasionally necessary, for instance if your source tree is mounted on a read-only filesystem. More realistically, out-of-tree builds ensure that you and your build system are aware of all the files that get created and read during the build process, which aids in maintainability, source tree cleanliness, and accurate dependency tracking.

Do only necessary work

As much as your build system supports it, you should use dependency tracking to ensure that you don’t redo any build steps or other tasks.

Example: If package or file B depends on A and B changes, don’t re-compile, re-lint, or re-test A.

Pin external dependencies as precisely as possible

It’s nice to work with a range of versions of dependencies, but ensure that there is at least one known-good, fully-specified list of dependencies. This usually means a specific commit hash or tarball hash for each dependency.

Justification: This aids in reproducibility.

Example: Poetry and Cargo provides lock files that fully specify dependency versions. CMake and Pip do not. Cabal can be configured to provide and use a “freeze” file.

Use ecosystem-appropriate tooling

To the extent that it’s reasonable, use tooling that’s common in the ecosystem you’re targeting.

Justification: This aids in maintainability by making it easier for likely contributors (that is, people familiar with the ecosystem) to contribute. It can also aid in debugging, in that others may have seen the same issues you’re seeing before.

Example: When building C++, use Make, CMake, or Meson. Don’t use something super obscure.

Example: It’s not always reasonable to use pip or setuptools for Python, because they have notable flaws (e.g. pip install does extraneous work when the package hasn’t changed since the last call to pip install). When using Python on a project, it may be appropriate to consider a different build/packaging solution.

Utilize content-addressing for build products where possible

Justification: Content-addressing provides an extensional view of when dependencies change, leading to fewer rebuilds, i.e. speed.

Example: If package B depends on A and the README of A changes, but doesn’t affect the build products that B uses, don’t rebuild B.

Don’t modify the user’s system

The build system should only modify files inside the build directory. In particular, it shouldn’t install system-wide packages. There are certain exceptions, for instance it’s OK for Cabal’s Nix-style builds to install packages in ~/.cabal because they are hermetically built.

Justification: This aids portability, because your assumptions about developers’ (and CI systems’) machines will break.

Provide a source distribution

The build system should provide a target that packages all necessary source files into an archive so that that archive can be unpacked and built with just the files present plus those downloaded and managed by the build system.

Justification: This ensures all dependencies are accurately accounted for. For instance, a common unstated dependency is a bunch of git submodules. It also makes it easier to back up your sources.

Datalog

Overview

Datalog is a declarative logic programming language. A Datalog program is a collection of rules specifying how to deduce relations among elements of some domain from some input relations. For example, the following program specifies the relation reachable in terms of another relation edge (which could be an input to the program):

reachable(X, Y) :- edge(X, Y).
reachable(X, Y) :- edge(X, Z), reachable(Z, Y).

Operationally, a Datalog interpreter might take a CSV file or database table which specifies the edge relation, perform the computation/deductions specified by the program, and output the reachable relation as another CSV file or database table.

Datalog is related to Prolog, which is sort of a less declarative and also Turing-complete version of Datalog.

Join order

The join order is the order in which the many tables involved in a rule are traversed. For example, ignoring the presence of indices, the rule

r(a, c) :- p(a, b), q(b, c)

will have to traverse the tables containing relations p and q to compute r.

The join order of a rule can have a sizable impact on its performance. For Soufflé, the join order is determined (mostly!) by the syntactic order of the clauses. Consider the following example in Soufflé:

r(a, c) :- p(a, b), s(b), q(b, c)

If p has many tuples, but q has less and s has even fewer, then a better ordering would be

r(a, c) :- s(b), q(b, c), p(a, b).

because (again, ignoring indices) the above rule would essentially be evaluated like the following nested for-loop:

for b in s:
    for (b0, c) in q:
        if b == b0:
            for (a, b1) in p:
                if b == b1:
                    r.insert(a, c)

Essentially, you want the most selective conditions to appear first, so that the innermost loops are executed the fewest number of times.

To get a feel for join order, it can be helpful to examine Soufflé’s imperative RAM IR.

Debugging

Tips

  • Try set -x in Bash
  • Use strace

Tools

creduce

creduce

apt-get update && apt-get install -y creduce
creduce --sllooww --timeout 60 --n $(nproc) reduce.sh file.c

halfempty

halfempty

git clone --branch=master --depth=1 --single-branch https://github.com/googleprojectzero/halfempty
cd halfempty
apt-get update -q
apt-get install -q -y libglib2.0-dev
make -j
#!/usr/bin/env bash

set -eux

result=1
trap 'exit ${result}' EXIT TERM ALRM

if grep interesting /dev/stdin; then
    result=0 # We want this input
else
    result=1 # We don't want this input
fi
exit "${result}"
./halfempty/halfempty --stable --gen-intermediate --output=halfempty.out test.sh input-file

Further reading

Error handling

Make

More generally, see Build systems.

Snippets

Adding a suffix to a list of files

BINS := $(SRC:=.bin)

Depending on the Makefile

# All rules should depend on the Makefile itself
THIS = $(abspath $(lastword $(MAKEFILE_LIST)))

Listing (phony) targets

See above for $(THIS).

.PHONY: help
help:
 @printf 'Targets:\n'
 @grep '^\.PHONY: ' $(THIS) | cut -d ' ' -f2

Setting the shell

SHELL = bash
.SHELLFLAGS = -euo pipefail

Useful functions

  • addprefix
  • addsuffix
  • dir: Like Bash’s dirname
  • notdir: Like Bash’s basename
  • subst

Automatic variables

$@

Q. In GNU Make, what does the $@ variable denote?

A. The target.

$<

Q. In GNU Make, what does the $< variable denote?

A. The first prerequisite.

$?

Q. In GNU Make, what does the $? variable denote?

A. All prerequisites that are newer than the target.

$^

Q. In GNU Make, what does the $^ variable denote?

A. All the prerequisites.

Markdown

Previewing Markdown

redcarpet --parse fenced_code_blocks my-file.md > my-file.html
# Continuously:
echo "my-file.md" | entr -c -s "redcarpet --parse fenced_code_blocks my-file.md > my-file.html"

Spoiler text

<details>
<summary>Spoiler warning</summary>

Spoiler text. Note that it's important to have a space after the summary tag.

```sh
echo "code block"
```

Tools

Python

Options and environment variables

  • -O tells CPython to disable assertions.
  • PYTHONDEVMODE tells CPython to enable some special checks
  • PYTHONWARNINGS enables different warning controls.

Snippets

Script template

#!/usr/bin/env python3

"""Description goes here"""

from argparse import ArgumentParser
from pathlib import Path
from sys import exit, stderr


def eprint(*args, **kwargs):
    print(*args, file=stderr, **kwargs)


def die(msg: str, /) -> None:
    eprint(msg)
    exit(1)


def go(paths: list[Path], /, *, flag: bool = False):
    pass


parser = ArgumentParser(description=__doc__)
parser.add_argument("--flag", action="store_true")
parser.add_argument("paths", nargs="+", type=Path)
args = parser.parse_args()
go(args.paths, flag=args.flag)

Using Enum with argparse

@unique
class Format(str, Enum):
    """Export format"""

    TEXT = "text"
    CSV = "csv"

    def __str__(self) -> str:
        """For inclusion in --help"""
        return self.value

parser.add_argument('--format', type=Format, help=Format.__doc__, choices=list(Format), default=Format.TEXT)

Rust

Performance

Allocators

Build Configuration - The Rust Performance Book

cargo add mimalloc
#![allow(unused)]
fn main() {
#[cfg(not(miri))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
}

Cargo profiles

Profiles - The Cargo Book

# https://nnethercote.github.io/perf-book/build-configuration.html
[profile.release]
codegen-units = 1
lto = "fat"

# https://github.com/mstange/samply#turn-on-debug-info-for-full-stacks
[profile.profiling]
inherits = "release"
debug = true

Tools

See also this section of the Rust Performance Book.

dhat

dhat

Quickstart:

cargo add --optional dhat

In Cargo.toml:

[features]
dhat = ["dep:dhat"]

In main.rs:

#[cfg(feature = "dhat-heap")]
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

Then:

cargo build --profile=profiling --features dhat -- your args here

Safety

cargo-careful

cargo-careful

cargo careful is a tool to run your Rust code extra carefully – opting into a bunch of nightly-only extra checks that help detect Undefined Behavior, and using a standard library with debug assertions.

cargo +nightly install cargo-careful
cargo +nightly fetch
cargo +nightly careful build -Zcareful-sanitizer=address --target=x86_64-unknown-linux-gnu --tests --frozen
cargo +nightly careful test -Zcareful-sanitizer=address --target=x86_64-unknown-linux-gnu --frozen

Scudo

Scudo

cargo add scudo
#![allow(unused)]
fn main() {
#[cfg(all(not(miri), debug_assertions))]
#[global_allocator]
static ALLOC: scudo::GlobalScudoAllocator = scudo::GlobalScudoAllocator;
}

Software engineering

See also

Notes

Basic things

https://matklad.github.io/2024/03/22/basic-things.html

After working on the initial stages of several largish projects, I accumulated a list of things that share the following three properties:

  • they are irrelevant while the project is small,
  • they are a productivity multiplier when the project is large,
  • they are much harder to introduce down the line.
  • READMEs
  • Developer docs
    • Process docs (branching, code review)
    • Style guide
  • User docs/website
  • Internal website (metrics, coverage, etc.)
  • Build and CI
  • Testing
  • Benchmarking
  • Fuzzing
  • Ad-hoc automation
  • Releases

Testing

DSL-based integration tests

My favorite kind of test.

Some examples:

Property-based testing

  • Really stupid properties:
    • Is the output nontrivial (e.g. nonempty)?
    • Does the program not crash (i.e., fuzzing)?
    • Does the program satisfy its assertions/invariants/contracts?
  • Property-preserving transformations: reversing a list doesn’t change the max.
  • Be inspired by math:
    • Test commutativity/intertwining of functions/methods
    • Test inverse relationships/functions
    • Test idempotency
    • Test structurally inductive properties
  • Differential testing: equivalence with a simplified implementation or model

See:

Obscure types of testing

Fault injection

Fault injection is a software testing technique that involves inducing failures (“faults”) in the functions called by a program. If the callee has failed to perform proper error checking and handling, these faults can result in unreliable application behavior or exploitable vulnerabilities.

https://github.com/trailofbits/krf

Metamorphic testing

Metamorphic testing relates multiple executions or input/output pairs of the same code. For example, doubling all the elements of a list also doubles the mode. It is useful in combination with property testing.

Mutation testing

Making automated changes to the code to see if the test suite will catch them. “Testing tests” as it were.

Particular types of system under test

REST

Metamorphic Testing of RESTful Web APIs

Abstract

Web Application Programming Interfaces (APIs) allow systems to interact with each other over the network. Modern Web APIs often adhere to the REST architectural style, being referred to as RESTful Web APIs. RESTful Web APIs are decomposed into multiple resources (e.g., a video in the YouTube API) that clients can manipulate through HTTP interactions. Testing Web APIs is critical but challenging due to the difficulty to assess the correctness of API responses, i.e., the oracle problem. Metamorphic testing alleviates the oracle problem by exploiting relations (so-called metamorphic relations) among multiple executions of the program under test. In this paper, we present a metamorphic testing approach for the detection of faults in RESTful Web APIs. We first propose six abstract relations that capture the shape of many of the metamorphic relations found in RESTful Web APIs, we call these Metamorphic Relation Output Patterns (MROPs). Each MROP can then be instantiated into one or more concrete metamorphic relations. The approach was evaluated using both automatically seeded and real faults in six subject Web APIs. Among other results, we identified 60 metamorphic relations (instances of the proposed MROPs) in the Web APIs of Spotify and YouTube. Each metamorphic relation was implemented using both random and manual test data, running over 4.7K automated tests. As a result, 11 issues were detected (3 in Spotify and 8 in YouTube), 10 of them confirmed by the API developers or reproduced by other users, supporting the effectiveness of the approach.

Tools

Further reading

Tools

Inspired by https://jyn.dev/tools/.

Basics

I use these every day.

Custom

Little utilities I wrote.

  • copy and paste: copy and paste from the shell, on Linux or macOS
  • notify: send a notification on Linux or macOS
  • pick: pick something at the terminal (just some fzf flags)

Debugging

See also Debugging.

zsh