Index
Researcher with experience in functional programming, formal methods, programming languages, and security. I work at Galois.
- Email://langston.barrett at gmail
- Github://langston-barrett
The statements and views posted here are my own and do not reflect those of my employer.
Projects
Highlighted Projects
- Lesshand: A shorthand for the 21st century
- treereduce: A fast, parallel, syntax-aware test case reducer based on tree-sitter grammars
- tree-splicer: Simple grammar-based test case generator (black-box fuzzer)
Datalog
- cclyzer++: a precise and scalable global pointer analysis for LLVM code
- dlsubst: Experimental bottom-up Datalog engine based on explicit substitutions
- Duckalog: Datalog engine based on DuckDB
- rdf-star-souffle: Toolkit for importing and reasoning about RDF-star data in Soufflé
- souffle-lint: A linter for Soufflé Datalog
- treeedb: Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
- tree-sitter-souffle: A tree-sitter grammar for Soufflé Datalog
Fuzzing
- czz: Whole-program, Scheme-scriptable, multi-language, coverage-guided fuzzer
- radamsa-sys: Rust bindings for Radamsa
- treereduce: A fast, parallel, syntax-aware test case reducer based on tree-sitter grammars
- tree-crasher: Easy-to-use grammar-based black-box fuzzer
- tree-splicer: Simple grammar-based test case generator
Tiny tools
- drice: Dr. Ice diagnoses internal compiler errors (ICEs) in rustc
- Icemelter: Icemelter automates steps in debugging rustc internal compiler errors (ICEs)
- linkup: Automatically add links to Markdown files
- Marki: Generate Anki cards from Markdown notes
- mdlynx: Small, fast tool to find broken file links in Markdown documents
- ttlint: tiny text linter
Tiny libraries
- fin-part-ord: Crate for representing finite partial orders
- tree-sitter-edit: A crate for printing modified tree-sitter parse trees, intended for use in multi-language code refactoring, linting, or modification (codemod) tools
- twjsrs: Crate for (de)serializing TiddlyWiki tiddlers to and from their JSON format
Other
- anki-wikidata: Create high-quality Anki cards with data from Wikidata
- contract.el: Racket-style higher-order contracts for Emacs Lisp
- coq-big-o: A formalization of Big O, Big Theta, and more based on normed vector spaces
- nixpkgs-blight: Instrument the builds of tens of thousands of open-source packages
- Spacelix: Spacemacs-like configuration for Helix
- sql-cli: Dynamically generate CLIs from SQL databases that support CRUD operations
- write-yourself-a-scheme-in-agda: Like “Write Yourself a Scheme in 48 Hours”, but in Agda
- zsh-contextual-abbrevs: Supercharged context-dependent aliases for ZSH that save you keystrokes!
Writing
- Publications:
- Bring Your Own Data Structures to Datalog (OOPSLA ’23, distinguished paper award)
- Galois Blog:
- MATE: Interactive Program Analysis with Code Property Graphs
- A Visual Guide to Pointer Analysis with cclyzer++: Part 1
- cclyzer++: Scalable and Precise Pointer Analysis for LLVM
- Under-Constrained Symbolic Execution with Crucible
- Introducing GREASE: An Open-Source Tool for Uncovering Hidden Vulnerabilities in Binary Code
- Personal Blog:
Survey of static reasoning in Haskell
This post surveys a few techniques for achieving high confidence in Haskell code at type-checking time. (It will not cover any aspects of testing.) To compare the approaches, I'll implement a single interface with each of them.
The surveyed techniques are fairly complicated. I'll try to explain a
bit about each, but please refer to their documentation for additional
background. This post will also assume familiarity with a lot of
Haskell: existential quantification
(-XExistentialQuantification, -XGADTs, or
-XRank2Types), existential quantification with higher-rank
polymorphism (-XRankNTypes), safe coercions
(Data.Coerce), phantom types, constraint kinds
(-XConstraintKinds), GADTs, and data kinds
(-XDataKinds and -XPolyKinds).
This post probably should have included hs-to-coq, but I'm unfamiliar with it and perfection is the enemy of completion.
The Specification
The interface in question is for lists of key/value pairs, also known as association lists. For the sake of brevity, we'll only require four operations:
empty: An empty association listinsert: Takes a key and value and inserts the pair into the listlookup: Takes a key and returns a value associated with itfmap: The association list type must be a functor over the value type
The point is to do a broad-strokes comparison, so we'll adopt a very minimal specification, only requiring that
forall k v assoc. lookup k (insert k v assoc) == v
along with the usual functor laws. The asymptotics should be:
empty: O(1)insert: O(1)lookup: O(n)fmap f: O(n) ifftakes constant time
Vanilla Haskell
First, let's see what an implementation in "plain Haskell" might look like:
{-# LANGUAGE DeriveFunctor #-}
module Vanilla
( Assoc
, empty
, insert
, lookup
, lookupMaybe
) where
import Prelude hiding (lookup)
newtype Assoc k v = Assoc [(k, v)]
deriving Functor
empty :: Assoc k v
empty = Assoc []
insert :: k -> v -> Assoc k v -> Assoc k v
insert k v (Assoc assoc) = Assoc ((k, v):assoc)
lookup :: Eq k => k -> Assoc k v -> v
lookup k (Assoc assoc) =
case assoc of
[] -> error "Failure!"
((k', v):rest) ->
if k == k'
then v
else lookup k (Assoc rest)
Short and sweet! Using the API is pretty simple, too:
test :: String
test = lookup 2 (insert 2 "" empty)
But there's one problem: lookup calls error,
which can cause the program to fail at run-time. How can we gain more
assurance that a program written using this interface won't
unexpectedly fail? A common answer would be to reify the partiality in
the return type, e.g., with Maybe or Either:
lookupMaybe :: Eq k => k -> Assoc k v -> Maybe v
lookupMaybe k (Assoc assoc) =
case assoc of
[] -> Nothing
((k', v):rest) ->
if k == k'
then Just v
else lookupMaybe k (Assoc rest)
This amounts to passing the buck. Every caller of
lookupMaybe has to decide how to handle the
Nothing case; they can choose to "infect" their own
return type with partiality, or to risk a run-time error by using a
partial function like fromJust :: Maybe a -> a.
In general, API authors choose between four options to deal with partiality:1
- Run-time failure, as in
lookup - Returning a "default" value, as in a division function that returns zero when given a zero divisor
- Reifying the partiality in the return type, as in
lookupMaybe - Restricting the function's domain, e.g.,
head :: NonEmpty a -> a
What if we want to the avoid runtime failure in (1), the ambiguity of
(2), and give callers more flexibility than (3)? (In the
lookup example, it's not even clear that (4) is at all
applicable.) Well hold on to your hats, because there are at least four
different ways to do it and they all involve quite a bit of type-system
tomfoolery.
Smart Constructors and the Existential Trick
Types are the main tool that Haskell programmers use to get static
guarantees about their programs. But type signatures like
Int -> Bool encode statements about sets of values (in
this case, this is a function that will take any Int as
an input), whereas the precondition of lookup is a
relationship between the particular values of the key and association
list. lookup requires that this key is present in this
specific association list. We need some way to attach unique type-level
names to each association list, and to tie keys to particular names.
First, we'll add a type variable name to both parameters
that acts as a unique identifier for the association list.
Assoc name k v is an association list with name
name, and Key name k is a key into the
association list with name name.
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ExistentialQuantification #-}
module Smart
( Assoc
, Key
, SomeAssoc(..)
, empty
, insert
, isKey
, lookup
) where
import Prelude hiding (lookup)
newtype Assoc name k v = Assoc [(k, v)]
deriving Functor
newtype Key name k = Key k
The question is now: How do we make sure each new Assoc has
a fresh name and can't be confused with any other
Assoc? The trick that makes this work is existential
quantification. Consider the type Something:
data Something = forall a. Something a
Say that I give you two Something values, and you pattern
match on them:
you :: Something -> Something -> Int
you (Something x) (Something y) = _
me :: Int
me = you (Something ()) (Something "foo")
What can you do with the data I've given you? In short, pretty much
nothing. You can't use x and y in any
meaningful way, they have completely unknown and unrelated types (GHC
will name their types something like a and
a1). In other words, we've given x and
y unique type-level labels. The same trick will work for
Assoc!
data SomeAssoc k v = forall name. SomeAssoc (Assoc name k v)
empty :: SomeAssoc k v
empty = SomeAssoc (Assoc [])
Since we didn't export the constructor of Assoc but we
did export that of SomeAssoc, the only way to make a new
association list is to pattern match on the existentially-quantified
empty value, which invents a new type-level
name for each Assoc.
insert and lookup much the same, modulo
slightly changed types:
insert :: k -> v -> Assoc name k v -> Assoc name k v
insert k v (Assoc assoc) = Assoc ((k, v):assoc)
lookup :: Eq k => Key name k -> Assoc name k v -> v
lookup (Key k) (Assoc assoc) =
case assoc of
[] -> error "Impossible"
((k', v):rest) ->
if k == k'
then v
else lookup (Key k) (Assoc rest)
It looks like lookup is still partial, but by carefully
controlling how clients get a Key, we can ensure that it's
total. And the only way we'll let them do so is with isKey
(note that the constructor of Key, like that of
Assoc, is not exported):
isKey :: Eq k => k -> Assoc name k v -> Maybe (Key name k)
isKey k (Assoc assoc) =
case assoc of
[] -> Nothing
((k', _):rest) ->
if k == k'
then Just (Key k)
else isKey k (Assoc rest)
It's basically identical to lookup: it searches through
the list and only yields a Key if k really is
a key in the list. Since this is the only way to get a Key,
every lookup is guaranteed to succeed.
Here's an example usage:
test :: String
test =
case empty of
SomeAssoc emp ->
let assoc = insert 2 "" emp
in case isKey 2 assoc of
Just k -> lookup k assoc
Nothing -> error "not a key"
Note that while isKey introduces a Maybe, the
actual lookup operation doesn't. You can, e.g., store keys
in a list and look them up later, without introducing partiality at the
lookup. This still isn't ideal; one way to work around it in some cases
would be to have insert return a Key.
GADTs, or Dependent Types
Dependent types are (roughly speaking) types that can mention values. Haskell doesn't have "full" dependent types, but GADTs make the type system quite expressive. Let's see what we can do with them!
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeOperators #-}
module GADT
( Assoc
, NatRepr(..)
, Key
, empty
, insert
, isKey
, lookup
) where
import Prelude hiding (lookup)
import Data.Kind (Type)
import Data.Type.Equality (TestEquality(testEquality), (:~:)(Refl))
import GHC.TypeLits (Nat, type (+))
For the sake of concreteness, we'll fix the keys to be natural numbers:
data NatRepr (n :: Nat) where
Zero :: NatRepr 0
Succ :: NatRepr n -> NatRepr (n + 1)
NatRepr is a singleton type meaning each value has a
unique type. For example, Zero :: NatRepr 0 and
Succ Zero :: NatRepr 1.
Just as lookup had an Eq constraint above,
NatRepr needs a TestEquality instance, which
is similar but for
singletons:
instance TestEquality NatRepr where
testEquality n m =
case (n, m) of
(Succ _, Zero) -> Nothing
(Zero, Succ _) -> Nothing
(Zero, Zero) -> Just Refl
(Succ n', Succ m') ->
case testEquality n' m' of
Just Refl -> Just Refl
Nothing -> Nothing
The Assoc type carries a type-level list of its keys:
data Assoc (keys :: [Nat]) (a :: Type) where
Nil :: Assoc '[] a
Cons :: NatRepr n -> a -> Assoc keys a -> Assoc (n ': keys) a
Because this is a GADT, we can no longer derive Functor, we
have to write it ourselves:
instance Functor (Assoc keys) where
fmap f =
\case
Nil -> Nil
Cons n a rest -> Cons n (f a) (fmap f rest)
empty has an empty type-level list of keys:
empty :: Assoc '[] a
empty = Nil
insert adds one key to the front of the type-level list. In
fact, insert is just Cons:
insert :: NatRepr n -> a -> Assoc keys a -> Assoc (n ': keys) a
insert = Cons
As in the smart constructor approach, we use a new type Key
to track the relationship between a key and an association list. Unlike
the smart constructor approach, we don't have to trust the
implementation of isKey; the types force it to be correct.
data Key (n :: Nat) (keys :: [Nat]) where
KeyHere :: Key n (n ': l)
KeyThere :: Key n keys -> Key n (m ': keys)
isKey :: NatRepr n -> Assoc keys a -> Maybe (Key n keys)
isKey n assoc =
case assoc of
Nil -> Nothing
Cons m _ rest ->
case testEquality n m of
Just Refl -> Just KeyHere
Nothing ->
case isKey n rest of
Nothing -> Nothing
Just key -> Just (KeyThere key)
In lookup, GHC's pattern match checker can tell that if
there's a Key n keys, then the type-level list
keys can't be empty, so there's no need to handle the
Nil case:
lookup :: Key n keys -> Assoc keys a -> a
lookup key assoc =
case (key, assoc) of
(KeyHere, Cons _ a _) -> a
(KeyThere key', Cons _ _ rest) -> lookup key' rest
Here's an example usage:
test :: String
test =
case isKey two assoc of
Just k -> lookup k assoc
Nothing -> error "not a key"
where
two = Succ (Succ Zero)
assoc = insert two "" empty
Ghosts of Departed Proofs
The Ghosts of Departed Proofs (GDP) paper (PDF) outlined several insights that supercharge the smart constructor/existential trick above.
It's worth noting that I'm not an expert in GDP.
GDP Infrastructure
The following introduction to the core GDP infrastructure is fairly
cursory, please check out the paper for more information. I'm defining
all this here to make this post self-contained, but the combinators in
this module are available in the gdp package on
Hackage.
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE DefaultSignatures #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}
module GDP.Named
( Named
, name
, the
, SuchThat
, suchThat
, Defn
, Proof
, axiom
, type (-->)
, implElim
) where
import Prelude hiding (lookup)
import Data.Coerce (Coercible, coerce)
First, the Named type and name function
generalize the existential introduction of type-level names for values,
using an equivalence between rank-2 types (the forall in
the second parameter to name) and existential types:
-- | Called @~~@ in GDP.
newtype Named name a = Named a
deriving Functor
name :: a -> (forall name. Named name a -> t) -> t
name x k = coerce k x
{-# INLINE name #-}
To extract a Named value, clients use the:
class The x a | x -> a where
the :: x -> a
default the :: Coercible x a => x -> a
the = coerce
instance The (Named name a) a where
Modules can give type-level names to functions using Defn
(demonstrated later):
data Defn = Defn
type Defining name = (Coercible Defn name, Coercible name Defn)
Type-level propositions are called Proof, and can be
manipulated with axioms like implElim:
data Proof a = Proof -- called QED in GDP
axiom :: Proof a
axiom = Proof
data p --> q
implElim :: Proof (p --> q) -> Proof p -> Proof q
implElim _ _ = axiom
Type-level propositions can be attached to values with
SuchThat (which can also be cast away with
the):
-- | Called @:::@ in GDP.
newtype SuchThat p a = SuchThat a
deriving Functor
instance The (SuchThat p a) a where
suchThat :: a -> Proof p -> SuchThat p a
suchThat x _ = coerce x
GDP Implementation of Association Lists
Now let's take a look at using those tools to provide more assurance
for the Assoc API. Happily, we can directly reuse
Assoc and empty from the vanilla Haskell
implementation!
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ExplicitNamespaces #-}
{-# LANGUAGE RoleAnnotations #-}
{-# LANGUAGE TypeOperators #-}
module GDP
( Assoc
, Vanilla.empty
, IsKey
, Insert
, insert
, lookup
) where
import Prelude hiding (lookup)
import qualified GDP.Named as GDP
import GDP.Named (type (-->))
import GDP.Maybe (IsJust)
import qualified GDP.Maybe as GDP
import qualified Vanilla as Vanilla
import Vanilla (Assoc)
Instead of a separate Key type, we create a type-level
proposition IsKey. A
GPD.Proof (IsKey kn assocn) means that the key of type
GDP.Named kn k is present in the association list of type
GDP.Named assocn (Assoc k v).
-- | @kn@ is a key of association list @assocn@
data IsKey kn assocn
type role IsKey nominal nominal
lookup takes a key with an additional IsKey
proof as a parameter and then just shells out to the underlying vanilla
Haskell lookup implementation. The return type and the
Lookup newtype will be explained in just a moment.
-- | Type-level name of 'lookup' for use in lemmas.
newtype Lookup kn assocn = Lookup GDP.Defn
type role Lookup nominal nominal
lookup ::
Eq k =>
GDP.SuchThat (IsKey kn assocn) (GDP.Named kn k) ->
GDP.Named assocn (Assoc k v) ->
GDP.Named (Lookup kn assocn) v
lookup k assoc =
GDP.defn (Vanilla.lookup (GDP.the (GDP.the k)) (GDP.the assoc))
insert works much the same way, but it doesn't have any
preconditions and so doesn't use SuchThat:
-- | Type-level name of 'insert' for use in lemmas.
newtype Insert kn vn assocn = Insert GDP.Defn
type role Insert nominal nominal nominal
insert ::
GDP.Named kn k ->
GDP.Named vn v ->
GDP.Named assocn (Assoc k v) ->
GDP.Named (Insert kn vn assocn) (Assoc k v)
insert k v assoc =
GDP.defn (Vanilla.insert (GDP.the k) (GDP.the v) (GDP.the assoc))
So how do clients construct a
GDP.SuchThat (IsKey kn assocn) (GDP.Named kn k) to pass to
lookup? One of the nice parts of GDP is that library
authors can determine how clients reason about propositions they define,
like IsKey. In particular, library authors can introduce
type-level names for their API functions like Lookup and
Insert, and export axioms that describe how the API relates
to the propositions.
One way to know that a key is in a map is if we just inserted that key. We can express this fact by exporting an axiom:
insertIsKey :: GDP.Proof (IsKey kn (Insert kn vn assocn))
insertIsKey = GDP.axiom
With that lemma, here's an example use of this safe API:
test :: String
test =
GDP.name Vanilla.empty $ \emp ->
GDP.name (2 :: Int) $ \k ->
GDP.name "" $ \v ->
GDP.the $
lookup
(k `GDP.suchThat` insertIsKey)
(insert k v emp)
We can also conclude IsKey when lookupMaybe
returns a Just value, kind of like what isKey
did in previous examples:
-- | Type-level name of 'lookupMaybe' for use in lemmas.
newtype LookupMaybe kn assocn = LookupMaybe GDP.Defn
lookupMaybe ::
Eq k =>
GDP.Named kn k ->
GDP.Named assocn (Assoc k v) ->
GDP.Named (LookupMaybe kn assocn) (Maybe v)
lookupMaybe k assoc =
GDP.defn (Vanilla.lookupMaybe (GDP.the k) (GDP.the assoc))
lookupMaybeKey ::
GDP.Proof (IsJust (LookupMaybe kn assocn) --> IsKey kn assocn)
lookupMaybeKey = GDP.axiom
test2 :: String
test2 =
GDP.name Vanilla.empty $ \emp ->
GDP.name (2 :: Int) $ \k ->
GDP.name "" $ \v ->
GDP.the $
let assoc = insert k v emp
in case GDP.maybeCase (lookupMaybe k assoc) of
GDP.IsNothing _ -> error "impossible"
GDP.IsJust isJustPf _ ->
lookup
(k `GDP.suchThat` GDP.implElim lookupMaybeKey isJustPf)
assoc
where IsJust and maybeCase are defined in
GDP.Maybe:
{-# LANGUAGE ExplicitNamespaces #-}
{-# LANGUAGE TypeOperators #-}
module GDP.Maybe
( IsNothing
, IsJust
, MaybeCase(IsNothing, IsJust)
, maybeCase
) where
import qualified GDP.Named as GDP
data IsJust name
data IsNothing name
data MaybeCase name a
= IsNothing (GDP.Proof (IsNothing name))
| IsJust (GDP.Proof (IsJust name)) a
-- | Called @classify@ in GDP.
maybeCase :: GDP.Named name (Maybe a) -> MaybeCase name a
maybeCase x =
case GDP.the x of
Nothing -> IsNothing GDP.axiom
Just a -> IsJust GDP.axiom a
Such code could clearly be defined once and for all in a library somewhere, and probably even generated by Template Haskell.
Liquid Haskell
Liquid Haskell adds refinement types to Haskell, which are kind of like dependent types. It operates as a GHC type-checking plugin, and doesn't affect the runtime semantics of Haskell. I'm no Liquid Haskell expert, but luckily one of the case studies in the documentation is on association lists. What follows is a slight adaptation.
{-# LANGUAGE LambdaCase #-}
module Liquid
( Assoc
, empty
, insert
, lookup
) where
import Prelude hiding (lookup)
import Data.Set (Set)
import qualified Data.Set as Set
The Assoc type is isomorphic to the vanilla implementation:
data Assoc k v
= Nil
| Cons k v (Assoc k v)
keys retrieves the set of keys in the association list.
It's only used at the type level. Using keys, we can
refine the type of empty to state that empty
has no keys. Types like { v : T | C v } should be read like
"values of type T that additionally satisfy constraint
C". The base type T can almost always be
omitted, as it can be inferred from the vanilla-Haskell type signature.
{-@ measure keys @-}
keys :: Ord k => Assoc k v -> Set k
keys =
\case
Nil -> Set.empty
Cons k _ rest -> Set.union (Set.singleton k) (keys rest)
{-@ empty :: { v : _ | keys v == Set.empty } @-}
empty :: Assoc k v
empty = Nil
Similarly, the function addKey is used to refine the type
of insert:
{-@ inline addKey @-}
addKey :: Ord k => k -> Assoc k v -> Set k
addKey k kvs = Set.union (Set.singleton k) (keys kvs)
{-@ insert :: k : _ -> _ -> assoc : _ -> { v : _ | keys v = addKey k assoc } @-}
insert :: k -> v -> Assoc k v -> Assoc k v
insert = Cons
Finally, has is used to express the precondition on
lookup:
{-@ inline has @-}
has :: Ord k => k -> Assoc k v -> Bool
has k assoc = Set.member k (keys assoc)
{-@ lookup :: assoc : _ -> k : { key : _ | has key assoc } -> v @-}
lookup :: Eq k => Assoc k v -> k -> v
lookup assoc k =
case assoc of
Nil -> error "Impossible"
Cons k' v rest ->
if k == k'
then v
else lookup rest k
Note that the order of parameters to lookup has to be
flipped so that assoc is in scope in the refinement type
for k.
Using this API looks almost exactly like it does for the vanilla implementation:
test :: String
test = lookup (insert 2 "" empty) 2
The type-checker will catch invalid calls to lookup:
src/Liquid.hs:58:36-45: error:
Liquid Type Mismatch
.
The inferred type
VV : {v : GHC.Types.Int | v == 3}
.
is not a subtype of the required type
VV : {VV : GHC.Types.Int | Set_mem VV (Liquid.keys ?d)}
.
in the context
?b : {?b : (Liquid.Assoc GHC.Types.Int [GHC.Types.Char]) | Liquid.keys ?b == Set_empty 0}
?d : {?d : (Liquid.Assoc GHC.Types.Int [GHC.Types.Char]) | Liquid.keys ?d == Set_cup (Set_sng 2) (Liquid.keys ?b)}
|
58 | test2 = lookup (insert 2 "" empty) (3 :: Int)
| ^^^^^^^^^^
Conclusions
Smart Constructors and the Existential Trick
Smart constructors and the existential trick are likely the most
accessible of all of the above techniques. They require only a single
fairly conservative extension to Haskell 2010
(-XExistentialQuantification or -XRank2Types).
However, the safety argument relies on the correct implementation of the
newtype abstraction by the library author; in this case it
would be easy enough to accidentally export the Key
constructor or provide some other way of constructing a Key
that violates the intended invariant. Also, a new newtype
and smart constructor generally have to be custom-built for each desired
safety property.
GADTs, or Dependent Types
Dependent types significantly reduce the possibility of programmer
error. Once appropriate definitions for Assoc and
Key are found, it is impossible for the library author to
mis-implement isKey. Like smart constructors and the
vanilla Haskell code, they require no external tooling nor libraries.
However, programming with dependently-typed techniques has its
downsides:
- It requires knowledge of lots of complex GHC extensions like GADTs,
type families (
-XTypeFamilies), data kinds/polykinds, and rank-n types. - Since types carry proof-relevant information (such as the
keystype index on theAssoctype), you can end up with a ton of slight variations on a single interface depending on the properties you need to prove. - It's hard to write dependently-typed code, so in practice you can
end up with inefficient code due to the difficulty of proving
efficient code correct. This is pretty much why this post focused on
association lists instead of a more efficient map structure: I think
it would probably be pretty hairy to write one up with GADTs.
Consider also that
Keyis linearly sized in the size of the association map. - GADTs are not well-supported by existing infrastructure, so we
can't e.g. derive
TestEqualitynorFunctorinstances, we have to write them ourselves. It's also generally impossible even to hand-write instances for hugely useful typeclasses likeGenericandData. In fact, many important type classes relating to singletons aren't even defined inbase(see, e.g., the typeclasses inparameterized-utils). Libraries likesingletonsandparameterized-utilsdo have Template Haskell techniques for generating instances for certain of these classes. - Error messages involving dependently typed Haskell extensions continue a venerable tradition of flummoxing learners.
I'm very familiar with dependent types, which likely partially accounts for why this list is so extensive.
GDP
The GDP approach is remarkably flexible. The ability to define arbitrary
lemmas about an API gives the API consumers a huge amount of power. For
example, it was possible to provide two ways to produce an
IsKey proof without changing the actual implementation.
Furthermore, the GDP approach was able to directly reuse the vanilla
Haskell implementation by simply providing wrappers with more
interesting type signatures.
On the other hand, understanding the GDP approach requires a good deal of study. GDP also requires trusting the library author to provide sound reasoning principles. The GDP paper outlines a few possible approaches (including use of Liquid Haskell) to improve confidence in the exported lemmas.
Liquid Haskell
Liquid Haskell was a bit difficult to install, some of the project's
packages had too-restrictive bounds (resulting in Cabal being unable to
find a build plan) whereas others had too-permissive bounds (resulting
in compilation failures when Cabal did find a plan). Generally, Liquid
Haskell is the unfortunate combination of an impressive amount of
infrastructure and code that's maintained mostly by academics. They've
done a great job of making it fairly easy to get started, but I
wouldn't rely on it for a commercial project for mundane reasons like
not being able to upgrade to the latest GHC until the Liquid Haskell
team releases a corresponding version of liquid-base.
Liquid Haskell sometimes requires you to write code in a particular way,
rejecting semantically equivalent (and arguably, clearer) rewrites. For
instance, I originally tried to write keys like so:
{-@ measure keys @-}
keys :: Ord k => Assoc k v -> Set k
keys =
\case
Nil -> Set.empty
Cons k _ rest -> Set.insert k (keys rest)
That didn't work, Liquid Haskell said
Unbound symbol Data.Set.Internal.insert. So I tried to
factor the insert-via-union into a where binding for the
sake of clarity like so:
{-@ measure keys @-}
keys :: Ord k => Assoc k v -> Set k
keys =
\case
Nil -> Set.empty
Cons k _ rest -> setInsert k (keys rest)
where setInsert x = Set.union (Set.singleton x)
This resulted in
Uh oh. Cannot create measure 'Liquid.keys': Does not have a case-of at the top-level. If I understand correctly, this is
probably because Liquid Haskell analyzes code at the GHC Core level, so
it's sensitive to changes that look similar at the source level but
compile to different Core. Such errors are annoying, but I was able to
find workarounds whenever they came up. Just like "dependent Haskell",
Liquid Haskell error messages leave a lot to be desired.
Like dependently typed Haskell and GDP, actually understanding Liquid Haskell would take at least several hours if not days of study. That being said, I was able to get started with this example in maybe an hour or so using their documentation.
Aside from the type signatures, the Liquid Haskell implementation looks pretty much exactly like the "vanilla" implementation. Remarkably, the API is also almost identical, and users could choose whether or not to use (and understand!) Liquid Haskell when consuming the API, unlike in the other techniques.
As in the dependently-typed approach, less trust in the library author is required.
Like GDP, I believe that Liquid Haskell provides the ability to export lemmas for consumers to use in constructing proofs. Unlike GDP, I believe all such lemmas can be machine-checked.
Comparison and Takeaways
The following table summarizes the above remarks. A table can't have much nuance and I'm not an expert in GDP nor Liquid Haskell, so the ratings should be taken with a hefty grain of salt.
Key:
- Trust: How much you need to trust the library author, i.e., to what degree are the claims of API safety machine-checked? (Lower is better.)
- Knowledge: How much do you need to learn to effectively apply this technique? (Lower is better.)
- Flexibility: How much flexibility do library consumers have to prove that they're using the API safely? (Higher is better.)
- Infrastructure: How much infrastructure is necessary to make the approach work? (Lower is better.)
- Reuse: How reusable are the components, e.g., if you needed to prove a different property or construct a larger API? (Higher is better.)
Approach Trust Knowledge Flexibility Infrastructure Reuse
Vanilla n/a Low n/a None n/a Smart High Med Low Low Low GADTs Low High Med Med Med GDP High High High High High Liquid Low High High Very high High
–>
TODO: Table got nerfed when changing blog software...
My overall take is that all of these techniques are under-utilized and under-explored. GADTs/dependent types, GDP, and Liquid Haskell would all be much easier to write if there were more widely-available libraries for programming with them. Given the attention that dependent types have received in the Haskell community, I think GDP and Liquid Haskell are especially neglected.
As the table shows at a glance, these techniques offer a wide range of benefits and drawbacks, which indicates that they're likely appropriate for different use-cases. Broad adoption of several would enable Haskell programmers to use techniques because they're appropriate to their goals, rather than because they have a paucity of options.
Footnotes
These are taken from the Ghosts of Departed Proofs paper.
Fun with mypy: reifying runtime relations on types
I'm a big fan of type checkers for Python, like Mypy. One fun aspect of these tools is that their type systems tend to have features that reflect the dynamic nature of Python, like type narrowing. This post describes another, less-documented way to use runtime evidence of type information.
First, consider the cast operation in in the
typing module. The
docs say:
typing.cast(ty, val)Cast a value to a type.
This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally don't check anything (we want this to be as fast as possible).
In other words, it's up to the programmer to ensure that the runtime
type of the value argument really is a subtype of the type argument.
What if we could make a version of cast, perhaps
safe_cast, that makes sure that's the case? It would need
to meet two requirements:
- Performance: No overhead at calls to
safe_cast, just likecast. No assertions, no conditionals. - Safety: It should be the case that every call to
safe_castis guaranteed (up to bugs in Mypy or programmers doing something obviously tricky) to succeed.
Well, there's good news! We can:
from __future__ import annotations
from dataclasses import dataclass
from typing import Generic, Type, TypeVar, final
Sub = TypeVar("Sub")
Super = TypeVar("Super")
@final
class Subclass_(Generic[Sub, Super]):
def __init__(self, sub: Type[Sub], sup: Type[Super]) -> None:
assert issubclass(sub, sup)
def safe_cast(evidence: Subclass_[Sub, Super], value: Sub) -> Super:
return value # type: ignore[return-value]
Here's what it looks like at the REPL:
>>> Subclass_(bool, int)
Subclass_()
>>> Subclass_(bool, str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/blog/reify.py", line 13, in __init__
assert issubclass(sub, sup)
AssertionError
>>> safe_cast(Subclass_(bool, int), True) + 2
3
So how does this work? Well, if you've got a value of type
Subclass_[Sub, Super] in scope, then (unless you've done something
really tricky) it must be the case that at some point, the assertion
written in Subclass_.__init__ passed. This means that
Sub is a subclass of Super, and so by the
Liskov substitution principle you can use any value of type
Sub in a context where a Super is expected.
To understand the technique in generality, let's refine this example and take a look at a few more.
Subclass
For the sake of usability, let's extend the code with three additional features:
- Throw a custom exception instead of using assertions
- Add the types as fields (for reasons explained below)
@dataclass(frozen=True)
class NotASubclass(Generic[Sub, Super], Exception):
sub: Type[Sub]
sup: Type[Super]
def __str__(self) -> str:
return f"{self.sub} is not a subclass of {self.sup}"
@final
@dataclass
class Subclass(Generic[Sub, Super]):
sub: Type[Sub]
sup: Type[Super]
def __init__(self, sub: Type[Sub], sup: Type[Super]) -> None:
self.sub = sub
self.sup = sup
if not issubclass(sub, sup):
raise NotASubclass(sub, sup)
So why add the types as fields? It's not necessary, but it does extend the reach of our reasoning by enabling us to add "lemmas" like this:
T1 = TypeVar("T1")
T2 = TypeVar("T2")
T3 = TypeVar("T3")
def subclass_transitive(
evidence1: Subclass[T1, T2],
evidence2: Subclass[T2, T3]) -> Subclass[T1, T3]:
return Subclass(evidence1.sub, evidence2.sup)
If you don't like the memory overhead, you can instead "axiomatize"
lemmas, by returning a sentinel value such as
Subclass[object, object] with a
# type: ignore[return-value] comment.
SameType
Subclassing is a partial order on types, and in particular, it's antisymmetric. If two types are both subclasses of each other, they're the same type:
@final
@dataclass
class SameType(Generic[T1, T2]):
ty: Type[T1]
def __init__(self, evidence1: Subclass[T1, T2],
evidence2: Subclass[T2, T1]) -> None:
self.ty = evidence1.sub
# These would be necessary for writing any lemmas:
def get1(self) -> Type[T1]:
return self.ty
def get2(self) -> Type[T2]:
# Could also save the evidence and use safe_cast here
return self.ty # type: ignore[return-value]
Here, we don't need a custom exception type nor any assertion in the
constructor, since both of these are "inherited" from
Subclass.
Some ideas for lemmas:
SameType[T1, T2]impliesSubclass[T1, T2]SameTypeis symmetricSameTypeis transitive
InTuple
Now for something a bit more involved. Recall PEP 585: Type Hinting
Generics In Standard
Collections. This PEP
obviates typing.Dict, typing.List,
typing.Tuple, and more by allowing us to write the
corresponding types directly inside type signatures:python```python
def repeat(i: int, n: int) -> list[int]:
pass
Old style (equivalent):
from typing import List
def repeat_old(i: int, n: int) -> List[int]: pass
Crucially, types like `dict[int, str]` are also values:
``` python
>>> dict[int, str]
dict[int, str]
Moreover, their type arguments are available at runtime:
>>> dict[int, str].__args__
(<class 'int'>, <class 'str'>)
So what can we do with this? Here's a generic class
InTuple[T, Tup]. It has two type parameters, the second of
which is bounded above by Tuple[Any, ...]. If you can
construct one, it means that the type T is one of the types
in the tuple type Tup.
from typing import Any, Tuple
# For some reason, tuple[Any, ...] doesn't work here, even though it is valid as
# a value at the REPL.
Tup = TypeVar("Tup", bound=Tuple[Any, ...])
T = TypeVar("T")
@dataclass(frozen=True)
class NotInTuple(Generic[T, Tup], Exception):
ty: Type[T]
tup: Type[Tup]
def __str__(self) -> str:
return f"Type {self.ty} is not in tuple type {self.tup}"
@final
@dataclass
class InTuple(Generic[T, Tup]):
"""Witness that ``T`` is in ``Tup``."""
ty: Type[T]
tup: Type[Tup]
def __init__(self, ty: Type[T], tup: Type[Tup]) -> None:
self.ty = ty
self.tup = tup
# Mypy version 0.910 doesn't acknowledge that this attribute is defined,
# but it is.
for tup_ty in tup.__args__: # type: ignore[attr-defined]
if ty == tup_ty:
return
raise NotInTuple(ty, tup)
Let's try it out:
>>> InTuple(int, tuple[int, str])
InTuple(ty=<class 'int'>, tup=tuple[int, str])
>>> InTuple(int, tuple[str, list[str], bool])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/blog/reify.py", line 44, in __init__
raise NotInTuple(ty, tup)
__main__.NotIn: Type <class 'int'> is not in tuple type tuple[str, list[str], bool]
So what can we do with this? Well, probably lots of things! We can write a function that gets the first value of a given type out of a tuple, regardless of its index:python```python def first(evidence: InTuple[T, Tup], tup: Tup) -> T: for elem in tup: if isinstance(elem, evidence.ty): return elem assert False, f"Impossible: Bad 'InTuple' value {evidence=} {tup=}"
``` python
>>> first(InTuple(int, tuple[str, int, bool, int]), ("string", 5, False, 10))
5
This isn't so impressive when we construct a literal
InTuple value as the first argument, but might be helpful
in more polymorphic scenarios.
It's also worth noting that this function can be a bit unintuitive when combined with certain subclass relationships:
>>> issubclass(bool, int)
True
>>> first(InTuple(int, tuple[bool, int, str, int]), (False, 5, "string", 10))
False
It's worth noting that any of these predicates or relationships could
be negated, simply by inverting the conditional and renaming the class.
You could probably have a lemma that the negation of
InTuple is reflexive, but I can't think of how you'd use
it.
IsGenericArgument
In fact, parameters to all generic types are available at runtime:
>>> from typing import Mapping, Generic, TypeVar
>>> Mapping[int, str].__args__
(<class 'int'>, <class 'str'>)
>>> T = TypeVar("T")
>>> class G(Generic[T]):
... pass
...
>>> G[int].__args__
(<class 'int'>,)
So we can handily generalize the previous example just by creating a new
exception type, replacing the Tup variable with something
unconstrained, and renaming the class to IsGenericArgument.
NewtypeOf
How can we tell if a given type is just a NewType wrapper
around another one? Well, let's investigate:
>>> from typing import NewType
>>> Pos = NewType("Pos", int)
>>> Pos
<function NewType.<locals>.new_type at 0x7fa08729b4c0>
>>> dir(Pos)
[..., '__supertype__']
>>> Pos.__supertype__
<class 'int'>
Ah, that's just what we need!
from typing import Callable
@dataclass(frozen=True)
class NotANewtypeOf(Generic[Sub, Super], Exception):
sub: Callable[[Super], Sub]
sup: Type[Super]
def __str__(self) -> str:
return f'{getattr(self.sub, "__name__", "<unknown>")} is not a newtype of {self.sup}'
@final
@dataclass
class NewtypeOf(Generic[Sub, Super]):
sub: Callable[[Super], Sub]
sup: Type[Super]
def __init__(self, sub: Callable[[Super], Sub], sup: Type[Super]) -> None:
# Mypy has trouble with Callable fields...
self.sub = sub # type: ignore[assignment]
self.sup = sup
if not getattr(sub, "__supertype__") == sup:
raise NotANewtypeOf(sub, sup)
Some ideas for lemmas:
- Is this transitive?
NewtypeOfimpliesSubtype
Newtype
The above example also leads naturally to our first predicate or
property of interest on types, rather than a relationship between
types, namely, whether or not a given type is in fact a
NewType.
@dataclass(frozen=True)
class NotANewtype(Generic[Sub, Super], Exception):
sub: Callable[[Super], Sub]
def __str__(self) -> str:
return f'{getattr(self.sub, "__name__", "<unknown>")} is not a newtype'
@final
@dataclass
class Newtype(Generic[Sub, Super]):
sub: Callable[[Super], Sub]
def __init__(self, sub: Callable[[Super], Sub]) -> None:
self.sub = sub # type: ignore[assignment]
if getattr(sub, "__supertype__") is None:
raise NotANewtype(sub)
More Ideas
What other predicates on and relations between types do we care about? A few ideas:
-
Subclassable: Some types aren't! For example,<class 'function'>(trytype(lambda: None)in the REPL). -
IsAbstract, orNotAbstract: I don't know if it's possible to check this at runtime, but it would be cool if it were! -
HasMetaclass: Check that a class has a given metaclass.
Do you have other ideas or use-cases? Drop me a line!
A quick tutorial on setoids in Rocq
A setoid is a type T and an
equivalence
relation
(=) : T -> T -> Prop. Coq has some built-in automation
facilities for working with setoids that can make your life a lot
easier.
Building a setoid
Let's suppose we wanted to implement a quick-and-dirty version of
finite sets on some type A. One option is to use
list A, but lists have two properties that get in the way:
ordering and duplication. Building a Setoid allows us to
abstract away these features, and consider two lists equivalent if they
have the same elements.
Context {A : Type}.
Definition equiv_list (l1 l2 : list A) : Prop :=
∀ a : A, In a l1 <-> In a l2.
To register our new equivalence with Coq and reap the benefits of
automation, we make list A an instance of the
Setoid
typeclass:
Instance ListSetoid : Setoid (list A) :=
{ equiv := equiv_list }.
Proof.
split.
At this point, we're dropped into proof mode and our goal is to prove
that equiv_list is an equivalence relation. To see the full
proof, see the Coq source for this
post
(we use some setoid-related automation to prove this goal).
Immediately, we gain access to a few tactics that work for normal
equality, like
reflexivity,
symmetry
(plus symmetry in [...]), and
transitivity.
setoid_rewrite
The #1 benefit to using setoids is access to the
setoid_rewrite tactic. Rewriting is one of the most
powerful features of the tactic language, and using setoids allows us to
expand its reach.
Suppose we have l1 l2 : list A. If we have
l1 = l2 (where = is Coq's built-in equality),
we can replace l1 with l2 in all contexts:
Lemma hd_eq (l1 l2 : list A) : l1 = l2 -> hd l1 = hd l2.
Proof.
intros l1eql2.
now rewrite l1eql2.
Qed.
This isn't true if we just know l1 = l2= (where
== is
notation
for list_equiv). However, there are certain contexts in
which they are interchangeable. Suppose further that A has
decidable equality, and consider the following remove
function:
Context {deceq : ∀ x y : A, {x = y} + {x ≠ y}}.
Fixpoint remove (x : A) (lst : list A) : list A :=
match lst with
| nil => nil
| cons y ys =>
match deceq x y with
| left _ => remove x ys
| right _ => y :: remove x ys
end
end.
Since this removes all occurrences of x in
lst, we should be able to prove
∀ x l1 l2, l1 == l2 -> remove x l1 = remove x l2
We state this lemma a bit oddly:
Instance removeProper : Proper (eq ==> equiv ==> equiv) remove.
This concisely states "given two equal inputs x y : A and
two equivalent lists l1 l2 : list A,
remove x l1 is equivalent to remove y l2". In
other words, remove respects equality on A
(trivially -- every Coq function respects equality) and the equivalence
relation == (AKA equiv) on
list A.
Once we have an Instance of Proper, we can use
setoid_rewrite to replace replace remove x l1
with remove x l2, even when they appear nested in the goal
statement:
Instance revProper : Proper (equiv ==> equiv) (@rev A).
Proof.
[...]
Qed.
Lemma remove_rev_remove {a b : A} {l1 l2 : list A} :
l1 == l2 -> rev (remove a l1) == rev (remove a l2).
Proof.
intros H.
now setoid_rewrite H.
Qed.
We can even compose Proper instances automatically:
Require Import Coq.Program.Basics.
Local Open Scope program_scope.
Instance rev_remove_Proper : ∀ a, Proper (equiv ==> equiv) (@rev A ∘ remove a).
Proof.
apply _.
Qed.
The benefits of setoid rewriting continue to increase with the complexity of the goal.
About my notes
These are my working notes. They are not written with an audience in mind, but might be useful to some!
Amd64 System V ABI
Function call sequence
Registers
The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, R9.
Stack layout
On x86_64 with the SysV ABI, the stack grows "downwards" from high addresses to low. The end of the stack is initialized with the ELF auxiliary vector, and functions expect the following data to be available above their stack frame (i.e., just above the address in rsp), from high addresses to low:
- Padding (if necessary)
- Their stack-spilled arguments
- The return address
(The end of the stack-spilled argument list must be 16-byte aligned, which may necessitate the use of padding, depending on the number of spilled arguments and the layout of the caller’s stack frame.)
The following diagram summarizes the stack frame layout:
High addresses
|---------------------|
| Caller's frame |
|---------------------|
| Padding (if needed) |
|---------------------|
| Spilled argument n |
|---------------------|
| ... |
|---------------------|
| Spilled argument 2 |
|---------------------|
| Spilled argument 1 |
|---------------------| <- %rsp + 8 (16-byte aligned)
| Return address |
|---------------------| <- %rsp
| Callee's frame |
|---------------------|
Low addresses
Helpful links:
- https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64
- https://wiki.osdev.org/System_V_ABI#x86-64
- https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf
Build systems
See also Make.
Build systems I have enjoyed
Build systems I gave up on
What to look for
Allow or require out-of-tree builds
Justification: Out-of-tree builds are occasionally necessary, for instance if your source tree is mounted on a read-only filesystem. More realistically, out-of-tree builds ensure that you and your build system are aware of all the files that get created and read during the build process, which aids in maintainability, source tree cleanliness, and accurate dependency tracking.
Do only necessary work
As much as your build system supports it, you should use dependency tracking to ensure that you don’t redo any build steps or other tasks.
Example: If package or file B depends on A and B changes, don’t
re-compile, re-lint, or re-test A.
Pin external dependencies as precisely as possible
It’s nice to work with a range of versions of dependencies, but ensure that there is at least one known-good, fully-specified list of dependencies. This usually means a specific commit hash or tarball hash for each dependency.
Justification: This aids in reproducibility.
Example: Poetry and Cargo provides lock files that fully specify dependency versions. CMake and Pip do not. Cabal can be configured to provide and use a “freeze” file.
Use ecosystem-appropriate tooling
To the extent that it’s reasonable, use tooling that’s common in the ecosystem you’re targeting.
Justification: This aids in maintainability by making it easier for likely contributors (that is, people familiar with the ecosystem) to contribute. It can also aid in debugging, in that others may have seen the same issues you’re seeing before.
Example: When building C++, use Make, CMake, or Meson. Don’t use something super obscure.
Example: It’s not always reasonable to use pip or setuptools for Python,
because they have notable flaws (e.g. pip install does extraneous work
when the package hasn’t changed since the last call to pip install). When
using Python on a project, it may be appropriate to consider a different
build/packaging solution.
Utilize content-addressing for build products where possible
Justification: Content-addressing provides an extensional view of when dependencies change, leading to fewer rebuilds, i.e. speed.
Example: If package B depends on A and the README of A changes, but
doesn’t affect the build products that B uses, don’t rebuild B.
Don’t modify the user’s system
The build system should only modify files inside the build directory. In
particular, it shouldn’t install system-wide packages. There are certain
exceptions, for instance it’s OK for Cabal’s Nix-style builds to install
packages in ~/.cabal because they are hermetically built.
Justification: This aids portability, because your assumptions about developers’ (and CI systems’) machines will break.
Provide a source distribution
The build system should provide a target that packages all necessary source files into an archive so that that archive can be unpacked and built with just the files present plus those downloaded and managed by the build system.
Justification: This ensures all dependencies are accurately accounted for. For instance, a common unstated dependency is a bunch of git submodules. It also makes it easier to back up your sources.
Links
Datalog
Overview
Datalog is a declarative logic programming language. A Datalog program is a collection of rules specifying how to deduce relations among elements of some domain from some input relations. For example, the following program specifies the relation reachable in terms of another relation edge (which could be an input to the program):
reachable(X, Y) :- edge(X, Y).
reachable(X, Y) :- edge(X, Z), reachable(Z, Y).
Operationally, a Datalog interpreter might take a CSV file or database table which specifies the edge relation, perform the computation/deductions specified by the program, and output the reachable relation as another CSV file or database table.
Datalog is related to Prolog, which is sort of a less declarative and also Turing-complete version of Datalog.
Join order
The join order is the order in which the many tables involved in a rule are traversed. For example, ignoring the presence of indices, the rule
r(a, c) :- p(a, b), q(b, c)
will have to traverse the tables containing relations p and q to compute r.
The join order of a rule can have a sizable impact on its performance. For Soufflé, the join order is determined (mostly!) by the syntactic order of the clauses. Consider the following example in Soufflé:
r(a, c) :- p(a, b), s(b), q(b, c)
If p has many tuples, but q has less and s has even fewer, then a better ordering would be
r(a, c) :- s(b), q(b, c), p(a, b).
because (again, ignoring indices) the above rule would essentially be evaluated like the following nested for-loop:
for b in s:
for (b0, c) in q:
if b == b0:
for (a, b1) in p:
if b == b1:
r.insert(a, c)
Essentially, you want the most selective conditions to appear first, so that the innermost loops are executed the fewest number of times.
To get a feel for join order, it can be helpful to examine Soufflé’s imperative RAM IR.
Debugging
Tips
- Try
set -xin Bash - Use
strace
Tools
creduce
apt-get update && apt-get install -y creduce
creduce --sllooww --timeout 60 --n $(nproc) reduce.sh file.c
halfempty
git clone --branch=master --depth=1 --single-branch https://github.com/googleprojectzero/halfempty
cd halfempty
apt-get update -q
apt-get install -q -y libglib2.0-dev
make -j
#!/usr/bin/env bash
set -eux
result=1
trap 'exit ${result}' EXIT TERM ALRM
if grep interesting /dev/stdin; then
result=0 # We want this input
else
result=1 # We don't want this input
fi
exit "${result}"
./halfempty/halfempty --stable --gen-intermediate --output=halfempty.out test.sh input-file
Further reading
Error handling
Links
Make
More generally, see Build systems.
Snippets
Adding a suffix to a list of files
BINS := $(SRC:=.bin)
Depending on the Makefile
# All rules should depend on the Makefile itself
THIS = $(abspath $(lastword $(MAKEFILE_LIST)))
Listing (phony) targets
See above for $(THIS).
.PHONY: help
help:
@printf 'Targets:\n'
@grep '^\.PHONY: ' $(THIS) | cut -d ' ' -f2
Setting the shell
SHELL = bash
.SHELLFLAGS = -euo pipefail
Useful functions
addprefixaddsuffixdir: Like Bash’sdirnamenotdir: Like Bash’sbasenamesubst
Automatic variables
$@
Q. In GNU Make, what does the $@ variable denote?
A. The target.
$<
Q. In GNU Make, what does the $< variable denote?
A. The first prerequisite.
$?
Q. In GNU Make, what does the $? variable denote?
A. All prerequisites that are newer than the target.
$^
Q. In GNU Make, what does the $^ variable denote?
A. All the prerequisites.
Markdown
Previewing Markdown
redcarpet --parse fenced_code_blocks my-file.md > my-file.html
# Continuously:
echo "my-file.md" | entr -c -s "redcarpet --parse fenced_code_blocks my-file.md > my-file.html"
Spoiler text
<details>
<summary>Spoiler warning</summary>
Spoiler text. Note that it's important to have a space after the summary tag.
```sh
echo "code block"
```
Tools
- markdownlint
- mdlynx
- redcarpet
- typos
Python
Options and environment variables
-Otells CPython to disable assertions.PYTHONDEVMODEtells CPython to enable some special checksPYTHONWARNINGSenables different warning controls.
Snippets
Script template
#!/usr/bin/env python3
"""Description goes here"""
from argparse import ArgumentParser
from pathlib import Path
from sys import exit, stderr
def eprint(*args, **kwargs):
print(*args, file=stderr, **kwargs)
def die(msg: str, /) -> None:
eprint(msg)
exit(1)
def go(paths: list[Path], /, *, flag: bool = False):
pass
parser = ArgumentParser(description=__doc__)
parser.add_argument("--flag", action="store_true")
parser.add_argument("paths", nargs="+", type=Path)
args = parser.parse_args()
go(args.paths, flag=args.flag)
Using Enum with argparse
@unique
class Format(str, Enum):
"""Export format"""
TEXT = "text"
CSV = "csv"
def __str__(self) -> str:
"""For inclusion in --help"""
return self.value
parser.add_argument('--format', type=Format, help=Format.__doc__, choices=list(Format), default=Format.TEXT)
Rust
Performance
Allocators
Build Configuration - The Rust Performance Book
cargo add mimalloc
#![allow(unused)]
fn main() {
#[cfg(not(miri))]
#[global_allocator]
static ALLOC: mimalloc::MiMalloc = mimalloc::MiMalloc;
}
Cargo profiles
# https://nnethercote.github.io/perf-book/build-configuration.html
[profile.release]
codegen-units = 1
lto = "fat"
# https://github.com/mstange/samply#turn-on-debug-info-for-full-stacks
[profile.profiling]
inherits = "release"
debug = true
Tools
See also this section of the Rust Performance Book.
dhat
Quickstart:
cargo add --optional dhat
In Cargo.toml:
[features]
dhat = ["dep:dhat"]
In main.rs:
#[cfg(feature = "dhat-heap")]
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
Then:
cargo build --profile=profiling --features dhat -- your args here
Safety
cargo-careful
cargo careful is a tool to run your Rust code extra carefully – opting into a bunch of nightly-only extra checks that help detect Undefined Behavior, and using a standard library with debug assertions.
cargo +nightly install cargo-careful
cargo +nightly fetch
cargo +nightly careful build -Zcareful-sanitizer=address --target=x86_64-unknown-linux-gnu --tests --frozen
cargo +nightly careful test -Zcareful-sanitizer=address --target=x86_64-unknown-linux-gnu --frozen
Scudo
cargo add scudo
#![allow(unused)]
fn main() {
#[cfg(all(not(miri), debug_assertions))]
#[global_allocator]
static ALLOC: scudo::GlobalScudoAllocator = scudo::GlobalScudoAllocator;
}
Links
Software engineering
See also
Notes
Basic things
https://matklad.github.io/2024/03/22/basic-things.html
After working on the initial stages of several largish projects, I accumulated a list of things that share the following three properties:
- they are irrelevant while the project is small,
- they are a productivity multiplier when the project is large,
- they are much harder to introduce down the line.
- READMEs
- Developer docs
- Process docs (branching, code review)
- Style guide
- User docs/website
- Internal website (metrics, coverage, etc.)
- Build and CI
- Testing
- Benchmarking
- Fuzzing
- Ad-hoc automation
- Releases
Links
Testing
DSL-based integration tests
My favorite kind of test.
Some examples:
Property-based testing
- Really stupid properties:
- Is the output nontrivial (e.g. nonempty)?
- Does the program not crash (i.e., fuzzing)?
- Does the program satisfy its assertions/invariants/contracts?
- Property-preserving transformations: reversing a list doesn’t change the max.
- Be inspired by math:
- Test commutativity/intertwining of functions/methods
- Test inverse relationships/functions
- Test idempotency
- Test structurally inductive properties
- Differential testing: equivalence with a simplified implementation or model
See:
Obscure types of testing
Fault injection
Fault injection is a software testing technique that involves inducing failures (“faults”) in the functions called by a program. If the callee has failed to perform proper error checking and handling, these faults can result in unreliable application behavior or exploitable vulnerabilities.
https://github.com/trailofbits/krf
Metamorphic testing
Metamorphic testing relates multiple executions or input/output pairs of the same code. For example, doubling all the elements of a list also doubles the mode. It is useful in combination with property testing.
Mutation testing
Making automated changes to the code to see if the test suite will catch them. “Testing tests” as it were.
Particular types of system under test
REST
Metamorphic Testing of RESTful Web APIs
Abstract
Web Application Programming Interfaces (APIs) allow systems to interact with each other over the network. Modern Web APIs often adhere to the REST architectural style, being referred to as RESTful Web APIs. RESTful Web APIs are decomposed into multiple resources (e.g., a video in the YouTube API) that clients can manipulate through HTTP interactions. Testing Web APIs is critical but challenging due to the difficulty to assess the correctness of API responses, i.e., the oracle problem. Metamorphic testing alleviates the oracle problem by exploiting relations (so-called metamorphic relations) among multiple executions of the program under test. In this paper, we present a metamorphic testing approach for the detection of faults in RESTful Web APIs. We first propose six abstract relations that capture the shape of many of the metamorphic relations found in RESTful Web APIs, we call these Metamorphic Relation Output Patterns (MROPs). Each MROP can then be instantiated into one or more concrete metamorphic relations. The approach was evaluated using both automatically seeded and real faults in six subject Web APIs. Among other results, we identified 60 metamorphic relations (instances of the proposed MROPs) in the Web APIs of Spotify and YouTube. Each metamorphic relation was implemented using both random and manual test data, running over 4.7K automated tests. As a result, 11 issues were detected (3 in Spotify and 8 in YouTube), 10 of them confirmed by the API developers or reproduced by other users, supporting the effectiveness of the approach.
Tools
Further reading
- How to Fuzz an ADT Implementation
- Just a Whole Bunch of Different Tests
- rustc-dev-guide: Best practices for writing tests
Tools
Inspired by https://jyn.dev/tools/.
Basics
I use these every day.
- bat:
catreplacement with highlighting - delta:
diffreplacement - entr: watch files
- fd:
findreplacement - fzf: responsive UIs for everything
- gh: GitHub CLI
- ghcid: Haskell dev tool
- glab: GitLab CLI
- helix: editor, used with custom keybindings
- jq: JSON processing
- kmonad: keyboard manager
- rg:
grepreplacement - taskwarrior
- tealdeer:
man, but brief - zoxide
- zsh: shell, see below
Custom
Little utilities I wrote.
copyandpaste: copy and paste from the shell, on Linux or macOSnotify: send a notification on Linux or macOSpick: pick something at the terminal (just somefzfflags)
Debugging
See also Debugging.
zsh
- fzf-tab
- spacezle: Spacemacs-like keybindings for zsh
- zbr: auto-expanding abbreviations, e.g.
gco --> git checkout - zsh-autosuggestions