K-LLVM

A semantics of LLVM in K.

LLVM is designed for the compile-time, link-time and run-time optimization of programs written in various programming languages. The language supported by LLVM targeted by modern compilers is LLVM IR. In this paper we define K-LLVM, a reference semantics for LLVM IR. To the best of our knowledge, K-LLVM is the most complete formal LLVM IR semantics to date, including all LLVM IR instructions, intrinsic functions in the LLVM documentation and Standard-C library functions that are necessary to execute many LLVM IR programs. Additionally, K-LLVM formulates an abstract machine that executes all LLVM IR instructions. The machine allows to describe our formal semantics in terms of simulating a conceptual virtual machine that runs LLVM IR programs, including non-deterministic programs. Even though the K-LLVM memory model in this paper is assumed to be a sequentially consistent memory model and does not include all LLVM concurrency memory behaviors, the design of K-LLVM’s data layout allows the K-LLVM abstract machine to execute some LLVM IR programs that previous semantics did not cover, such as the full range of LLVM IR behaviors for the interaction among LLVM IR casting, pointer arithmetic, memory operations and some memory flags (e.g. readonly) of function headers. Additionally, the memory model is modularized in a manner that supports investigating other memory models. To validate K-LLVM, we have implemented it in K [41], which generated an interpreter for LLVM IR. Using this, we ran tests including 1,385 unit test programs and 2,156 concrete LLVM IR programs, and K-LLVM passed all of them.