MLIR

Overview

The Multi-Language Intermediate Representation, an LLVM project. The goal is to support dialects of IRs like LLVM, Rust's MIR, Swift's SIR, etc. that share some common infrastructure, e.g. datatypes, parsing, location tracking, pass management, etc. The key idea is that these many different dialects can live at different levels of abstraction: near source code, as high-level IRs, or as hardware descriptions languages.

Ops, Blocks, and Regions

Dialects

Dialects can have custom syntax, with custom printers and parsers.

  1. LLVM

    The LLVM dialect embeds LLVM into MLIR, with a few notable changes:

    • Constants become Ops
    • Phi instructions become block arguments

    As of June 2021, there are no analysis passes for LLVM that live in MLIR.

  2. SCF: Structured Control Flow

    The SCF dialect contains Ops for structured control flow like for- and while-loops, and if-statements.

Foreign APIs

There are C and Python bindings. As of June 2021, the C bindings have no specific stability guarantee.

Literature

MLIR: A Compiler Infrastructure for the End of Moore's Law

  1. Abstract

    This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building domain specific compilers, and aid in connecting existing compilers together. MLIR facilitates the design and implementation of code generators, translators and optimizers at different levels of abstraction and also across application domains, hardware targets and execution environments. The contribution of this work includes (1) discussion of MLIR as a research artifact, built for extension and evolution, and identifying the challenges and opportunities posed by this novel design point in design, semantics, optimization specification, system, and engineering. (2) evaluation of MLIR as a generalized infrastructure that reduces the cost of building compilers-describing diverse use-cases to show research and educational opportunities for future programming languages, compilers, execution environments, and computer architecture. The paper also presents the rationale for MLIR, its original design principles, structures and semantics.

  2. Design Principles

    • Little builtin, everything customizable
      • The three core concept are Op, block, and region
      • There are a handful of other types like types and attributes
    • SSA with nested regions
    • Progressive lowering
    • Maintaining high-level semantic information
    • IR validation should be workable
    • Declarative rewriting
    • Source location tracking and traceability
  3. Design Details

    1. Operations

      An operation (Op) is the unit of semantic information. Ops might represent opcodes in some ISA, or functions, modules, or variables.

    2. Attributes

      Attributes are compile-time constant key-value maps.

    3. Location information

    4. Regions and blocks

      Regions contain blocks, which contain ops, which contain regions. Blocks have terminators and successors. Blocks have arguments, which replace phi nodes.

    5. Symbol tables

      Symbol tables provide a mechanism for non-lexical reference.

    6. Dialects

      Dialects group related ops, attributes, and types. Dialects may have custom syntax.

    7. Types

      Types represent compile-time information/static semantics.

  4. IR Infrastructure

    1. TableGen

      MLIR uses LLVM's TableGen to declaratively describe ops.

    2. Declarative rewrites

      Pass authors may specify pattern-based declarative rewrites in a TableGen-hosted DSL called DRR.

    3. Pass manager

      There is a (parallelizable) pass manager that works on the op granularity.

    4. Textual format

      All dialects have a textual IR that is isomorphic to their in-memory representation.

  5. Evaluation: Applications

    1. TensorFlow

    2. Polyhedral optimization

    3. Fortran IR

    4. Domain-specific compilers

  6. Related work

    Calls out:

    • IRs: LLVM, ONNX
    • Languages for heterogeneous computing: OpenMP, StarSs, OpenACC, C++ AMP, HCC, SyCL, Lightweight Modular Staging/Delite
    • Machine learning compilers: XLA, Glow, TVM, PolyMage

Polygeist: Affine C in MLIR

  1. Abstract

    We present Polygeist, a new tool that reroutes polyhedral compilation flows to use the representation available in the recent MLIR compilation infrastructure. It consists of two parts: a C and C++ frontend capable of converting a wide variety of existing codes into MLIR suitable for polyhedral transformation, and a bi-directional conversion between MLIR's polyhedral representation and existing polyhedral exchange formats. We demonstrate the flow by converting the entire Polybench/C benchmark suite into MLIR, and by performing an IR-to-IR optimization leveraging an existing polyhedral compiler (Pluto). Our flow produces results comparable to the state-of-the-art compiler, enabling direct comparison of source-to-source and IR-to-binary compilers. We believe Polygeist can improve the interoperation between MLIR and the existing polyhedral tooling ultimately benefiting both the research and the production compiler communities.

  2. Notes

    • The tool comes with an experimental/partial translation from C into MLIR, specifically the affine dialect with embedded LLVM.

Progressive Raising in Multi-level IR

  1. Abstract

    Multi-level intermediate representations (IR) show great promise for lowering the design costs for domain-specific compilers by providing a reusable, extensible, and non-opinionated framework for expressing domain-specific and high-level abstractions directly in the IR. But, while such frameworks support the progressive lowering of high-level representations to low-level IR, they do not raise in the opposite direction. Thus, the entry point into the compilation pipeline defines the highest level of abstraction for all subsequent transformations, limiting the set of applicable optimizations, in particular for general-purpose languages that are not semantically rich enough to model the required abstractions. We propose Progressive Raising, a complementary approach to the progressive lowering in multi-level IRs that raises from lower to higher-level abstractions to leverage domain-specific transformations for low-level representations. We further introduce Multilevel Tactics, our declarative approach for progressive raising, implemented on top of the MLIR framework, and demonstrate the progressive raising from affine loop nests specified in a general-purpose language to high-level linear algebra operations. Our raising paths leverage subsequent high-level domain-specific transformations with significant performance improvements.

  2. Notes

    • This paper is not the most clearly written - lots of typos, etc. (at least in the PDF I found).
    • While the idea in the abstract is fairly general, the majority of this paper focuses on the application to loop nests.
    • The transformation starts at the bottom with the SCF dialect.

ScaleHLS: Achieving Scalable High-Level Synthesis through MLIR