Glean on aarch64 on Apple Silicon : part 1

Get a working aarch64 box

This post show how to get a working aarch64 env on the MacBook Air (M1) for Haskell.

I’m working on the road at the moment, so picked up a MacBook Air with the M1 chip, to travel light. I wanted to use it as a development environment for Glean (c.f. what is Glean), the code search system I work on. But Glean is a Linux/x86_64 only at the moment due to use of some fancy AVX extensions deep down in the runtime. Let’s fix that.

Motivation: getting Glean working on Apple ARM chips could be useful for a few reasons. Apple Silicon is becoming really common, and a lot of devs have MacBooks as their primary development environment (essentially expensive dumb terminals to run VS Code). Glean is/could be the core of a lot of developer environments, as it indexes source code and serves up queries extremely efficiently, so it could be killer as a local language server backend for your Mac IDE. (e.g. a common backend for all your languages, with unified search, jump-to-def, find-refs etc).

Setup up UTM

Glean is still very Linux-focused. So we need a VM. I’m building on an M1 MacBook Air (ARM64). So I install UTM from the app store or internet – this will be our fancy iOS QEMU virtualization layer.

Configure the OS image as per https://medium.com/@lizrice/linux-vms-on-an-m1-based-mac-with-vscode-and-utm-d73e7cb06133 for aarch64 debian, using https://mac.getutm.app/gallery/ubuntu-20-04 for the basic configuration.

In particular, I set up the following.

  • Information -> Style: Operating system 
  • System -> Hardware -> Architecture: aarch64
  • System -> Memory -> 16G (compiling stuff!)
  • Drives -> VirtIO at least 20G, this will be the install drive and build artifacts
  • Drives -> Removable USB , for the installation .iso
  • Display -> console only (we’ll use ssh)
  • Network -> Mode: Emulated VLAN
VM disk configuration

I’ll point VS Code and other things at this VM, so I’m going to forward port 2200 on the Mac to port 22 on the Debian VM.

Network settings for the VM

Choose OS installer and boot

Set the CD/DVD to the Debian ISO file path. I used the arm64 netinst iso for Debian 11 from https://cdimage.debian.org/debian-cd/current/arm64/iso-cd/

Boot the machine and run the Debian install. It’s like 1999 here. (So much nostalgia when I used to scavenge x86 boxes from dumpsters in the Sydney CBD 20 years ago to put Linux on them).

Yeah!

Boot the image and log in. Now we have a working Linux aarch64 box on the M1, running very close to native speed (arm on arm virtualization).

You can ssh into this from the Mac OS side, or set it up as a remote host for VS Code just fine, which is shockingly convenient (on port 2200).

Install the dev env

This is a really basic Debian image, so you need a couple of things to get started with a barebones Haskell env:

apt install sudo curl cabal-install

We have a basic dev env now.

$ uname -msr
Linux 5.10.0-10-arm64 aarch64

$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/  :? for help
Prelude> System.Info.arch
"aarch64"
Prelude> let s = 1 : 1 : zipWith (+) s (tail s) in take 20 s
[1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]

To build Glean a la https://glean.software/docs/building/ we need to update Cabal to the 3.6.x or greater, as Glean uses some fancy Cabal configuration features.. 

Update cabal

We need cabal > 3.6.x  which isn’t in Debian stable, so I’ll just use the pre-built binary from https://www.haskell.org/cabal/download.html

Choose: Binary download for Debian 10 (aarch64, requires glibc 2.12 or later): cabal-install-3.6.0.0-aarch64-linux-deb10.tar.xz

Unpack that. You’ll also need apt-get libnuma-dev if you use that binary.

$ tar xvfJ  cabal-install-3.6.0.0-aarch64-linux-deb10.tar.xz
$ ./cabal --version
cabal-install version 3.6.0.0
compiled using version 3.6.1.0 of the Cabal library

I just copy that over the system cabal for great good. It’s a good idea now to sync the package list for Hackage, before we start trying to build anything Haskell. with a cabal update.

Install the Glean dependencies

To build Glean we need a bunch C++ things. Glean itself will bootstrap the Haskell parts. The Debian packages needed are identical to those for Ubuntu on the Glean install instructions : https://glean.software/docs/building/#ubuntu except you might see “Package ‘libmysqlclient-dev’ has no installation candidate”. We will instead need default-libmysqlclient-dev. We also need libfmt-dev.

So the full set of Debian Glean dependencies are:

> apt install g++ \
    cmake \
    bison flex \
    git cmake \
    libzstd-dev \
    libboost-all-dev \
    libevent-dev \
    libdouble-conversion-dev \
    libgoogle-glog-dev \
    libgflags-dev \
    libiberty-dev \
    liblz4-dev \
    liblzma-dev \
    libsnappy-dev \
    make \
    zlib1g-dev \
    binutils-dev \
    libjemalloc-dev \
    default-libmysqlclient-dev \
    libssl-dev \
    pkg-config \
    libunwind-dev \
    libsodium-dev \
    curl \
    libpcre3-dev \
    libfftw3-dev \
    librocksdb-dev \
    libxxhash-dev \
    libfmt-dev

Now we have a machine ready to build Glean. We’ll do the ARM port of Glean in the next post and get something running.

Bootstrapping a community via hackathons

I recently gave an interview to Jasper Van der Jeugt as part of the Haskell Zurich Meetup, on the history of hackathons in the Haskell community, and how we intentionally tried to boostrap and grow an open source tooling and infra team for Haskell, via hackathons, in the 2005-2010 period.

Prior to the launch of cabal and hackage the Haskell development experience was “choose a compiler” and “use fptools” as the core library. There were very few 3rd party libraries (< 20 ?) , only a barebones package system and no centralized distribution of packages.

It was really clear by 2005 that we needed to invest in tooling: build system, package management and package distribution. But without corporate funding for infrastructure, who would do the work? We needed to bootstrap an open source package infrastucture team. Enter the hackathons.

In 2007 we met in Oxford to hack for 3 days to launch Hackage, and make it possible to upload and share packages for Haskell. To do this we wanted to link the build system (cabal) to the package management, upload and download (hackage), leading to the modern world of packages for Haskell, which rapidly accelerated into 10s of thousands of libraries.

Image
The first Haskell infrastructure hackathon team that launched Hackage back in 2007.

Looking back this was a pivotal moment: after Hackage , the open source community rapidly became the primary producer of new Haskell code. Corporate sponsorship of the community increased and a wave of corporate adoption was enabled due to the package distribution system. A research community became a (much larger) open source and then commercial software engineering community. And the key steps were Hackage and Cabal, and some polished core libraries that worked together.

You can see the lessons learned echoed in systems like the Rust cargo and crate system, now. Good languages become sustainable when they become viable open source communities around packages.

You can listen to the interview here: https://youtu.be/Dho7XXoakvY?t=1053