Google Colab: Enhancing Usability

August 1, 2019

Categories: Technical Tags: Network Emacs

What is Google Colab

Colab is Google's free tier cloud computation offering through Jupyter notebook frontend. It's somewhat baffling why and how it's free, but it's plausible that at any given time there are lots of unused hardware lying around in GCP. Regardless, massive props to the team that turned that fact into such a neat and free service. The coolest thing is you get to use very good GPU (recently upgraded from K80 to T4) which is practically essential for Deep Learning these days. So if you are just starting out and not a gamer, you don't need to shell out a hefty sum for a capable GPU (the market for which is terrible anyway, thanks cryptominers and NVIDIA).

Now, in terms of interface, the notebooks aren't half bad, but my sense of usability needs to involve Emacs somehow. To do that we need to gain (secure) shell access to the underlying VM first, so this procedure might be of interest to everyone. Jupyter can already act as a rudimentary shell; if you prepend "!" to a command it's executed in the shell context. But this obviously can't replace the interactive shell that you love; and being reduced to using only non-interactive commands become an annoyance really fast.

Start SSH server

So to begin with, spin up the VM (or, container surely?) by connecting through the web based notebook in browser. Let's understand the system:

! ls -l /proc/1/exe
! uname -a
! head -2 /etc/os-release

This shows:

lrwxrwxrwx 1 root root 0 Aug  1 21:19 /proc/1/exe -> /bin/bash
Linux d884627b5d1d 4.14.79+ #1 SMP Wed Dec 19 21:19:13 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="18.04.2 LTS (Bionic Beaver)"

Okay, so just your usual Ubuntu LTS. But spawned inside a container with bash as init (and with GNU coreutils). On one hand, no systemd so yay. However, we should use some sort of process supervisor (maybe even alongside autossh). Hope for the best, prepare for the worst, basically. Apparently s6 is not in the repo, I will go with runit.

Assuming you got the update/upgrade underway, let's install the necessary packages quietly:

! apt-get install -qq -o=Dpkg::Use-Pty=0 openssh-server openssh-client autossh runit
! mkdir -p ~/.ssh/
! curl -s https://path/to/ > ~/.ssh/authorized_keys
! cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak
! echo 'ClientAliveInterval 120' >> /etc/ssh/sshd_config

Now it's time to start the sshd server. We only need to create a service file, runit will watch over the directory and do all necessary supervision.

! mkdir -p /var/service/sshd /var/run/sshd
! printf '#!/bin/sh\n%s\n' 'exec /usr/sbin/sshd -D' > /var/service/sshd/run
! chmod +x /var/service/sshd/run

Finally we need some ipython magic to spawn a process in the background rather than waiting for it to exit.

! get_ipython().system_raw('runsvdir /var/service/ &')

Creating a Tunnel

We need to establish a (reverse) tunnel because the container reside inside a private network. If you have a public IP yourself, it is recommended that you create the tunnel to your own host because network latency can otherwise become a deterrent to a good interactive experience. Else, you would need a relay/jump host to which you would need to create a reverse tunnel. I am using the free for demonstration purposes.

! mkdir -p /var/service/tunnel /var/run/tunnel
! printf '#!/bin/sh\n%s\n' 'exec autossh -M 0 -o ServerAliveInterval=120 -o StrictHostKeyChecking=no -NR mycolab:22:localhost:22' > /var/service/tunnel/run
! chmod +x /var/service/tunnel/run

This would start the tunnel, and all you need to do is to do the following from your computer:

ssh -i ~/.ssh/colab -J root@mycolab

You could make this more convenient by adding this to ~/.ssh/config:

### First jumphost. Directly reachable
Host serveo
  StrictHostKeyChecking no

### Google colab via jumphost
Host mycolab
  ServerAliveInterval 120
  StrictHostKeyChecking no
  User root
  HostName mycolab
  ProxyJump serveo
  IdentityFile ~/.ssh/colab

So the command above now only needs to be:

ssh mycolab

Bridging Emacs and Jupyter

There are multiple options available in both the Emacs and Jupyter side. The high level overview shows that communication with the Jupyter kernel happens through ZeroMQ sockets. At this level, the appropriate interface is to use the emacs-jupyter package. Note that, you would further need to forward the ports specified by the relevant kernel.json file. See this cookbook regarding connecting to a remote kernel via ssh.

On the other hand, Jupyter notebook is one more layer of abstraction comprising a server that sits atop the lower-level kernel, and a REST API through which browsers talk to it. So to use it, you just need to expose (forward) only one port. The appropriate interface in this case would be to use EIN though emacs-jupyter might grow to be capable here soon.

I would be amiss not to mention that you might entertain the idea of running Emacs completely on the server side. That might save troubles with port forwarding and such, can potentially be even snappier due to less network latency? Well, my configs are nearly unusable to be used in terminal. Besides, I wouldn't want to miss rich content support (plots mainly). So, that leaves us with the possibility of running graphical Emacs on server side with X Forwarding. I actually tried that too with X2Go (which uses more efficient NX compression). It worked fine, though my using xcape messed with keyboard layout. But local Emacs already works fine for me, so I didn't delve deeper.

As an addendum, even though the container is good for 12 hours, if the front-end is inactive with no running task, it could terminate early. So you might want to keep a tab open and employ it with some sort of busywork.