aboutwritingtalksworkeducation

Machine Learning in Scala with Google Colaboratory

Shadaj Laddad · March 12, 2019

Last week was the TensorFlow Dev Summit, where the latest announcements in all the TensorFlow libraries were announced. In these talks, almost all the demos were in Google Colaboratory (Colab), a free product from Google that gives you access to a Jupyter notebook running in the cloud with the option to connect to powerful GPUs to accelerate machine learning operations!

Inspired by the power of these interactive demos, I set forth to make it possible to run Scala code inside Colab. In this blog post, we’ll see how to set up Scala to run inside Google Colab, and then take a look at a few examples of machine learning (including with GPU acceleration) with Scala inside Colab notebooks.

Getting Started

Scala is not currently a built-in language for Colab (only Python and Swift are officially supported), so before we start writing code there’s a short step needed to install Scala support into your environment. Because the Colab filesystem is reset after some inactivity, you’ll need to re-run this installer whenever you return to a notebook after some time.

If you want to follow along with the examples in this post, head over to the notebook, connect to the environment that you want to run the later notebooks with (CPU or GPU in Runtime > Change runtime type), and hit Runtime > Run All

The installer loads Almond, a Jupyter kernel that enables Scala support. In addition, it slightly tweaks the kernel definition to preload the Python native libraries, which makes it possible to use NumPy and TensorFlow through ScalaPy as we’ll see in a bit!

Hello World in Colab

Now that we’ve installed Scala support, it’s time to get started with our first Scala notebook! First, create a new notebook by making a copy of the Scala in Colab template (because Scala is not officially supported, there’s no way to directly create a new Scala notebook).

Because the Scala kernel is installed, Colab automatically knows to run all the code blocks in the template with it, so when you hit the run button next to the first code block you should see “Hello, world!” printed below!

Before we jump into machine learning, let’s try out the different features offered by Almond. First, we can import a library. Almond internally uses Ammonite, so we get a nice import syntax for loading libraries in our notebook.

For example, we can import Circe, a library for manipulating JSON in Scala, with the Ammonite import syntax

import $ivy.`io.circe::circe-core:0.10.0`, $ivy.`io.circe::circe-generic:0.10.0`, $ivy.`io.circe::circe-parser:0.10.0`

import io.circe._, io.circe.generic.auto._, io.circe.parser._, io.circe.syntax._

With this loaded, we can then convert a Scala object to JSON. As you’re typing this in, you’ll see that the notebook has full code-completion support. For example, hit Ctrl-Space after typing Qux(13, Some(14.0)). and you’ll see code completions that include the asJson method.

case class Qux(i: Int, d: Option[Double])

val json = Qux(13, Some(14.0)).asJson.spaces

Here’s a complete notebook that uses Circe to convert a Scala object to JSON.

Thanks to Almond, we get a natural experience for writing Scala code inside the notebook. Almond features even more advanced features, such as displaying custom HTML, which can be used to plot charts. Take a look at the Almond docs for more information.

Using TensorFlow

Now that we have Scala code running in our notebook, it’s time to do some machine learning! Colab is an excellent environment for experimenting with and training models, especially because it offers access to high-performance GPUs and TPUs for free!

Using TensorFlow from Scala is easy with ScalaPy, a library that enables seamless interop between Scala code and Python libraries. ScalaPy is designed so that any Python library can be used, even if they have native dependencies. And libraries like TensorFlow and NumPy are no exception — we can use them from ScalaPy while still having access to high performance native computations. To learn more about ScalaPy, check out this blog post.

One of the primary features of Scala is its rich type system, and ScalaPy makes it possible to write type safe code even when working with Python libraries. When using TensorFlow, we can simply add .as[TensorFlow] and all later usages will be checked against the type definitions in the scalapy-tensorflow library.

First, let’s start with a simple example of performing linear regression with TensorFlow. The following notebook starts by installing ScalaPy, then loads up TensorFlow, and finally performs some linear regression.

Linear regression isn’t exactly the most complex machine learning application, especially when we’re using a library as powerful as TensorFlow, but it does a good job of demonstrating what the experience of writing Scala code for TensorFlow is like!

Learning Faster with GPUs

Now that we have TensorFlow set up, let’s do some more advanced machine learning and train an image classifier! Normally, training such a model would be very slow on a CPU, but with Colab we have access to GPUs that can make the training process a lot faster!

In this example, we train a CNN based classifier on the MNIST dataset (which contains images of handwritten digits), achieving a test accuracy of over 99% with just 5 epochs of training! All the code to load the dataset, set up the model, and perform training are written in Scala, but through ScalaPy we are able to use TensorFlow to perform the training on a GPU.

With the power of GPUs, training this relatively complex model is super fast: training with GPUs took 1 minute and 23 seconds while with CPUs it took over 6 minutes. Pretty impressive!

And More!

This is just the tip of the iceberg for what is possible with machine learning in Scala. With the powerful compute resources in Colab and the awesome experience with Almond, the possibilities for experimenting with new machine learning techniques are endless. And because ScalaPy is designed to handle any type of Python library, it is also compatible with the latest features in TensorFlow, including TensorFlow 2.0 Alpha!

To learn more about the tools used in this blog, check out:

  • Almond -- the kernel implementation that lets us write Scala code in Colab
  • ScalaPy -- the bridge that made it possible to use TensorFlow in our Scala code
  • ScalaPy TensorFlow -- static typings that make using TensorFlow natural in Scala