This is an introductory tutorial on using Theano, the Python library. I’m going to start from scratch and assume no previous knowledge of Theano. However, understanding how neural networks work will be useful when getting to the code examples towards the end.
The plan for the tutorial is as follows:
- Give a basic introduction to Theano and explain the important concepts.
- Go over the main operations that we have available in Theano.
- Look at working code examples.
I recently gave this tutorial as a talk in University of Cambridge and it turned out to be way more popular than expected. In order to give more people access to the material, I’m now writing it up as a blog post.
I do not claim to know everything about Theano, and I constantly learn new things myself. If you find any errors or have suggestions on how to improve this tutorial, do let me know.
The code examples can be found in the Github repository: https://github.com/marekrei/theano-tutorial
1. What is Theano?
Theano is a Python library for efficiently handling mathematical expressions involving multi-dimensional arrays (also known as tensors). It is a common choice for implementing neural network models. Theano has been developed in University of Montreal, in a group led by Yoshua Bengio, since 2008.
Some of the features include:
- automatic differentiation – you only have to implement the forward (prediction) part of the model, and Theano will automatically figure out how to calculate the gradients at various points, allowing you to perform gradient descent for model training.
- transparent use of a GPU – you can write the same code and run it either on CPU or GPU. More specifically, Theano will figure out which parts of the computation should be moved to the GPU.
- speed and stability optimisations – Theano will internally reorganise and optimise your computations, in order to make them run faster and be more numerically stable. It will also try to compile some operations into C code, in order to speed up the computation.