Dimitrios Kouzis-Loukas Blog: My TensorFlow experience with Ubuntu 14.04 LTS

Time to try TensorFlow

There was all this recent buzz about Goggle's Tensorflow and I wanted to try it out. My Mac doesn't have NVIDIA so I bought an ASUS - N551VW with NVIDIA GeForce GTX 960M (2GB Ram).

Needless to say that switching from Mac to Ubuntu is quite an experience. Every keyboard shortcut you're used to, not only doesn't work, but likely closes the window or undos something useful. Well done $(whoever is responsible)! Anyway, back to Tensorflow.

The installation experience

Tensorflow requires the CUDA toolkit that you can easily find here. Don't use the "runfile (local)" option. Use the "deb (local)" instead because "runfile (local)" is more complex and it freezes instantly after you install the driver. You can find the process nicely described here. Tensorflow also requires cuDNN that you can find here but requires you to register for the Accelerated Computing Developer Program. To install it smoothly you can follow the instructions in the relevant Tensorflow's section.

After you install CUDA, I suggest you to also install the CUDA samples (see "CUDA SDK Samples" here) and build and run deviceQuery. It's the "hello world" of CUDA and it's nice to see what nice piece of hardware you own:

Back to Tensorflow - I used the PIP installation and it all went fine. Don't forget to choose the GPU edition and Python 3 (obviously!)

Hello world stuff...

Likely you will end up running the Tensorflow Hello world next. Nice and simple. You open a session and run an addition in the GPU:

Nice, clear diagnostics and general happiness. Well done! Before you realise it, you're called to chose between the "blue pill" and the "red pill" here. I chose the "blue pill" but didn't last for long. Here it is:

A few lines of code and you have a MNIST classifier with 92% accuracy and many statements on how embarrassingly bad this result is. Soon you end-up with the "red pill". With a few extra lines you build a multilayer convolutional network and you use dropout to avoid overfitting. It's nice to see that it's relatively simple to set something like this up. It makes you want to try alternatives and learn. Well done!

While I was training this DNN with 20k iterations, I got the first Zen moment while I could hear the fan spinning like crazy and feel the heat but as the "System Monitor" confirmed... it wasn't the CPU. That was nice!

Anyway, training finishes and the DNN is ready to show off its skills. Evaluation! And here's where I hit the first bug.

Nothing more or less than Issue 136 which is simply - "You don't have enough GPU RAM". Here's the solution but who bothers re-compiling. I trust them it works. I also trust that the DNN gives 99.2% accuracy and that it's "not state of the art, but respectable." Awesome!

Where is the IDE?

More or less at this point I run out of tutorials but I still have this feeling there's something missing... Where is this IDE I was seeing in the videos - the one with the nice graphs etc. Aha! Here it is... and it's called TensorBoard... and it's not an IDE! :) I mean - obviously my bad - but I thought that you could use it to write code by drawing and clicking but what TensorBoard actually does is to visualise several aspects of your NN's training (or whatever flow you're running.)

You still write Python but you use a SummaryWriter class to log data to a directory (see example here.) Then you use tensorboard --logdir =/tmp/mnist_logs (or whatever path) to visualise and get a better understanding of what happened. Don't get me wrong - awesome job! But maybe videos/marketing oversold this feature a little bit. That said - excellent visualisations!

So I do tensorboard --logdir=/tmp/mnist_logs and I open TensorBoard on the browser and I get nothing (!) It's because of Issue 1421 which is fixed here and merged on the latest trunk but not in a PIP released version yet. No problem, but I really want to see this so I end up compiling from source.

Compiling from source

I now have to install Bazel as described here, which means that I have to install Java 8 (which I should anyway). Then while compiling TensorBoard (2 steps described here) I hit Issue 605 which reminds me that I have to follow the exact process described here since unfortunately I will have to build a large part of TensorFlow as well (with GPU-to play safe). The compilation takes some time ~10 min, much of which is spend on protocol buffers compilation (why do I need c# bindings to run TensorFlow?) and I have a chance to see they use GRPC... nice!

The TensorBoard

After this process, here's TensorBoard giving some coarse metrics on overall DNN performance during training that shows it's converging nicely

...and some more detailed histograms per layer. Awesome!

Finally the Graph tab shows you what you're actually running:

When you click on a block (on the image above I clicked on layer 1) you see its structure. In the example above the graph shows the

those layers use. I can clearly see how TensorBoard could help someone debug non-trivial problems and communicate complex network topologies to peers.

Closing remarks

TensorFlow, despite the little bugs here and there (which by the way - get fixed very quickly), is a very mature product. It gets the level of abstraction right - way better than other NN software packages. It hides completely the fact that you're running on a GPU and allows you to write code that follows quite closely the mathematics. Of course this was just "hello world" problems but they are indicative of some clear design decisions that I'm sure the entire package follows.

So well done. It's a nice piece of software that encourages you to experiment and explore a field that used to be way less accessible!

Dimitrios Kouzis-Loukas Blog

Monday, 18 April 2016

My TensorFlow experience with Ubuntu 14.04 LTS