So the story starts out, like most stories in the technology industry these days, some cool new piece of software for doing something comes along, that makes things easier to do and everyone jumps on board and there are hundreds of blog articles and Github repositories in under a day showing you the amazing ways people are playing with this new technology.
For me this just happened to be TensorFlow Serving. I have been meaning to play around with some Big Data tools for a while now, so this gave me the perfect excuse and I wanted to see what all the hype was about. I have added some Extra Tutorials, Guides & Architectures and Distributed TensorFlow tid bits at the bottom of this article around TensorFlow, as there is always more than one way to crack an egg, and some of these other ways might be better for your use case.
Setup
I started out like anyone would by going to the installation page, but couldn’t find any instructions that would actually help me, as it just talks about building from source. After hours of compiling files I really had no idea what I was doing, I ran into this issue here. I looked around on the internet, solved it by looking into this issue but ultimately ran into alot of other issues. What I should have done was look at issue#1 and I would have seen that there is no support for OS X and to just run it up using docker. I thought about this, but I thought I would see if I could add some Big Data support to Apollo why I was at it and as Apollo, is based on Mesos I thought my best bet would be to use the Kubernetes tutorial.
I had also been working on adding Rackspace support to Apollo, so thought what better way of also getting this ticked off the list. Plus I could take advantage of Rackspace Bare Metal servers for the Spark and TensorFlow servers, which is the perfect use case.
So I have my mesos cluster running on Rackspace and using the DCOS CLI I can install Spark by running: dcos package install spark
or by going to ‘group_vars/all’ file in Apollo project and changing the spark_enabled variable to true. This will automatically install the spark framework and handle everything for you.
This will install Spark Mesos Framework. Now while Spark starts up, I will explain why I have diverged off and included Spark, when this article is about TensorFlow. Originally I had just planned to run TensorFlow but then I started reading a couple of blogs about Hyperparameter Tuning/Optimisation. Basically they say to use Spark to do distributed neural network training which leads to massive reduction time and lower error rates. This is all covered here, in a great post from the team at Databricks. Of course Spark isn’t the only way we can do this but I believe it is the most powerful as it allows us to build more complicated pipelines. Distributed TensorFlow explains the details in depth about using Spark, AWS EC2 and Distributed TensorFlow.
So now to deploy TensorFlow. Here is a gist to the Tensorflow Inception Marathon file. This will take a while as the docker image is quite large. In the meantime go read up on some other demos people have tried in the Extra Tutorials, Guides & Architectures section below.
Once you have the green health check in the Marathon UI. The IP Address ‘52.50.8.153’ listed here is the server running your Spark instance. You can get this by clicking on the Spark in Marathon UI and it should list the server IP it is running on.
You can now query Tensorflow with the path to the image you want analysed. Your output should be the classes / classification of things seen in the image.
docker run --name=inception_container -it gcr.io/tensorflow-serving/inception /serving/
bazel-bin/tensorflow_serving/example/inception_client
--server=52.50.8.153:9000 --image=/path/to/my_image.jpg
Update
Recently I discovered Keras which is a highly modular neural networks library that runs on top of Tensorflow or Theano. It was developed with a focus on enabling fast experimentation. Which suits me perfect for trying different methods and models of machine learning.
I then went down a rabbit hole investigating about what I could do with Keras and how it abstracted away some of the work for me on top of the two tensor networks, when I came across the Deep Learning Robot, which is built for advanced research in Robotics and Artificial Intelligence (Deep Learning) and comes pre-installed with Tensorflow, Robot Operating System, Caffe, Torch, Theano, CUDA, and cuDNN. This for me was a game changer, as I can now run any GPU intense AI on a machine next, cutting all my cloud costs. WIN!
Bonuses
If you also want to try running this up on GPU machines, Mesos does have support for GPUs, but isolation support has not landed yet but you can head over to the TensorFlow Mesos Framework being built, which looks promising and even has some early support for GPUs.
Last month Google Cloud Vision also graduated to General Availability. For those not wishing to experiment on their own, I recommend taking a look at Google Cloud Vision, as it looks quite nice.
Distributed TensorFlow
- Deep Learning with Spark and TensorFlow
- Large Scale Deep Learning With TensorFlow on EC2 Spot Instances
- Scaling Googles Deep Learning Library on Spark
- Distributed TensorFlow