Creating Your First Machine Learning Model with Vertex AI

Emilie Robichaud
4 min readFeb 9, 2023

When I registered for BackyardHacks in 2020, I knew I wanted to explore computer vision. Like others, I bought many plants during the COVID-19 lockdown — what’s the science behind that phenomenon, anyway? Did the added greenery make people feel less confined? Or did having to nurture a living organism give life a new purpose when things felt so bleak? Whatever the reason, I was one of those people! Combined with my desire to learn about computer vision, I devised the idea for Florafy. An Android app that allows you to take a picture of a plant, and it will identify the plant type for you! This blog is going to focus on the machine learning model aspect of this type of project; how the model decides what it’s “looking” at, and how you can create your own model with Vertex AI by Google!

A Brief Overview

If you‘ve seen Silicon Valley, you probably know about the Not Hotdog app. This app will tell you if an object is a hotdog or not — while this may not have many practical uses, it’s a great (and fun) example for the purpose of this blog. Now, let’s get started! In order for our model to understand what a hotdog is, we must give it large amounts of data for training. 🌭

hotdog (left), also hotdog (right)

While it may seem obvious to us when we see a hotdog, our model has no knowledge of the wonderful world of hotdogs! If we only feed it images like the left, it will only recognize hotdogs with Wonder Bread buns and mustard… but there are many other ways hotdogs may look. We need a variety of toppings, buns, sizes, and positions, in order for the model to grasp the ethos of hotdogs.

Gathering Data

Gathering data can be the hardest part of training a machine-learning model, especially if you are doing complex object classification. Finding data will be dependent on what you’re training your model to “see”, but Google Open Images is a good place to start, containing ~9 million images with corresponding labels. For our example, I’ve found a hotdog not-hotdog dataset available on Kaggle — this dataset contains 249 images of hotdogs and 249 images of not hotdogs.

From the Vertex AI console, simply upload your images and label them accordingly. Then, allow your model to train! The amount of time this takes depends on your data; how many images there are, and how many labels. My model took 2.5 hours, but I’ve trained some for 8+ hours!

Behind the Curtain

So, what exactly is happening while your model is training? First, you need to understand the concept of a “neural network”. In Machine Learning, a neural network is a method that teaches computers to process data in a way that is inspired by the human brain — it creates an adaptive system that computers use to learn from their mistakes and improve continuously.¹

In the case of computer vision, the network takes in the input (images of hotdogs or not hotdogs) and processes the individual pixels of each image. By doing this with hundreds or thousands of pre-labeled images, it can learn what to look for. Hence, when we pass in an image for testing, the model will compare every pixel from that image to every image of a hotdog it's ever seen; if the testing image meets a minimum threshold of similar pixels, it declares it a hotdog!²

Testing our Model!

This is where our work pays off — how well does the model work?

Vertex AI has generated this useful matrix for us. Our model is able to accurately label a hotdog 92% of the time, and a not hotdog 88% of the time. For our purposes, this is good enough. But if you want to improve the accuracy of your model, you should include more images and allow it to train more. Now let’s test with some new images…

You can see that model is 100% certain that the hotdogs are indeed hotdogs! It is 99.7% sure that the image of a pizza is not a hotdog, and 99.9% sure that the image of a mouse is not a hotdog. To show that our model is not exactly perfect, I tested an image of a dachshund in a hotdog costume… our model is 94.5% sure that this is a hotdog! 😅 For better accuracy, we would need to train with significantly more data. But I hope this blog serves as a good starting point for beginners interested in machine learning! 💻

[1] AWS. What Is A Neural Network? https://aws.amazon.com/what-is/neural-network/

[2] The Next Web. A beginner’s guide to AI: Computer vision and image recognition. https://thenextweb.com/news/a-beginners-guide-to-ai-computer-vision-and-image-recognition Google Google Developers

--

--

Emilie Robichaud

University of Toronto graduate with majors in Computer Science and Mathematics! Always eager to explore more in the world of technology.