Building an AI-first startup

Rob Vickery

Artificial intelligence, as a technology, is in a constant state of evolution. Starting over ten years ago with the emergence of consumer AI, moving to the development of AI for the enterprise sector, through to today where it is performing tasks that we cannot do, AI is now a permanent and essential part of our lives:

When we are assessing startups who present to us an AI-based business model, we are looking to identify and explore three key areas quickly:

  1. To understand if the founders are aware of the limitations and capabilities of machine learning and that their solution fits in three buckets that we believe distinguishes accurate artificial intelligence from two Excel spreadsheets that are talking to each other. 
  2. The proprietary source and nature of the data used to train the machine learning model(s) to get to minimal algorithmic performance (MAP). Coined by Zetta Ventures, MAP represents the minimum level of accurate intelligence required to justify adoption by a customer.
  3. To gain an insight into the founders closed-loop strategy. Closed loops are an essential part of developing frontier AI solutions and, in the simplest sense, represents the rinse and repeat of data flowing into a continuous cycle of hypothesis testing, error logging and correction. How this is managed at a philosophical and operational level is critical. A typical closed-loop process looks like this:

In our experience as AI-investors for the last decade, machine learning technology seeks to do one of three things:

Distinguishing particular objects from others

AI can excel at the process of separating your data into distinct groups. A well-defined segment is one in which the members of the group are similar to each other and also are different from members of other segments.

Let’s take the example of image segmentation. Each annotated pixel in an image belongs to a certain type. It is often used to label images for applications that require high accuracy and are manually intensive because it requires pixel-level precision. A single image can take up to 30 minutes or beyond to complete. Effective AI-based image segmentation can annotate every pixel and distinguish between items such as sky, ground, and vehicle type.

The Augmentation of processes 

AI-first businesses are focused on models that can automatically analyse substantial datasets, fine-tune the parameters of what someone defines as “normal behaviour” on the fly and highlight when breaches in the patterns occur. Some real-world examples of these might be:

  • Netflix creating your watch-list based on tracking and understanding what you have liked in the past. 
  • Google Maps Tracking historical and real-time traffic patterns and adjusting your route to avoid delays.

Identifying irregularities 

Anomaly detection refers to identifying objects or occurrences that do not conform to an expected pattern or to other items in a dataset that are usually unable to be detected by a human expert. Examples of these include:

  • Multiple failed attempts to log in to consumers online banking, enhancing the chance that a phishing attack is taking place. 
  • Forms of language modelling and natural language processing that identifies a keyword such as “Alexa”, and the software reacts accordingly. 

In the purest sense, all three of these characteristics are simply connecting the input to the desired output. In our portfolio, Yabble takes multi-dimensional survey data and uses it to understand real-time customer sentiment towards brands and products.

Market characteristics for AI-first startups to look out for. 

We believe there are several key market features that can augment an AI-first startup’s chances of success:

  • Vertical focused. AI works best when it is seeking to predict a better answer to a very particular problem. It is far easier to be the best AI platform for auto part fitment data than trying to be a platform that seeks to automate car servicing, accident repairs and new leases. 
  • Enterprise markets, not a consumer. The FAANG (Facebook, Apple, Amazon, Netflix and Google) cartel control the consumer markets. You can still build a billion-dollar business in the enterprise sector without drawing the ire and AI research development budgets of these behemoths. 
  • Things that humans don’t want to do or do very well. It is proven repeatedly that products crush it when they do the tedious, risky or unappealing tasks that humans choose not to do. 
  • Humongous, proprietary datasets. Exponential and defensive enterprise value is created when you train your machine learning algorithms on unique datasets to your business. Training models on social media data that is available to the entire planet is the complete opposite. 
  • Someone’s undesired data could be your machine learning training data goldmine. Sometimes called the “data exhaust”, unwanted or discarded data from an existing process may be the best source of training data to be fed into your closed-loop process. 

As with all startup strategies, listen to your target market, identify potentially ripe, latent opportunities and exploit them like crazy. 

What might be some good problems to solve with AI?

The advance of artificial intelligence has revealed flaws in characteristics we have long valued in executive decision-makers. Algorithms have demonstrated actions once considered prophetic to be lucky, decision principles previously regarded as holy to be total rubbish, and unwavering conviction to be as blind as a bat. Take, for example, the performance of managed investment funds to see the shortcomings of old human decision-making styles. With few exceptions, these funds, many run by household-name investors, underperform index funds over the long term, and AI’s algorithmic trades frequently outperform human ones.

With this statement, let’s not kid ourselves that AI will replace intuition, excellent customer relationship management or inherent creativity. But it should be used to potentially disrupt previous modes of accepted thought and deliver new value to an end customer. Aside from index fund algorithms, what other problems might AI be good at solving?

  • Tasks and procedures are still done manually, often in an inefficient, costly and inaccurate way. Manually counting parked cars at a Tesco supermarket site using RGB satellite data is a particular, incorrect way to determine any potential rises or shortfalls in the value of their stock. Today’s CubeSats and software like Slingshot Aerospace can use hyperspectral imagery analysis and a more frequently updated data set to do this automatically and much more accurately. 
  • Convex payoffs. Zetta Ventures has a wonderful way of attaching the value of a product to the shape of the payoff a customer might experience from using the product. They describe the payoff as either being concave or convex. Convex payoffs represent an increase in value as the product is used. For example, there are numerous AI inventory management systems that can highlight gaps in stock levels so that you can restock, and therefore sell more product. These systems help you fill a hole that was already empty — there is nothing to lose even if the algorithm fails. Concave payoffs might occur with health diagnostic AI tools trying to predict the future occurrence of life-threatening diseases. If they get it wrong, then the consequences are dire for the test patient and the business creating the predictions. 

AI is hungry for great data. Crap data in, crap predictions out. 

So what does great data look like? We believe it has a few key traits:

  • Highly dimensional. A characteristic of machine learning is dealing with massive amounts of data from a vast number of sources. Whether this data is processed as an image, video, text, speech, or purely binary, it almost always exists in some high-dimensional space. Putting this in a human vs. AI match-up, the latter typically struggles beyond two axes.
  • Accessible. If your training data set is either costly or difficult to access regularly and in volume, this will make it challenging to build your business model. For example, if you need to collect data that is generated by a specific piece of hardware that only functions twice a week, then you will have an issue. 
  • Broad and deep. Try and obtain a data set that covers as much of the problem you are trying to predict/solve. If there isn’t enough data available, you could look to augment the existing data points that you already have, synthesise specific data, or you could be more judicious and discriminative and focus on the “right data” that will have the most impact. 

Even if you have enough quality data, you still need to ensure that the AI model you are building is stable and withstands unforeseen training data anomalies or bias hitherto. For example, when Microsoft tried to launch its chatbot, Tay, it caused subsequent controversy when the bot began to post inflammatory and offensive tweets through its Twitter account, causing Microsoft to shut down the service only 16 hours after its launch. According to MS, this was caused by trolls who “attacked” the service as the bot made replies based on its interactions with people on Twitter.


Getting to Minimal Algorithmic Performance (MAP)

So, you have determined that you have an abundant source of dimensional and broad data. The question now is how much do you need to achieve the minimal level of algorithmic performance to create predictions that will prove your thesis or business model.

Let’s say that you are seeking to mimic or match the performance of a soil acidity tester, a pretty monumental amount of data will be needed to get to MAP. If the problem you are trying to solve is simple, then the amount of data to achieve MAP could be pretty small. 

In our experience, some of our portfolio companies have achieved an initial level of MAP with individual data units (images, voice, video views etc.) that number in the order of millions.

It is important to remember that if you achieve MAP today, tomorrow it might have lapsed. The snapshot of the world you are trying to predict can vary, and your model will no longer be valuable. This is why your AI models need to have a constant, fresh data source to continue the training process.

Some general good behaviours that successful AI companies possess

  1. They are designed from the ground up to automate the capturing and handling of data. Your value chain should be collecting and tagging proprietary data at every point possible, even at the beginning of its existence. 
  2. They collaborate and build relationships with complementary data partners to share data trends. In fact, for our portfolio companies, it is a mandatory prerequisite for a partnership to share data. No data. No deal. 
  3. Their data is stored and shared internally in a unified data warehouse. Siloed data lakes prevent machine learning algorithms from achieving MAP. 
  4. The founding team have direct and deep experience in understanding data. Hiring data scientists and AI engineers in your second round of funding (seed) will hamper your pathway to building meaningful predictions. 

Summary

Artificial intelligence continues to evolve in its sophistication and application. If we removed Netflix’s recommendation engine, Siri’s voice commands, Uber’s rider/driver matching algorithm or Google Maps’ automated commutes, some aspects of our lives could arguably come to a halt. Most of these products and companies have emerged from owners who have optimised their business models to collect oceans of unique, proprietary data. 

Creating an enterprise AI model that is constantly refreshing itself with proprietary, dimensional, focused and yet expansive data sets is probably one of the most effective ways to build a significant moat around your business model. As an AI-focused VC, this is ground zero for our due diligence and will frame most of our impressions from our first pitch meeting with an AI-focused startup. 

We wish you the best of luck with building your audacious and relentless AI strategy. Please look us up if you want to discuss any aspect of it.