Datasets are the lifeblood of artificial intelligence (AI) — they’re what make models tick, so to speak. But data without corresponding annotations is, depending on the type of algorithm at play (i.e., supervised versus unsupervised), more or less useless. That’s why sample-labeling startups like Scale have raised tens of millions of dollars and attracted clients like Uber and General Motors. And it’s why Kevin Guo and Dmitriy Karpman cofounded Hive, a startup which uses annotated data supplied by hundreds of thousands of volunteers to train domain-specific AI models.
Hive, which employs nearly 100 people, launched its flagship trio of products — Hive Data, Hive Predict, and Hive Enterprise — shortly before raising over $30 million in venture capital from PayPal founder Peter Thiel’s Founders Fund and others.
“We built [Hive] because we felt that while there’s a lot of excitement around AI and deep learning, we didn’t see many practical applications being built,” Guo told VentureBeat in a phone interview. “There’s a lot of hype, but didn’t seem obvious what problems they’re really going to solve. Most of these things were demos that were somewhat working, but weren’t really enterprise-grade.”
Toward that end, Hive recruits the bulk of its human data labelers through Hive Work, a smartphone app and website that instructs them to complete tasks like classifying images and transcribing audio. In exchange, Hive doles out a small reward — a collective $300,000 so far. (Guo says it can use “surge pricing” to ensure faster turnaround times when necessary, like when a Hive customer has a specific project.)
The strategy’s been a success. Hive counts almost 700,000 users in over 30 countries among its contributor community, who help to process roughly ten million tags with 99 percent accuracy. (That accuracy’s attributable in part to a weed-out system that slips in “known” tasks every once in a while, ensuring users don’t game the system.) Clients tap the workforce through Hive Data, which provides data-labeling services tailored to a number of verticals.
“Getting training data to build these models actually, really, really important. It’s almost ironic in a sense that the only way to automate is by enlisting an enormous amount of human labor,” Guo said. “You can have the best framework there is, but without good training data, you’re not gonna be able to have a good output. I liken it to a human mind: you can have the smartest brain, but if you don’t teach this brain the difference between cats and dogs and shot it good examples, it’ll never recognize the difference between cats and dogs.”
Hive Work’s output also feeds Hive Predict, custom-designed computer vision models for enterprises that help automate business processes, and Hive Enterprise, which targets domains like auto, retail, security, and media with customized deep learning models built from scratch with proprietary data. Using a backend based on Google’s open source TensorFlow framework, Hive develops AI systems via an API or the cloud, or engineers an on-premises solution in partnership with integration partners.
So far on its in-house servers and networking infrastructure, Hive’s created machine learning models that recognize activity, predict age and gender, classify cars, determine the distance between a camera sensor and a subject of interest, and even detect things like explosions, gunshots, fights, and commercials in television feeds. Guo declined to name any of Hive’s customers, but said that each is making tens of millions of API requests a month.
One of Hive’s models — Logo Model API — detects logos, of course, but also the products or ads on which they’re displayed and the duration they’re visible. And it has a 99 percent recall and 98 precision, Hive claims, compared to Google Vision Cloud’s 66 percent recall and 5 percent precision.
Hive’s adding 100 logos a week, with the goal of reaching 10,000 by Q4 2018.
“Our standard for quality is just much higher than everyone else,” Guo said. “I didn’t want [Hive] to be another really overhyped AI company that couldn’t actually build technology, I don’t think that’s good for the space in general.”