Bigeye Data Labs Inc., a company that’s pushing the concept of observability into big data, has gotten a big lift in that mission after raising $45 million in a new round of funding.
Announced today, the Series B round was led by Coatue with participation from existing investors Sequoia Capital and Costanoa Ventures, bringing Bigeye’s total amount raised to $66 million to date.
Bigeye is on a mission to reinvent data quality by applying concepts proven in the areas of DevOps and site reliability engineering to the realm of datasets and data pipelines. It has created a monitoring platform that works by instrumenting a company’s databases with data quality metrics powered by machine learning. The platform can create actionable alerts when it finds anomalies within a dataset, helping teams fix the problems before they can hurt the company’s business.
The idea is that by spotting these data quality issues, teams can fix anomalies that would otherwise appear in dashboards and machine learning models. Inaccurate data can do a lot of damage to a business.
Bigeye co-founder and Chief Executive Kyle Kirwan told SiliconANGLE that he defines “low quality data” as any data that shouldn’t be used by a given application because of some undesirable characteristic that could affect performance. A small number of missing values won’t change any decisions made with a dashboard, but it could have a measurable impact on a learning model’s performance.
“The most common issues that make data unfit for use are data not showing up on time, missing data and duplicated data,” Kirwan said. “There are other conditions that can cause problems too — like extreme outlier values or misformatted identifiers can also cause serious issues depending on the application.”
The challenge in identifying and fixing this low quality data stems from the fact there are so many different sources of information, making it almost impossible to detect the numerous issues that might crop up.
“Changes to the schema the data is initially logged with, expired API tokens, mistakes in changes to data pipeline code, test data making it into production by mistake, all of these can cause serious issues,” Kirwan explained. “What frustrates data teams is that many of these issues are outside of their control, but they still have to play defense against them.”
The most common problem caused by low quality data is what’s known as a “data outage,” which is when a system just stops generating new data. That could cause all manner of problems, Kirwan said, such as a machine learning model that keeps sending suggestions to customers to sign up for an event that has already happened. Then there are the secondary problems that low-quality data causes.
“These are the ripple effects that occur when primary problems go unchecked,” Kirwan said. “So the executive stops trusting the data team and no longer wants to use data for anything mission-critical. The user tells other users that a product isn’t very good and that they should try something else instead. These effects are less visible and take time to accumulate but are ultimately more damaging than any single data outage.”
Bigeye aims to prevent all of these issues from happening. Its platform relies on sophisticated anomaly detection algorithms and uses application programming interfaces and pre-built integrations with tools such as Airflow to integrate with a customer’s data stack.
It works by learning what it’s looking for in each type of dataset and what the normal thresholds for a particular dataset are. So if anything falls outside that threshold, it will trigger an alert so a team can go in, see what the problem is and attempt to fix it.
“Data outages happen because of changes somewhere in the data pipeline,” Kirwan said. “The data stops flowing, or it gets duplicated, or some other problem occurs that will cause an outage when it reaches the application at the other end.”
What Bigeye does, he added, is make it easier to collect thousands of metadata metrics from an organization’s data pipelines and detect changes in how the data behaves. “We have also added recognition of pattern changes and slow-moving trends, which will detect outages in addition to more benign changes that the data team will still want to be aware of,” he said.
Because the platform was built by former data lake engineers, Bigeye comes with a minimal “performance tax” that’s associated with most observability tools.
The company has enjoyed strong growth in recent times, pointing out that its user base has doubled in each of the last four quarters. It has an interesting customer base, with the likes of Instacart Inc. and Udacity Inc. using Bigeye to clean up data they use to make strategic growth decisions.
Alpha Exploration Co., the company behind the Clubhouse social audio app, uses Bigeye to clean up the data powering artificial intelligence tools that are used to improve service and user engagement. Meanwhile SignalFire, creator of a talent platform for engineers, data scientists, product managers, designers and business leaders, uses Bigeye to clean up data from third-party sources.
Analyst Holger Mueller of Constellation Research Inc. said the observability market is maturing rapidly and stretching out into more specialized aspects such as data quality.
“Data is a key asset for any enterprise, so its management, handling and operation is going to be very high on the observability roadmap,” Mueller said. “So the funding for Bigeye is timely. It’s one of the few startups in the fast-growing segment and the money will enable it to build out its product more rapidly.”
Bigeye said it will use today’s funding to scale its team to meet the growing demand for data observability, help more companies prevent data outages and build more trust in data in general.