October 26, 2020

Brainome: Making data scientists 10x more productive

Spencer Greene

We are living in the age of data. Growth in the world’s available data since 2000 has been impressive, but the ability to turn data into value has been far greater. With artificial intelligence, machine learning, and a collection of data technologies, the scope of data science is expanding at a rate we can hardly keep up with.

I’ve lost count of the number of startups I’ve met that promise to “disrupt industry_foo using AI.” They’re all right about one thing: the industries they’re targeting are going to be reinvented in the 2020s — whether by these startups, by their competitors, or by the incumbents they’re trying to displace. Data science has already begun transforming dozens of industries, and what’s been done so far represents a tiny fraction of what’s possible. Intelligent use of data can make a real dent in some of our world’s biggest problems, like improving health outcomes, economic prosperity, physical and information security — the list goes on.

But here’s the thing: data science is still science. Meaning that for every new problem, there’s still a need for scientists who develop hypotheses and run experiments to test those hypotheses.

A data scientist’s hypothesis is a model, and each experiment tests how well a model predicts the real world. Like other sciences, in data science your first hypothesis might be proven right, or you might make ten attempts, or a hundred, before finding one that works. And like in other sciences, a hypothesis being “right” means it predicts the real world better than other hypotheses, but no predictor is ever perfect. However good your model is, another more accurate, more efficient hypothesis may come along tomorrow. A data scientist will improve a model until it’s accurate enough, or efficient enough, but has never had a way of knowing how much better it could be. Because there hasn’t been a way to see how good models are, data scientists also haven’t known when to stop improving them.

Until now.

Enter Brainome, a company TSVC invested in earlier this year.

Based on the research of Dr. Gerald Friedland, Brainome has a tool that makes data scientists dramatically more productive.

Gerald started out researching a key problem in data science: what is the lower bound on how efficient a model can be while still being accurately predictive? What he discovered in working with hundreds of real-world datasets is that it’s a lot smaller than people think. He took these datasets and the applied models, and proved there are equally accurate models that are ten to a hundred times smaller. That alone is a huge step forward — knowing that a better model exists is helpful in deciding whether or not to create it. The company has since made three even more revolutionary advances:

1) They’re able to measure the relative contribution of different features in the dataset, pre-training. For many problems and datasets, Brainome reduces feature-engineering time from days to minutes.

2) They can evaluate a dataset for sufficiency, answering the questions “do I have enough data to build a model that’s worthy?” and “do I need all this data or does a subset suffice?” This step also can be done pre-training.

These two capabilities mean that, instead of an iterative cycle of compute-intensive auto-ML experiments, a data scientist can use Brainome to fully characterize the dataset before investing in any sort of training.

3) And the pièce de resistance: Brainome’s developed a way to construct a radically efficient model from the given dataset.

Well, let me put it in business terms: companies that use Brainome can cut weeks or months off their R&D schedules. Pharmaceutical companies can identify promising drugs faster. Quantitative trading shops can verify new algorithms faster. Adtech companies can adapt faster and deliver better targeting.

CEO and cofounder Bertrand Irissou, himself a veteran of machine learning research [BI1], says the company is challenging the unexamined beliefs that more data and compute power are the answer to data scientists’ problems. Brainome replaces the brute force, state-of-the-art with a “measure-first” approach that’s orders of magnitude more efficient.

TSVC sees a huge opportunity here, and we’re delighted to be a part of Brainome’s journey. If your company uses AI or data science, check them out!

Read the original article