Big data is not always the best data

For a few years now, jobs in Big Data have been touted as a great path to IT career success. After all, we are told, data is everywhere, and somebody has to figure out how to handle it all. But is big data really the best data? Not according to a group of researchers at Harvard Business School.

Marco Iansiti, David Sarnoff Professor of Business Administration, and colleagues Ehsan Valavi, Joel Hestness, and Newsha Ardalani reported that data is time dependent. It has a shelf life.

Time-dependency, according to the Harvard team, “means that data loses its relevance to problems over time. This loss causes deterioration in the algorithm’s performance and, thereby, a decline in created business value.”

Among their findings, the researchers showed that even an infinite amount of data collected over time may have limited use. A much smaller dataset that is current can actually prove more useful.  They added that increasing data volume by including older datasets may put a company in a disadvantageous position.

In one experiment, the Harvard group looked at the value of data in a “next word prediction” task. They found that over time, 50MB of new data is just as useful as twice as much old data for the “next word prediction” task.

Economists and data scientists have long argued that having more data improves the quality of AI-based products and services. This argument triggered debates on whether data volume owned by big tech companies gave them a competitive advantage.  In other words, if you have a sufficiently massive amount of data compared with your competitors, you can simply push them off the market.

The Harvard study calls this idea into question. Having a lot of old data may not be nearly as valuable was having a modest amount of current data.

“Time-dependency,” the researchers reported, “plays a crucial role in determining the importance of data in AI-based businesses.”

So maybe it’s time to throw out some of that old data. And if you’re like me, the satisfaction of throwing out junk will be almost better than improving your business operations.