I have a smart scale which gives me great information of my weight, BMI, body fat… etc, but it was probably manufactured somewhere in Asia where the beauty standard is different, because it thinks I’m fat. You be the judge: I am female, 161cm and 55.7 kg. I would like to argue there is bias in the data the manufacturer used: they probably did not have weight data of the rest of the world.

 I guess I'm fat in Asia.

I guess I'm fat in Asia.

(And don’t tell my friends who have innocently stepped on the scale while visiting me that 1) I know their weight, too and 2) my scale also judges them.)

In 2013 Angelina Jolie decided to pre-empt breast cancer by undergoing a double mastectomy before any symptoms had shown. She chose to listen to the data, which told her she had 87% of developing breast cancer because of a dangerous mutation of the BRCA1 gene she was carrying.

What if we can all benefit from such medical analysis? We need everyone’s health data in the data pool so we can create reliable and unbiased prediction models.

Collecting unbiased data and listening to the algorithm is sometimes not nice to have, but necessary.

For example, it is true in the case of building self-driving cars. According to the analyst firm RAND in 2016, autonomous cars would need to be tested over 11 billion miles in order to prove that they’re better drivers than humans drivers.

Have you seen those self-driving test cars with a thing on top that looks like a bucket from Kentucky Fried Chicken? That is a Lidar. Each test car is packed with sensors, cameras, radars and often a Lidar. The mission of the test car is to capture as much data about the environment and conditions of the road, and data scientists then use the data collected to train neural networks how to act appropriately on the road.

About 1 TB of data per hour is captured by each test car. And how much is 1 TB of data? It’s about 400 hours of high-definition videos. In order to ensure our safety on the road, a fail-proof self driving car model demands 99.9% accuracy, and that requires a large amount of labeled data covering all kinds of driving scenarios.

What if all the automotive companies can share their data, or draw from the same data pool? This greatly reduces the labor and time it takes to collect necessary data. And if one self-driving car got into an accident, the learning from the accident can be shared with all the autonomous vehicles so no second accident need to happen.

“Data” is the lifeline and often times the Achilles Heel of AI innovation. Sometimes AI developed is hindered because of the lack of high-quality data, and your AI model is as good as the data that goes into it. The potential for AI to boost economy and to improve our lives is huge, but good AI products require high-quality, unbiased data, in exponential quantity.

Yuval Noah Hararri, the author of Sapiens and Homo Deus, says that in this the new era, whoever owns our data, will own our future, and they will take control of all the resources in the world, rendering the rest of us “useless”.

Therefore we should be asking: who owns our data?

Currently, it’s the U.S. companies such as Google, Facebook, Amazon and Apple. While they don’t necessarily have Chinese citizen’s data, the Chinese government does and they have arrested a few suspects using facial recognition technology.

Where does Europe stand in the worldwide AI race? The French president Emmanuel Macron gets how important it is. He said, “public data needs to be open and accessible for AI research”.

Europe has the chance to develop AI products that unlock the value of unbiased data for the good of humanity instead of locking it down for the good of a few corporations. We need to start thinking about ways open our data to contribute to a decentralized, encrypted data pool, perhaps based on blockchain, that is not controlled by one person, company, or country, because blockchain is decentralized, immutable and a good solution to store encrypted data.

Decentralized data owned by no one also ensures fiasco such as the one created by Cambridge Analytica would not happen again, where companies used AI that knows us better than ourselves for mass manipulation.

A flourishing data economy built on blockchain could be the engine to AI revolutions everyone benefits from, and not just the major corporations that own our data. This is especially essential for Europe to build the economies of scale in data, and establish leadership in AI.

It is high time we take control of our own fate to ensure AI is developed for the good of humanity, and for everyone.

We can’t just leave it to the researchers or governments, and certainly not the companies without our best interest in mind.

We need to open our data for Europe to have a chance in the AI race - it’s a little like donating our blood.

We need to make sure our data is safe in a decentralized data pool, so not only a few companies control our data.

We need to make sure regulations such as GDPR is updated, and not outdated.

We can democratize AI by first building an unbiased, decentralized data pool. Join the revolution before time runs out. This is the most important revolution of our lifetime.  

Comment