LAB51 logo

Deep Natural Anonymization

LAB51 Featuring Marian Gläser
By Dolma Memmishofer
Dolma Memmishofer

9 Min

May 22, 2023


Marian, it is great to have you with us! Could you tell us about brighter AI, its vision, and mission?

brighter AI emerged from the automotive industry in 2017 and began with generative neural networks. We started working with a variety of companies, primarily in the automotive industry. Later on, most of our projects were halted due to GDPR privacy regulations. Long story short, brighter AI was born at the moment when there was a clear and urgent problem in the industry.

Our mission is to enable companies to work with data in a compliant manner while using it for analytics purposes, thus bridging the gap between privacy and innovation and, in many ways, moving society forward. We enabled companies to be as competitive in Europe despite privacy as any other company worldwide.

Ultimately, our vision is to protect people’s identities in public places.

What are the technologies underlying brighter AI anonymization, and how does it work?

We enable businesses while also protecting their privacy by anonymizing data. It is significant because anonymous data does not fall under the GDPR’s general framework. This means that anonymizing the data allows for a much more open method of handling, processing, and transferring it. To allow companies to fully utilize the data, we developed a new type of anonymization that would replace faces and license plates with artificial ones. This means that even after anonymization, it still appears to be a person with similar characteristics, such as age and gender. It completely protects against any kind of facial recognition. One of our goals is to be as close to the original data as possible while enabling the use of AI and analytics with no drawbacks. Another goal is to be as distant from the original identity as possible. In many ways, due to its complexity, you could say it’s a bit of magic what is happening in there.

LAB51 Banner Original and Deep Natural Anonymization

From whom does brighter AI mainly protect the data: From people or AI?

When it comes to identifying or searching for specific people, the machine or AI takes a different approach than humans. However, the question is: when is a person protected? When is an identity anonymized?

If it is impossible to find an original identity based on resources, time, and effort, the data is considered to be anonymized. And with this short definition in mind, we can then derive what this means for humans and machines.

We’re dealing with massive amounts of data; the clients who use our technology typically have millions, if not billions, of frames. The mass of the stator is huge, and to go through it, humans need to have artificial faces that look similar but aren’t the same. It is nearly impossible to trace and track an individual’s identity at that time with any effort or reasonable means.

LAB51 Banner From whom does brighter AI mainly protect the data - From people or AI

Looking at your examples on the website, DNAT creates similar outcomes to reality. However, it seems that if I knew the person, I could still recognize them. To what extent DNAT replaces the facial trials of people in focus?

If you alternate a certain face or the identity of a person, and it looks very similar, it does not equal the same identity. It’s the same if you have a passport number and you change just one digit out of that, it would look very similar, but only by guessing 10 times would you be able to find the original number. Therefore, we are changing about 40,000 different elements in the frame and modifying them to make sure it is hard to decode.

Obviously, if, for example, you have a tattoo on your neck, special jewelry, or any other kind of thing that is unique to you, that could be identifiable. But this brings us back to the point: it’s not feasible to search through this data, especially if you have publicly collected data where people aren’t like portraits and can’t be found by those additional attributes.

So if facial recognition is getting better, will we need to improve as well? It is quite easy to answer.

As a whole, the face stays similar, but it represents a new identity. Therefore, the exact match becomes vague. So the more advanced facial recognition software is, the better it understands subtle differences between people, which means that the better facial recognition software you have, the more it will identify our anonymization as a new unique identity.

LAB51 Banner Original and Deep Natural Anonymization face

What are the main applications of Deep Natural Anonymization compared to regular blur?

Whenever you have heavy data collection, it can be used for training, analysis, and verification of neural networks. The major applications that we are currently seeing in the industry come from the automotive side, where a massive amount of data is continuously aggregated on fleets to improve neural networks. They need a very natural appearance to the image, and they cannot afford any kind of loss in terms of quality.

As you can see with a blurry person, there is a basic distortion of the image that decreases in percentage; you cannot have this in automotive because it needs to be as raw as possible. And second, you need other attributes. For example, an automotive company needs to understand whether the person is looking at the vehicle or is looking somewhere else, and for that, you need a clear, sharp face with a visible look trajectory.

Anonymization also finds its application in public transport and smart cities. For instance, Deutsche Bahn operates 40,000 cameras and in order to fully and compliantly leverage this technology, they require a robust data privacy framework. This is where we come into play. We provide this necessary baseline, enabling Deutsche Bahn to make full use of their camera data for new applications, such as capacity management.

In addition to our operations in the medical field, where we gather training data from surgical suites, we have also made modest strides in the education sector. Our focus there is solely on AI, and we use anonymized data to maintain the highest standards of privacy.

LAB51 Banner large data face detail people ai

When do companies truly need to scrutinize detailed footage of people’s movements, and when is it sufficient to just use feature extraction on the node or edge, thereby avoiding the need for extra data protection measures by not transmitting vast amounts of data to the server?

People purchase large amounts of training data to develop an understanding of the environment and develop algorithms. As part of the initial phase, when these models are trained and adjusted, the training data needs to be anonymized. But we see that once a company goes into the area of using the camera, they usually want to use it for more and more use cases. So it’s quite diverse.

Last but not least, enhancing the accuracy of your data inevitably requires more training data. A common practice we’ve observed among our clients involves anonymizing data at the edge, where it’s aggregated before being streamed to the cloud for analytics. This process is feasible primarily because the data can be streamed off-premises and into the cloud. Moreover, as the data is anonymized, it doesn’t impact the models, offering a significant advantage for analytics. This way, no modifications to the model, hardware, or environment are needed, making the entire process seamless and efficient.

Do you have any minimum requirements for the resolution of the camera that it needs to have for anonymization to work well?

In general, we are very agnostic. Of course, as the faces become smaller, our chances of missing one increase. So it’s a question of data and data privacy: the higher the quality, the better we can ensure that no faces are missed because, at some point, a face just gets too small. If you have a very low-resolution image with only a couple of pixels for a face, it’s questionable whether it’s still identifiable. As a result, there are no hard requirements for data or resolution. Companies usually strive for higher quality when it comes to AI and analytics.

How much computational power do you require to run an optimization on the edge before streaming into the cloud?

It depends on the frame rate and resolution of the codecs and the container format. So, we need a GPU-based infrastructure on the edge with, for example, an NVIDIA Jetson. To run deep neural networks efficiently, you need the right infrastructure and hardware. If you have the necessary hardware, it will perform decently. If you want to increase its performance, you’ll need to scale up the hardware.

Speaking about safety reasons, is it possible to decode anonymized data?

No, it isn’t. We take the input of the original face class, an understanding of the person’s features, and then we generate a certain amount of randomness towards an entirely new identity, and this randomness cannot be converted. So we cannot go from the final outcome to the original face. The data is non-revertible backward.

Some of your clients, for example, in public transportation, use cameras for several purposes, e.g., to analyze the traffic flow and ensure safety. How does DNAT work at a crime scene? Isn’t it true that data anonymization can aid criminals in their efforts?

Any kind of business uses our services for commercial purposes that are outside of security. In the train station, for instance, there is a legitimate interest in recording people for safety and security. In Germany, storage times can usually be up to 48 hours. One stream is stored for the police so they can look it up in case of any crime or accident. And the second video stream that we capture and anonymize goes for data analytics and training purposes.