Bridging Innovation and Integrity: Our Collaboration with KTH on Privacy-Preserving AI

Jan 29
2 min read

At SEB, our data is one of our most valuable assets. It allows us to understand market trends and build smarter services for our customers. However, the most valuable information we hold, e.g. individual customer transactions, is also the most sensitive.

Under regulations like GDPR, we must rightfully ensure the highest standards of data protection. While these regulations are vital for maintaining customer trust, they can create a hurdle for rapid experimentation in Machine Learning (ML). Traditionally, the journey from an innovative idea to testing it with real data involves extensive approval processes to ensure compliance, which can extend development timelines.

The Vision: Innovation Without Risk

To address this, SEBx partnered with the KTH Royal Institute of Technology to explore the potential of Synthetic Data. The objective is to create a mathematical equivalent of our data. This synthetic data mirrors the statistical patterns and behaviors of real-world data, allowing our models to learn and improve. All without containing any information that can be traced back to an actual individual.

The challenge in this field is ensuring that the generation process is truly robust. Even when data is anonymized, some Machine Learning models risk "leaking" information if they inadvertently memorize specific details from their training sets.

To solve this, we leveraged a sophisticated privacy framework developed by Professor Tobias Oechtering and his research team at KTH: Pointwise Maximal Leakage (PML).

While common industry standards like Differential Privacy apply a blanket layer of mathematical noise to hide individuals, PML is a context-aware measure. In our collaboration, we integrated the PML framework into a synthetic data generation algorithm. By accounting for the specific nature of the data we are protecting, we can achieve a much higher level of data quality without sacrificing security.

Working alongside Professor Oechtering, Sara Saeidian, and Leonhard Grosse, Master student Ata Yavuzyılmaz developed a method to make this theory a reality. He then put the synthetic data to the ultimate test: training Machine Learning models on synthetic data and comparing their performance against those trained on real, sensitive data.

The results were highly encouraging:

Comparable Performance: The models trained on synthetic data performed with a level of accuracy nearly identical to those trained on real data.
Enhanced Utility: By using the PML framework, we achieved a better balance between high privacy and practical usefulness than previously possible.
Proof-of-Concept: The research demonstrates that we can unlock the power of our most sensitive data assets while providing a mathematical guarantee of privacy.

Why This Matters to Us

This collaboration proves that we do not have to choose between innovation and privacy. In the future, this approach could allow us to verify a data-generation algorithm once, and thereafter provide our developers with safe, high-quality data instantly. This would move our innovation cycle from months to days, all while upholding our commitment to customer integrity.

We would like to thank Ata, Tobias, Leonard and Sara. This work has not only resulted in an excellent Master thesis but also a published letter in the IEEE Signal Processing Letters.

Read the research publications here:

Master Thesis: Privacy-Preserving Synthetic Data Generation using Pointwise Maximal Leakage

Journal Publication: A Tight Context-Aware Privacy Bound for Histogram Publication

Georg Schuppe

AI researcher

SEBx

Bridging Innovation and Integrity: Our Collaboration with KTH on Privacy-Preserving AI

Recent Posts

Comments