Pattern

Synthetic Data? I’d Tread Carefully

Synthetic Data? I’d Tread Carefully
Published
November 14, 2023
Share

Gartner predicts that by 2024, more than 60% of AI model training data will be synthetic. 

Some might think this bring us one step closer to the Matrix, but I remain cautiously optimistic, and you should too.

Synthetic data, if managed correctly, can positively impact industries that struggle to collect organic data. 

For example, healthcare workers will be able to augment real patient data for training doctors, and finance professionals can improve fraud detection by expanding their data models and testing for abnormal transactions. 

But what about IT? If you’re tasked with managing your company’s digital experience, should you incorporate synthetic data?

Synthetic Testing for App Development

I think if you’re looking to develop and improve your own internal applications, synthetic testing might be for you. 

Instead of bugging employees to test out features in your app, you can use a synthetic-fed AI tool and not risk jeopardizing those relationships.

And before you release the application into the wild, you can run thousands of simulations verifying that its functions are performing correctly and that its code is flawless.

Most importantly, synthetic testing permits you to stress test load capacity and performance. Instead of trialing how one application performs for one employee, you can exercise how it performs across thousands of stand-in employees. 

Okay, But Pump the Brakes 

It remains to be seen how synthetic datasets will inform large language models. Ultimately, IT leaders need to remember that what they’re looking at isn't real.

You should also be cognizant about the inherent biases that come with this innovation. AI and large language models aren’t great at correcting for their own isms (racism, sexism, ageism, etc.). 

Ultimately, IT leaders should aim to collect and understand real human sentiment in the context of the digital workplace, in addition to the hard data they obtain.

You don’t want to unintentionally hide behind a blanket of false security. 

For example, imagine you’ve just rolled out a digital transformation project and all of the hard data (network connections, RAM, boot time, etc.) comes back positive. Should you declare victory? What about the employees interacting with the tool? How would they perceive your definition of success?

These are questions that synthetic data, at the moment, cannot answer. I’m hopeful, however, that one day we’ll get there, but until then, I’ll stay cautiously optimistic. 

Sean is a writer, editor, and content marketing strategist. He plays a hand in shaping most of Nexthink's digital content, and has made a career in product marketing and writing for private technology companies and media outlets. Learn More

Subscribe to the DEX Hub