Data hiding: Using stealth to preserve content quality
Once upon a time, content publishers didn’t have to worry about content duplication, but with more than 4.5 billion people worldwide – nearly 60% of the world's population – now using the internet, that’s a luxury providers can no longer enjoy.
Multimedia content is flooding the internet, contributing to 80% of internet traffic. Anyone can showcase full-screen high-definition videos without waiting for the videos to buffer. This has led to a considerable shift in the way people are using the internet.
Content streaming services such as Netflix, Amazon Prime, Hulu, and Disney+ offer thousands of films and television programs owned by major film studios. Streaming also provides an alternative to file downloading, which is blocked in many countries.
The deluge of content
In 2016, Facebook upped its game by allowing any registered user to broadcast live videos for free – which came at a hefty price. People could upload videos of live concerts, closed-door events, and fashion shows without acquiring the licences for it.
The piracy of digital content became unstoppable. Content providers had to deal with their content being duplicated and shared for free, or sold at a lower price on other platforms.
According to Associate Professor Wong Kok Sheik from the School of Information Technology at Monash University Malaysia, piracy could also impact the academic environment. With myriad lecture videos out there (besides the institute's own), how can students who want to further understand a subject identify fake or original content?
Data hiding was introduced to combat such issues. It’s the process of inserting data into a host to serve specific purposes, such as proof of ownership, verification of genuine content, and linking related contents.
The data can be information derived from the content itself (such as a description), something external to the host content, or a mixture of both. Depending on its purpose, data can be inserted in various ways.
However, distortion in the host content is inevitable with data hiding. Dr Wong and his team aim to ensure that a content's quality is ultimately preserved by preventing unintended or intended changes.
"We’re analysing the coding structure, which is how things are stored in the digital domain. We are looking for ways to encode [insert] additional data while preserving the perceptual quality of the content," he says.
Same value, different methods
The first breakthrough was to identify two distinct ways to store the same value in video. For example, the value 12 can be represented as either 2 x 6 or 4 x 3.
"To the decoder [a piece of software or device that reads the file and produces the corresponding content], it delivers the exact same value, but what's under the hood [for example, 2 x 6 versus 4 x 3] carries additional data," he explains. A similar idea is deployed in other media, including image and document files.
Instead of using the traditional methods that modify the values directly, Dr Wong used different ways to obtain the same values or achieve the same visual impression. By following this approach, the quality and appearance of videos, PDFs and photos appear the same, even after data hiding.
"This is a complete quality-preserving type of processing. Existing techniques can't offer this. We’re also able to scale according to demands. If you tell me how much data you want to insert, I’ll prepare the space for you to do so," he explained.
Dr Wong's innovation is also reversible – the process can be reverted to obtain an exact copy of the original content, down to the bit level.
The outlook for this research is bright. When new formats are put forward, new techniques specific to the format are required to provide security and managerial functions to the content. This situation opens opportunities for innovations to hide data.
Digital content such as video, audio, images, and encrypted files, can be better-managed and linked to related content. Identifying the source of leaked content can also be easily carried out with "fingerprinting", and ownership over content can be claimed with the aid of a watermark whenever there’s a dispute.
Data hiding can likewise have another purpose – to obstruct the meaning of the data. A more common name is encryption. Dr Wong is also researching in this domain, and has completed some pioneering research in the unified domain of encryption and data insertion.