Published Feb 22 2018

Data mining and the art of painting a bigger picture

Each day in 2016, Ambulance Victoria received about 60 calls from people poisoned by alcohol, and 25 from people affected by other drugs. Although the ambos saw a 30 per cent rise in crystal methamphetamine-related emergencies on the previous year, prescription medicines were responsible for more calls than illicit substances.

The data was gathered by the Turning Point Drug and Alcohol Research Centre, which is affiliated with Monash University. For more than a decade, Turning Point has been collecting the reports paramedics fill out every time an ambulance goes on a call. The centre has collected reams of data from Victoria and other states, but five years ago it was unsure how to make the best use of the material.

Enter Monash Professor Geoff Webb, from the Faculty of Information Technology. Last year he won the inaugural Eureka Prize for Excellence in Data Science for a body of work, including the Turning Point research. The citation said: “His work, which has included supporting research into male suicide and a range of diseases, has had significant social and economic impact.”

For the Movember-funded project, post-doctoral researcher Christoph Bergmeir was placed at Turning Point, looking at each ambulance callout, the date, time and location. Professor Webb says: “We did analysis around what factors are associated with different calls. What happens on the weekends? Are there more calls? What happens in the country versus the city? Does the provision of mental health resources in the area affect the number of ambulance call outs in the area?

“One ambition we had for the data was to create surveillance techniques. It sounds creepy, but it’s not. Currently, for example, if there is a batch of some drug out there that has something in it that means people are dying, that routinely won’t be picked up until it works its way through to the Coroner’s Court.” This process can take several months. But the data gathered by Turning Point will make it possible to detect an overdose spike within weeks.

The 2016 data showed, for example, that ambulance callouts for prescription medicine overdoses – including painkillers, anti-psychotics and anti-depressants – were higher in regional areas, per rate of population. This could indicate, for example, the need for better education concerning prescription drug use in the regions.

Big data: transforming the world

The first round of work on the Movember project has been completed and a draft report written. Professor Webb says the Eureka Prize was awarded for the methodologies he employed rather than the results – the Turning Point work is ongoing. He's using the same techniques in bioinformatics, a field that uses computer programming tools to understand biological data. He says the beauty of working with big data analytics is that it allows him to do cross-disciplinary work (he doesn't have a medical or social science background). “It's an extraordinarily exciting field. It's transforming the world beneath our feet.”

Professor Webb describes data mining as “more an art than a science; it requires lateral thinking. When you have a new task, it's not obvious how to go about it. It’s a very creative process, working out where you would like to go with it.”

His method, when faced with reams of new data, is to devise algorithms that can then detect useful patterns.

“My research usually takes the form of, what needs to be done? What do we need to do to get the information from the data?” he says. “I begin the brute force way – crunching numbers on the University’s supercomputer. My aim is to do the work efficiently, to bring it down”, so that the algorithms can be employed by a personal computer.

“As with any technology, you need to be afraid of what it can do."

At Monash, Professor Webb’s bioinformatics work is being used to predict how protease enzymes work in the human body. While advanced microscopy and x-ray crystallography techniques allow crystallised molecules to be visualised, the technology cannot capture them in movement.

“We use data-driven techniques to infer things about structure and function – without doing a full-blown structural analysis,” he says.

“Protease is a class of enzyme whose job it is to cleave, or cut, other proteins. Understanding what that protease will cut is basic to understanding their function. We take a protease and all the things that it's known to cut and where it cuts them. From there we can predict what other proteins it can cut and in what location. Cutting the wrong thing at the wrong time in the wrong place is the basis of many awful diseases.”

Professor Webb readily acknowledges that the techniques he uses are also being employed by corporations to monitor – and possibly manipulate – their clients on the internet. “As with any technology, you need to be afraid of what it can do. Like driverless cars – it's natural to be afraid of them, but they have great potential, too,” he says.

“Any technology can be used for positive or negative ends. I don’t think we know where it's going to end. The world is an extraordinarily interconnected place.”

About the Authors

  • Geoff webb

    Director, Monash University Centre for Data Science and Professor of Information Technology Research, School of Information Technology

    Geoff is a world-renowned data scientist whose research investigates how to use data to best support effective evidence-based decision making and derive useful knowledge and insight. This spans artificial intelligence, machine learning, data mining, data analytics and big data. Geoff is the author of the Magnum Opus commercial data mining software package, a system that embodies many of his research contributions in the area of data mining and has contributed many components to the popular Weka machine learning workbench. He is a technical adviser to Froomle, a data-science-driven recommendation engine.

Other stories you might like