Machine learning to the rescue: Preventing cyberbullying in real time
In today's digital age, the widespread use of social media and online communication has brought new challenges, including the rise of cyberbullying.
With the anonymity and accessibility of the internet, individuals may engage in harassing or intimidating behaviour online, leading to devastating consequences for victims.
However, technological advancements such as machine learning offer hope in improving the efficiency of detecting and preventing cyberbullying.
Machine learning is a powerful tool within the field of artificial intelligence that allows machines to learn and enhance their performance without explicit programming.
Specifically, machine learning algorithms can be trained to detect patterns within online communication that may indicate cyberbullying behaviour.
These algorithms can identify instances of cyberbullying in real time by analysing vast amounts of data gathered from social media platforms, messaging apps, and other online platforms.
This paves the way for prompt intervention and prevention measures.
"One application of machine learning that can help identify cyberbullying is natural language processing [NLP],” says Associate Professor Manjeevan Singh, from the School of Business at Monash University Malaysia.
“NLP algorithms can analyse the language used in online communication to determine the tone and sentiment of the message, as well as identify specific terms or phrases associated with bullying behaviour.
“For example, if an individual frequently uses foul language or makes threatening statements, the algorithm may flag it as potentially abusive behaviour, and alert the appropriate authorities."
According to Dr Manjeevan, using machine learning for the identification of cyberbullying offers numerous advantages, particularly in terms of scalability.
Conventional ways of preventing cyberbullying, such as manually monitoring online platforms, can be inefficient and time-consuming, particularly for major social media sites that have millions of users.
In contrast, machine learning algorithms enable the recognition and response to cyberbullying incidents in a timely and effective manner.
However, this approach also presents certain challenges. In order to train the algorithms, significant quantities of high-quality data are required, which is one of the most challenging aspects.
Although cyberbullying is rife, it remains a relatively unexplored area, particularly in the context of the Malay language. There’s a dearth of publicly accessible datasets containing hate speech, which poses a challenge for researchers.
To address this issue, efforts were made to collect tweets in Malay, which were then processed to remove any tweets in related languages, such as Indonesian, that had been mixed in. While this effort began with several thousand tweets, it represents an important starting point for further research.
After manually labelling each tweet as bullying or not, it was found that almost 40% of the selected dataset was marked as bullying.
"To classify the tweets, we experimented with several deep-learning models, including Bert, XLnet, and Fasttext. The F1 scores for XLnet outperformed Bert, with an achieved classification accuracy of 76%. By incorporating both XLnet and Fasttext, the accuracy rate increased to 80%," Dr Manjeevan stated.
It was acknowledged that the accuracy rates could be further improved with additional training and the incorporation of hate speech data.
To help researchers move more quickly through their research, this dataset will be made publicly available.
Dr Manjeevan added it’s likely there’ll be an increase in the utilisation of machine learning technology to identify and prevent cyberbullying as the technology continues to advance. Cyberbullying can be reduced, and a more secure environment can be created for all internet users if the appropriate tools and strategies are used.
Experimental design and implementation was carried with Associate Professor Sriparna Saha, Shaubhik Bhattacharya, and Krishanu Maity, from the Department of Computer Science and Engineering, at the Indian Institute of Technology, Patna.