
Social sentiment analysis has become an indispensable tool for organizations seeking to understand public opinion, especially in the context of news and online media monitoring.
While extensive research has been conducted on sentiment analysis for widely spoken languages like English, the Malay language presents unique challenges that necessitate specialized approaches and tools.
This article delves into the intricacies of social sentiment analysis for the Malay language, exploring the challenges, methodologies to enhance accuracy, the integration of semi-supervised machine learning with lexicon-based approaches, current sentiment analysis tools, and the superior capabilities of the best social media analytic companies in this domain.
Overview & Challenges
The Malay language, spoken predominantly in Malaysia, Indonesia, and Brunei, has a rich linguistic structure with unique syntax, semantics, and morphology. This linguistic complexity poses significant challenges for sentiment analysis.
70% of the social media users in Malaysia expressed themselves in Malay, whilst the remaining 30% use English, Mandarin and a mix of both. Choosing the best social sentiment analytics tool is often not an easy decision for marketers and corporate communications personnel.
More recently, a review of popular social media monitoring tools and providers in Malaysia such as Meltwater, Isentia, Dataxet (NAMA), SOCIO and Brandwatch showed that 9 out of 10 tools did not achieve the sentiment accuracy required for marketers and PR personnel to enable better data-driven decisions.
“This is due to lack of a customized sentiment detection engine that is able to process sentiment from slangs, dialects and contextual sarcasms – which are often found in the social media datasets, primarily expressed in Malay.” said Ku Hazran a former PETRONAS employee from corporate communications division.
For global companies such as Meltwater, Talkwalker and Isentia, this limitation makes it a less popular option for businesses in Malaysia who are more inclined towards analyzing sentiment more accurately from social data listening tools.
“Marketing and PR agencies always fell for social listening tools, thinking that the automated sentiment analysis is the best solution to process Malay language found in social media and online media data” said Nazri Noordin, a former data analytics consultant in Accenture.
Never fall for tools. The automated social sentiment analytic tools by foreign companies in Malaysia produce large margin of error, often dishing out inaccurate sentiment as it did not detect local slangs or sarcasm.
So before choosing a social media monitoring or listening tool, these challenges remain an inherent industry problem. Here is a bit of context of behind the sentiment analysis challenge for foreign-based companies in Malaysia.
One primary issue is the prevalence of informal writing styles on social media platforms, where Malaysian users often employ colloquialisms, abbreviations, and code-switching between Malay and English. Similar to Singapore’s infamous Singlish (Singapore English), “Manglish” (Malay and English) incorporates variations and permutations in a long text found in most social media platforms – leading to “noisy” text data, complicating the extraction of accurate sentiments using traditional fully-automated social listening approach.
You need to clean the social media data to get better sentiment. But most social media listening companies in Malaysia are not equipped to handle vast amount of data and complexities of the task.
A study highlighted that Malaysians frequently use informal language on social media, making it challenging for natural language processing (NLP) applications to accurately interpret sentiments.
Another challenge is the limited availability of comprehensive sentiment lexicons and annotated datasets for the Malay language – that are not provided by popular social media listening / monitoring companies such as Isentia, Dataxet (NAMA), Brandwatch, Talkwalker or even Meltwater.
That leaves Malaysian-based companies such as Berkshire Media who are reported to be the best in social sentiment analysis, which uses its home-grown SENTIROBO® semi-supervised machine learning sentiment detection engine, with several publications published to test its superior accuracy performance in processing large complex Malay data from various social media channels.
Social listening tools in Malaysia pose more risks and danger when user rely on automated sentiment engine approach - often results in large error from wrongly tagged sentiment due to linguistic differences that exist in the Malay-English (Manglish) language.
In view most existing sentiment analysis tools are tailored for English, you should be aware there are limitations and do expect a low accuracy below 50%.
Additionally, the lack of standardized spelling and the use of regional dialects further exacerbate the complexity of sentiment analysis in Malay. So that brings us to the point of which type of social sentiment analytics is useful for you.
Types of Approaches to Achieve Higher Sentiment Accuracy

If you wish to find the best sentiment analytics or analysis tool, the first step is to choose a service-based social listening (monitoring) company or agency that specializes in Malay language, or at least have sufficient experience to process large social media data sets in Malay and English.
More importantly, to enhance sentiment analysis accuracy for the Malay language, researchers have explored various methodologies and these are the best type of sentiment analytics engine that can be useful when choosing a social listening service provider (company) in Malaysia:
- Lexicon-Based Approaches: These involve creating dictionaries of words annotated with their associated sentiments. While effective in certain contexts, their performance heavily depends on the comprehensiveness of the lexicon and may struggle with context-dependent sentiments. A systematic literature review revealed that 54% of studies on Malay sentiment analysis utilized lexicon-based methods.
- Machine Learning Approaches: Utilizing algorithms like Support Vector Machines (SVM), Naïve Bayes, and deep learning models, these approaches learn patterns from annotated datasets. However, their effectiveness is contingent upon the quality and size of the training data, which is often limited for the Malay language. Approximately 29% of studies employed machine learning techniques for Malay sentiment analysis.
- Hybrid Approaches: Combining lexicon-based methods with machine learning, hybrid approaches aim to leverage the strengths of both. They have shown promise in improving accuracy, with about 17% of studies adopting this methodology.
Using Semi-Supervised Machine Learning and Lexicon-Based Approaches

Given the scarcity of large annotated datasets for the Malay language, semi-supervised learning emerges as a viable AI-driven solution.
“This approach leverages a small set of labeled data alongside a larger pool of unlabeled data to train models, effectively mitigating the limitations posed by limited labeled datasets.” said Shahid Shayaa, Founder and CEO of Berkshire Media. He has published multiple research studies and regarded as highly experienced in the field of sentiment analysis and data processing.
He added that integrating semi-supervised learning with lexicon-based methods can further enhance sentiment analysis such as the one developed by Berkshire Media. SENTIROBO® is a powerful sentiment semi-supervised machine learning algorithm that far exceeds the performance of any automated social listening tools in Malaysia, making it amongst the best services you can opt for if you are running a large corporations or public listed companies.
At the end of the day, you need human-verified analysis to ensure the sentiment is correctly tagged regardless of the sentiment engine used by social listening companies.
For instance, lexicons can assist in the initial labeling of unlabeled data, which can then be refined through machine learning algorithms. This synergy allows for the model to capture both the nuanced linguistic features of Malay and the contextual sentiments expressed in informal text.
On that note, only few companies in Malaysia can do that. And Berkshire Media stands tall amongst companies that truly understand contextual Malay language, local sentiment and public perception across multiple domains (industries).
At the end of the day, the type of sentiment analysis tool (approach) that you choose is determined whether you are willing to pay for the extra premium associated with managed services, a form of outsourcing business model that are provided by some niche social listening companies in Malaysia such as Berkshire Media.
Current Sentiment Analysis Tools and Algorithms in the Market
Well, there are several sentiment analysis tools and algorithms are available, each with its strengths and limitations. But the most useful tip is to work with service-based companies as oppose to buying outright licensing for tools:
- Lexalytics Salience: This tool supports sentiment analysis based on part-of-speech tagging and other linguistic features. However, its effectiveness for the Malay language may be limited due to language-specific nuances.
- SentiRobo: Developed by Rohani and Shayaa of Berkshire Media, SENTIROBO® utilizes machine learning for sentiment analysis and has been tailored for social media content in Malaysia. Its adaptability makes it a potential candidate for Malay sentiment analysis, especially when integrated with language-specific lexicons, offering a high degree of accuracy at 99% on large datasets. This are fine-tuned based on domain-specific keywords and used by most large Government linked companies (GLCs) in Malaysia.
- Multilingual BERT Models: Leveraging deep learning, models like BERT have been trained on multiple languages, including Malay. These models can capture contextual nuances but require fine-tuning with domain-specific data to achieve optimal performance.
Why Berkshire Media Excels in Social Sentiment Analysis
As we stand tall over the last decade, Berkshire Media stands out in the realm of social sentiment analysis for the Malay language due to several key factors:
- Expertise in Malay Linguistics: Led by experts like Shahid Shayaa, who has extensively involved in research of sentiment analysis and social media analytics, Berkshire Media possesses deep insights into the linguistic intricacies of the Malay language in multiple domain (industries). Its long standing project engagement stretches to more than 5 years in each domain and its combined linguistic, public perception and reputation risk management have helped to strengthen is position as the market leader when it comes to analyzing Malay language from social and digital media platforms.
- Customized Hybrid Approaches: The company employs a hybrid methodology, integrating lexicon-based techniques with machine learning algorithms, tailored specifically for the Malay language. This approach ensures higher accuracy in sentiment detection by addressing language-specific challenges, backed by human-verified approaches to ensure 99% degree of accuracy in its reporting and insights. This makes it’s a rather unique social listening service provider you could find in Malaysia.
- Development of Proprietary Tools: Berkshire Media has developed proprietary tools optimized for Malay sentiment analysis. These tools incorporate comprehensive Malay sentiment lexicons and are trained on extensive datasets, enabling precise analysis of social media content. The coverage includes Indonesian language and other similar multi-lingual countries such as Singapore.
- Continuous Research and Development: The firm is committed to ongoing research, staying abreast of the latest advancements in NLP and sentiment analysis. Its founder, Shahid Shayaa, have published more than 10 studies and papers on sentiment analysis, consumer perception and behaviour – linking to its strong foundation of perfecting sentiment detection algorithm. This dedication ensures that Berkshire Media’s proprietary methodologies and tools remain at the forefront of technology, providing clients with cutting-edge solutions, and highly regarded as the pinnacle service provider for experienced marketers and PR consultants.
- Removal of Noise to Achieve Clean Data: The firm employes full-time analysts that are tasked to clean the social media data to achieve higher sentiment accuracy. This tedious, laborious but rather crucial task is what sets Berkshire Media apart from other sentiment analytics service providers or tools. As such, most social listening tools are not able to “clean the data” accurately without resorting to manual labour, or human-assisted approach.
Conclusions
In conclusion, social sentiment analysis for the Malay language presents unique challenges that require specialized approaches. By integrating semi-supervised machine learning with lexicon-based methods and leveraging tools tailored for linguistic nuances, organizations can achieve more.
Choosing automated social listening tools may not solve the sentiment analysis issue, so therefore users are advised to engage with a reputable social listening service provider such as Berkshire Media to formulate the right strategies with custom-made solutions that offer higher value when it comes to driving actionable insights for decision makers.
As data the new oil, technology alone may not be the answer. The real value of social listening lies in its ability to provide accurate sentiment analysis from online and social media data. It starts with a good engine and good people.
References:
- Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Wai, P. S., Chung, Y. W., Piprani, A. Z., & Al-Garadi, M. A. (2018). Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access, 6, 37807-37827. https://doi.org/10.1109/ACCESS.2018.2851311
academia.edu+5scholar.google.com+5researchgate.net+5 - Rahman, Z. A., & Omar, N. (2022). A systematic literature review of sentiment analysis in the Malay language and its approach. Proceedings of the 2022 4th International Conference on Artificial Intelligence and Robotics (ICAIR), 1-6. https://doi.org/10.1109/ICAIR55639.2022.9882722ieeexplore.ieee.org
- Shayaa, S., Ainin, S., Jaafar, N. I., Zakaria, S. B., Phoong, S. W., Yeong, W. C., Al-Garadi, M. A., Muhammad, A., & Piprani, A. Z. (2018). Linking consumer confidence index and social media sentiment analysis. Cogent Business & Management, 5(1), 1509424. https://doi.org/10.1080/23311975.2018.1509424
researchgate.net+3eprints.um.edu.my+3tandfonline.com+3 - Rohani, V. A., & Shayaa, S. (2015). Utilizing machine learning in sentiment analysis: SentiRobo approach. Proceedings of the 2015 International Symposium on Technology Management and Emerging Technologies (ISTMET), 120-124. https://doi.org/10.1109/ISTMET.2015.7359020scholar.google.com
- Rohani, V. A., Shayaa, S., & Babanejaddehaki, G. (2016). Topic modeling for social media content: A practical approach. Proceedings of the 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 351-356. https://doi.org/10.1109/ICCOINS.2016.7783232scholar.google.com
- Shayaa, S., Ainin, S., Jaafar, N. I., Zakaria, S. B., Phoong, S. W., Yeong, W. C., & Al-Garadi, M. A. (2017). Social media sentiment analysis of consumer purchasing behavior vs consumer confidence index. Proceedings of the 2017 International Conference on Big Data and Internet of Things (BDIOT), 91-96. https://doi.org/10.1145/3175684.3175699
researchgate.net+3scholar.google.com+3eprints.um.edu.my+3 - Rohani, V. A., & Shayaa, S. (2017). How social media influencers govern sentiment territory. International Journal of Applied Evolutionary Computation, 8(1), 49-60. https://doi.org/10.4018/IJAEC.2017010104scholar.google.com
- Shayaa, S., Wai, P. S., Chung, Y. W., Sulaiman, A., Jaafar, N. I., & Zakaria, S. B. (2017). Social media sentiment analysis on employment in Malaysia. Proceedings of the 8th Global Business and Finance Research Conference, 1-12.eprints.um.edu.my+2scholar.google.com+2tandfonline.com+2
- Shayaa, S., Sulaiman, A., Wai, P. S., Ashraf, M., Jaafar, N. I., Zakaria, S. B., & Al-Garadi, M. A. (2018). Consumer confidence index predict behavioral intention to purchase. European Proceedings of Social and Behavioural Sciences, 44, 1-9. https://doi.org/10.15405/epsbs.2018.08.02.1

About the Author
Shahid Shayaa is the founder and managing director of Berkshire Media. He specializes in data-driven communication strategies and insights using social data analytics, social media monitoring tools and machine learning text algorithms for more than 13 years. As an expert in the field of media monitoring, issue management and reputation risks for companies, his deep involvement in various research studies in this field and published various scientific papers on social data analytics, sentiment analysis and back-end algorithms on consumer sentiment, emotions and behaviour for marketers and campaign managers.