Science and Research |
|
SAR Journal |
|
ISSN 2619-9955 | eISSN 2619-9963 | Frequency:4/year | Peer Reviewed: Yes | UIKTEN Publisher | ![]() |
Profiling Noisy Social Media Data for Sentiment Applications: A Visual and Analytical Framework
Daniela Pencheva
© 2025 Daniela Pencheva, published by UIKTEN. This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International. (CC BY-NC 4.0).
Citation Information: SAR Journal. Volume 8, Issue 3, Pages 213-224, ISSN 2619-9955, https://doi.org/10.18421/SAR83-01, September 2025.
Received: 30 July 2025.
Revised: 05 September 2025.
Accepted: 12 September 2025.
Published: 27 September 2025.
Abstract:
This article presents a practical approach for analyzing noisy social media data in sentiment analysis. It focuses on identifying and mitigating the impact of elements such as emojis, repeated characters, hashtags, links, and informal expressions, which often distort results. A simple lexicon-based method is applied to assess sentiment both before and after data cleansing. The objective is to illustrate how data quality influences sentiment interpretation, supported by clear examples and interactive dashboards. Results confirm that noisy input affects sentiment scores and can lead to misleading conclusions. The study outlines a structured process for profiling and cleaning textual data, demonstrating how business intelligence tools assist in monitoring and improving data quality. The proposed approach is applicable in practice and enhances the reliability and transparency of sentiment analysis. It also provides a foundation for future advancements in data preprocessing within natural language processing tasks.
Keywords – Data quality, data preprocessing, noisy data, business intelligence sentiment analysis.