Bridging Bytes and Business: A Research Inquiry into Big Data's Strategic Significance

Liam Parker; Ava Brooks

Bridging Bytes and Business: A Research Inquiry into Big Data’s Strategic Significance (Published)

Article Author: Liam Parker & Ava Brooks

In the era of information-driven economies, “Bridging Bytes and Business: A Research Inquiry into Big Data’s Strategic Significance” serves as a scholarly exploration into the transformative nexus of data analytics and strategic business endeavors. This research delves into the strategic role played by big data in contemporary business landscapes, examining its multifaceted influence on operational efficiencies, decision-making processes, and the overall competitive advantage of organizations. Employing a meticulous research methodology that encompasses literature reviews, case studies, and real-world applications, this paper seeks to bridge the gap between the intricacies of data analytics and the strategic imperatives of modern businesses. By synthesizing insights from diverse industries, ranging from technology and finance to healthcare and beyond, our analysis aims to provide a comprehensive understanding of how big data acts as a bridge, connecting the analytical prowess of bytes with the strategic imperatives of business success.

Keywords: Big Data, Data Driven Decision Making, Data Insights, Data Revolution, Data Streams, Digital Innovation, Industry Disruption, Industry Transformation, Technological Advancements, data analytics

Author Identification Based on NLP (Published)

Article Author: Noura Khalid Alhuqail

The amount of textual content is increasing exponentially, especially through the publication of articles; the issue is further complicated by the increase in anonymous textual data. Researchers are looking for alternative methods to predict the author of an unknown text, which is called Author Identification. In this research, the study is performed with Bag of Words (BOW) and Latent Semantic Analysis (LSA) features. The “All the news” dataset on Kaggle is used for experimentation and to compare BOW and LSA for the best performance in the task of author identification. Support vector machine, random forest, Bidirectional Encoder Representations from Transformers (BERT), and logistic regression classification algorithms are used for author prediction. For first scope that have 20 authors, for each author 100 articles, the greatest accuracy is seen from logistic regression using bag-of-words, followed by random forest, also using bag-of-words; in all algorithms, bag-of-words scored better than LSA. Ultimately, BERT model was applied in this research and achieved 70.33% accuracy performance. For second scope that increase the number of articles till 500 articles per author and decrees the number of authors till 10, the BOW achieves better performance results with the logistic regression algorithm at 93.86%. Moreover, the best accuracy performance is with LR at 94.9% when merged the feature together and it proved that it is better than applied BOW and LSA individual, with an improvement by almost 0.1% comparing with BOW only. Ultimately, BRET achieved result by 86.56% accuracy performance and 0.51 log los.

Keywords: Analysis, Identification, NLP, author, data analytics

Empirical Study of Features and Unsupervised Sentiment Analysis Techniques for Depression Detection in Social Media (Published)

Article Author: Shahad Ayedh Alharthi

This study provides an empirical evaluation of diverse traditional learning, deep learning, and unsupervised techniques based on diverse sets of features for the problem of depression detection among Twitter and Reddit users. The main objective of this study is to investigate the most appropriate features, document representations, and text classifiers for the significant problem of depression detection on social media microblogs, such as tweets, as well as macroblogs, such as posts on Reddit. The study’s investigation will concentrate on the linguistic characteristics, blogging behavior, and topics for features, multi-word, and word embeddings for document representation as well as on unsupervised learning for text clustering. This study will select the best approaches in the literature as baselines to practically examine them on the depressive and non-depressive dataset of blogs designed for this work. The study’s integrations and ensembles of the selected baselines will be experimented as well to recommend a design for an effective social media blog classifier based on unsupervised learning and WE document representation. The study concluded that the experiments proved that a stacking ensemble of Adam Deep Learning with SOM clustering followed by Agglomerative Hierarchical clustering with topic features and pre-trained word2vec embeddings achieved an accuracy more than 92% on Twitter and Reddit depression analysis datasets.

Keywords: Analysis, Depression, Social media, computing., data analytics, empirical study

Predicting Student University Admission Using Logistic Regression (Published)

Article Author: Sharan Kumar Paratala Rajagopal

The primary purpose is to discuss the prediction of student admission to university based on numerous factors and using logistic regression. Many prospective students apply for Master’s programs. The admission decision depends on criteria within the particular college or degree program. The independent variables in this study will be measured statistically to predict graduate school admission. Exploration and data analysis, if successful, would allow predictive models to allow better prioritization of the applicants screening process to Master’s degree programme which in turn provides the admission to the right candidates.

Keywords: Logistic regression, college admission, data analytics, predictive analysis

data analytics