CYBY23 CORPUS FOR CYBERBULLYING WITH BYSTANDER INFORMATION

No Thumbnail Available
Date
Authors
LEBAI LUTFI, SYAHEERAH
ALFURAYJ, HAIFA
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
Information on the Cyberbullying Bystander (CYBY23) Dataset --------------------------------------------------- This dataset is labeled based on thread. The threads are separated by blank rows and can be organized by reply_id. Each thread includes the main tweet and corresponding replies. The dataset statistics are as follows: -------------------------------------------- # threads | # tweets(main tweets + replies) -------------------------------------------- 112 | 639 -------------------------------------------- The data is stored as a .xlsx. The data file contains 17 columns: tweet_id = id number of each tweet reply_id = conversation id number, identical for all tweets belong to the same thread text = text of tweet Cyberbullying classes label = prsent the level of aggression of the whole thread by three-point scale (0-1-2). 0 - aggression(not bullying) 1 - bullying with low aggression 2 - bullying with high aggression bystander roles label = present the role of bystander who reply to the main tweet,as (0-1-2-3). 0 - instigator- This person agree with the main post 1 - defender- This person disagree with the main post 2 - impartial - This person is not taking any sides 3 - other - This person posted unrelated replies created_at = refers to the date of creation retweet_count = indicates the overall count of retweets. favorite_count = indicates the total number of favorites. Insult ,Threat ,Identity_Attack , Profanity ,Toxicity ,and Severe_Toxicity = multiple features scores obtained through the implementation of (Perspective API). polarity,subjectivity,sentiment: multiple features scores obtained through the implementation of (textblob). Reference for the dataset ------------------------- We kindly request that you cite the following papers in any publications resulting from your work that utilize our dataset to ensure proper acknowledgment and recognition of the dataset's source: [Alfurayj, H. S., Yee, N. S., & Lutfi, S. L. (2023, October). Bystanders Unveiled: Introducing a Comprehensive Cyberbullying Corpus with Bystander Information. In TENCON 2023-2023 IEEE Region 10 Conference (TENCON) (pp. 1012-1017). IEEE.] [Alfurayj, H. S., & Lutfi, S. L. (2023, October). Exploring Bystanders' Roles in Labeled Cyberbullying Threads on Twitter: A preliminary analysis. In TENCON 2023-2023 IEEE Region 10 Conference (TENCON) (pp. 1018-1023). IEEE.] This citation is essential to acknowledge the efforts of the creators and contributors who have made this dataset available for public use. It also enables us to track the impact and reach of our dataset, which in turn helps us to secure funding for future data collection and sharing initiatives.
Keywords
BYSTANDER , CYBERBULLYING , TWEETS , MULTILABLES , BYSTANDER ROLES , FINE-GRAINED DETECTION
Citation