Online Harassment is a major part of online life, especially for women and minorities. There is no straightforward, single solution to the problem. Ultimately, it will require combined efforts in artificial intelligence, computational linguistics, sociology, communications, social network analysis, interface design, and online communities researchers to understand the full scope of the issue and create technical solutions to address it.
Before any of that work can begin, it is necessary to understand exactly what is happening. That means all interested researchers will need a large, diverse, representative corpus of online harassment data to work with.
The purpose of this workshop is to take a major step forward in building an online harassment corpus.
We hope to bring together researchers who have this data in one form or another. This could include organizations with large volumes of blocked or flagged content, researchers who have been building their own databases of offensive or hateful content, and those who have large general databases of social media content that from which they can extract harassing information.
Our workshop goal will be as follows:
- Develop a shared vocabulary of harassment types (hate speech, individual harassment, threats, deeply offensive language, etc.).
- Develop a codebook, with examples, that allows qualitative researchers to hand-label these messages
- Identify simple linguistic patterns, hashtags, and other features that will aid in the search for harassing messages
- Map codebooks from existing projects to our group codebook
- Actually create some hand-coded data as a first step toward building a large community corpus of harassing content
We anticipate this workshop will be the inauguration of a virtual research community where volunteers can add content, reviewed and approved by other members, to an ever-growing research corpus.
This will be a true working workshop. We will spend the morning reviewing available data and developing a typology of online harassment, and the afternoon actually coding data and refining a codebook. The dataset, codebook, and mappings to existing typologies will be published at the end of the workshop.
Anyone interested in the topic can participate in the workshop. You do not need to submit a paper to attend and participate!
We will be accepting short papers (up to 4 pages) to address the following topics
- Available social media datasets that could be part of this repository
- Code books, coding schemes, taxonomies, and typologies of online harassment, generally or of a particular sub-type
- Other resources that may be useful to building a central repository of online harassment for research
Note: at least one author of each accepted paper must attend the workshop. All participants must register for both the workshop and for at least one day of the conference.
Submissions should be in standard ACM format.
Please submit through Easychair at https://easychair.org/conferences/conference_info.cgi?a=12704859
Deadlines: paper submissions by 21 December 2016