Evaluating Short-Term Temporal Fluctuations of Social Biases in Social Media Data and Masked Language Models

Kavli Affiliate: Yi Zhou

| First 5 Authors: Yi Zhou, Danushka Bollegala, Jose Camacho-Collados, ,

| Summary:

Social biases such as gender or racial biases have been reported in language
models (LMs), including Masked Language Models (MLMs). Given that MLMs are
continuously trained with increasing amounts of additional data collected over
time, an important yet unanswered question is how the social biases encoded
with MLMs vary over time. In particular, the number of social media users
continues to grow at an exponential rate, and it is a valid concern for the
MLMs trained specifically on social media data whether their social biases (if
any) would also amplify over time. To empirically analyse this problem, we use
a series of MLMs pretrained on chronologically ordered temporal snapshots of
corpora. Our analysis reveals that, although social biases are present in all
MLMs, most types of social bias remain relatively stable over time (with a few
exceptions). To further understand the mechanisms that influence social biases
in MLMs, we analyse the temporal corpora used to train the MLMs. Our findings
show that some demographic groups, such as male, obtain higher preference over
the other, such as female on the training corpora constantly.

| Search Query: ArXiv Query: search_query=au:”Yi Zhou”&id_list=&start=0&max_results=3