Kavli Affiliate: Wei Gao
| First 5 Authors: Yuxi Sun, Wei Gao, Jing Ma, Hongzhan Lin, Ziyang Luo
| Summary:
With the rise and widespread use of Large Language Models (LLMs), ensuring
their safety is crucial to prevent harm to humans and promote ethical
behaviors. However, directly assessing value valence (i.e., support or oppose)
by leveraging large-scale data training is untrustworthy and inexplainable. We
assume that emulating humans to rely on social norms to make moral decisions
can help LLMs understand and predict moral judgment. However, capturing human
values remains a challenge, as multiple related norms might conflict in
specific contexts. Consider norms that are upheld by the majority and promote
the well-being of society are more likely to be accepted and widely adopted
(e.g., "don’t cheat,"). Therefore, it is essential for LLM to identify the
appropriate norms for a given scenario before making moral decisions. To this
end, we introduce a novel moral judgment approach called textit{ClarityEthic}
that leverages LLMs’ reasoning ability and contrastive learning to uncover
relevant social norms for human actions from different perspectives and select
the most reliable one to enhance judgment accuracy. Extensive experiments
demonstrate that our method outperforms state-of-the-art approaches in moral
judgment tasks. Moreover, human evaluations confirm that the generated social
norms provide plausible explanations that support the judgments. This suggests
that modeling human moral judgment with the emulating humans moral strategy is
promising for improving the ethical behaviors of LLMs.
| Search Query: ArXiv Query: search_query=au:”Wei Gao”&id_list=&start=0&max_results=3