Tracing Text Provenance via Context-Aware Lexical Substitution

Kavli Affiliate: Feng Wang

| First 5 Authors: Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma

| Summary:

Text content created by humans or language models is often stolen or misused
by adversaries. Tracing text provenance can help claim the ownership of text
content or identify the malicious users who distribute misleading content like
machine-generated fake news. There have been some attempts to achieve this,
mainly based on watermarking techniques. Specifically, traditional text
watermarking methods embed watermarks by slightly altering text format like
line spacing and font, which, however, are fragile to cross-media transmissions
like OCR. Considering this, natural language watermarking methods represent
watermarks by replacing words in original sentences with synonyms from
handcrafted lexical resources (e.g., WordNet), but they do not consider the
substitution’s impact on the overall sentence’s meaning. Recently, a
transformer-based network was proposed to embed watermarks by modifying the
unobtrusive words (e.g., function words), which also impair the sentence’s
logical and semantic coherence. Besides, one well-trained network fails on
other different types of text content. To address the limitations mentioned
above, we propose a natural language watermarking scheme based on context-aware
lexical substitution (LS). Specifically, we employ BERT to suggest LS
candidates by inferring the semantic relatedness between the candidates and the
original sentence. Based on this, a selection strategy in terms of
synchronicity and substitutability is further designed to test whether a word
is exactly suitable for carrying the watermark signal. Extensive experiments
demonstrate that, under both objective and subjective metrics, our watermarking
scheme can well preserve the semantic integrity of original sentences and has a
better transferability than existing methods. Besides, the proposed LS approach
outperforms the state-of-the-art approach on the Stanford Word Substitution
Benchmark.

| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=10

Read More

Leave a Reply