Robust-Wide: Robust Watermarking against Instruction-driven Image Editing

Kavli Affiliate: Ting Xu

| First 5 Authors: Runyi Hu, Jie Zhang, Ting Xu, Jiwei Li, Tianwei Zhang

| Summary:

Instruction-driven image editing allows users to quickly edit an image
according to text instructions in a forward pass. Nevertheless, malicious users
can easily exploit this technique to create fake images, which could cause a
crisis of trust and harm the rights of the original image owners. Watermarking
is a common solution to trace such malicious behavior. Unfortunately,
instruction-driven image editing can significantly change the watermarked image
at the semantic level, making current state-of-the-art watermarking methods
ineffective. To remedy it, we propose Robust-Wide, the first robust
watermarking methodology against instruction-driven image editing.
Specifically, we follow the classic structure of deep robust watermarking,
consisting of the encoder, noise layer, and decoder. To achieve robustness
against semantic distortions, we introduce a novel Partial Instruction-driven
Denoising Sampling Guidance (PIDSG) module, which consists of a large variety
of instruction injections and substantial modifications of images at different
semantic levels. With PIDSG, the encoder tends to embed the watermark into more
robust and semantic-aware areas, which remains in existence even after severe
image editing. Experiments demonstrate that Robust-Wide can effectively extract
the watermark from the edited image with a low bit error rate of nearly 2.6%
for 64-bit watermark messages. Meanwhile, it only induces a neglectable
influence on the visual quality and editability of the original images.
Moreover, Robust-Wide holds general robustness against different sampling
configurations and other popular image editing methods such as
ControlNet-InstructPix2Pix, MagicBrush, Inpainting, and DDIM Inversion. Codes
and models are available at https://github.com/hurunyi/Robust-Wide.

| Search Query: ArXiv Query: search_query=au:”Ting Xu”&id_list=&start=0&max_results=3