From Anchors to Answers: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models

Kavli Affiliate: Dan Luo

| First 5 Authors: , , , ,

| Summary:

Enabling large language models (LLMs) to effectively process and reason with
graph-structured data remains a significant challenge despite their remarkable
success in natural language tasks. Current approaches either convert graph
structures into verbose textual descriptions, consuming substantial
computational resources, or employ complex graph neural networks as tokenizers,
which introduce significant training overhead. To bridge this gap, we present
NT-LLM, a novel framework with an anchor-based positional encoding scheme for
graph representation. Our approach strategically selects reference nodes as
anchors and encodes each node’s position relative to these anchors, capturing
essential topological information without the computational burden of existing
methods. Notably, we identify and address a fundamental issue: the inherent
misalignment between discrete hop-based distances in graphs and continuous
distances in embedding spaces. By implementing a rank-preserving objective for
positional encoding pretraining, NT-LLM achieves superior performance across
diverse graph tasks ranging from basic structural analysis to complex reasoning
scenarios. Our comprehensive evaluation demonstrates that this lightweight yet
powerful approach effectively enhances LLMs’ ability to understand and reason
with graph-structured information, offering an efficient solution for
graph-based applications of language models.

| Search Query: ArXiv Query: search_query=au:”Dan Luo”&id_list=&start=0&max_results=3