Why Does Your CoT Prompt (Not) Work? Theoretical Analysis of Prompt Space Complexity, its Interaction with Answer Space During CoT Reasoning with LLMs: A Recurrent Perspective

Kavli Affiliate: Xiang Zhang

| First 5 Authors: Xiang Zhang, Juntai Cao, Jiaqi Wei, Chenyu You, Dujian Ding

| Summary:

Despite the remarkable successes of Large Language Models (LLMs), their
fundamental Transformer architecture possesses inherent theoretical limitations
that restrict their capability to handle reasoning tasks with increasing
computational complexity. Chain-of-Thought (CoT) prompting has emerged as a
practical solution, supported by several theoretical studies. However, current
CoT-based methods (including ToT, GoT, etc.) generally adopt a
"one-prompt-fits-all" strategy, using fixed templates (e.g., "think step by
step") across diverse reasoning tasks. This method forces models to navigate an
extremely complex prompt space to identify effective reasoning paths. The
current prompt designing research are also heavily relying on trial-and-error
rather than theoretically informed guidance. In this paper, we provide a
rigorous theoretical analysis of the complexity and interplay between two
crucial spaces: the prompt space (the space of potential prompt structures) and
the answer space (the space of reasoning solutions generated by LLMs) in CoT
reasoning. We demonstrate how reliance on a single universal prompt (e.g. think
step by step) can negatively impact the theoretical computability of LLMs,
illustrating that prompt complexity directly influences the structure and
effectiveness of the navigation in answer space. Our analysis highlights that
sometimes human supervision is critical for efficiently navigating the prompt
space. We theoretically and empirically show that task-specific prompting
significantly outperforms unsupervised prompt generation, emphasizing the
necessity of thoughtful human guidance in CoT prompting.

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3

Read More