Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space in LLMs

Kavli Affiliate: Xiang Zhang

| First 5 Authors: Xiang Zhang, Juntai Cao, Jiaqi Wei, Chenyu You, Dujian Ding

| Summary:

Despite the remarkable successes of large language models (LLMs), the
underlying Transformer architecture has inherent limitations in handling
complex reasoning tasks. Chain-of-thought (CoT) prompting has emerged as a
practical workaround, but most CoT-based methods rely on a single, generic
prompt such as "think step by step", with no task-specific adaptation. These
approaches expect the model to discover an effective reasoning path on its own,
forcing it to search through a vast prompt space. In contrast, several studies
have explored task-specific prompt designs to boost performance. However, these
designs are typically developed through trial and error, lacking theoretical
grounding. As a result, prompt engineering remains largely ad hoc and unguided.
In this paper, we provide a theoretical framework that explains why some
prompts succeed while others fail. We show that prompts function as selectors,
extracting task-relevant information from the model’s full hidden state during
CoT reasoning. Each prompt defines a unique trajectory through the answer
space, and the choice of trajectory is crucial for task performance and future
navigation within the space. We analyze the complexity of finding optimal
prompts and characterize the size of the prompt space for a given task. Our
theory reveals principles behind effective prompt design and shows that naive
CoT-using self-guided prompts like "think step by step"-can severely hinder
performance. Through experiments, we show that optimal prompt search can lead
to more than a 50% improvement on reasoning tasks, providing a theoretical
foundation for prompt engineering.

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3