Revolutionize Your Thinking: Chain of Draft - The Efficient Alternative to Chain of Thought

Discover how Chain of Draft, a new prompting strategy, offers an efficient alternative to Chain of Thought. Learn how it can revolutionize your thinking process, reduce costs, and improve latency, while delivering comparable or better performance. Explore the insights and examples that showcase the power of this innovative approach.

22 mars 2025

Discover a groundbreaking prompting strategy that outperforms the popular Chain of Thought approach while significantly reducing cost and latency. This innovative technique, called Chain of Draft, offers a more efficient way for AI models to tackle complex problems, closely mirroring the human thought process.

How Chain of Thought Works and Its Limitations
What is Chain of Draft and How It Differs from Chain of Thought
Implementing Chain of Draft: No Model Updates Required
Comparing Performance: Chain of Thought vs. Chain of Draft
Conclusion

How Chain of Thought Works and Its Limitations

Chain of Thought enables language models to break down problems into step-by-step solutions, mimicking the structured reasoning process that humans go through. This approach allows models to think through problems systematically and arrive at the final answer.

However, Chain of Thought has some limitations:

Computational Overhead: The step-by-step reasoning process required by Chain of Thought demands substantially more computational resources at inference time, leading to verbose outputs and higher latency.
Lack of Awareness: Chain of Thought models often lack awareness regarding task complexity, leading to overthinking even on simple problems and resulting in unnecessary resource consumption.
Perceived Latency: Techniques like "streaming" aim to reduce perceived latency by incrementally providing partial outputs as they are generated, but this approach cannot fully mitigate overall latency or computational cost.
Unsuitability for Intermediate Steps: In some cases, the intermediate steps of Chain of Thought reasoning are not intended to be shown to end-users, making the approach unsuitable for certain applications.

To address these limitations, researchers have proposed alternative techniques, such as "Skeleton of Thought," which aims to guide language models to generate a concise outline of the answer before filling in the details. However, these approaches still do not fully solve the problem of computational cost and latency.

What is Chain of Draft and How It Differs from Chain of Thought

Chain of Draft is a new prompting strategy that aims to address the inherent slowness and verbosity of the Chain of Thought approach. While Chain of Thought enables models to break down problems step-by-step and reflect on potential solutions, it requires a large number of tokens, leading to higher computational costs and latency.

In contrast, Chain of Draft encourages language models to generate concise, dense information outputs at each step, mimicking how humans typically approach problem-solving by relying on concise drafts or shorthand notes. Instead of providing verbose intermediate steps, Chain of Draft prompts the model to capture only the essential insights needed to progress towards the solution.

This approach has several key advantages:

Efficiency: Chain of Draft uses a fraction of the tokens and latency compared to Chain of Thought, making it more cost-effective and faster to execute.
Conciseness: The prompting strategy limits the model to a maximum of 5 words per thinking step, resulting in more concise and focused outputs.
Scalability: By reducing the computational resources required, Chain of Draft can be more easily scaled to handle larger and more complex problems.

The paper demonstrates that Chain of Draft can perform on par with or even exceed the performance of Chain of Thought, while being significantly more efficient in terms of both cost and latency. This new prompting technique represents a significant advancement in the field of large language models, allowing for more effective and streamlined problem-solving.

Implementing Chain of Draft: No Model Updates Required

Implementing the Chain of Draft prompting strategy is remarkably simple, as it does not require any updates to the underlying model. It is solely a prompt engineering technique that can be applied to existing language models.

The key steps are:

Standard Prompt: Answer the question directly, without any preamble, explanation, or reasoning.
Chain of Thought Prompt: Instruct the model to "think step-by-step to answer the following question" and return the answer at the end, separated by four hash symbols (####).
Chain of Draft Prompt: Instruct the model to "think step-by-step but only keep a minimum draft for each thinking step with five words at most" and return the answer at the end, separated by four hash symbols (####).

By providing this simple prompt guidance, the model is encouraged to generate concise, essential information at each step, rather than verbose, intermediate reasoning. This approach closely mimics how humans typically approach problem-solving, relying on concise notes and drafts to capture key insights.

The beauty of this technique is that it can be easily implemented without any changes to the model architecture or the need for fine-tuning or reinforcement learning. It is a pure prompt engineering strategy that can be applied to a wide range of language models, as demonstrated in the results across different benchmarks and models.

Comparing Performance: Chain of Thought vs. Chain of Draft

The paper presents a comparison of the performance between the traditional Chain of Thought approach and the newly proposed Chain of Draft technique. The key findings are as follows:

GPT-4 Performance: Using the standard prompt, GPT-4 achieved 53% accuracy. With Chain of Thought, the accuracy increased to 95.4%, but at the cost of 200 tokens and 4.2 seconds of latency. In contrast, Chain of Draft achieved 91.1% accuracy, using only 43 tokens and 1 second of latency.
Chinook 3.5 Sonet Performance: Similar trends were observed with Chinook 3.5 Sonet. The standard prompt resulted in 64% accuracy, Chain of Thought achieved 95% accuracy but with higher token usage and latency, while Chain of Draft maintained 91% accuracy with significantly reduced tokens and latency.
Common Sense Reasoning: For common sense reasoning tasks, Chain of Draft and Chain of Thought performed similarly, with Chain of Draft slightly outperforming Chain of Thought (90% vs. 88%) while using less than half the tokens and lower latency.
Sports Understanding: In the sports understanding category, Chain of Draft outperformed Chain of Thought for both GPT-4 and Chinook 3.5 Sonet, again with a fraction of the token usage and latency.
Coin Flip Evaluation: Both Chain of Thought and Chain of Draft achieved 100% accuracy for the coin flip evaluation task, but Chain of Draft demonstrated better efficiency in terms of token usage and latency.

The key takeaway is that Chain of Draft, a simple prompt-based technique, can achieve performance on par with or exceeding Chain of Thought, while significantly reducing the computational resources and latency required. This makes Chain of Draft a more efficient and practical approach for real-world applications.

Conclusion

The introduction of chain of draft as a new prompting strategy represents a significant advancement in the field of large language models (LLMs). Compared to the traditional Chain of Thought approach, chain of draft offers several key advantages:

Efficiency: Chain of draft generates concise and dense information outputs at each step, reducing the overall number of tokens required and resulting in a fraction of the computational cost and latency compared to Chain of Thought.
Performance: Despite the reduced verbosity, chain of draft is able to maintain performance on par with or even exceeding Chain of Thought across various benchmarks, including math, common sense reasoning, and sports understanding.
Simplicity: Implementing chain of draft does not require any model updates, fine-tuning, or reinforcement learning. It can be easily adopted by simply updating the prompt, making it a highly accessible and versatile prompting strategy.

The results presented in the paper demonstrate the power of chain of draft in addressing the inherent slowness and resource-intensive nature of Chain of Thought. By emulating the human approach of relying on concise drafts and shorthand notes, chain of draft enables LLMs to focus on advancing towards solutions without the overhead of verbose reasoning.

This new prompting technique represents a significant step forward in the evolution of thinking models, paving the way for more efficient and cost-effective AI-powered solutions. As the field of AI continues to advance, innovations like chain of draft will play a crucial role in driving the development of more practical and scalable applications.

FAQ

What is Chain of Thought and how does it compare to Chain of Draft?

How does Chain of Draft work?

How does the performance of Chain of Draft compare to Chain of Thought?

How easy is it to implement Chain of Draft?