Emil Bols
GPT-4 has taken the world by storm, impressing everyone with its capability to solve a wide range of tasks. Microsoft researchers recently released a paper [1] where they claim to have observed the first ‘sparks’ of artificial general intelligence in GPT-4.
In fact, it does not seem far-fetched to imagine GPT-4 soon becoming a virtual assistant performing any number of tasks for you. Considering how well GPT-4 understands human language, you can imagine it:
Managing your calendar by booking and rescheduling meetings
Booking flights and making itineraries for business trips
Help get ready for meetings by doing background research and preparing presentations
Potentially, it could even be customer-facing and work as a customer service agent, dealing with client complaints and directly solving their issues.
However, if we want to create such a virtual assistant, it is completely essential to make GPT-4 be able to ‘talk’ with other software, like search engines, emails, calendars and calculators. This is needed because GPT-4 still has many limitations based on its design. Indeed this concept is also highlighted as the next key use case for AI by Andrej Karpathy from OpenAI.
In this blog post, we will talk about why GPT-4 needs external tools and how we can integrate them to superpower its potential!
The limitations of GPT-4
Out-of-the-box GPT-4 has many limitations that prevent it from acting as a copilot. Some examples include:
◾️ It only knows things about its training data, which means it lacks knowledge of current events and internal data from your company. For example, if you ask it who won the last World Cup, it will say France, as it believes you are referring to the 2018 World Cup.
◾️ It can fail basic reasoning tasks, especially in cases where humans might also get fooled. For example, if you ask if 30 kg of feathers is heavier than 10 kg of iron, it will reply no, as it assumes feathers are very light, forgetting to consider that 30 kg is more than 10 kg.
◾️ Unlike normal computer programs, it struggles with math. While it can do simple math, like a human, more difficult calculations will fail. For example, when asked what 7457 is multiplied by the square root of 23, it will not find the correct result of 35762.5
While these limitations exist, each can be overcome using external tools and clever prompting.
Making GPT-4 think it through!
In order to solve more complicated problems, GPT-4 needs to be able to think logically. Often GPT-4 does this really well, but there are two aspects of its design that make it more difficult.
GPT-4 generates its answers bit by bit, much like a human speaking before thinking. It cannot go back and edit its answer if it makes a mistake. If you ask a complex question to a human, you think it through first. GPT-4 is not designed to do this.
GPT-4 is a large language model trained to fill in the next words of the text that it is reading. If it accidentally makes the wrong conclusion at the start of its answer, it will try to complete the rest of the text to fit with the wrong conclusion. It doesn’t like to contradict the text that comes before.
The combination of these effects means that GPT-4 might blurt out a wrong answer and then make arguments that stubbornly support the wrong conclusion. So how can we deal with this?
A simple but very powerful method is Chain of Thought prompting [2]. We can request that GPT-4 write down its reasoning step by step before it answers the question. This significantly helps GPT-4 perform logical reasoning. For example, using this approach, it easily solves the below feather and iron question.
In practice, you might not want GPT-4 to always explain its logic to the user before it makes an answer. However, we can simply provide GPT-4 with a notepad hidden from the user, where it can write down its thoughts before answering the question.
Beyond solving simple logic puzzles, this significantly increases the amount of tasks GPT-4 can solve.
However, even with Chain of Thought (CoT) prompting, GPT-4 can still not answer the two remaining questions from the introduction regarding current events and math. This could be overcome by letting GPT-4 ‘talk’ with software like APIs. In the same way, software can empower humans to do tasks beyond our normal capabilities, it can vastly increase the possibilities for tasks that GPT-4 can perform.
The two remaining problems in the introduction can then be solved by enabling GPT-4 to use a search engine and a calculator.
Let GPT-4 do the searching for you!
By allowing GPT-4 to use search engines, you can give it access to information about current events and even internal data. In the last blog post, we solved this by searching through our data using the question as a search query and then providing the results to GPT-4.
However, when using Chain of Thought prompting, it is possible to go beyond that and let GPT-4 use the search tool itself. This is advantageous when dealing with a complex question where you might need multiple searches to find the answer.
For example, imagine that you had to analyze your internal financial data. If you searched, ‘How has our revenue evolved in the period 2018-2022?’, you would only find the answer if that specific analysis had already been done in the documents. Therefore, the method from the previous blog post would most likely fail.
However, by letting GPT-4 search itself, it can analyze the question to determine what information is needed. It can then decide to make multiple searches like ‘revenue 2022’, ‘revenue 2021’ and so on. After obtaining the revenue for each year, GPT-4 can then combine the information and explain the trend to the user.
Superpowering GPT-4
Now imagine that instead of analyzing the revenue, you would want to calculate some more complex KPI for each year. As highlighted, GPT-4 struggles with math, but we can provide it with a calculator!
Imagine posing a question to GPT-4 like ‘How has this KPI evolved in the period 2018-2022 for my company’. By using Chain of Thought prompting, GPT-4 can write down a list of tasks to execute and decide when to use tools. This chain of thought might look like this:
It can then execute the plan step by step to obtain the answer. The execution plan can be hidden from the user, with only the answer and relevant findings being communicated.
We are, however, not limited to just providing GPT-4 with a calculator. For example, we could also let it have access to a tool for making graphs, letting it make a figure showing the evolution of the KPI. With these tools, you could imagine GPT-4 generating an accurate financial analysis from scratch based on your documents.
In short
GPT-4 is a huge step forward in AI, showing the first signs of a machine being able to reason. It has the potential to act as an assistant for any number of tasks, whether that be for analyzing data or acting as a secretary. However, it has many limitations that need to be overcome if you want to use its full potential. Using methods like Chain of Thought prompting, it can derive complex plans and use tools. These tools can be almost anything, and they can enable it to do math, execute code or use APIs like Google Maps or Wolfram Alpha. Allowing GPT-4 to use external tools vastly increases the tasks it is able to perform, making the possibilities seemingly endless!
Contact Us