Table of content

Chat with your data using GPT-4!
Efficient information gathering with GPT-4!
An internal perspective
But it’s not all sunshine and roses: The limitations of GPT-4
AI Retriever to the rescue
Our own Superlinear GPT-4!
In short…

Table of content

Table of content

Chat with your data using GPT-4!
Efficient information gathering with GPT-4!
An internal perspective
But it’s not all sunshine and roses: The limitations of GPT-4
AI Retriever to the rescue
Our own Superlinear GPT-4!
In short…

Chat with your data using GPT-4!

Chat with your data using GPT-4!

21 Mar 2023

Learn how GPT-4 can be tailored to your data and help increase efficiency in your business. Get insights on incorporating GPT-4 to solve complex tasks and leverage its capabilities for innovative solutions.

OpenAI recently released the newest version of their GPT model, GPT-4. This model is a significant upgrade from the already powerful ChatGPT. The impressive ability of GPT-4 to answer complex questions on a wide range of topics has made many headlines. It has memorized an enormous amount of information which it learned from large online text datasets.

Beyond memorization, it can even do creative tasks such as coding, copywriting and inventing new recipes. All these skills enable a large set of applications for GPT-4. As Superlinear is a contributor to the OpenAI evaluation code [1], we have already been granted priority access to the GPT-4 API. This allows us to build new projects that utilize this powerful tool. 

How can you best incorporate GPT-4 to increase efficiency? And how can you tailor it to your own data? That is exactly what we want to help you shed light on with this blog post.

Efficient information gathering with GPT-4!

One area where GPT-4 can superpower your company is by providing information. Many organizations, particularly big ones, struggle with employees spending significant amounts of time searching for the information they need to do their jobs. While internal search tools exist, employees often rely on 1-to-1 communication through email or meetings to retrieve information. According to McKinsey, knowledge workers spend 19% of their time trying to find information [2], resulting in wasted time and inefficiency. 

However, imagine if GPT-4 learned all the internal information of the organization. Then it would be able to instantly answer any questions of employees directly, providing quick and easy access to information while eliminating the need for employees to search through tons of documents themselves and have endless back-and-forth communication with colleagues!

Furthermore, employees regularly create new documents and data that can quickly become lost. You could have GPT-4 read every new document produced so that it would always provide up-to-date information, reducing the need for meetings and emails to stay informed.

Andrej Karpathy, former director of AI at Tesla and now researcher at OpenAI, also sees Large Language Models as the way that we'll interact with information in the future:

An internal perspective

Here at Superlinear, we have the same challenges. We have worked on a wide variety of interesting Machine Learning problems, but newcomers to the company might not necessarily be aware of the specific details of our past projects and what was learned during these. Usually, it would be necessary for a new member to go around asking others for the needed details, which can be quite a hassle. Luckily, we already have many case studies and blog posts on our website describing past AI problems and solutions that we dealt with in detail. Additionally, we have an internal technical guide to the best Machine Learning and project management practices to be followed within Superlinear. Therefore, it would be great if we could load all this information into GPT-4, making it possible to ask our own AI helper for advice.

But it’s not all sunshine and roses: The limitations of GPT-4

While GPT-4 is an extremely powerful tool, it does have some disadvantages that are related to its training:

  • GPT-4 is a large language model that is trained to predict missing words in sentences. It's not designed to be factual, so it may generate answers that sound probable but are entirely made up by the AI.

  • GPT-4 doesn't know where the information it has learned comes from, so it can't provide sources for the answers it generates. This makes it difficult for a user to verify the statements it makes.

  • Training GPT-4 on your own data is very hard because it uses an approach called Reinforcement Learning from Human Feedback. This requires the model to be trained with human helpers that select good answers by hand, which is then provided as feedback to the AI. This process needs a lot of effort and money.

All these challenges make it hard to use GPT-4 out of the box if you want to apply it to your internal knowledge in a dependable way.

AI Retriever to the rescue

Luckily, there exists a simple approach that eliminates all these three challenges in one swoop, called open-book generative question answering. This method adds another AI called the ‘retriever’ (i.e. the open book) to help GPT-4. This retriever will search through the organization's documents in order to find the most relevant information needed for answering the question. GPT-4 is then forced to only use the provided information to answer the question, which prevents it from making something up. While answering, GPT-4 can also be forced to reference the source of the information, which then enables the user to easily verify the answer. Finally, this approach requires no further training of GPT-4. In other words, this solution saves a lot of time and effort while providing a better and more useful result.

Our own Superlinear GPT-4!

We built a question-answering AI based on the data found on the Superlinear website as well as the information found in our internal documents. This new tool allows newcomers to ask questions about Superlinear, with the AI providing concise answers and linking to where the information can be found. It works by having a retriever search through our documents, finding relevant ones, and then providing them to GPT-4 to generate an answer. In an upcoming blog post, we will provide all the technical details as to how we built this AI.

One of the most important aspects of building a good question-answering AI is managing to connect it with great search. Without being provided with the right information, even GPT-4 can't answer your questions about internal data. There are several key challenges in doing this:

  • Many data types like word documents, SharePoint, PDFs or websites were designed to be read by humans, not computers. These are typically called unstructured data. Good search requires careful thought of the processing of this input data to make it easier for computers to understand.

  • Your internal data might use terms and expressions specific to your company that the AI retriever is not familiar with, making it less effective. However, by fine-tuning the retriever to your own data, it can learn these, improving the quality of the search. Such a fine-tuning dataset is easily created using customer data or FAQ documents, which many organizations already have.

In short…

AI has the potential to significantly improve information retrieval for both small and large organizations that are accumulating large amounts of unstructured data. These methods have the potential to eliminate the inefficient and tedious step of acquiring information in 1-to-1 correspondence with other members of the organization. Instead, questions can be formulated to an AI helper that quickly provides the information without the need to search through internal documents. This is today possible with state-of-the-art AI models like GPT-4. However, the key challenge of such an implementation is building a great system for search that works for you. This requires expertise and careful work on the data itself, and it can often be necessary to make a custom AI retriever fine-tuned to your internal knowledge. However, when done right, it can hugely improve efficiency for everyone!

[1] https://github.com/openai/evals
[2] https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/the-social-economy

Author:

Emil Bols

Solution Architect

RAGLite tutorial

Article

This guide walks you through the process of building a powerful RAG pipeline using RAGLite. From configuring your LLM and database to implementing advanced retrieval strategies like semantic chunking and reranking, this guide covers everything you need to optimize and scale your RAG-based applications.

RAGLite tutorial

Article

This guide walks you through the process of building a powerful RAG pipeline using RAGLite. From configuring your LLM and database to implementing advanced retrieval strategies like semantic chunking and reranking, this guide covers everything you need to optimize and scale your RAG-based applications.

RAGLite tutorial

Article

This guide walks you through the process of building a powerful RAG pipeline using RAGLite. From configuring your LLM and database to implementing advanced retrieval strategies like semantic chunking and reranking, this guide covers everything you need to optimize and scale your RAG-based applications.

RAGLite

Article

Discover RAGLite, a lightweight toolkit that revolutionizes Retrieval-Augmented Generation (RAG). With features like semantic chunking, adaptive retrieval, and hybrid search, RAGLite overcomes traditional RAG limitations, simplifying workflows and ensuring fast, scalable, and accurate information retrieval for real-world AI applications.

RAGLite

Article

Discover RAGLite, a lightweight toolkit that revolutionizes Retrieval-Augmented Generation (RAG). With features like semantic chunking, adaptive retrieval, and hybrid search, RAGLite overcomes traditional RAG limitations, simplifying workflows and ensuring fast, scalable, and accurate information retrieval for real-world AI applications.

RAGLite

Article

Discover RAGLite, a lightweight toolkit that revolutionizes Retrieval-Augmented Generation (RAG). With features like semantic chunking, adaptive retrieval, and hybrid search, RAGLite overcomes traditional RAG limitations, simplifying workflows and ensuring fast, scalable, and accurate information retrieval for real-world AI applications.

worker doing product defect detection in manufacturing

Article

Unsupervised anomaly detection advances quality control in manufacturing by enabling efficient and flexible product defect detection with a minimal labelling effort and the ability to handle changing products and various defect types.

worker doing product defect detection in manufacturing

Article

Unsupervised anomaly detection advances quality control in manufacturing by enabling efficient and flexible product defect detection with a minimal labelling effort and the ability to handle changing products and various defect types.

worker doing product defect detection in manufacturing

Article

Unsupervised anomaly detection advances quality control in manufacturing by enabling efficient and flexible product defect detection with a minimal labelling effort and the ability to handle changing products and various defect types.

Contact Us

Ready to tackle your business challenges?

Stay Informed

Subscribe to our newsletter

Get the latest AI insights and be invited to our digital sessions!

Stay Informed

Subscribe to our newsletter

Get the latest AI insights and be invited to our digital sessions!

Stay Informed

Subscribe to our newsletter

Get the latest AI insights and be invited to our digital sessions!

Locations

Brussels HQ

Central Gate

Cantersteen 47



1000 Brussels

Ghent

Planet Group Arena

Ottergemsesteenweg-Zuid 808 b300

9000 Gent

© 2024 Superlinear. All rights reserved.

Locations

Brussels HQ

Central Gate

Cantersteen 47



1000 Brussels

Ghent

Planet Group Arena
Ottergemsesteenweg-Zuid 808 b300
9000 Gent

© 2024 Superlinear. All rights reserved.

Locations

Brussels HQ

Central Gate

Cantersteen 47



1000 Brussels

Ghent

Planet Group Arena
Ottergemsesteenweg-Zuid 808 b300
9000 Gent

© 2024 Superlinear. All rights reserved.