Get started
IMPACT 2025
Resources/Blog/

Does RAG still matter?

Does RAG still matter?

  • Generative AI
  • Data Contextualization

Published at: 4/25/2024, 1:26:00 PM

Lars Moastuen

Product Management, Cognite

Spoiler: it does.

We don't mean to dissuade anyone from reading this fully, but allow us to cut to the chase and deliver the payload upfront for those with more limited context windows. A scalable, trustworthy, and safe generative AI implementation needs an excellent RAG solution to feed relevant, trustworthy, and up-to-date information into large language models.

Now, let’s get into the details…

In 2023, ChatGPT was introduced to the World, and soon after, retrieval-augmented generation (RAG) became a ubiquitous term.

Some companies are training Large Language Models (LLMs) from scratch, which requires a large investment. However, there has also been an influx of innovative methods for efficiently fine-tuning models, such as Parameter-Efficient Fine-Tuning (PEFT) [1], QLoRA [2], and prompt tuning [3] have become popular. These techniques allow companies to “train” models on their data without large investments in data sets or hardware for model training.

In addition, context window size (the capacity for LLMs to comprehend and retain knowledge) has greatly increased:

Tokens comparisons between the most well-known LLMs.

Google has reported that they are experimenting with a context window of 10 million tokens. Given this development, RAG is surely obsolete.

Or is it?

Differences between LLMs answering user queries with and without Retrieval Augmented Generation


Although the new techniques open up new possibilities, RAG remains a key ingredient in an effective enterprise generative AI framework. There are several reasons why fine-tuning and prompt tuning doesn’t replace the need for a good RAG system:

  1. Challenges in source attribution: Determining the origin of information from the fine-tuned LLM can be difficult, making it challenging to differentiate between fabrications and factual data.
  2. Incorporating new information or removing outdated information necessitates retraining or re-fine-tuning, translating to additional time and financial investment.
  3. Issues with information access control: Lacking mechanisms to manage which data is accessible by different users.
  4. Ensuring data integrity: Compiling large, accurate data sets demands thorough data verification to guarantee the information remains current and correct.
  5. Information overload: Like humans overwhelmed by too much information, a large context window holds vast data, but LLMs might struggle to pinpoint what's relevant, increasing the risk of hallucination and inaccuracies [4].

LLMs with a very large context window are a valuable addition to the toolbox, but it's not a universal remedy. Although these models open up for variety of solutions, they have drawbacks:

  1. Performance: A long context window requires additional processing. This results in a time-to-first token (TTFT is the measurement of how long it takes before the LLM starts “typing” its response) that can be minutes. This is unacceptable for many, or even most, use cases.
  2. Scalability: With additional processing needs comes reduced scalability. LLM providers operate with quotas. Google allows two requests per minute for Gemini Pro 1.5 (in preview). Anthropic operates with a limit of 10,000,000 tokens per day and 40,000,000 tokens per minute for its top-tier subscription. Although there are always exceptions to such quota restrictions for large enterprises, the numbers are a clear testament to the fact that inference capacity is scarce.
  3. Cost: A natural consequence of reduced performance and scalability is increased cost. A single request to Anthropic Claude 3 with 200,000 tokens costs about $3.00 USD. This might be an acceptable price for some use cases, but for automated processing pipelines triggering hundreds or thousands of requests on a daily or hourly schedule, the cost will quickly explode.
  4. Environmental impact: Training LLMs is an extremely power-intensive operation. Although energy efficiency is improving due to new hardware and algorithm improvements, interference remains a compute-intensive operation and will remain so for the foreseeable future. A recent report estimates that energy consumption associated with AI is expected to reach 0.5% of global electricity consumption by 2027 [5].

A scalable, trustworthy, and safe generative AI implementation needs an excellent RAG solution to feed relevant, trustworthy, and up-to-date information to the LLMs.

The Cognite contextualization engine and industrial knowledge graph allow accurate information to be retrieved. The contextualization engine ensures data is connected across source systems, and the new AI service for populating the structured knowledge graphs from unstructured documents ensures the industrial knowledge graph is as complete, accurate, and up-to-date as possible.

Additional services for semantic search enable filtering information based on meaning rather than free-text search. This technique can even find information across multiple languages. These services make it easy to provide the best possible information to the LLMs, reducing the risk of hallucination while keeping the number of tokens required to process low.

The strategies can be integrated with fine-tuning to achieve an optimal balance of timely, relevant updates and a deep understanding of the data. This hybrid approach allows fine-tuning to adapt the LLM to the general data landscape while RAG ensures the provision of current, relevant information, thereby yielding trustworthy outputs with traceable data sources.

The key is to build a representative benchmarking dataset for any generative AI capability to ensure consistent accuracy and performance across a wide range of use cases. Such data sets can also be used to compare the results from different LLMs and assess how the different methodologies mentioned above perform.

In conclusion, while advancements in fine-tuning and training LLMs offer impressive capabilities, the integration of RAG remains indispensable for delivering accurate and reliable AI-driven solutions. By combining innovative fine-tuning techniques with robust RAG systems from Cognite, companies can ensure their AI implementations are powerful and practical. Such a balanced approach enhances the efficacy of generative AI and maintains trustworthiness and relevance in industrial applications.

Learn more about Cognite’s comprehensive suite of Generative AI capabilities: https://www.cognite.com/en/generative-ai

[1] https://www.nature.com/articles/s42256-023-00626-4

[2] https://arxiv.org/abs/2305.14314

[3] https://arxiv.org/abs/2104.08691

[4] https://medium.com/enterprise-rag/why-gemini-1-5-and-other-large-context-models-are-bullish-for-rag-ce3218930bb4

[5] https://www.theverge.com/24066646/ai-electricity-energy-watts-generative-consumption

  • Blog - Generative AI

    What is Industrial AI? (And Why Most Companies Get It Wrong)

  • Blog - Generative AI

    Cognite Atlas AI Hackathon: 24 Hours of Rapid Innovation

  • Blog - Data Contextualization

    Reliability Redefined: Using Proactive Maintenance and Digital Workflows for Peak Performance

Want to learn more about our product?

Sign up for our monthly newsletter

Sign up today to receive new content, news, product updates and more, delivered directly to your inbox

Sign up for Cognite Newsletter

Your monthly Cognite news, product updates, and expert content

Product

Unique Value

Why Cognite

Strong Industrial Heritage

FAQ

Benefits

Digital Transformation Leaders

Executives

Operations Teams

IT Teams

Offering

Cognite Data Fusion®

Cognite Atlas AI™

Cognite Success Tracks

Get Started: Data Fusion Quick Start

Industrial Tools

Industrial Canvas

Field Operations

Maintenance

Robotics

Explore

Cognite Demos

Cognite Product Tour

Solutions

Industries

Upstream Energy

Downstream Energy

Continuous Process Manufacturing

Power Generation

Power Grid

Renewables

Solution areas

Advanced Troubleshooting

Field Operations

Data-Driven Turnaround Planning

Partner Ecosystem

Partners

Cognite Embedded

Customers

Success Stories

Value Review

Resources

Resources

All Resources

Webinars

LLM/SLM Benchmark Report

The Definitive Guide to...

... Industrial Agents

... Generative AI for Industry

... Industrial DataOps

Other

Company

About us

Newsroom

Careers

Leadership

Security

Ethics

Sustainability

Policies

Code of Conduct

Customer & Partner Privacy

General Privacy

Human Rights Policy

Vulnerability disclosure policy

Recruitment Privacy Notice

Report a Concern

Privacy PolicyTerms of Use

2016-2025 © Cognite AS. All Rights Reserved