Abstract
This article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucination tendencies, and it will present RAG as a promising solution that integrates external knowledge retrieval to improve factual accuracy and relevance. This study reviews current RAG architectures, including naïve and advanced models, emphasising techniques such as optimised indexing, query refinement, metadata utilisation, and the incorporation of autonomous AI agents in agentic RAG systems. Methodologies for effective data preprocessing, semantic-aware chunking, and retrieval strategies—such as multihop retrieval and reranking—are also discussed to address challenges such as irrelevant retrieval and semantic fragmentation. This work further examines embedding models, notably the use of state-of-the-art vector representations, to facilitate precise similarity searches within knowledge bases. A case study demonstrates the deployment of an RAG pipeline for analysing multisheet datasets, highlighting challenges in data structuring, prompt engineering, and ensuring output consistency.
| Original language | English |
|---|---|
| Article number | 6247 |
| Pages (from-to) | 1-17 |
| Number of pages | 17 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 15 |
| Issue number | 11 |
| Early online date | 1 Jun 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 1 Jun 2025 |
Keywords
- RAG
- Retrieval-Augmented Generation
- LLM
- large language models
- AI
- artificial intelligence