Community

๐Ÿ“ RAG vs Long-Context LLM, ์Šน์ž๋Š”?

์š”์ฆ˜ LLM์˜ ๋ฐœ์ „ ๋ฐฉํ–ฅ์„ ๋ณด๋ฉด ํฌ๊ฒŒ ๋‘๊ฐ€์ง€ ํ๋ฆ„์ด ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 1. RAG (Retrieve and Generate)๋ฅผ ํ†ตํ•ด ์™ธ๋ถ€ ์ง€์‹์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, 2. Long-Context (LC) LLM์„ ํ•™์Šตํ•˜์—ฌ ๋ชจ๋ธ์ด ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅํ•œ ์ž…๋ ฅ ํ† ํฐ ์ˆ˜๋ฅผ ํ‚ค์šฐ๋Š” ๊ฒƒ RAG๋Š” ์ž‘์€ ๋ชจ๋ธ๋กœ๋„ ๋งŽ์€ ์ง€์‹์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๊ณ , ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด๋„ ์ตœ์‹  ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Perplexity AI, Claude ๋“ฑ ๋Œ€๋ถ€๋ถ„์˜ LLM ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์„œ๋น„์Šค์—์„œ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ณ , ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์„ ๋’ท๋ฐ›์นจํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ ์ถœ์‹œํ•˜๋Š” LLM๋„ ๋Œ€๋ถ€๋ถ„ ๊ธด ์ปจํ…์ŠคํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต๋˜๊ณ  ์žˆ์ฃ . GPT-4 Turbo ๋ชจ๋ธ์ด๋‚˜ ์ตœ๊ทผ์— ์—…๋ฐ์ดํŠธ๋œ Llama 3.1 ๋ชจ๋ธ๋“ค ๋ชจ๋‘ 128K ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ณ , Gemini 1.5 Pro ๋ชจ๋ธ์€ ๋ฌด๋ ค 2M ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿค” ๊ทธ๋ ‡๋‹ค๋ฉด RAG์™€ LC ์ค‘ ์–ด๋–ค ๋ฐฉ๋ฒ•์ด ๋” ํšจ๊ณผ์ ์ผ๊นŒ์š”? ์ด ์งˆ๋ฌธ์— ๋Œ€๋‹ตํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌ๊ธ€ ๋”ฅ๋งˆ์ธ๋“œ ์—ฐ๊ตฌ์ง„์€ ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด LC๊ฐ€ ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ „๋ฐ˜์ ์œผ๋กœ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, RAG๋Š” ํ›จ์”ฌ ์ ์€ ๋น„์šฉ์œผ๋กœ๋„ LC์— ์ค€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๊ณ ์š”. ์—ฌ๊ธฐ์„œ ๋” ๋‚˜์•„๊ฐ€ ์ €์ž๋Š” ๋‹ต๋ณ€ ํ’ˆ์งˆ์€ ์œ ์ง€ํ•˜๋ฉด์„œ ๋น„์šฉ์„ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ฐฉ๋ฒ•๋ก ์ธ Self-Route๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ”— Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach Google DeepMind University of Michigan https://arxiv.org/pdf/2407.16833

์•Œ๋ฆผ

์•Œ๋ฆผ์ด ์—†์Šต๋‹ˆ๋‹ค