Community

๐Ÿ“ LLM ํ•™์Šต์€ ์•ž์œผ๋กœ ์‚ฌํ›„ ํ›ˆ๋ จ์ด ๋” ์ค‘์š”ํ•  ๊ฒƒ

ํ˜„์žฌ GPT-4์˜ ELO ์ ์ˆ˜๋Š” ์›๋ž˜ ์ถœ์‹œ๋œ ๋ฒ„์ „๋ณด๋‹ค ์•ฝ 100์  ๋” ๋†’์€๋ฐ์š”. OpenAI ๊ณต๋™ ์ฐฝ์—…์ž์ธ John Schulman๊ณผ์˜ ์ธํ„ฐ๋ทฐ์— ๋”ฐ๋ฅด๋ฉด GPT-4๊ฐ€ 1๋…„ ์ „๋ณด๋‹ค "๋” ๋˜‘๋˜‘ํ•ด์ง„" ์ด์œ ๋Š” ๋ณต์žกํ•œ ์‚ฌํ›„ ํ›ˆ๋ จ(Post-training) ๋•๋ถ„์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ ์ค‘ ํ•˜๋‚˜๋Š” GPT-4 ๋ชจ๋ธ์—์„œ ์ƒ์„ฑ๋œ ์ถœ๋ ฅ์ด ์›น์— ์žˆ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ์ฝ˜ํ…์ธ ๋ณด๋‹ค ํ’ˆ์งˆ์ด ๋” ๋†’๋‹ค๋Š” ๊ฒƒ์ธ๋ฐ์š”. ๋”ฐ๋ผ์„œ ๋‹จ์ˆœํžˆ ์›น์— ์žˆ๋Š” ๋‚ด์šฉ์„ ๋ชจ๋ฐฉํ•˜๋„๋ก ํ›ˆ๋ จ๋ฐ›๋Š” ๊ฒƒ๋ณด๋‹ค ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์ƒ๊ฐํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ๋” ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ๋งŽ์€ ํšŒ์‚ฌ๋“ค์ด ์‚ฌ์ „ ํ›ˆ๋ จ(Pre-training)์— ์ง‘์ค‘ํ•ด์„œ, ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ์™€ ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ๋งŽ์ด ํ•˜๊ณ  ์žˆ๋Š”๋ฐ์š”. ์ด๋Š” ์ด์ œ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ๋งŽ์ด ๋Œ€์ค‘ํ™” ๋˜์—ˆ๊ณ , ๋˜ ๊ณต๊ฐœ๋œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์ด ๋งŽ์•„์กŒ๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋–ป๊ฒŒ ๋ณด๋ฉด ์ƒ๋‹นํžˆ ์‰ฌ์šด ์ผ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‚ฌํ›„ ํ•™์Šต์€ ์•”๋ฌต์ ์ธ ์ง€์‹์ด ๋งŽ์ด ํ•„์š”ํ•˜๊ณ , ์ด์— ์ˆ™๋ จ๋œ ์ธ๋ ฅ์ด ๋งŽ์ด ํ•„์š”ํ•œ ๋งค์šฐ ๋ณต์žกํ•œ ์ž‘์—…์ด๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ถ€๋ถ„์ด ํ˜„์žฌ ๊ฐ€์žฅ ์–ด๋ ต๊ณ  ํž˜๋“  ์ผ๋กœ, ๊ธฐ์ˆ  ์žฅ๋ฒฝ์„ ํฌ๊ฒŒ ๋งŒ๋“œ๋Š” ์ผ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์•ž์œผ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋ณด๋‹ค ์‚ฌํ›„ ํ›ˆ๋ จ์— ๋” ๋งŽ์€ ์ปดํ“จํŒ… ์ž์›์„ ์“ฐ๊ฒŒ ๋  ์ˆ˜๋„ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. John Schulman๊ฐ€ OpenAI์˜ Post-training Lead์ธ ๋งŒํผ ์‚ฌํ›„ ํ›ˆ๋ จ์— ํŽธํ–ฅ๋œ ์˜๊ฒฌ์„ ๋‚ผ ์ˆ˜ ์žˆ๊ธดํ•˜์ง€๋งŒ, OpenAI์˜ ๊ณต๋™ ์ฐฝ์—…์ง€์ด๊ธฐ๋„ ํ•˜๋‹ˆ ๊นŠ๊ฒŒ ์ƒ๊ฐํ•ด ๋ณผ ํ•„์š”๊ฐ€ ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์•Œ๋ฆผ

์•Œ๋ฆผ์ด ์—†์Šต๋‹ˆ๋‹ค