๐Ÿ” ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์„๊นŒ? ํšŒ์‚ฌ์—์„œ ๊ฒ€์ƒ‰ SaaS ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์–ด ์ง์ ‘ ๊ฒ€์ƒ‰์—”์ง„์„ ๊ตฌํ˜„ํ•  ๊ธฐํšŒ๋Š” ์—†์ง€๋งŒ ๋‹ค์–‘ํ•œ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ๋“ค์„ ์ดํ•ดํ•  ํ•„์š”๋Š” ์žˆ์ฃ . ์ด ๊ธ€์€ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ๋ณ„ ์—ญ์ƒ‰์ธ(inverted index)๊ตฌ์กฐ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•˜๋Š”์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. [์šฉ์–ด ์ •๋ฆฌ] ์ƒ‰์ธ(Index): ๋ฌธ์„œ์—์„œ ํ‚ค์›Œ๋“œ๋ฅผ ์ฐพ์•„๋ณด๊ธฐ ์‰ฝ๋„๋ก ์ •๋ ฌ/๋‚˜์—ด๋œ ๋ชฉ๋ก. ์—ญ์ƒ‰์ธ(inverted Index): ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ๋ฌธ์„œ๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ์‹. ์ฑ… ๋’ท๋ฉด์— ์žˆ๋Š” ํ‚ค์›Œ๋“œ์—์„œ ๋ฌธ์„œ ํŽ˜์ด์ง€๋ฅผ ์ฐพ๋Š” ๋ฐฉ์‹. ๋‘˜์˜ ์ฐจ์ด๋Š” ๋‹จ์–ด์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๋ฐฉํ–ฅ์„ฑ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์ƒ‰์ธ(Froward Index)๋Š” docID โ†’ ๋ฌธ์„œ๋ฅผ mapping ํ•˜๊ณ  ์—ญ์ƒ‰์ธ์€ ์ฟผ๋ฆฌ(๋ฌธ์„œ๋‚ด์šฉ) โ†’ docID์„ mapping ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฒ€์ƒ‰ํ•  ๋•Œ ์—ญ์ƒ‰์ธ์ด ์•„์ฃผ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ด๊ฒ ์ฃ ? [์š”์•ฝ] 1. Numeric search ์ฃผ๋กœ ๋ฒ”์œ„ ๊ฒ€์ƒ‰์ด ์žˆ์„ ํ…๋ฐ, ํŠน์ • ์ˆซ์ž๋ฅผ key๋กœ ๋‘๋ฉด ๋ฌดํ•œ์— ๊ฐ€๊นŒ์›Œ์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—(5, 5.2, 5.9..) bucket ์„ ์ง€์ •ํ•ด [5,6) ๊ฐ™์ด ์ผ์ • ๋ฒ”์œ„ ๋‹จ์œ„๋กœ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. 2. Phrase search ๊ตฌ๋ฌธ์€ ๋‹จ์–ด์˜ ์ˆœ์—ด๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ์š”. ์ˆœ์„œ(pos)์ •๋ณด๋ฅผ ์ €์žฅํ•˜์—ฌ ๊ตฌ๋ฌธ๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‹จ์–ด ์กฐํ•ฉ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 3. Preprocessing ์ฟผ๋ฆฌ์— ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ๋‘์ ์ด๋‚˜ ์‰ผํ‘œ ๋“ฑ์€ ์—†์•ค๋‹ค๋˜๊ฐ€ 'a' ์™€ 'the' ๋“ฑ ์„ ์ œ๊ฑฐํ•˜์—ฌ semantic ํ•œ ๊ฒƒ์— ์ง‘์ค‘ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค(no-op). ํ•œ ๋‹จ์–ด๊ฐ€ ์—ฌ๋Ÿฌ ํ˜•ํƒœ(๊ณผ๊ฑฐํ˜•, ๋ฏธ๋ž˜ํ˜•..)๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๊ฒฝ์šฐ๋„ ํ•˜๋‚˜์˜ ๋‹จ์–ด๋กœ ์ฒ˜๋ฆฌ(Stemming)ํ•˜๋Š” ๊ฒƒ, ๋‹จ์–ด์˜ substring์˜ ์ˆ˜๋ฅผ ์ง€์ •ํ•˜์—ฌ ํ•ด๋‹น ๋‹จ์–ด๋กœ ์ฒ˜๋ฆฌํ• ์ง€๋ฅผ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.(n-gram) 4. Fuzzy Search ๋™์˜์–ด๋ฅผ ๊ณ ๋ คํ•œ ๊ฒ€์ƒ‰๊ธฐ๋Šฅ ์ž…๋‹ˆ๋‹ค.(hello = hi, bonjour..) ์ž๋™์œผ๋กœ๋Š” ๋ชปํ•  ๊ฒƒ ๊ฐ™๊ณ  ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋„ฃ์–ด์ฃผ์–ด์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ๋™์˜์–ด๋ฅผ ํ•œ๋ฒˆ์— ๊ณ ๋ คํ•˜์—ฌ ๊ตฌ์„ฑํ•  ์ˆ˜๋Š” ์—†์„ ๊ฒƒ์ด๊ณ  ํ•„์š”์‹œ ๋งˆ๋‹ค ๋„ฃ์–ด์ฃผ๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•˜๊ฒ ์ฃ . index size๊ฐ€ ๋Š˜์–ด๋‚˜๋Š” ๊ฒƒ์€ ์–ด์ฉ” ์ˆ˜ ์—†๊ฒ ๋„ค์š”. 5. Snippet ๋‹จ์ˆœํžˆ ๋ฌธ์„œ ๊ฒ€์ƒ‰๋ง๊ณ  ์ฟผ๋ฆฌ๊ฐ€ ๋ฌธ์„œ์˜ ์–ด๋–ค ๋ถ€๋ถ„์— ์žˆ๋Š”์ง€ ์ •ํ™•ํžˆ ์•Œ๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํ•˜์ด๋ผ์ดํŠธํ•˜๋ฉฐ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๊ฒ ๋„ค์š”. snippet ์€ ๋ฌธ์ž๋ณ„ offset์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ฃผ๋ณ€ ํ…์ŠคํŠธ์˜ ๋ฒ”์œ„๋ฅผ ๋น ๋ฅด๊ฒŒ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. 6. Per Field Search ๋ฌธ์„œ ๋‚ด์šฉ ์ „์ฒด๋ฅผ ํ•˜๋‚˜์˜ ํ•„๋“œ๋กœ ๊ตฌ์„ฑํ•˜์—ฌ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฐ ํ•„๋“œ ๋ณ„ ๊ฒ€์ƒ‰์„ ์ œ๊ณต ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ๋…๋ฆฝ๋œ ์—ญ์ƒ‰์ธ์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์šฉ๋Ÿ‰๊ณผ ์„ฑ๋Šฅ์„ ๊ณ ๋ คํ•˜์—ฌ ๊ฒ€์ƒ‰์— ์‚ฌ์šฉํ•  ํ•„๋“œ๋งŒ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. 7. Scoring and Ranking ์ฟผ๋ฆฌ์— ๋งค์นญ๋œ ์—ฌ๋Ÿฌ ๋ฌธ์„œ ์ค‘ ์‚ฌ์šฉ์ž์—๊ฒŒ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๊ฒƒ์„ ์„ ๋ณ„ํ•˜์—ฌ ๋งจ ์•ž์— ๋ณด์—ฌ ์ค„ ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ๊ฒƒ์ด ์ ํ•ฉํ•œ ์ง€๋Š” ๊ฐ ์œ ์ €์˜ ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒ ์ฃ . ๊ธฐ๋ณธ์ ์œผ๋กœ good default ranking ์œ„์— ์‚ฌ์šฉ์ž์˜ ๊ฒ€์ƒ‰ ๊ฒฝํ—˜์„ ๋ฐ˜์˜ํ•˜์—ฌ(user case specific ranking) ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. good default ranking ์œผ๋กœ๋Š” ์ตœ์‹  ์ˆœ, ๊ฒ€์ƒ‰ํ•œ ์‹œ๊ฐ„, ํ•„๋“œ๋ณ„ ์ค‘์š”๋„ ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. 8. Faceting ์–ด๋–ค ํ•„๋“œ์˜ scope์„ ์ง€์ •ํ•œ ์ƒํƒœ์—์„œ ๊ฒ€์ƒ‰์„ ์›ํ•  ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ๊ฒƒ์œผ๋กœ ๋ถ€๋™์‚ฐ ์–ดํ”Œ์—์„œ ์—ฌ๋Ÿฌ ์กฐ๊ฑด ๋‚ด์—์„œ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๊ฒ ๋„ค์š”.

The System Design Ideas Behind Advanced Search Functions

Medium

The System Design Ideas Behind Advanced Search Functions

๋‹ค์Œ ๋‚ด์šฉ์ด ๊ถ๊ธˆํ•˜๋‹ค๋ฉด?

๋˜๋Š”

์ด๋ฏธ ํšŒ์›์ด์‹ ๊ฐ€์š”?

2021๋…„ 4์›” 11์ผ ์˜ค์ „ 8:40

๋Œ“๊ธ€ 0




    ๋น„์Šทํ•œ ๊ฒŒ์‹œ๋ฌผ

    ์ฃผ๊ฐ„ ์ธ๊ธฐ TOP 10

    1

    ๊ณจ๋นˆํ•ด์ปค Chief Maker

    ์š” ๋ฉฐ์น  GPT-5 Reasoning - High ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ง„

    2

    K๋ฆฌ๊ทธ ํ”„๋กœ๊ทธ๋ž˜๋จธ ์ปคํ”ผํ•œ์ž” ๊ฐœ๋ฐœ์ž

    ๋‚ด๊ฐ€ ๊ฐ€๋ณธ ์šฐ๋ฆฌ๋‚˜๋ผ

    3

    ์„๋ฏผ ์ปค๋ฆฌ์–ด ์ฝ”์น˜

    ํ™•์‹ ์€ ์–ด๋””์—์„œ ์˜ค๋Š”๊ฐ€?

    3

    ๊น€์„ ํ˜ธ ์—ฌ๊ธฐ์–ด๋•Œ์ปดํผ๋‹ˆ / User Behavior Team Lead

    ๋งค์šฐ ๊ณต๊ฐํ•˜๋Š” ๊ธ€. ์ข€ ๋” ์ฒจ์–ธํ•ด๋ณด์ž๋ฉด, - ๋ฌธ์ œ๊ฐ€ ๋ญ”์ง€ ์ •์˜ํ•˜

    5

    ์žฅํ™์„ ์ŠคํŽ˜์ด์Šค์˜ค๋””ํ‹ฐ ๋ถ€๋Œ€ํ‘œ/CPO

    < ๋ชจ๋ฅธ๋‹ค๊ณ  ๋งํ•˜๋Š” ์ˆœ๊ฐ„, ๋‡Œ๋„ ๋ฉˆ์ถ˜๋‹ค >

    ์ถ”์ฒœ ํ”„๋กœํ•„