Community

๐ŸŒˆTransformers from Scratch-ํŠธ๋žœ์Šคํฌ๋จธ ์™„๋ฒฝ ๊ฐ€์ด๋“œ

๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๋‹ค๋ณด๋ฉด Transformer์— ๋Œ€ํ•ด ์ตํžˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์š”์†Œ์— ๋Œ€ํ•ด ์•Œ๋ ค๋ฉด ์–ด๋–ค ์ง€์‹์ด ํ•„์š”ํ• ๊นŒ์š”? Transformer๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์ง€์‹๋“ค์„ ์ž˜ ์ •๋ฆฌํ•œ ๋ฌธ์„œ๋ฅผ ๊ณต์œ ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ๋Š” Scratch๋กœ ๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ํ•˜๋‚˜์”ฉ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‚ด์šฉ์ด ์ „๊ฐœ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๊ฐ€ ์ •๋ง ์ž˜ ํ‘œํ˜„๋˜์–ด ์žˆ์–ด์„œ, ์ดํ•ด๊ฐ€ ์–ด๋ ค์šธ๋งŒํ•œ ๋ถ€๋ถ„์„ ์ด๋ฏธ์ง€ ๋ณด๊ณ  ์ดํ•ดํ•˜๊ณค ํ•ฉ๋‹ˆ๋‹ค! ๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜์‹ ๋‹ค๋ฉด ์ผ๋‹จ ์ €์žฅํ•ด๋‘์‹œ๊ณ  ์ดํ›„์— ๊ผญ ๋ณด์‹œ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์•„์š” :) ๋‚ด์šฉ์ด ์งง์ง„ ์•Š๊ณ  ๋งŽ์ง€๋งŒ ํ•˜๋‚˜์”ฉ ๋ณด์‹œ๋ฉด ํ•™์Šต์— ๋„์›€๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค! ์ €๋„ ํ•œ๋ฒˆ ๋‹ค์‹œ ๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค :) โœจ๏ธ ์ถ”์ฒœ๋“œ๋ฆฌ๊ณ  ์‹ถ์€ ๋ถ„ - ๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜์‹œ๋Š” ๋ถ„ - Transfomer๋ฅผ ๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์ดํ•ดํ•˜๊ณ  ์‹ถ์œผ์‹  ๋ถ„ ๐ŸŽ ์„ค๋ช…ํ•˜๋Š” ํŒŒํŠธ โ–บ One-hot encoding โ–บ Dot product โ–บ Matrix multiplication โ–บ Matrix multiplication as a table lookup โ–บ First order sequence model โ–บ Second order sequence model โ–บ Second order sequence model with skips โ–บ Masking โ–บ Rest Stop and an Off Ramp โ–บ Attention as matrix multiplication โ–บ Second order sequence model as matrix multiplications โ–บ Sequence completion โ–บ Embeddings โ–บ Positional encoding โ–บ De-embeddings โ–บ Softmax โ–บ Multi-head attention โ–บ Single head attention revisited โ–บ Skip connection โ–บ Multiple layers โ–บ Decoder stack โ–บ Encoder stack โ–บ Cross-attention โ–บ Tokenizing โ–บ Byte pair encoding โ–บ Audio input

์•Œ๋ฆผ

์•Œ๋ฆผ์ด ์—†์Šต๋‹ˆ๋‹ค