softmax#

k๊ฐœ์˜ ์‹ค์ˆ˜๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฒกํ„ฐ๋ฅผ k๊ฐœ์˜ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜

๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜(Logistic function)์˜ ๋‹ค์ฐจ์› ํ™•์žฅ์ด๋ฉฐ, ๋‹คํ•ญ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋‚˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์—์„œ ์‚ฌ์šฉ๋œ๋‹ค.

์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋Š” ์ฃผ๋กœ ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ์‚ฌ์šฉ๋˜๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์ค‘ ํ•˜๋‚˜๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค๋Š” input์„ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉฐ, ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ฃผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ƒํ™ฉ์—์„œ ์‚ฌ์šฉ๋œ๋‹ค. ์ด์ง„ ๋ถ„๋ฅ˜์™€ ๋น„๊ตํ•ด์„œ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์ข€ ๋” ์šฉ์ดํ•˜๋‹ค. sigmoid๋Š” yes or no ๋งŒ์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค๋ฉด, ์—ฌ๋Ÿฌ ๋“ฑ๊ธ‰์ด๋‚˜ ํด๋ž˜์Šค๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒฝ์šฐ์— softmax๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FQGFKh%2FbtqPQtew8NG%2FP5e54TRwt9fZqmXi55866k%2Fimg.jpg

Fig. 2 softmax activation function ๊ฐœ์š”#

  1. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ : ์ฃผ์–ด์ง„ ์ž…๋ ฅ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ํด๋ž˜์Šค ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฌธ์ œ(์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์–ธ์–ด ๋ชจ๋ธ)

  2. ํ™•๋ฅ  ๋ถ„ํฌ ์ƒ์„ฑ : ๊ฐ ํด๋ž˜์Šค์— ์†ํ•  ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ํด๋ž˜์Šค๋ฅผ ์„ ํƒํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋จ

softmax function์„ ํ†ต๊ณผํ•œ ๋ชจ๋“  output๊ฐ’๋“ค์˜ ํ•ฉ์€ 1์ด ๋œ๋‹ค. ์ด๋ฅผ ๋‹ค์‹œ ๋งํ•˜๋ฉด ํ™•๋ฅ (Probability)๊ฐ€ ๋˜๋Š” ๊ฒƒ์ด๋‹ค. sigmoid๊ฐ€ output layer๊ฐ’์„ ๋ณด๊ณ  threshold(๋ณดํ†ต 0.5)๋ณด๋‹ค ํฌ๋ฉด 1, ์ž‘์œผ๋ฉด 0์œผ๋กœ ์ด์ง„๋ถ„๋ฅ˜ ๋ฐ–์— ๋ชปํ•˜๋Š”๋ฐ, softmax output layer๋Š” ๋‚˜์˜ค๋Š” ๋ชจ๋“  ๊ฐ’๋“ค์„ normalizing ํ•ด๋ฒ„๋ฆฌ๊ณ , ๊ฐ๊ฐ์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ๊ตฌํ•ด๋‚ธ๋‹ค.

\[ \text{Softmax}(z)_i = \frac{e^{z_i}}{\sum_{j=1}^Ke^{z_j}} \]

input vector z์— ๋Œ€ํ•ด์„œ output์€ i๋ฒˆ์งธ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ๋งํ•œ๋‹ค. \(e^{z_i}\)๋Š” i๋ฒˆ์งธ ์ž…๋ ฅ ์š”์†Œ์˜ ์ง€์ˆ˜ ํ•จ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

https://velog.velcdn.com/images%2Fguide333%2Fpost%2F05653beb-0e79-48d8-90d9-3744253421d5%2FScreenshot%20from%202021-05-17%2011-10-04.png

Fig. 3 softmax function#

softmax ๊ตฌํ˜„#

import numpy as np
def softmax(x):
    e_x = np.exp(x-np.max(x)) #์˜ค๋ฒ„ํ”Œ๋Ÿฌ์šฐ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด ์ž…๋ ฅ๊ฐ’ ์ค‘ ์ตœ๋Œ€๊ฐ’์„ ๋นผ์คŒ
    return e_x / e_x.sum()
x = np.array([1,1,2])
y = softmax(x)
print(y) # [0.2,0.2,0.6]

reference#