# Tensor Parallelism

# 权重切分

对于矩阵乘法,如下XW=YX * W = Y, 其中:

  • bbatch_size , 批量大小
  • ssequence_length ,输入序列长度
  • hhidden_size/embedding_size

# 按行切分权重

# forward

# backward

  • Wbackward

LW1=LYYW1LW2=LYYW2\frac{\partial L} {\partial W_{1}} = \frac{\partial L} {\partial Y} * \frac{\partial Y} {\partial W_{1}} \\ \frac{\partial L} {\partial W_{2}} = \frac{\partial L} {\partial Y} * \frac{\partial Y} {\partial W_{2}} \\

​ 对于反向传播,更新WW,只需要将LY\frac{\partial L} {\partial Y} 更新到两块 GPU 内分别更新W1W_{1} 和W_

  • Xbackward

    LX=concat[LX1,LX2]\frac{\partial L} {\partial X} = concat[\frac{\partial L} {\partial X_{1}}, \frac{\partial L} {\partial X_{2}} ]

# 按列切分权重

# forward

# backward

# 参考文章

Edited on Views times

Give me a cup of [coffee]~( ̄▽ ̄)~*

Value WeChat Pay

WeChat Pay