hello, guys,
I have implemented a special kvstore named kvstore_speical. It is useful for some model parallel problem like distributed fc.
github:https://github.com/starimpact/mxnet_v1.0.0/tree/kvstore_special
There are some details in the README.md
It is written in mxnet_v1.0.0.
It is work well in the multi card single pc, but still has one problem in multi card multi pcs(maybe crush after hours or days, and put out a strange error in van.cc.).
I ask for help, anyone can help me to put it into the master of mxnet.


Welcome for further discussion.

  • No labels

2 Comments

  1. Hi Ming,
    Can you provide an example of what nsoftmax does? Is it dividing the total number of classes of softmax across different machines? And can you share the error you see from van.cc with your current implementation?

    1. hi, Rahul, 

      thanks for your attention.

      1.nsoftmax is like the softmax, but the weight and input 1d vector will be normalized before multiplication of fc.

      2.you are right, it is dividing the total number of classes of softmax  across different machines.

      3. I will try to put the error of van.cc here.