I happened to see some APIs in Dynet about dropout mask, like:
set_dropout_masks(batch_size=1)
The document said that:
Set dropout masks at the beginning of a sequence for a specific batch size
If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element.
What does that mean?
Dropout mask is nothing else but a vector of random values \(\mathbf{d}_i\in[0, 1]\) sampled from a Bernoulli distribution. To apply mask to your data points, you do:
\(\mathbf{x}=\mathbf{x}\cdot\mathbf{d}\)
Suppose you have \(2\) batches input1 and input2, if you do dropout by default, you will have two masks generated. Otherwise the same dropout mask is applied across all batches.