I am training a narrow 3-layer tensorflow neural network with layer sizes (input_size, small_number_hidden_units, output_size)
and using the learned weights as a blueprint for the initial conditions of a wider 3-layer network with layer sizes (input_size, large_number_hidden_units, output_size)
. My goal is to piggyback on the solution found by the narrow model to make the wider model less costly to train. Besides the number of hidden units, both models have the same architecture.
Is it possible to avoid the overhead of creating and compiling two models by using a single model and addding units to its hidden layer when needed? For example, would it be possible to take an untrained network with layer sizes (input_size, small_number_hidden_units, output_size)
, turn off a big chunk of the hidden units such that they are ignored during both forward and backward pass, train the network in that state in order to save computational time, then at some point in the middle of training turn all hidden units on and finish training?
I was thinking of using a mask to turn off hidden units like in this post, but it is unclear to me if it will really reduce the computational cost.
TalTal The Eighth is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.