05-14-2025, 09:59 AM
During training:
We have some traing input and output data:
Each piece of training data is used in turn:
a) It is processed from 1st, 2nd .. to output using the weights (random initially)
b) The output from the ANN is compared with the training data expected result to obtain an error
c) A backwards propagation step updates all the weights progressively from last to first layers using the output error (chain ruled derivative optimisation). The simple form of activation function means it is analytically differentiable (and importantly non-linear) so that the backwards propagate is reasonably fast even for ANNs with large number of nodes and layers.
This is repeated for each training data, gradually improving the weighting.
The whole process is then repeated using all training data (new epoch) many times untill the errors are within tolerance or max number of epochs is reached. It is a gradual tuning of the node weights over many training cycles, repeatedly using all of the training data - the more the better - face recognition is trained on every face image on the web and of course every image (private or not) on social media.
Note that the backwards propagation updates all layers' node weights using the error data obtained from the forwards step, passing the derivative information backwards as layers are processed from last to first.
I do suggest you google and read about how this works. I know you want to investigate yourself, but at some point it is worth understanding how others have done it.
We have some traing input and output data:
Each piece of training data is used in turn:
a) It is processed from 1st, 2nd .. to output using the weights (random initially)
b) The output from the ANN is compared with the training data expected result to obtain an error
c) A backwards propagation step updates all the weights progressively from last to first layers using the output error (chain ruled derivative optimisation). The simple form of activation function means it is analytically differentiable (and importantly non-linear) so that the backwards propagate is reasonably fast even for ANNs with large number of nodes and layers.
This is repeated for each training data, gradually improving the weighting.
The whole process is then repeated using all training data (new epoch) many times untill the errors are within tolerance or max number of epochs is reached. It is a gradual tuning of the node weights over many training cycles, repeatedly using all of the training data - the more the better - face recognition is trained on every face image on the web and of course every image (private or not) on social media.
Note that the backwards propagation updates all layers' node weights using the error data obtained from the forwards step, passing the derivative information backwards as layers are processed from last to first.
I do suggest you google and read about how this works. I know you want to investigate yourself, but at some point it is worth understanding how others have done it.