05-13-2025, 11:18 AM
Maybe terminology? " the neurons in the second layer are being trained on UNTRUE data from the neurons in the first layer that haven't been trained yet " - this is not right.
The back propagation starts from the last layer and works 'backwards" towards the first. As each layer is processed chain rule derivatives are used caturing all the peviously updated layers changes (to the right) including the error difference from the previous forward pass.
I can't explain it in a couple sentences, there is plenty on the web about it, but you are not right that layers are trained on UNTRUE data (or it just wouldn't work), they are trained on derivative information that includes the error variance information AND the changes made in previous layers to the right during the backwards propagation.
The back propagation starts from the last layer and works 'backwards" towards the first. As each layer is processed chain rule derivatives are used caturing all the peviously updated layers changes (to the right) including the error difference from the previous forward pass.
I can't explain it in a couple sentences, there is plenty on the web about it, but you are not right that layers are trained on UNTRUE data (or it just wouldn't work), they are trained on derivative information that includes the error variance information AND the changes made in previous layers to the right during the backwards propagation.