Login

***litdev*** · (This post was last modified: 05-12-2025, 06:18 PM by litdev.)

Classically, the training is performed by 'back propagation' tuning weights to minimise difference between observed results and traing data. This is basically an optimisation algorithm that tunes weights to minimise residual variance between training data outcone and the ANN result. So normally all layers are tuned for each piece of training data at each step of the training.

**AbsoluteBeginner** · 05-13-2025, 07:01 AM

(05-12-2025, 06:16 PM)litdev Wrote: Classically, the training is performed by 'back propagation' tuning weights to minimise difference between observed results and traing data. This is basically an optimisation algorithm that tunes weights to minimise residual variance between training data outcone and the ANN result. So normally all layers are tuned for each piece of training data at each step of the training.

I have thought about your words.
But, still, in fact, only the first hidden layer of neurons is trained ON RELIABLE data.
This fact is not mentioned at all in your mathematical description of the neural network training process. Blush

But this fact is of great importance for a person who wants to CONSCIOUSLY design a neural network, and not just wait until a randomly selected configuration can learn to solve a problem. Angel

If you do not see an OBVIOUS error in my reasoning, then I will continue my fascinating amateur research in the same direction.

***litdev*** · (This post was last modified: 05-13-2025, 08:37 AM by litdev.)

Back propagation optimisess all layer weights to better match the test value using chain rule derivative of error. I guess one proof is it works, but its always nice to know why something works.

https://towardsdatascience.com/understan...c509ca9d0/

**AbsoluteBeginner** · 05-13-2025, 09:33 AM

(05-13-2025, 08:35 AM)litdev Wrote: Back propagation optimisess all layer weights to better match the test value...

I don't argue with this statement.
What I don't like is that while the neurons in the first hidden layer are being trained on TRUE data, the neurons in the second layer are being trained on UNTRUE data from the neurons in the first layer that haven't been trained yet.

Why do I need such training? Huh

I will feel more comfortable if I start training the second layer only when the finally trained neurons of the first layer provide the second layer with correct data for its training.

Is that logical? Blush

***litdev*** · 05-13-2025, 11:18 AM

Maybe terminology? " the neurons in the second layer are being trained on UNTRUE data from the neurons in the first layer that haven't been trained yet " - this is not right.

The back propagation starts from the last layer and works 'backwards" towards the first. As each layer is processed chain rule derivatives are used caturing all the peviously updated layers changes (to the right) including the error difference from the previous forward pass.

I can't explain it in a couple sentences, there is plenty on the web about it, but you are not right that layers are trained on UNTRUE data (or it just wouldn't work), they are trained on derivative information that includes the error variance information AND the changes made in previous layers to the right during the backwards propagation.

**AbsoluteBeginner** · 05-13-2025, 08:58 PM

I am grateful to you for your patience. Shy

Most likely, we understand the terms in the same way.
I think that we have different ideas about the situation in which the neurons of the first hidden layer and the second hidden layer are.

For example, during the first training run, the weights of the neurons in the first layer will be adjusted depending on the reliable information at their inputs.
But, since the neurons themselves have not yet been trained, their outputs will contain INCORRECT information.
Therefore, the same backpropagation cycle that trains the neurons of the first layer on reliable information will force the neurons of the second layer to study the false signals of the first layer.

And only after the neurons of the first layer have learned so well that their outputs will form sufficiently reliable information, the second layer will begin to retrain on this information.
Until this point, training the second layer does not make any sense. Blush

Am I not right?

***litdev*** · 05-14-2025, 09:59 AM

During training:

We have some traing input and output data:

Each piece of training data is used in turn:

a) It is processed from 1st, 2nd .. to output using the weights (random initially)
b) The output from the ANN is compared with the training data expected result to obtain an error
c) A backwards propagation step updates all the weights progressively from last to first layers using the output error (chain ruled derivative optimisation). The simple form of activation function means it is analytically differentiable (and importantly non-linear) so that the backwards propagate is reasonably fast even for ANNs with large number of nodes and layers.

This is repeated for each training data, gradually improving the weighting.

The whole process is then repeated using all training data (new epoch) many times untill the errors are within tolerance or max number of epochs is reached. It is a gradual tuning of the node weights over many training cycles, repeatedly using all of the training data - the more the better - face recognition is trained on every face image on the web and of course every image (private or not) on social media.

Note that the backwards propagation updates all layers' node weights using the error data obtained from the forwards step, passing the derivative information backwards as layers are processed from last to first.

I do suggest you google and read about how this works. I know you want to investigate yourself, but at some point it is worth understanding how others have done it.

**AbsoluteBeginner** · 05-14-2025, 02:58 PM

(05-14-2025, 09:59 AM)litdev Wrote: ...
I do suggest you google and read about how this works. I know you want to investigate yourself, but at some point it is worth understanding how others have done it.

Dear LitDev,
I read articles and watched many videos that visualized in detail everything that you explained to me. Shy

But, in your explanation there is no word about WHAT data the neurons of the first hidden layer use, and WHAT data is available to the neurons of the second layer.
And if you wanted to tell exactly this, then you would be forced to say out loud that, for example, in the first training cycle, reliable initial training data was fed to the inputs of the neurons of the first layer, and the neurons of the second layer were forced to use the “noise” that was formed at the outputs of the neurons of the first layer.

I don't think you would call the word "reliable" the output data of the first layer, created by completely untrained neurons.
I also cannot call such data "reliable". Therefore, I call them "unreliable".

What can I do about this? I am not to blame. Angel

***litdev*** · 05-14-2025, 10:25 PM

I can't really explain it more, backwards propagation DOES pass information about the fitness of solution back to previous layer weights.

ANN training in this way does work as we see with all the AI tools that have used it in their training. It's not perfect but for some pattern recognition type tasks it is very effective.

**AbsoluteBeginner** · 05-15-2025, 07:46 AM

It's even good that somewhere something doesn't work perfectly. Big Grin

What would I be doing now for fun if the neural networks we know didn't have any shortcomings?

But now I clearly realized that each hidden layer of the neural network “lives” in its own closed world.
Each layer "sees" only what comes to the inputs of its neurons. And each layer doesn’t care at all what the physical world around it will do with the data that this layer of neurons sets at the outputs of its neurons.

Of the entire huge neural network that performs image recognition on the matrix, ONLY THE FIRST LAYER sees this image.
This layer reacts to this image in some way. The reaction of this layer is the creation of a NEW "picture" on the matrix of outputs of its neurons.

It is this “picture” that the second layer of neurons sees. The second layer knows nothing about the real original image.
If a developer wants to consciously create his own neural network, then he must decide what exactly the first layer should make out of the original image.

If I want a neural network to recognize a figure based on a set of features, then perhaps the first layer should recognize and prepare data for the second layer about what features are present in the original image and how they are located there.
And, according to logic, it makes sense to train the second layer to work with the data of the first layer ONLY WHEN the data of the first layer is correct.
That is, when the first layer has already been trained.

In my opinion, everything is logical. Blush

Login
Username:
Password:	Lost Password?
	Remember me