dummy

1 Gradient Descent

It is a iterative optimization algorithom.

Gradient means the rate of inclination or declination of a slope.

Descent means the instance of descenting.

we also need terminologies like learning rate, cost function, and below:

1.1 Epoch

One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.

So, what is the right numbers of epochs?

No right answer to this question. The answer is different for different dataset. the numbers of epochs is related to how diverse your data is.

1.2 Batch Size

The total number of tranning examples present in a signal batch.

But what is a Batch?

we can't pass the entire dataset into the nerual net at once, so we divide dataset tinto number of batches or sets or parts

Note: Batch size and number of batches are two different things.

1.3 Iterations

It is the number of batches need to complete once epoch. W and b update for one iteration

Note: The number of batches is equal to number of iterations for one epoch.

Examples:

We can divide one dataset of 2000 examples into batches of 500 then it will take 4 iterations to complete 1 epoch.

Where Batch size is 500 and iterations is 4, for 1 complete epoch.

  1. https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9
  2. https://www.cnblogs.com/mstk/p/8214499.html

2 Optimization Algorithom

2.1 Adam(Adaptive momentum algorithom)

pass