deep learning - Are the val data different from train data in data creation in caffe?

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

1- train containing 130,523 images. 2- validation cantoning 14,503 images. 3- test containing 94,500 images. Now i want to create .lmdb formats for my data to be used for training. in the tutorial it says group your data into train and val. so does it mean that I should just use the train and val data set and do not use the test at all? later when I want to test my model then what happens to test data set? shouldn't they be converted to .lmdb again? I want to make sure I have understood the differences. Sorry if a question is very basic but I did not find any answers.

this one [ caffe.berkeleyvision.org/gathered/examples/imagenet.html]...it just talks about train and val not test. I wanna make sure I should not use test data in creating .lmdb files. – user6726469 Aug 17, 2016 at 13:19 and also this one [ github.com/BVLC/caffe/issues/550] the explanation by **dennis-chen ** – user6726469 Aug 17, 2016 at 13:21

There are three types of datasets.

Training set - This is the data that the network is trained on.

Testing set - This dataset is used to verify that the network is not over fitted to the training set and that it is regularised.

Validation set - Since we actually use the testing set during training (to check the regularisation) it is advisable to keep a separate test set which the data has not seen till now. Running the network on this set will inform us how the network will perform when it is tested in the real world.

In your case, you should make lmdb files for all three. During training use the training and testing set. After training use the validation set to confirm that the trained network is accurate.

Thank for the answer. so 30% of the data should be kept for the testing data or the validation? how about 10% of dta? – user6726469 Aug 17, 2016 at 13:35 @malreddysid - For


    BVLC Caffe

the example model shown has


    training

and


    validation

set for training phase. Testing is not used at all. Are you sure about above explanation? – Chetan Arvind Patil Oct 9, 2017 at 3:22

Sometime the term validation and test become interchangeable (at least in caffe). However, from the size of each set of your data, I consider that validation set (containing ~14k images) are supposed to be used to check the accuracy of your trained model before you actually test the model to the unseen real world data. Thus, your test dataset (~94k images) will be considered as unseen real world data.

To get insight how to do train-val-test process, also have a look at the examples provided in caffe directory. 00-classification.ipynb and 01-learning-lenet.ipynb would be enough.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question . Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers .