just a bit background of what I am making an attempt to create:
I’m aiming to create a mannequin that converts signal language to textual content, utilizing the WLASL dataset. Now, from the get-go, downloading this mannequin from kaggle, whereas the dataset appears fairly complete, the quantity of movies per class vary from 5-13, which is clearly fairly much less to coach on. I made a decision to check out Apple Create ML as a substitute of one thing like tensorflow or much more advanced deep studying frameworks as this may be far more easy. Because the dataset is sort of restricted when it comes to movies per class, I used all 6 knowledge augmentations within the “Hand Motion Classifier” (Horizontally Flip, Rotate, Translate, Scale, Interpolate Frames, Drop Frames). Whereas I knew this might not save the mannequin, it might positively improve the accuracy by rather a lot. Be aware, that I’m not utilizing all 2000 courses (phrases) from the dataset, slightly, I simply used a subset of 300. I acquired 16% validation accuracy, and 90% coaching accuracy with all augmentations, so my mannequin was clearly overfitting. So I attempted the identical with 25 courses, and this time I acquired 42% validation accuracy, with 100% coaching accuracy. Once more, overfitting. I went over to the reside preview, and virtually each signal I attempted was predicted mistaken.
Now, I made a decision to make use of the “mannequin sources” within the sidebar. I’m not actually certain what they’re for, however this is what I attempted:
I cut up the subset of the information into 2 seperate mannequin sources (16 courses however the quantity remains to be excessive), and acquired acquired 83% validation accuracy and 90% validation accuracy respectively. Each of those mannequin sources are utilizing all knowledge augmentations. My mannequin is clearly overfitting, having 100% coaching accuracy in each sources, however splitting it into two fashions clearly elevated my accuracy, and when i examined this within the “reside preview”, each ASL signal that I did myself, it was capable of guess EVERY SINGLE WORD precisely with over 90% confidence.
So my query is, even with my restricted knowledge (whereas augmentations do improve it by rather a lot, clearly the efficiency distinction shouldn’t be this a lot), how have my fashions carried out so nicely? Furthermore, is splitting one mannequin into separate mannequin sources even viable? I’m not certain what using the “mannequin sources” even was, and so I attempted this, and one way or the other I acquired higher outcomes. Whether it is viable, how can I implement them into one swift app. I’m just a bit confused proper now, and so hopefully somebody can inform me what’s going on. If this isn’t a viable resolution, can any individual present one other one as to how I can use this dataset? Prior information about it might be extremely useful, however even when you do not, might you please assist me?
Thanks a lot guys 🙂
PS: Right here is the hyperlink to it -:
Kaggle Hyperlink: https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed
Authentic paper github web page: https://github.com/dxli94/WLASL
Sorry for such a protracted message. Should you want any photos for extra perception or higher information to offer higher assist, I’ll fortunately give them.
As soon as once more, thanks a lot