Best Image Datasets for Machine Learning and Data Science (ML002)

You can checkout the previous post - Why we need to know about Machine Learning? (ML001)

In this the second post in my Machine learning series(you can checkout the first part of this series). In this post I will be discussing about Machine learning datasets in particular. Some people might start with algorithms in the beginning but I think one should start with the data itself which is the most important aspects. We will look into some of the best available Machine learning Image datasets in the world and also go through different usecases and their datasets. In the future we will also go in depth on how to visualize and understand the data well in order to choose the right algorithms.

Best machine learning and datascience datasets

The datasets listed below are some of the best image datasets available for Machine learning research and development. With high qualiy datasets we can acheive a lot in this field but often finding high quality datasets are a challenge and comes at a expensive cost because of the time needed to label a dataset. For unsupervised datasets, where labelling is not required is very hard to collect.

Where to find Machine learning datasets?

There are good datasets available nowadays in many websites and University websites which serves as a great starter for machine learning projects.

So, we will list out some of the best resources below -

  • Kaggle - Kaggle is a machine learning website which hosts different ML competitions and provide quality datasets. It is a great community to get started, one must visit and start using Kaggle and interact with the community to learn a lot. I will post a detailed post on how to get started with Kaggle.

  • UCI ML Repository - University of California, Irvine is another great University which provide rich resources in Machine learning. It has some state of art datasets which one can easily download and get started on.

  • VisualData - It has a great collection of datasets and it also contains datasets from latest conferences like - CVPR2020 and ECCV2020. Do check it out.

  • CMU Libraries - It is a repository by Carnegie Mellon University and contains some of the best datasets in the field of AI and Machine learning.

  • Google Dataset Search - It is just like a search engine but its only for datasets.

Now as we saw some of the online resources where you can find datasets to get started with. Now we look at the best datasets according to usecases.

Machine learning datasets (Usecases wise) -

Image Datasets -

  • Facial Recognition Datasets

  • Action Recognition Datasets

  • Object detection and recognition

  • Handwritten and character recognition

  • Aerial Images

Image Datasets -

Facial Recognition Datasets -

  • FERET (facial recognition technology) - 11338 images of 1199 individuals in different positions and at different times.

  • Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) - 7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.

  • SCFace - Color images of faces at various angles.

  • Yale Face Database - Faces of 15 individuals in 11 different expressions.

  • Cohn-Kanade AU-Coded Expression Database - Large database of images with labels for expressions.

  • JAFFE Facial Expression Database - 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models.

  • FaceScrub - Images of public figures scrubbed from image searching.

  • BioID Face Database - Images of faces with eye positions marked.

  • Skin Segmentation Dataset - Randomly sampled color values from face images.

  • Bosphorus - 3D Face image database.

  • UOY 3D-Face - neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.

  • CASIA 3D Face Database - Expressions: Anger, smile, laugh, surprise, closed eyes

  • CASIA NIR - Expressions: Anger Disgust Fear Happiness Sadness Surprise

  • BU-3DFE - neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted.

  • Face Recognition Grand Challenge Dataset - Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.

  • Gavabdb - Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images

  • 3D-RMA - Up to 100 subjects, expressions mostly neutral. Several poses as well.

  • SoF - 112 persons (66 males and 46 females) wear glasses under different illumination conditions.

  • IMDB-WIKI - IMDB and Wikipedia face images with gender and age labels.

Action Recognition Datasets -

Object detection and recognition -

Handwriting and character recognition -

Aerial Images -

  • Aerial Image Segmentation Dataset - 80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0.

  • KIT AIS Data Set - Multiple labeled training and evaluation datasets of aerial images of crowds.

  • Wilt Dataset - Remote sensing data of diseased trees and other land cover.

  • MASATI dataset - Maritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions.

  • Forest Type Mapping Dataset - Satellite imagery of forests in Japan.

  • Overhead Imagery Research Data Set - Annotated overhead imagery. Images with multiple objects.

  • SpaceNet - SpaceNet is a corpus of commercial satellite imagery and labeled training data.

So, these are some of the datasets which you can try your hands at. We will keep updating our repository for more such image datasets. We will also cover some of the other datasets in other domains. Thanking you again for reading this blog. If you find it helpful do like, comment and share this post. If you have any questions do mail me

Drop Me a Line, Let Me Know What You Think

                                                                                                  © Subham Tewari                                                                                        Privacy Policy