5 Kaggle Competitions Every Aspiring Data Scientist Should Participate In
Opting to compete in any competition will give you some much-needed practical experience. Participating in data science competitions is an excellent opportunity to hone your coding and analytical skills, and it will also increase your exposure to real-world problems.
In this blog, we outline 5 Kaggle competitions that every aspiring data scientist should take a look at and participate in!
These competitions are under Kaggle’s Getting Started category, which is meant for beginners in the field. If you’re looking for a simple competition with low stakes and plenty of tutorials, consider one of these. However, do note that there are no monetary prizes or awards for these competitions.
Why Kaggle?
Kaggle is a well-known platform for such competitions and challenges. It also offers a diverse range of challenges that cater to various skill levels. Opting to go for a Kaggle competition can help with exploring competitions that align with your interests, future area of study, and even hobbies!
Who is eligible?
Before we dive into the list, please note that the eligibility rules of the competitions include the following statement: “The older of 18 years old or the age of majority in your jurisdiction of residence (unless otherwise agreed to by Competition Sponsor and appropriate parental/guardian consents have been obtained by Competition Sponsor).” If you are below the age of 18, you will likely need your guardian’s written permission to take part in these contests.
1. House Prices - Advanced Regression Techniques
Start date: August 30, 2016
End date: Rolling leaderboard
If you have some experience with R or Python and machine learning basics, and you are looking to get some practical applications for this knowledge, you can look at ‘House Prices’.
In this competition, your goal is to predict the final prices of residential homes in Ames, Iowa, based on a dataset with 79 explanatory variables. You can work in teams of up to 10 people. You will also be provided with a starter notebook for initiation and tutorials on Kaggle Learn that can guide you!
Start date: February 23, 2022
End date: Rolling leaderboard
Spaceship Titanic sets up a futuristic scenario where a spaceship suffers a collision that sends half its passengers to a different dimension. Using records from the spaceship’s damaged computer system, you must predict which passengers were transported to rescue them.
Your results will be judged based on their classification accuracy and the percentage of predicted labels that are correct. Through this competition, you can hone your programming skills, and data preprocessing skills and delve into the fundamentals of machine learning!
Another similar competition geared for beginners is the Titanic – Machine Learning from Disaster competition.
3. I’m Something of a Painter Myself
Start date: August 28, 2020
End date: Rolling leaderboard
If you’re particularly interested in computer vision and the intersection of data science and visual art, you should definitely look at participating! The challenge is to build a Generative Adversarial Network (GAN) — comprising a generator and discriminator model — to generate 7,000 to 10,000 Monet-style images.
Your work will be evaluated based on the MiFID (Memorization-informed Fréchet Inception Distance), the smaller it is, the more accurate your images are. Working with GANs will train you in the field of artificial intelligence and its use in image generation while also honing your coding skills in general.
Start date: July 25, 2012
End date: Rolling leaderboard
In this competition, you are required to correctly identify digits from a dataset of handwritten images. Through the process, you will learn about simple neural networks and classification methods such as SVM and K-nearest neighbors.
While working in teams of up to 10 people, you are encouraged to experiment with various algorithms and compare results as you work. An added bonus is that Kaggle offers several tutorials to help guide you during the contest!
5. Contradictory, My Dear Watson
Start date: July 29, 2020
End date: Rolling leaderboard
In this Kaggle competition, participants are challenged to develop a computer model using Natural Language Processing (NLP) to understand how pairs of sentences relate to each other. The goal is to predict if one sentence implies, contradicts, or has no connection with the other.
You will work with a dataset that includes pairs of sentences in fifteen different languages and your model must predict the relationship labels between them using Natural Language Inferencing. This is a good opportunity for beginners to explore NLP, machine learning, and coding skills in the context of language and linguistics.
If you’re looking to build unique projects in the field of AI/ML, consider applying to Veritas AI!
Veritas AI was founded by Harvard graduate students, and through the programs, you get a chance to learn the fundamentals of AI and computer science while collaborating on real-world projects. You can also work 1-1 with mentors from universities like Harvard, Stanford, MIT, and more to create unique, personalized projects. In the past year, we had over 1000 students learn data science and AI with us. You can apply here!
Image Source: Kaggle Logo