Veritas AI

View Original

10 Data Science Competitions for High School Students

If you’re a high school student passionate about data science, then we hope you’re looking to participate in data science competitions. Competitions allow you to apply your theoretical knowledge in practical, real-world challenges. Many competitions will hone your leadership and project management skills, but will also allow you to develop collaborative skills. 

In this list, we have detailed 10 data-science competitions for high school students. Note that while all  competitions are data science centric, some are hackathons which require  you to heavily rely on the use of data science concepts! 

1. Kaggle Big Data Bowl

Kaggle is a data science and AI platform that hosts contests, normally hosted by large organizations, with some form of monetary prize. Their NFL Big Data Bowl 2023 is a competition where you can use the NFL dataset to generate actionable and novel stats. 

You will have access to the ‘NFL’s Next Gen Stats’ dataset, including player tracking, play, game, and player information. Your role is to create new metrics and stats to analyze linemen on passing plays. The submissions are graded based on 5 categories - innovation, accuracy, relevance, clarity, and data visualization. As a bonus, you can win prizes of up to $10,000!

Location: Virtual

Eligibility: Open to everyone who has a Kaggle account. Note: If you are under 18, you will have to obtain parental permission and be approved by the sponsor to participate

Registration deadline + Submission Deadline: To be announced for 2023 – 2024, Registration is likely to open around October 2023, and submissions will end in January 2024 (based on previous dates)

 2. NWEA Data Science for Everyone (DS4E)

The NWEA Data Science for Everyone (DS4E) Competition is a competition that was created to increase diversity in the field of data science. In the competition, you and your team will receive data from the Programme for International Student Assessment (PISA) study, which is a study of 15-year-old students’ knowledge of mathematics, reading, and science.

In particular, you will use a subset of the full data on students’ mathematics assessment scores, along with a host of demographic and psychological variables. Your challenge is to predict mathematics assessment scores by building machine-learning models based on the demographic and psychological variables in the dataset.

The competition is held on Kaggle, and the winners of the competition (judged through the accuracy of their model) can win scholarships! 

Location: Virtual

Eligibility: Open to all high school students

Prizes: Winners can win scholarship money (amounts not specified). 

Registration deadline: Rolling registration

Competition Date: Late February - Mid-May (dates are not specified)

3. Machine Hack’s Data Centric AI Competition

MachineHack, in collaboration with Cleanlab, have launched the Data-Centric AI Competition where you will predict the correct class associated with each datapoint in a test set using a machine learning classifier trained on a training dataset. There are 2 components to the event and in the first 5 weeks, you will be analyzing a text data set, while in the latter 3 weeks, you will be analyzing image data!

For the 2023 edition, your text data set consists of reviews from Amazon and their ‘star’ rating. You will have to build and optimize a classification model that predicts the star rating given a review. On the other hand, the image dataset consists of alpha-numeric character images. You will have to build and optimize a classification model that predicts the character type given an image. 

Location: Virtual

Eligibility: Open to all students who have a Machine Hack account

Registration deadline: Rolling Registrations 

Competition Date: January 23rd - March 21st, 2023 (for both components of the competition). 

4. Data Crunch

Data Crunch is a team of 42-Paris students and a former finance teacher from ESSEC Paris. Currently, they scrape financial data and model this into investment strategies using statistics and machine learning. Data Crunch conducts weekly sprints where you can build models and submit your predictions to win cash prizes!  

Each round comes with its own dataset, and the top 3 best models will be selected to join Data Crunch’s Arena. If your model continues to stay on top, you can win 200€ monthly!

Location: Virtual

Eligibility: Open to everyone. However, Data Crunch recommends that you have some knowledge in statistics and machine learning

Registration deadline: Rolling Registrations 

Competition Dates: Weekly

5. Bitgrit Competitions

Bitgrit is a platform where you can find online challenges to develop your skills in data science! For example, one of the active challenges is to develop an ML model to predict bird species using their attributes and geographical locations! 

The problem statement for this challenge is that a non-profit conservation society has taken up the task of tracking and estimating a population of a known species of bird. You are offered 2 data sets - a train data set, and a test data set, both having the bird data for locations 1-3. With this, you will have to use their genetic traits and location data to predict the species of bird that has been observed! 

There are other active challenges such as a weather forecast challenge, so you can choose a challenge that is suitable for your skill level! 

Location: Virtual

Eligibility: Open to everyone

Registration deadline: Varies based on competition

Competition Date: Varies based on competition

6. Temple University’s OwlHacks

OwlHacks is Temple University’s annual, student-run hackathon. This year, the event is held in-person and will include workshops, panels, games and more! However, they also hold a competition where you can submit a project based on predetermined topics. For 2023, the tracks are: 

- Health and Wellness - With the Covid-19 pandemic highlighting the importance of good health, this track could focus on developing solutions related to improving healthcare accessibility and affordability, promoting healthy lifestyles, and preventing the spread of disease. For example, MD Insider uses machine learning to better match patients with doctors by collecting data from various institutions to analyze physician factors and pair patients with doctors who are able to meet their individual needs.

- Sustainable Communities - In recent years, there have been numerous incidents in the US that have put young people at risk. How can we utilize technology to enhance the safety of children, students, and young adults within our communities? With security systems in these schools using datasets and machine learning to assess threats, how do we make sure these systems are accurate and avoid profiling students? 

- Inclusive Education - How can we use technology to improve access to education and promote social justice for all students, regardless of their background of location. This track could focus on developing solutions related to online learning, language translation, adaptive technologies for students with disabilities, combating discrimination and prejudice, supporting marginalized communities, and promoting equal opportunities.

Location: Temple University’s College of Science and Technology building, Philadelphia

Eligibility: Open to everyone

Registration deadline: Rolling Registrations 

Competition Dates: September 23rd-24th, 2023

7. Driven Data Competitions

Driven Data works on projects that are at the intersection of data science and social impact, in areas such as development, health, education, research and conservation, and public services. They run online data science and machine learning competitions with social impacts, with projects from organizations like NASA, Microsoft, Meta AI, and The World Bank! 

Currently, Driven Data is holding a Research Rovers prize competition with NASA where you will prototype and demo an AI-based research assistant solution for the NASA workforce. Some tasks that the assistant must do are identifying seminal papers in particular domains, summarizing research results across different publication formats and standards, and more! 

Data Driven has similar active projects, so everyone can find a project that they’re interested in! For example, you can participate in a competition by Meta that focuses on content tracing, or you can use unsupervised machine learning to extract insights about older adult falls from emergency department narratives.

Location: Virtual 

Eligibility: Open to everyone who are U.S. residents/ citizens, and are above 18 years of age or the legal majority in their current country of residence

Registration deadline: Rolling Registrations 

Submission deadline: October 2nd, 2023

8. Saint Joseph’s University’s Analytics and Data Visualization Competition

This competition by Saint Joseph’s University is a 6-week competition that culminates in a one-day event on campus at the Haub School of Business. In the competition, you will have to explore a data set, create visualizations from that data and present their findings to the judges!

You will have 6-weeks to create your visualization, and will have a mentor from SJU, on-demand visualizations tutorials and synchronous sessions to get feedback on your project!

Location: Virtual + Saint Joseph’s University Campus, Philadelphia

Eligibility: Open to all high school students who live in Philadelphia

Registration deadline: Rolling Registrations 

Competition Date: Tentative dates based on the 2023 edition is January 2024 - March 2024. 

9. CodaLabs

CodaLab Competitions is an open source framework for running competitions that are typically about machine learning and data science. The topics of these competitions can range from comparing dictionaries and word embeddings to detecting partially fake audio. 

One active competition right now is the Alice Benchmarks competition. Alice Benchmarks aims to test domain adaptation methods when the source data is synthetic data. In this competition, there are 3 tasks - pedestrian re-identification, vehicle re-identification and scene semantic segmentation. For the object re-id, the source and target domains will be in completely different classes. Your role will be to retrieve the pedestrian instances of the same ID as the query image. If you are interested in learning more about this specific competition, here is the overview! 

Codalabs has similar competitions across all experience levels. It is a great opportunity to explore real life applications of data science! 

Location: Virtual

Eligibility: Open to everyone

Registration deadline: Varies based on competition

Competition Date: Varies based on competition

10. MathWorks Math Modelling Challenge

Mathworks’ Math Modelling Challenge is a free contest for high school juniors and seniors in the U.S. During the competition weekend, you will login and gain access to specific real-world problems for which you will have to create a math and data driven viable solution. 

For 2023, the challenge was regarding the growing use of electric bikes. You had the choice of answering 3 questions: 

- Q1: The Road Ahead—Create a model to predict growth in e-bike sales. Predict the number of e-bikes that will be sold two and five years from now.

- Q2: Shifting Gears—In addition to being able to predict growth, it is often important to understand the underlying causes of that growth. For e-bikes, there is a lot of debate about what exactly led to the increase in their usage and/or sales. Some commonly cited reasons include: environmental awareness, gas prices, personal finances, health and exercise, and the “coolness factor” (increased visibility of others with e-bikes increases demand). Consider one or more factors that may have contributed to e-bike growth and use mathematical modeling to argue whether that factor (or factors) was a significant reason for the growth of e-bike usage.

- Q3: Off the Chain—As more people choose e-bikes as their preferred mode of transportation, there may be reduced usage of other modes of transportation, like gas vehicles and regular bikes. Quantify the resulting impacts on carbon emissions, traffic congestion, health and wellness, and/or other factors you deem important.

Your submission should include a one-page executive summary with your findings, written in the form of a brief to the head of the U.S. Department of Transportation, followed by your solution paper—for a maximum of 20 pages. You can look at the specificities of the challenge here

Location: Virtual

Eligibility: 

- High schools in the U.S. (including US territories and DoDEA schools) are eligible. Schools with sixth form students (age 16-19) in England and Wales are eligible. International and exchange students may participate in the Challenge.

- A maximum of two (2) teams per school, each consisting of three (3) to five (5) students and one (1) teacher-coach, may register for the M3 Challenge. 

- Teams must comprise  of high school juniors and seniors or sixth form students (ages 16-19) from the same school; no exceptions will be made to allow underclassmen.  For M3 purposes, your school is the school name that will appear on your diploma.

- Dual/joint enrollment programs, magnet programs, and other academic or training programs that draw students from more than one high school for a subset of classes or academic enrichment may be eligible if all of the following criteria are met, and at the discretion of SIAM. You can view the detailed criteria here

- Homeschool and cyber school students may either form their own team(s) or request to participate on a team at the school in the district or community in which they reside. All efforts to contact the local school are up to the home or cyber schooled student. However, homeschool and cyber school students may instead elect to form their own team(s), and may do so under these guidelines!

Prizes: Winners win scholarships that total up to $100,000. 

Registration deadline: February 24th, 2023

Competition Date: March 3rd, 2023 - March 6th, 2023

If you’re looking to build unique projects in the field of AI/ML, consider applying to Veritas AI! 


Veritas AI was founded by Harvard graduate students, and through the programs, you get a chance to learn the fundamentals of AI and computer science while collaborating on real-world projects. You can also work 1-1 with mentors from universities like Harvard, Stanford, MIT, and more to create unique, personalized projects. In the past year, we had over 1000 students learn data science and AI with us. You can apply here!

Image Source: Kaggle Logo