No Free Hunch

Every year, thousands of entrepreneurs launch startups, aiming to make it big. This journey and the perils of failure have been interrogated from many angles, from making risky decisions to start the next iconic business to the demands of having your own startup. However, while the startup survival has been written about, how do these survival rates shake out when we look at empirical evidence? As it turns out, the U.S. Census Bureau collects data on business dynamics that can be used for survival analysis of firms and jobs. In this tutorial, we build a series of functions in Python to better understand business survival across the United States.

Today, we're excited to announce a new type of submission on Kaggle. Instead of an Id column, your next submission just might start with the words: import kagglegym. Thanks to our partner Two Sigma, we have launched our inaugural Code Competition: The Two Sigma Financial Modeling Challenge. For the first time, we are accepting and scoring the algorithms that create the numbers, instead of just the numbers themselves.

Kaggle users have created nearly 30,000 kernels on our open data science platform so far which represents an impressive and growing amount of reproducible knowledge. In this blog post, I feature some great user kernels as mini-tutorials for getting started with mapping using datasets published on Kaggle. You’ll learn about several ways to wrangle and visualize geospatial data in Python and R including real code examples and additional resources.

Does every painter leave a fingerprint? In the Painter by Numbers playground competition, Kagglers were challenged to identify whether pairs of paintings were created by the same artist. In this winner's interview, Nejc Ilenič describes his first place convolutional neural network approach. The greatest testament to his final model's performance? His model generally predicts greater similarity among authentic works of art compared to fraudulent imitations.

The Red Hat Predicting Business Value competition ran on Kaggle from August to September 2016. Over 2000 teams competed to accurately identify potential customers with the most business value based on their characteristics and activities. In this interview, Darius Barušauskas (AKA raddar) explains how he pursued and achieved his very first solo gold medal with his 1st place finish. Now an accomplished Competitions Grandmaster after one year of competing on Kaggle, Darius shares his winning XGBoost solution plus his words of wisdom for aspiring data scientists.

Challenge conventional wisdom about the American people, study over 100 years of global weather data, and uncover themes underlying creativity and innovation. We invite you to analyze some of the world's most interesting data made available on Kaggle Datasets by the US Department of Commerce. Read more about these datasets which were expertly prepared for analysis and how you can get involved. We want to see what you create—authors of top kernels will receive our newest Kaggle swag.

Can daily news headlines be used to accurately predict movements in the stock market? This is the challenge put forth by Jiahao Sun in the dataset featured in this interview. Jiahao curated the Daily News for Stock Market Prediction dataset from publicly available sources to use in a course he’s teaching on Deep Learning and Natural Language Processing and share with the Kaggle community.

On our open data analytics platform, you can find datasets on a topics ranging from European soccer matches to full text questions and answers about R published by Stack Overflow. Whether you're a researcher making your analyses reproducible or you're a hobbyist data collector, you may be interested in learning more about how you can get involved in open data publishing. In this blog post, I dive into the details of how to navigate the world of open data publishing on Kaggle where data and reproducible code live and thrive together in our community of data scientists.

Kagglers competed in the TalkingData Mobile User Demographics challenge to predict the gender of mobile users based on their app usage, geolocation, and mobile device properties. In this interview, Danijel Kivaranovic and Matias Thayer, whose team utc(+1,-3) came in third place, describe how actively sharing their solutions and exchanging ideas in Kernels gave them a competitive edge with their Keras + XGBoost solution.

The currently ongoing Seizure Prediction competition—hosted by Melbourne University AES, MathWorks, and NIH—invites Kagglers to accurately forecast the occurrence of seizures using intracranial EEG recordings. In this blog post, you'll learn about the contest's potential to positively impact the lives of those who suffer from epilepsy, outcomes of previous seizure prediction contests on Kaggle, as well as resources which will help you get started in the competition including a free temporary MATLAB license and starter code.

Following his interest in applying his skills in math and computer science to real world data, David (AKA cactusplants) recently discovered the world of data science: "the perfect science". After 8 competition finishes in the top 10% and a number of popular kernels, his portfolio quickly piqued the interest of his new employer, SeamlessML. In this interview, David—a Competitions Master—describes how his experience on Kaggle led him from third place in the Draper Satellite Image Chronology competition to his new role as a data scientist.