Overview
Preparing for data science interview is hard, people can be asked a wide range of questions without any focus and guidance. In the interview, sometimes people are asked to do hard-core coding on data structure and machine learning, sometimes they are asked to solve business problems using data and experiments, sometimes, they are going to work out probabilities and statistics questions, sometimes they have to figure out the right way to query using SQL.
Things can be overwhelming sometimes, but calm down, we can tackle the data science interviews. Personally, I would prepare the interview in the following three steps.
- Understand the industry and available positions
- Learn and review algorithms & data structure, machine learning, statistics, SQL, and business analytics.
- Prepare specifically for the upcoming phone/on-site interviews
People often focus on the technical side more when preparing for interviews. There is nothing wrong with that. Meanwhile, it would be great if you could have a bigger picture of the business and industry by reading online, talking to data science practitioners, looking and applying for a variety of jobs. Since the data science is such a new field which doesn’t exist 6 years ago, there are not that many standard and routines in the industry. It would be great to keep fresh understanding of the trend and job market of data science world. In addition, networking is one of the best way to get into the data science world since people tend to hire more people through referral, informal meeting, reaching out in data science then other fields.
My Experience
Getting into the data science world is not easy. I have been dreaming of being part of the cool data science world since senior year of college, it takes years for me to finally get there.
In order to get there, I get academic training in math/statistics/biostatistics. My adventure to the data science starts two years ago. I did my research in parallel data simulations and missing data imputation, worked as a statistician for a biotech startup, and learned about machine learning and big data infrastructure in Georgia Tech, Galvanize, Udacity and BitTiger. Now I will be working on data aggregation and engineering in a B2B marketing analytics company.
Along the way, I have been applying and interviewing data science positions with lots of struggles. Sometimes you think you know regression, data structure and algorithms, but actually you don’t. The hard way to figure that out is doing lots of interviews. Each time failure is a new starting point. The old saying never goes run, practice makes prefect.
For more detailed resources I used for the interview preparation, please see the Resources section.
Anyway, best of luck, data science fellows!
Resources
In this section, I will briefly introduce the resources and notes I used to prepare for the data science interviews. Those things which enjoy routine usage in my interview preparation are
denoted with a ♡ symbol. Definitely check it out if you have not done so!
Coding Preparation
- White board
- You defintely need a white board to practice coding
- Leetcode ♡
- No.1 online coding practice site that you should visit on daily basis for coding interview preparation
- Weekly contest on coding is very good in term of the exercise and rewards
- HackerRank
- Many company uses HankerRank for online coding challenges, so make sure you are familiar with the environment.
- Codewar
- SQLZoo
- Regexone ♡
- Hands-on walk-through on regular expression
- PostgreSQL Exercises
- highly recommend using Postico ♡ to set up the local Postgres database to practice
Data Science/ Statistics Preparation
- Awesome Data Science
108 Data Science Interview Questions by Company
- If you guys want to work on the solutions together, you can join me in the shared Google Doc
-
- Answers can be found here
- Probability Cheatsheet
- How to interpret p-value
Analytics Preparation
- Lean Analytics Workshop - Alistair Croll and Ben Yoskovitz
- Google the following concepts:
- Funnel analysis
- Cohort study
- SEO
- KPI
- User Segmentation
- Click trough rate, conversion rate, bounce rate, retention rate, churn
- Bucket analysis
- Life time value
- Channel
- Seasonality
Books
- Data Science Interviews Exposed ♡
- A very good book including detailed real data science interview questions and answers.
- Be aware that there are minor mistakes on the calculations.
- A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning (A Collection of Programming Interview Questions) (Volume 6)
- A collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II): Hands-on Big Data and Machine … Programming Interview Questions) (Volume 7)
- The Elements of Statistical Learning
- An Introduction to Statistical Learning ♡
- Cracking the Coding Interview
- Grokking Algorithms: An illustrated guide for programmers and other curious people
- Lean Analytics: Use Data to Build a Better Startup Faster
School
Online
- Udacity
- A/B testing ♡ (Taught by Googlers, highly recommended)
- Coursera
- Machine Learning
- Functional Programming Principles in Scala
- Stanford
- Blogs
- web analytics
- RNN
- CNN
- Udacity
Bootcamp
- Galvanize
- Data Incubator
- Insight Data Science
Master Degree
There are many choices, a lot of schools offer one-year or two-year MS programs on data science, data analytics, computer science, machine learning, statistics, which are all related to data science. My advice here is consider them if you do not have training or experience in computer science or statistics, and you really want to learn deep into methodologies to gather with applications. The cost of doing a MS degree is not ignorable, plus the data science filed is evolving a lot. There is no guarantee that you could learn the latest data science and machine learning technologies find a desired data science job afterward.
Data Science Challenges
- Kaggle
- Yelp Challenge
- Udacity Didi Challenge
- Other data team challenges and hackathon
Python 101
- https://www.edx.org/course/introduction-python-data-science-microsoft-dat208x-7
- https://python101.pythonlibrary.org/
- http://interactivepython.org/runestone/static/pythonds/index.html
Coding/Debuging
- Check variable names (spelling) and other typos
- Check base/corner cases (when the input is None, [], -1,…)