Zhifan (Jeff) Sang
Data Scientist
- Email: zfsang@gmail.com
- Blog: zfsang.github.io
- Github: zfsang
Summary
I am an enthusiastic data scientist with strong statistics background,
who has built multiple data-driven projects and packages.
I am passionate about building high quality predictive models and
user-friend analytics web app with modern machine learning and web technologies.
Through being conversant in data engineering, predictive analytics, experiment design and inference, visualizations,
I am able to execute front-to-end on a product or project.
Experience
Data Engineer at Radius Intelligence
May 2017- Present
– Worked on data crawling, ETL, QA analysis to digest TB-size data via batch/streaming processes
– Designed, researched, and developed a business-related entity extraction pipeline on webpages with natural language processing (NLP) models and human evaluations
– Research and develop big data warehouse solutions with Spark/Hive, and collaborate across teams to make the data highly available internally for business analytics and modeling
Self-Driving Car Mentor at Udacity Enterprise
January 2017- August 2017
– Provide technical support in computer vision/machine learning, and one-on-one mentorship to students
– Built and optimized deep learning models for traffic sign recognition, lane finding, vehicle tracking,
driving behavior clone with sklearn, cv2, Keras, Tenserflow in Python
Statistician at Assurex Health
June 2016 - December 2016
– Provided statistical support for genetic testing based precision medicine, and managed 1.2M medical
claim records (EHR), and predicted adherence with MSE=0.08 using RNN
– Built an interactive meta-analysis web app specialized in genetic association analysis in R Shiny and
data processing tools in Python, which reduced 30% analysis time in science teams
– Improved the quality of consult calls by identifying keywords with medical marketing teams
Research Assistant at [Emory University] (http://www.emory.edu)
April 2015 - May 2016
– Implemented parallel MCMC to distribute computation and eliminate network traffic with 3X speed up – Developed R package for missing data imputation, which was widely used in the department
– Simulated protein images using spatial autoregression and tested image density with ANOVA
Skills & Expertise
These are languages, tools, and practices to which I have had exposure over the
past 3 years or so. Those things which enjoy routine usage in my daily work are
denoted with a ^†^ symbol.
Programming Languages
- Python^†^
- R^†^
- JavaScript
Frameworks & APIs
Software & Tools
- Amazon Web Services
- Git^†^
- Mac OS X^†^
- MongoDB^†^
- MySQL
- Nginx
- PostgreSQL
- Sublime Text
- tmux^†^
- Ubuntu Linux
- Vim^†^
Education
Emory University,
MS, Biostatistics, 2014-2016
Nankai University,
BS, Statistics, 2010-2014
Honors and Awards
Interests
- Photographing
- Marathon/Triathlon
- Backpacking/Trekking
©2017 Zhifan (Jeff) Sang. All rights reserved.