about

Zhifan (Jeff) Sang

Data Scientist

Summary

I am an enthusiastic data scientist with strong statistics background,
who has built multiple data-driven projects and packages.
I am passionate about building high quality predictive models and
user-friend analytics web app with modern machine learning and web technologies.
Through being conversant in data engineering, predictive analytics, experiment design and inference, visualizations,
I am able to execute front-to-end on a product or project.

Experience

Data Engineer at Radius Intelligence

May 2017- Present

– Worked on data crawling, ETL, QA analysis to digest TB-size data via batch/streaming processes
– Designed, researched, and developed a business-related entity extraction pipeline on webpages with natural language processing (NLP) models and human evaluations
– Research and develop big data warehouse solutions with Spark/Hive, and collaborate across teams to make the data highly available internally for business analytics and modeling

Self-Driving Car Mentor at Udacity Enterprise

January 2017- August 2017

– Provide technical support in computer vision/machine learning, and one-on-one mentorship to students
– Built and optimized deep learning models for traffic sign recognition, lane finding, vehicle tracking,
driving behavior clone with sklearn, cv2, Keras, Tenserflow in Python

Statistician at Assurex Health

June 2016 - December 2016

– Provided statistical support for genetic testing based precision medicine, and managed 1.2M medical
claim records (EHR), and predicted adherence with MSE=0.08 using RNN
– Built an interactive meta-analysis web app specialized in genetic association analysis in R Shiny and
data processing tools in Python, which reduced 30% analysis time in science teams
– Improved the quality of consult calls by identifying keywords with medical marketing teams

Research Assistant at [Emory University] (http://www.emory.edu)

April 2015 - May 2016

– Implemented parallel MCMC to distribute computation and eliminate network traffic with 3X speed up – Developed R package for missing data imputation, which was widely used in the department
– Simulated protein images using spatial autoregression and tested image density with ANOVA

Skills & Expertise

These are languages, tools, and practices to which I have had exposure over the
past 3 years or so. Those things which enjoy routine usage in my daily work are
denoted with a ^†^ symbol.

Programming Languages

Frameworks & APIs

Software & Tools

Education

Emory University,
MS, Biostatistics, 2014-2016

Nankai University,
BS, Statistics, 2010-2014

Honors and Awards

Interests

  • Photographing
  • Marathon/Triathlon
  • Backpacking/Trekking

©2017 Zhifan (Jeff) Sang. All rights reserved.