About

Hi there! I am a research data scientist at NYU Langone Medical Center with a strong interest in machine learning and statistics, particularly their applications in healthcare. My expertise encompasses both fundamental and practical aspects of machine learning and deep learning. Driven by a passion for extracting insights from complex datasets, I aim to use my skills for impactful data-driven research and innovation, developing open-source tools, and contributing to various industries.

Skills

Statistics

Machine Learning

AI in Healthcare

Natural Language Processing

Bayesian Inference

Visualization

Data Engineering

Big Data

Education

MS in Data Science

September 2022 - Present
Relevant Coursework
  • Optimization and Computational Linear Algebra
  • Machine Learning
  • Big Data
  • Bayesian Machine Learning
  • Natural Language Processing and Representation Learning

B.S.M. in Quantitative Finance

September 2017 - June 2021
Relevant Coursework
  • Mathematical Statistics
  • Derivatives Pricing
  • Financial Technology
  • Advanced Calculus



Publication

Predicting Risk of Alzheimer's Diseases and Related Dementias with AI Foundation Model on Electronic Health Records

Structural Equation Modeling of the Marine Ecological System in Nanwan Bay Using SPSS Amos

Professional Experience

Berkley Center for Entrepreneurship, NYU Stern

June 2023 - May 2024

Data Engineer

  • Constructed a cloud database integrating multiple data sources to monitor the engagement and growth progress of NYU-affiliated startups, resulting in a 59-fold increase in data collection size
  • Designed and implemented Mage.ai ETL data-pipelines with Python from scratch for preprocessing and cross-referencing founders and startups information from diverse sources, culminating in the deployment of a public-facing Tableau
  • Assisted the Dean of Stern School of Business and the Director of the Berkeley Center in completing the annual performance report

Pinkoi

July 2021 - Jan. 2022

Data Scientist Intern - Product Team

  • Modeled (Regression and Machine Learning methods) quantitative and qualitative data for user behavior research, and conducted experiment A/B testing to optimize e-commerce web/app platforms
  • Constructed user road map with exhaustive exploratory data analyses and hierarchical regression models to support business and product teams' understanding of different phases of user behavior and increase YOY user conversion rate by 27% in six months
  • Supported product feature designing and key objective goal planning with a 12-person team consists of engineers, product designers, and a project manager
  • Administrated and taught a series of SQL and data visualization sessions to 60 plus colleagues from sales team and product team

Research Experience

NYU Langone Health

June 2023 - Present

Research Data Scientist

  • Leveraged time series Electronic Health Records data to construct a Generative Pre-trained Transformer (GPT) model for early disease detection by performing classification learning and disease specific fine-tuning
  • Utilized XPos rotary positional embedding method and experimented MAMBA model to resolve extrapolation challenge of LLM models and improved precision recall score by 0.19
  • Reweighted token representations to reduce impact of chronic diseases on inference performance
  • A pre-print of our findings is forthcoming; models will be open-sourced.

National Kaohsiung University of Science and Technology

Apr 2022 - Sep 2022

Project Assistant

  • Worked with a multidisciplinary team; co-authored a research paper in the Sustainability journal
  • Proposed the use of Structural Equation Model (SEM) to explore causal relationships between environmental and marine-life factors
  • Applied principal component analysis (PCA) and factor analysis to identify latent factors in marine ecosystems
  • Discovered that upwelling currents have a limited direct impact on marine life, challenging previous beliefs
  • Participated in drafting and iterative revision of the manuscript, particularly during the peer review process

Department of Quantitative Finance, NTHU

June 2020 - July 2021

Research Assistant

  • Processed stand-alone CSR reports and MSCI KLD scores in Python by conducting web scraping, data cleansing, and textual analysis
  • Reviewed and gave presentations on over 20 academic papers regarding topics such as, NLP analysis in finance, and influence of CSR performance on firm financial performance

Projects

Bayesian Optimization for Discrete Choice Model Likelihood Estimation

Predicting Stock Price Movements Using Daily News

Bayesian Posterior Approximation

Music Recommendation Systems

Modeling Priors in Bayesian Inference Insights from a Number Game Experiment

Hedging Climate Change News

Analysis on Health Data for Medical Insurance

Contact

My Address

1275 E University Dr

Unit 212

Tempe, AZ 85281

Social Profiles

Email

allen.sl.huang@gmail.com

sh7008@nyu.edu

Contact

+1 480-401-8112

-->