Machine Learning Projects

I have performed three different ML projects. The first two projects are of non-biology topics and the last one is related to synthetic biology.


Next housing market crash

download

This project aimed to identify cities that can serve as early signs of the future housing market crash. I identified the ~250 cities that responded earliest during the housing market crash in 2007 by computing the time points when the median house prices were at the maximum (tipping points). I showed the cities’ common characteristics were small in their city populations and low in their growth rates based on Gaussian Mixture Model clustering. 34% of the cities were found to already reach their tipping point (max median house prices) as of October 2016 and start to decrease in their median house prices. In other words, the risk level of the next housing market crash was 34%. In addition, I identified most investable cities in the US, which are Bellevue and Redmond in Washington and Palo Alto, Cupertino, Sunnyvale, San Francisco, San Mateo, Redwood City, Mountain View, South San Francisco, Daly City, and Montebello in California.
Data used in this project:
The historic median house prices by city (downloaded from here) was obtained from the Zillow Research.
The city populations of year 2010 to 2014 were obtained from census.gov data.
Algorithms used: Gaussian Mixture Model.

Gold price prediction

[github]
Data used in this project: Yahoo finance
Algorithm used: linear regression

Parameter inference for synthetic gene circuits (stochastic dynamical systems)

[github]
This project has been performed with my undergrad (Keagan Moo).

Algorithm used: Approximate Bayesian Computation (ABC)
We aimed to infer system parameters of gene regulatory networks, based on stochastic trajectories of observed signals (fluorescent light intensity over time). Parameter correlation matrices are systematically taken into account in the ABC method.