/images/yama1.jpg

Predicting Real Estate Prices in New York State: A Machine Learning Approach

In this project, I developed a machine learning model to predict transaction prices for real estate properties in New York State. Leveraging regression models and extensive data analysis, the goal was to achieve a model with a mean absolute error (MAE) below the project’s win condition of $70,000.

The project involved comprehensive data cleaning, exploratory data analysis, and feature engineering to ensure high-quality input data for the models. Various regression algorithms, including Lasso, Ridge, Elastic Net, Random Forest, and Gradient Boosting Trees, were trained and evaluated for their prediction performance.

Through careful feature engineering and preprocessing techniques, such as standardization and outlier handling, the models demonstrated impressive performance in predicting real estate prices. The Random Forest model emerged as the top-performing model, achieving the highest R2 score and the lowest MAE among the evaluated models.

The project’s outcomes have significant implications for the real estate industry, enabling more accurate and data-driven pricing strategies. The model can assist real estate investment firms in optimizing pricing decisions, assessing risks, and reducing costs associated with manual appraisals. The next steps involve deploying the winning model, continuous monitoring, incorporating external data sources, evaluating model robustness, and collaborating with stakeholders for further improvements.

To explore the project in detail, please visit the ML-real-estate-prediction GitHub repository.

Developing Metrics for Structural-Level Discrimination: A Data Science Project Focused on LGBTQ+ Populations

In this data science project, I embarked on the challenge of quantifying discrimination using real-world data. Structural stigma, which encompasses discrimination embedded in societal-level conditions, cultural norms, and institutional policies, has a significant impact on LGBTQ+ individuals in the U.S. While established methods exist to measure individual and interpersonal stigma, quantifying structural stigma poses a unique challenge due to its complex macro-social nature.

To address this, I designed an approach using Confirmatory Factor Analysis, utilizing a variety of variables that represented both explicit and implicit attitudes specific to LGBTQ+ individuals from an open data set, Project Implicit. This allowed us to create an index of county-level structural stigma. The resulting factor, when visualized on a U.S. map, provides an illuminating representation of the structural stigma landscape across the country. It helps us better understand the macro-social context and pervasive challenges that LGBTQ+ populations in the U.S. encounter in their day-to-day lives.