Kiel Dang

Logo

πŸ’» Experienced ML Engineer. Previously contributed to projects at Hitachi Vantara, Definity Insurance, Northeastern University, and Alibaba E-commerce.

🏫 PhD student in AI, specializing in LLMs, based in New York, USA. πŸŽ“ Valedictorian of Master in DS, Machine Intelligence (Fall 2021 Cohort) with an overall GPA of 4.0.

View My LinkedIn Profile

View My Medium Blog

Return my Github

My highlighted Data Science projects.


πŸ“˜ Deep Learning and AI


✨ Construction Safety - Object Detection

This project is designed to identify unsafe holes on construction sites, helping to ensure the well-being of workers and the integrity of job sites. When combined with a Personal Protective Equipment (PPE) detection model, it forms a robust safety monitoring system deployed on Jetson-based edge inference systems.

πŸ› οΈ Workflow and Tech Stack: To train this model, I adopted an Iterative Training Process. Data preparation and deployment were accomplished with Roboflow, while model customization took place in Google Colab.

πŸ”¬ Techniques and Strategies: Transfer learning, Hyperparameter tunning, Multiple Deep Learning Algorithms to train and compare (YOLO, DETR, RCNN, COCO, UNet), Iterative training and Model Refinement.

Python Google Colab Roboflow RCNN YOLO DETR NVIDIA

View project on Github

View project on Google Colab


✨ Customer Service Chatbot with In-Context Learning

This project is a deep dive into the world of AI-driven customer service chatbots, enhanced by the power of in-context learning. We leverage the Llama Index and Language Model API to create a chatbot that understands and responds to customer inquiries effectively, transforming the way businesses provide support.

πŸ” Project Highlights:

\ \ \ \ \

View project on Github


πŸ“˜ Data Science and Machine Learning


✨ Dune Series Network Analysis and Community Detection

This project delves into the captivating β€œDune” book series by Frank Herbert using advanced data analysis techniques. By harnessing natural language processing and network science, we uncover the intricate web of character relationships and communities within this iconic science fiction universe. πŸ” Project Highlights:

View project on Github


✨ CV and Job Matching

This application predicts the matching percentage of a candidate’s resume to a job posting. It utilizes the Doc2Vec model, which represents job descriptions and resumes as numerical vectors. Doc2Vec combines the Continuous Bag-of-Words (CBOW) and Skip-Gram techniques to efficiently compare and calculate similarity between textual documents. The trained model can be easily deployed and hosted online (Azure), providing a convenient solution for matching CVs with job postings. Note: The algorithm serves as the first step in a use-case scenario where a company receives multiple job applications for various job postings. The second step involves employing the modified Gale-Shapley algorithm to index candidates for each job and select the best match.

Screenshot 2023-06-11 at 10 17 24 PM

View code on Github


✨ Sales forecasting using SARIMAX (Industry best practices)

This project follows industry best practices to address time series problems and involves key steps such as checking for stationarity, data transformation, decomposing models into components, anomaly detection, white noise checking, identifying orders, and performance measurement. The goal is to provide accurate sales forecasts for Walmart superstore and facilitate data-driven decisions.

View code on Github


✨ Automated Text Data Extraction and Form Filling System

This project introduces an innovative solution for automating text data extraction and form filling, aiming to streamline data processing in the digital age. Leveraging a combination of OCR, natural language processing, and rule-based approaches, it offers an efficient way to extract information from unstructured text and populate forms accurately, saving time and reducing errors.

πŸ” Project Highlights:

Project Preview

View project on Github


✨ Explainable Machine Learning - Understand the Black-Box

Interpretable Machine Learning (ML) is a critical aspect of advancing the use of machine learning in various fields. Many black box models hinder ML’s adoption due to their lack of transparency and interpretability. The Jupyter Notebook in this repository includes the following sections:

Python Scikit-learn Lime Shap

View project on Github


πŸ“˜ MLOPs


✨ Salary Prediction Application

This application predicts the salary of software engineers based on key pieces of information. It features two sections: a prediction page for salary prediction and an exploration page for EDA insights from the dataset. The predictions are generated using an XGBoost model, while the web app is built on Streamlit framework. To ensure the reproducibility, virtual environments are utilized on local hosts and contained by Docker. This app is deployed on GCP as well. A video guide on how to use the application is also available.

View code on Github


πŸ“˜ Data Analysis and Business Intelligence


✨ Walmart Ecommerce Dashboard Project

This project showcases the creation of an interactive Ecommerce Dashboard for Walmart using Power BI. The goal was to analyze and visualize key performance indicators (KPIs) to gain insights into sales, revenue, customer behavior, and more. The project followed a structured approach, encompassing defining KPIs, working with raw data in SQL Server for efficient manipulation, building SQL queries for validation, connecting Power BI for visualization, and utilizing Power Query for data cleaning. By incorporating calculated measures and time intelligence functions, the dashboard provides a comprehensive overview of Walmart’s ecommerce operations. The project follows a standard pipeline in BI and DA, starting from database to transformation and visualization.

Power BI SQL Server Power Query

View code on Github


✨ CO2 Emission Data Visualization Dashboard

This interactive dashboard project empowers users to explore and visualize carbon dioxide (CO2) emissions data from Our World in Data. It leverages cutting-edge Python libraries, including Panel, Hvplot, and GeoPandas, to create an intuitive and informative platform for analyzing CO2 emissions worldwide. The dashboard enables users to filter emissions data by year and country, compare emissions trends through scatterplots, and visualize geographical variations on a map. It serves as a valuable tool for gaining insights into the primary driver of global climate change and fostering data-driven discussions around emissions reduction.

View code on Github



πŸ’»πŸ’»πŸ’» Welcome to my portfolio πŸ’»πŸ’»πŸ’»

Hello! My name is Kiel, and I set up this page to showcase some of the data science projects I’ve been working on.

Data science practitioner with robust business acumen, boasting 2 professional working years of hands-on experience in data preprocessing and predictive modeling. Expert in SQL, Python, and R, with a track record of success in applying machine learning and statistical analysis to solve complex problems.

My πŸ“‹ CV has plenty of information about the professional projects I’ve worked on, but the purpose of this page is to showcase some of my favourite personal (on-the-side) projects in a more visual way.

If you have any questions, feel free to drop me an πŸ“§ email or send me a message on 🌐 LinkedIn.

Thanks for reading,

Kiel Dang