This project aims to speed quantitative decision making of steel tempering conditions decision making through predictive modeling, ultimately aiding in the automation of heat treatment workflow in engineering applications. Using primarily polars and scikit-learn I revise my UC Berkeley Data Analytics captone project into the engineering tool descriped in the proposal and above. To aid in this I use vega-altair for quick visualizations and seaborn for statistical visualizations. The next step for this project is to use FastAPI to build a funcitonal interface for the tool that does not require runing the notebook.
Tempering of steel is an important process in engineering. It ensures that the steel is homogeneous and has the mechanical properties desired of the final product. Where annealing is a similar heat treatement aimed at making a metal as soft as possible for continued working, tempering is a heat treatment aimed at reaching a target final hardness. This target hardness is indicatative of a set of other mechanical properties, but hardness is used as the metric because it can easily be non-destructively tested in a manufacturing setting for quality control.
Broadly, tempering is one of several heat treatments that can transform a certain grade of steel to be better suited for specific applications. It may also help undo internal stresses accumulated through shaping during manufacturing.
There are two notebooks in this repository.
-
The *-main.ipynb was turned in for the capstone project of the UC Berkeley Data Analytics boot camp. I had 3 weeks to find a dataset, submit a business case proposal, then complete a notebook that showed at least 2 models of machine learning, one of which with > 0.75 accuracy. This notebook has 4 distinct models, and explores many optimization tweaks to these models, but the approach in general was not appropriate for the problem.
-
The *-revised.ipynb notebook is currently in progress (Dec 2, '24). It is a reconstruction of the project based on lessons learned from completing the capstone project. The end goal of the revised notebook is to fully implement a scalable version of the business idea presented in the proposal. That is, build a pipeline, model, and application that can accept steel compositions and a desired final hardness, and output the appropriate tempering conditions, in a user friendly manner.
For the Tempering time and temperatature classification problem, I attempted several simpler, models, and settled on Extra Trees Classifier as a baseline. It severely overfit the data, with 90% training combined accuracy and 25% combined testing accuracy. The neural network classifier performed better with respect to overfitting, but the combined accuracy did not break 50%.
Moving to a regression models to predict the final hardness as a continuous measure, gradient boosting regression had a R-squared of 0.93. Using a neural network for the same prediction resulted in similar MSE and R2. Plotting the GBR model predicted vs actual showed a linear relationship with some noise, indicating the more complex NN model was uncessessary.
I provide a set of next steps to undertake in the revised project, both in how to build such an tempering process predition application from the available data, and what data audmentation and processing can improve a direct prefiction model. Both extensions of the project are outside the scope of the capstone project, and will be accomplished later as an extension of learning.
- Extra Trees Classifier
- Deep Neural Network Classifier
- Gradient Boosting Regressor
- Neural Network Regressor
For classification models the metrics accuracy and loss were used.
For regression models the metrics Mean Squared Error and loss were used.
Spend more time in exploratory data analysis. The issues that arose in the classification model were evident from the data initially, however I did not have the experiecne and expertise to identify that. This whole project was a huge learning experience, from using polars, a dataframe tool I had never used before, to building a mutli-output model, which I had also never done.
I initially gravitated towards PCA as a tool to reduce multicollinearity, as there was a substantial amount in my data, but found it eliminated the explainability of the model, so am learning additional ways to handle that issue, as I hope to go into chemical applications of machine learning.
Numnber of lines of code was cut in half. Excessive data augmentation and processing was removed. EDA pipeline readability and thoroughness was improved. Model preprocessing done with a pipeline object to allow scaling. Bugs resulting from erronious preprocessing resolved.
Function developed that takes in a trained model and steel input, with desired hardness, and outputs a tempering temperature for a given amount of tempering time to reach a target hardness. The model scores 0.98 which is very good and better than found in the original project.
TODO: add an API and localhost for the github page (Jan 22, '25)
Raiipa Technologies Kaggle dataset
The Effects of Alloying Elements on Steels
Bringas, J. E. (2002). Handbook of Comparative World Steel Standards. ASTM International.
Nathan Sheibley
UC Berkeley-Ext Data Analytics Boot Camp Final Project - Solo