White Wine Quality Prediction using PyTorch

Ameya Shahu
4 min readJun 27, 2020

--

Photo by Matthieu Joannon on Unsplash

This is my first blog as a report for the final project of Deep Learning with PyTorch (Zero to GANs) course conducted by FreeCodeCamp and Jovian.ml .

For this project, I used Wine Quality data set. Data set included red wine as well as white wine samples in two separate CSV file. My project is specifically for white wine hence, I use white wine data set.

Data set consist of 12 attributes. Each sample has “quality” that range from 0 to 10 and it is only the target variable. The quality of wine determined by 11 attributes: Fixed acidity, Volatile acidity, Citric acid, Residual sugar, Chlorides, Free sulfur dioxide, Total sulfur dioxide, Density, pH, Sulfates, Alcohol. These 11 attributes are input variables.

The objective of this project was to predict the quality of white wine for given input variables using different deep learning algorithms.

  1. I started with importing libraries. I was using Kaggle to run notebook which has all the required libraries installed except jovian package. jovian package and jovian.ml are used to manage version of the notebook also it helps to compare outputs. The jovian package can be installed using the command pip install jovian --upgrade -q .

2. In the next step, I downloaded data using a utility function download_url()in PyTorch and explore it.

I imported the CSV file as a pandas data frame. Generally, values in CSV separated by a comma but in this file values are separated by semicolon i.e. delimiter is semicolon due to this data was not imported properly in the program. This issue was solved using the passing value of an argument delimiter in pd.read_csv() function.

I got a statistical description of the data set using describe() method.

the statistical description clearly shows that there is no missing data in data set as count of each row is 4898. Also, other information was obtained from the statistical description.

To analyze the relationship between the various input variable and the quality of wine I generated a correlation matrix and used the standard pearson method.

I made the following observation using this matrix —

  • None of the features has a linear relationship with the quality of the wine.
  • alcohol, sulphates and pH value have positive ( if one quantity increases another will also increase) relationship with the quality of the wine.
  • Other features have a negative trend with the quality of the wine.
  • Amount of alcohol shows a positive strong relationship with quality of wine among all the features.
  • citric acid, free sulfur dioxide, sulphates have a weak relationship with the quality of the wine.

3. After data exploration, data need to be prepared for model training.

I initiated the data preparation phase by defining a variable to hold the number of rows and columns in data set followed by creating lists for input and output variable separately.

PyTorch deals with PyTorch tensors. But data I was processing on was a pandas data frame. So, there are many methods to pandas data frame to tensors. I converted the data set from the pandas data frame to the NumPy array and finally to PyTorch tensor.

further, I combined inputs and targets into a dataset for model training.

Deep learning model training divided into three-step-

  1. Training- Actual training model
  2. validation- tuning of parameters
  3. Testing- testing of a trained model

For these three steps, different data set need to use, hence I divided data set into three parts as a training data set, validation data set and testing data set.

torch.manual_seed() generate the same dataset every time the cell is run.

It is expensive to train model in one batch consist of the whole data set hence dataset is divided into batches.

I committed changes after every important step to jovian.ml

4. I moved to model defining stage as soon as data preprocessing is over.

I used the template code provided by Instructor and founder of jovian.ml Akash N S to define the model class.

I tried linear regression before using feedforward neural network. But the result is better with a feedforward neural network.

I used a neural network with 11 as the size of the input layer, a hidden layer with the size of 64 and one output. For loss calculation, Mean squared error method is used.

4. I defined evaluate() and fit() to evaluate and fit dataset into the model. To apply gradient descent algorithm SGD optimizer is used.

Then, I created an object of class WineModel and pass it to fit() function to fit data set to model.

Loss Vs Epochs

After training of the model, different parameters are logged into jovian.ml for compare model in future.

In the end, I defined a function def predict_single() to get a prediction from the trained model also evaluated the model on the test data set.

Quality of wine is in integer but the prediction was in float so I round off prediction.

References-

  1. https://jovian.ml/ameya-shahu/white-wine-quality-prediction
  2. https://jovian.ml/ameya-shahu/02-insurance-linear-regression
  3. https://archive.ics.uci.edu/ml/datasets/wine+quality

Database citation-

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547–553, 2009.

This is my first blog. Thanks for reading this long..!!

I thank Instructor Akash N S, Jovian.ml and FreeCodeCamp for this course. Special thanks to Akash N S for motivating to write this blog.

If you have any feedback or there is some issue in this blog , kindly share with me.

Thank you….!

--

--

Ameya Shahu
Ameya Shahu

Written by Ameya Shahu

Graduate Computer Science student at Arizona State University

No responses yet