Every asset has a life cycle and thus requires frequent maintenance. However, we may not want to spend resources too soon as that is a waste and we cannot be too late as it is risky. Thus, “when” to repair is an important problem.
Predictive maintenance is a way to predict or forecast the probability of breakdown of a fixed asset. Predictive maintenance is important for all kinds of businesses, from a large company predicting the breakdown of motors to a small businesses predicting the breakdown of printers. It can also be used to save lives for example predict the likelihood of a factory machine breakdowns or even gas leaks.
Traditionally predictive modelling is done with feature engineering and simple regression models, however these methods are difficult to reuse. We will use a more advanced LSTM models. LSTMs have the ability to use sequences of data to make predictions on a rolling basis. The sequence of data can as small as 5 and as large as 100. For the data backend, we will use GridDB which is highly scalable and ensures high reliability. Installing GridDB is simple and is well documented here. To check out the python-GridDB client please refer to this video.
Let us setup GridDB first!
Quick setup of GridDB Python Client on Ubuntu 20.04:
- Install GridDB
Download and install the deb from here.
- Install C client
Download and install the Ubuntu from here.
- Install requirements
wget https://github.com/swig/swig/archive/refs/tags/v4.0.2.tar.gz tar xvfz v4.0.2.tar.gz cd swig-4.0.2 ./autogen.sh ./configure make
2) Install python client
wget \ https://github.com/griddb/python_client/archive/refs/tags/0.8.4.zip unzip . 0.8.4.zip
Make sure you have python-dev installed for the corresponding python version. We will use python 3.8 for this post.
3) We also need to point to the correct locations
export CPATH=$CPATH:<python header file directory path> export LIBRARY_PATH=$LIBRARY_PATH:<c client library file directory path> </c></python>
We can also use GridDB with docker as shown here
Next we install the python libraries. Installing numpy, keras, tensorflow, sklearn and pandas is a simple pip install.
pip install keras pip install numpy pip install tensorflow pip install pandas pip install sklearn
Step 1: Downloading Dataset
We use a subset of the NASA turbofan dataset that can be downloaded from this Kaggle project. The data has the unit number, times in cycles, three operational settings and 21 sensor measurements. The train/test files have cycles so far and the truth file has the total number of cycles it can run.
Step 2: Importing Libraries
import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import confusion_matrix, recall_score, precision_score from keras.models import Sequential from keras.layers import Dense, Dropout, LSTM, Activation from keras.callbacks import EarlyStopping
Step 3: Data Loading and Processing
dataset_train = pd.read_csv('/content/PM_train.txt',sep=' ',header=None).dropna(axis=1) dataset_test = pd.read_csv('/content/PM_test.txt',sep=' ',header=None).dropna(axis= 1) dataset_truth = pd.read_csv('/content/PM_truth.txt',sep=' ',header=None).dropna(axis=1)
Alternatively, we can use GridDB to get this data frame.
import griddb_python as griddb # Initialize container gridstore = factory.get_store(host= host, port=port, cluster_name=cluster_name, username=uname, password=pwd) conInfo = griddb.ContainerInfo("attrition", [["id", griddb.Type.LONG], ["cycle",griddb.Type.LONG], .... #for all 23 variables griddb.ContainerType.COLLECTION, True) cont = gridstore.put_container(conInfo) cont.create_index("id", griddb.IndexType.DEFAULT)
now we rename the columns for easy identification
features_col_name = ['os1','os2','os3','s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12','s13','s14','s15','s16','s17','s18','s19','s20','s21'] col_names = ['id','cycletime'] + features_col_name dataset_train.columns = col_names #renaming columns dataset_test.columns=col_names dataset_train.columns=col_names
We do the same for the truth file.
Next we generate the labels. We want to predict failure in the next 15 days. The data is structured such that that last cycle run is the point of failure. However in the test set the last datapoint is not present and that is available in the truth dataset. So, first we take the total cycles run so far, add the cycles left from the truth dataset to get the total time of failure. Finally we subtract the total time left with current time to get time to failure.
#get cycles left for train dataset_train['ttf'] = dataset_train.groupby(['id'])['cycletime'].transform(max) - dataset_train['cycletime'] # generate column max for test data rul = dataset_test.groupby('id')['cycletime'].max().reset_index() dataset_test['ttf'] = dataset_train.groupby(['id'])['cycletime'].transform(max) - dataset_train['cycletime'] dataset_truth['rtf'] = dataset_truth['rul'] + rul['cycletime'] dataset_test = dataset_test.merge(pm_truth , on=['id'],how='left') dataset_test['ttf'] = dataset_test['rtf'] - dataset_test['cycletime'] dataset_test.drop('rtf', axis=1, inplace=True)
Next we assign labels based on prediction period
period=15 dataset_train['label'] = dataset_train['ttf'].apply(lambda x: 1 if x <= period else 0) dataset_test['label'] = dataset_test['ttf'].apply(lambda x: 1 if x <= period else 0) dataset_train.head()
Next, we scale the data as LSTM requires data to be scaled
sc=MinMaxScaler() dataset_train[features_col_name]=sc.fit_transform(dataset_train[features_col_name]) dataset_test[features_col_name]=sc.transform(dataset_test[features_col_name])
Next, we choose how many datapoints to use for LSTM. we can use 50 predictions. For this, we group the training data in groups of 50.
def gen_sequence(id_df, seq_length, seq_cols): df_zeros=pd.DataFrame(np.zeros((seq_length-1, id_df.shape)),columns=id_df.columns) id_df=df_zeros.append(id_df,ignore_index=True) data_array = id_df[seq_cols].values num_elements = data_array.shape la= for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)): la.append(data_array[start:stop, :]) return np.array(la) #generate train data X_train=np.concatenate(list(list(gen_sequence(dataset_train[dataset_train['id']==id], seq_length, seq_cols)) for id in dataset_train['id'].unique())) y_train=np.concatenate(list(list(gen_sequence(dataset_train[dataset_train['id']==id], seq_length,['label'])) for id in dataset_train['id'].unique())).max(axis =1) # generate test data X_test=np.concatenate(list(list(gen_sequence(dataset_test[dataset_test['id']==id], seq_length, seq_cols)) for id in dataset_test['id'].unique())) print(X_test.shape) y_test=np.concatenate(list(list(gen_sequence(dataset_test[dataset_test['id']==id], seq_length, ['label'])) for id in dataset_test['id'].unique())).max(axis =1) print(y_test.shape)
Step 4: Prediction
Next, we start the prediction process.
We first create an LSTM model in Keras. for that we use the
nb_features =X_train.shape timestamp=seq_length model = Sequential() model.add(LSTM( input_shape=(timestamp, nb_features), units=100, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM( units=seq_length, return_sequences=False)) model.add(Dropout(0.2)) model.add(Dense(units=1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary()
Next, we compile the model. We use the mean_squared_error as the loss and evaluate it on accuracy.
# fit the network model.fit(X_train, y_train, epochs=10, batch_size=200, validation_split=0.05, verbose=1, callbacks = [EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto')])
Next, we train the model for 100 epochs.
history=model.fit(X_train, y_train, epochs=100, batch_size=1, verbose=1, shuffle=False)
Epoch 1/10 98/98 [==============================] - 28s 250ms/step - loss: 0.1596 - accuracy: 0.9394 - val_loss: 0.0611 - val_accuracy: 0.9679
Evaluation and Predictions
Finally, we evaluate the test set, then we rescale the predictions and plot it along with the ground truth.
y_prob=model.predict(X_test) y_classes = y_prob.argmax(axis=-1) print('Accuracy of model on test data: ',accuracy_score(y_test,y_classes))
Accuracy of model on test data: 0.9744536780547861
We can also calculate the probability of failure for every machine as follows:
machine_id = 1 machine_df=df_test[df_test.id==machine_id] machine_test=gen_sequence(machine_df,seq_length,seq_cols) m_pred=model.predict(machine_test) failure_prob=list(m_pred[-1]*100)
failure prob is 0.15824139
Now we can play around with the prediction period, the interval for LSTM and the number of varaibles used to even get better results.
In this post we learned how to train an LSTM predictive maintenance model with Keras, python and GridDB. We can get a predictive accuracy of ~97% with a few lines of code.