One tool for test automation for every service, application, and platform. aiTest Launching Soon - Secure Your FREE Spot (Limited to the First 100 Signups)! | Join us on Tuesday, 25th August 2023, for an insightful webinar on 'Enhance the efficiency of Cloud monitoring using LogicMonitor' and optimize your cloud operations like never before!

Create Email Salutation Identification ML model and Run it in AWS Lambda

7_v1

Why?

There are many tools out there which need to feed the email body to some ML model for some analysis or for further processing. But to them the first line which is in 40% of cases a salutation or a greeting line is a noise.

Let's get started...!

We will input an email body to the model and it should return True if the first line of the email body is greeting else it should return False

Dataset

For 7Targets, we wanted to create a model that would remove the salutation line for further processing of emails. To create an ML model you need a dataset on which the model is trained. For our case, we needed a set of emails as a dataset. In our case, we used the emails we have. We manually classified the emails and marked them if they contained greetings.

email-salutation

Preprocessing

To create feature vectors for the email, we first had to preprocess the data. Then removed the empty lines from every email and then selected the first line for further processing. For the sake of simplicity, we will always check if the first line of the email is a salutation or not.

Feature Vector Generation

Now from the first line of the email that I got from the above step, we had to generate the feature vectors which are then fed to the model for training. We then wrote the code to convert the first line of email into the 3 features described as below:

Model Training

We first split the data into sets of training and testing data with a ratio of 3:1 and chose Random Forest Classifier as the classifier for the project. Trained the model with training data and then tested the model and got accuracy to be 97%. The accuracy was higher than I had expected. Finally, we saved the model into a pickle file using joblib for later use.

How did we use the model?

We had to run real time inference of the data on AWS Lambda, because we use serverless technology a lot. We first created the AWS Lambda Layer from the scikit learn library and used the AWS prebuilt Numpy layer. Wrote the inference code and pushed the pickle file of the model into the Lambda. It worked like a charm and we are now using it in production.

Summing up!

In this article, I have demonstrated the simplest workflow required to develop a machine-learning model used to identify salutations in email.

Though, there are a few more improvements that would be required in this model. Like, if there is no newline between the salutation and the email body then the model would not work well. Which we will discuss at a later time.

Hit me up on my email if you have any questions.

More To Explore