12- Feature Engineering Manual Feature Engineering


Hello guys. Today we’re going to talk about one of the most important feature engineering step. Which is feature generation, which is divided into two main parts: Manual feature generation and automatic feature generation. So let’s start with manual feature generation. manual feature generation relies on domain knowledge, intuition and data manipulation. This can be grouped into transformation and aggregation. Today we will see how to generate new feature using both methods. So let’s start by importing the libraries we need. We need feturetools library to import a raw dataset from it and apply manual feature engineering on it. In this video and in the next video we will use it to apply automatic feature engineering using feature-tools We will also import numpy and pandas Now lets import our data set: We see that this dataset has many tables: customers, products, sessions and transaction Now lets separate these tables so we can deal with them in ease Let’s take a look on each table We see that the transactions table have 5 features: transaction_id, session_id, tranacstion_date product I.D. and amount. The session table have 4 features: session_id, customer_id, device and session_ Finally the customer table have the features: customer_id, zip_code, join_date and date_of_birth Now lets talk about the first feature generation type which is: transformation. Transformations act on a single table by creating new features out of one or more existing columns. We’re going to create new features from our customer table. These features include: joined day, joined month and joined year The initial table only have 4 features: customer id, Zip code, joined date and Date of birth, from which we’ve been able to create more features The second type of manual feature engineering is: aggregation. Aggregations are performed across tables and use a one-to-many relationship to group observations and then calculate the statistics. In our case, each customer has many sessions and each session has many transactions so first lets merge our sessions and transactions tables on the session_id values Now we can see how each transaction has merged with its information from the session table And then we calculate the mean, maximum, and minimum amounts for each customer. We will group the table by customer id values and then apply aggregation that “amount” feature. Now let’s give the new columns or the new features a name and by this. We also created three new features that mean, max and min amount for each customized transaction. Finally, we will merge the results with customers table to get our new dataset with new features. We have managed to manually create 6 new features. join-day, join-month, join-year, mean-transaction-amount Max-transaction-amount and min-transaction-amount. So that’s it for today’s video and see you in the next one.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *