15- Feature Engineering Feature Selection 2 Univariate Selection

Hello guys. In this video we are going to ruin the first feature selection technique which is Univariate feature selection So what is Univariate feature selection? Univariate feature selection examines each feature individually to determine the strength of the relationship of the feature with the response variable so it basically selecting the best features based on univariate statistical tests. There are lot of different options for univariate selection. in this course. We’re going to use: SelectKBest which removes all but the K highest scoring features Lets first import SelectKBest library and Chi2 library which select the n_features features with the highest values for the test chi-squared statistic from our dataset. Now we will create an object of selectKbest and set the score_function to chi2 and the number of feature to select to 10 Now we will fit the object to our training data and after that apply transform to get the selected features. let us now print the count of our original features and the count of the selected ones. We see how this model selected the best 10 features out of 80 features. Now lets see which features he has chosen Lest plot the features we selected and compare them to the features we’ve ignored now finally to prove that the feature selection have positive impact on our model accuracy Let’s compare the accuracy between a model has fitted all the feature and one has fitted only the selected ones. To do this we will import RandomForestClassifier Now lets select our test features and create an object of the classifer.
We will give it esimators of 100 and random state of 42 Fitst lets fit the classifer to all the features and print the score Finally lets fit the classifer to the selected feature and print the score We notice that we achieved Hugh improvement when we used only the selected feature. Oure model accuracy has improved from 77% to almost 99% by only applying this simple feature selection technique Now, I want to mention that the accuracy of the model is not the best way to estimate how well our model is and we will learn how to perform a much better way in the next section in this course which is F1-score But we will get to that later. So that’s it for today’s video. and the next one. We will learn another technique for model selection which is feature importance. See you in the next one.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *