Build A Complete Project In Machine Learning | Credit Card Fraud Detection | Eduonix



hello and welcome to this machine learning tutorial presented by edu onyx my name is Brendan and I will be your instructor for this project in this lesson we are actually going to be doing credit-card fraud detection using several methods of anomaly detection from the SK learn package we are going to be using a local outlier factor to calculate anomaly scores as well as an isolation force algorithm these two algorithms will comb through our data set of almost 280,000 credit card transactions and predict which ones are fraudulent here is an example data point and the results that we can expect to achieve with the networks that we are going to a train in our data set we have 30 parameters we have the time and the amount of the transaction as well as 28 other features that are result of a pca dimensionality reduction in order to protect the identity and sensitive information involved with these credit card transactions here is our trained isolation for forest algorithm and these are results that you can expect to get as you see here this class of 1/1 means that this was a fraudulent transaction whereas in our data set 0 would mean it's a valid this network is going to predict negative 1 for outlier or a value of 1 for an in liar and as we can see it correctly predicts that this is an outlier or a fraudulent transaction so some Python knowledge is recommended however I will be taking you through each and every individual step involved so there is no worries about getting lost or falling behind I hope that you can learn some valuable information from this tutorial where we'll be using Jupiter notebooks to develop this Python application it has some valuable lessons about pre-processing data sets as well as the deployment of multiple anomaly detection algorithms being our local outlier factor and our isolation forest algorithm I hope you continue to listen and enjoy the project hello and welcome to this tutorial on machine learning by edu onyx in this project we are going to be performing credit card fraud detection by using several different anomaly detection methods so I'm really excited to dive in because this is such an incredible application of anomaly detection or outlier detection and is an interesting part of unsupervised machine learning so let's go ahead and dive right in for this project we are going to be using a data set that is hosted on Kaggle comm so this is the credit card fraud detection data set it's got an open source database license so it's free to us to use unfortunately it's behind a you you got to get an account on Cagle to go ahead and download this I'm hoping that we can host it on the edu onyx website and you can get it along with the project however it's just a simple sign up and provide an email and then you'll be able to access this so I've already gone ahead and did that and I downloaded this file it's just a simple CSV file so you see it's here in my tutorial folder so that I will be able to access that there so go ahead and pause the video and whether it's from the edu onyx web site or from this cable web site right here create an account and download that CSV file it's going to come as a zip and extract it and place it in whichever folder that you are going to be completing this project in okay so once that's completed we can go ahead and just like the the previous projects we are going to be using Conda to launch of Jupiter notebooks and programming our project in Python through Jupiter notebooks because that is a great way to make it transferable and work cross-platform so I'm using Windows so if you are on a Windows machine you'll be able to follow along exactly however if you are on Linux or a Mac OS you may have to change a few things too to launch your Jupiter notebook appropriately but for windows go ahead and just type Jupiter notebook and that will open up the notebook in your web browser as per usual we are going to be using numpy pandas matplotlib and SK learn so if you don't have those packages already installed go ahead and do Conda install whatever the package name is however if you have been following along with the projects that we have been completing so far then you will likely have all of these installed already but we'll go ahead and do an import check to make sure that we're all running the same versions of these difference and here I was in the wrong folder so I'm going to go into my tutorial folder remember wherever you type this Jupiter notebook opens up the folder that you are in currently in your terminal okay so now that I'm here in my tutorial folder I'm gonna go ahead and open up a new Python notebook or Jupiter notebook and we can name this something appropriate credit card fraud detection because that is what we are going to be doing today okay so let's start and we'll do generic imports of all the libraries that we need and we'll also print off the versions so that we know we are using the same ones as always we'll do an import sis to get our python version we're going to be using numpy pandas again import mat plot lib import and we're going to need Seabourn once more to do a correlation matrix will also need sie pi okay so let's go ahead and print those off and we'll do a nice little format here format is a great way to substitute variables into strings in Python so whenever I have to substitute a variable into a string I often go or I usually always go with this dot format method and you can do multiple variables it all just depends what you need I'm gonna control C and copy this a couple of times to speed up this process here so we have numpy let's make this one pandas and we'll switch this here we are going to need oh and I accidentally clicked enter matplotlib will get this sorted out Seabourn and then the last one side PI I was actually going to wait to import the SK learn until we were doing the machine learning steps but let's go ahead and do that now actually ok oh and I'm gonna have an error here there we go so here are the packages that I am using Python 2.7 as always and then these packages and versions here so make sure you have up-to-date versions of these packages and everything should run appropriately alright but now let's move on because we don't need the over all packages we need some more specific packages so we'll dive into that what I mean by this is we're gonna use numpy and shorten things down we'll use it as MP similarly we'll do PD for pandas we need pipe lot for our graphical interface I will use it as PL T and then we'll import Seabourn as SMS so go ahead and click shift enter here and those are all good and so if you have hit an error in one of these steps and you probably don't have one of these packages installed or modules installed you can go back to your terminal or open up a new terminal since the jupiter notebook is running in this one and type kondeh install and then whatever package you are missing as long as Kannada is in your path variables that will download and install the appropriate module that you need and then you should be able to come back here and successfully import it but as long as that's all going and up and running we are ready to move on so let's load the data set I'm gonna use pandas for this so if you were successful and you got the data set downloaded and it is in your same folder as your Jupiter notebook file is now then all it's gonna take to get this into our our notebook is a read CSV function credit card dot CSV and that's gonna pull this in as a panda's data frame so let's go ahead and click shift enter and make sure this works and we're thinking because this is a pretty big CSV file but we were good we were successful as soon as this number pops up you know you're good to go and it looks like it has it so let's start start exploring the data set a little bit so we know what we're working with the first thing I like to do is a data dot columns so we know what is included in the data set ok so it looks like we have about 31 different columns going from time all the way to amount in class so here we have v1 through VN these are actually the result of a pca dimensionality reduction that was used in order to protect sensitive information in this data set for example we don't want to expose the identity of the individual who made the credit-card transaction we also don't want to expose stuff like location but this is a time in a separation from previous transactions and amounts and then a class our class here is going to be a 0 4 it's a valid normal credit card transaction and then 1 is going to mean a fraudulent transaction but let's go ahead and learn a little bit more about that so let's let's do a print data shape so we know what's here ok so we have two hundred and eighty four thousand credit card transactions with 31 columns all of this information for each transaction let's do a print data dot describe and we need Oh lost our print function here get the correct parenthesis and this will give us useful information about each column so this gives us the mean man max etc and count for each each column so you see here the count is the same for each of these parameters so that means that we're not missing any data so that's that's helpful this is formatted weird it's not showing us the middle V's it that's what these dot dots mean here the class though is the important part so we have two hundred eighty-four thousand of these we have a max of 1 that's the fraud and we have a min of 0 that's a standard if you see this mean this looks like it's really close to 0 so that must mean we have way more valid transactions than we do fraudulent transactions so we're going to have to account for that as we go on one thing I'm also going to do since this is such a large data set in order to save on time and and computational requirements I am going to sample only a fraction of this data set so instead of using all 280,000 I'm gonna do data equals data sample fraction equals 0.1 so I'm only going to take 10% of the data and just so that we have the same data we're gonna define a random state here and then I'm gonna do print data shape just to make sure that worked okay so now I said 284,000 we have 28,000 for 181 still have all of our columns but this is a much more reasonably sized data set to work with now if we used all 284 thousand of these we'd probably get better results however for the sake of requirements I'm going to go ahead and cut this down right now so that we we have more manageable data to work with so let's keep exploring our data set we've imported our CSV file we've looked at each of the columns and the column names and the distributions another way to do that visually is going to be a plot a histogram of each parameter so to do that really easy with the pandas data frame we're going to do dot data dot hist I'm going to define a figure size here to make it a little more pretty and then we need a pill T dot show since this is going to be a PI plot works hand-in-hand with matplotlib and pandas and it go ahead and click Shemp denture there we should be able to pull this up after it gets done thinking okay here we go so here is all of the histograms and so this looks like most of our V's are clustered right around 0 with some fairly large outliers or maybe no outliers in regards to this one the interesting thing here is again we see that we have very few 1 values or fraudulent transactions in comparison to our valid transactions so that's kind of surprising although it makes sense since this is real-world data but let's go ahead and actually calculate the number of fraudulent cases that we have and the number of valid cases that we have so we can get an outlier fraction that is gonna go into our future methods our anomaly detection methods so to do that we're going to just start with a comment here so fraud is going to be equal to data and we can index this by saying jizz data and class and bracket where class is equal to one so that is a fraudulent case and will have a valid is going to be equal to the data we can index this with the class again however this time we want equals to zero these are going to be the valid transactions okay and we're using this to calculate a a percentage of fraudulent cases two valid cases so we can do an outlier fraction is going to be equal to the length of fraud divided by now we got to be careful here if we don't use a float here it's going to round us to an integer which would round us to zero since this is going to be really low so what we need to do to carry our decimal points is type float to turn this into a float which is the type of data and then our length valid so this is going to be the number of fraudulent cases divided by the number of valid cases with decimal points so let's print this out and I'm also going to do print out the number of fraud cases and we'll make this nice we'll use another format because it's always good practice and we'll just do length fraud here okay and we'll also do a print valid cases dot format length valid okay so we'll go ahead and click shift enter here so we actually see that we only have a about 0.17% of our databases fraudulent cases and that's a total of forty nine fraudulent cases and twenty eight thousand valid cases so we have a huge disparity here between the fraudulent cases and the valid transactions so this is going to be harder to predict but it it makes it a it's important to carry all this information through because if you have an over-representation of fraudulent cases in the data set that you're using to train your algorithms you're going to end up predicting more fraud than there actually is so it's important that we consider this underlying percentage of fraudulent cases this outlier fraction before going into actually building our networks one more thing that we should do is build a correlation matrix to see if there's any strong correlations between different variables in our data set this is going to tell us whether or not we need to remove things like in previous projects it was the ID of our datasets we don't have an ID here it's also going to tell us if there are strong linear relationships which we could then use different different linear methods to predict these it's also going to show us which features are important for the overall classification so here is our correlation matrix and the panas data frame which is the type our data set is in right now makes this really easy with the dot correlation function here and then we just simply do a PI plot figure so we'll have peel T dot figure and fig size to make this nicer again and we're also going to use the Seabourn here so this is one we're going to do the SMS heat map and this is going to turn our correlation matrix into a very nice visual display that is easy to read and peel T dot show once again and shift enter to pull this up and here we go so here's our correlation matrix with the heat map so if you see we have a lot of values right here really close to zero so there's not strong there isn't strong relationships between our different V parameters the v1 through VN t8 most of them are fairly unrelated to the others however what we do care about is the class so we see there's some variations in the relationships between the different parameters and our class here so the lighter ones are going to be a positive correlation whereas negative would be a strong negative correlation so we see maybe V 17 and V 11 the 11 stronger positive whereas V 17 would be a stronger negative correlation there isn't a strong correlation between amount and whether or not it was fraudulent or time and whether or not it was fraudulent that could be some interesting information to take away we don't see many one-to-one correlations here in in our correlation matrix so that's good we don't need to pull out any of our columns before we dive into doing our actual machine learning but before we actually get started with that we need to format our data set slightly so let's go ahead and do that so we need to get all of our columns from the data frame to do that we're going to do columns equals data columns dot to a list so that will generate a list of the columns and then we want to filter the columns to remove data we do not once so in this case there's only going to be one that we're pulling out and that is going to be our class it would be very easy to predict which ones were fraudulent if we told our network which ones are fraudulent so this is unsupervised learning since it's nominally detection so we don't want the labels to be fed to our our networks ahead of time so we want to store the variable we're predicting on as well and that's going to be the target here and the target once again is going to be class that's what we are trying to predict so we can have our X data and what we're going to do is the data and this is going to be the columns which we pulled the class out of and we can have Y is equal to data and this is going to be the target so only the class so let's go ahead and print the shapes so that we know what we are working with and if everything worked correctly we'll be able to see that here okay go ahead and click shift enter here so indeed it did work we now have 30 columns in our X which was everything except our class label and then our Y is a one dimensional array that has the class labels for all 28,000 samples from our data set so that's exactly what we wanted and everything is set up now to go ahead and start building our networks we're going to use an isolation forest algorithm well as a local outlier factor algorithm to try to do anomaly detection on this data set however that is going to be in the next video I'm gonna stop this one here since we have successfully imported pre-processed and explored our data set we have put ourselves in a position to pick correct machine learning algorithms and methods and produce results that are going to be meaningful and accurate so keep following on in the second video and I hope everything went well up until this point thank you very much hello and welcome back to this tutorial on credit card fraud detection using multiple anomaly detection methods programmed in Python through Jupiter notebooks so in the last video we successfully built and set up our data set which included two hundred and eighty four thousand data points but we went ahead and down sampled that a little bit so we would have a more reasonable data set to work with and that fraction I did was about 10% so we have 28,000 credit card transactions a total of 49 anomaly cases and 28,000 432 valid credit card transactions and we went ahead and separated separated that into an X which is the data with all of the columns that we are interested in and then a Y which is the target column or class 0 being a valid transaction and a 1 being a fraudulent credit card transaction so now we are in a position to go ahead and move on to actually fitting this data and trying to predict which ones are outliers so as per usual we need to import the packages that we are going to be using and these are going to come from SK learn so the first ones are metrics that we are going to use to determine how successful we are in our outlier detection so we'll do an accuracy score and a classification report and I made a spelling error here so let me go ahead and correct that and then from and in some sample method we are going to import the isolation forests and we'll talk about what and how these these methods work in a sec we're also going to use the local outlier factor so these are two common anomaly detection methods from the SK learn package and these are commonly used you might also see things like a support vector machine used for outlier detection that takes a little bit longer to train in this case so with 28,000 data points actually computing the support vectors to build a machine like that would take a while so this is why I'm only doing the isolation force and the local outlier factor okay so let's let's talk about these methods for a sec before we get started the local outlier factor is an unsupervised outlier detection method and this goes ahead and it calculates the anomaly score of each sample and we call it the local outlier factors so it measures the local deviation of density of a given sample with respect to its neighbors it is local in that anomaly score depends on how isolated the object is with respect to the surrounding neighborhood so we're talking about neighbors here and this is going to be determined in the same way as the K nearest neighbors method so we are actually doing something very similar however we're calculating an anomaly score based off those neighbors the isolation forest algorithm is a little bit different it's going to return the anomaly score of each sample using this isolation forest method so it does that and it isolates the observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature so we're gonna have all of the different columns here could be considered a feature since recursive partitioning can be represented by a tree structure the number of splitting is required to isolate a sample is equivalent to the path length from the root node to the terminating node so if you're like me that doesn't meet make a lot of sense to you however it's more easily understood if we go into what exactly that is that means so this path length averaged over a force of such these random trees is a measure of normality and our decision function so random partitioning produces noticeably shorter paths for anomalies hence when a forest of random trees collectively produce shorter path links for a particular sample they are highly likely to be anomalies so this is a combination of the random forest algorithms and it's doing that to isolate points which have these shorter path links or are more likely to be anomalies so if you want to read more about these methods I encourage you to do so however for now we'll go ahead and just dive into program programming them up in Python and then seeing what type of results they produce and machine learning it's really important that you understand the algorithms that you are using because you need to pick the algorithm that is going to be the most successful a lot of the times it's a good idea to compare multiple methods for example work we're comparing these two in this tutorial however even narrowing it down to two different machine learning methods is going to require a lot of foresight so it's just something to consider so to start I'm gonna define a random state so that we're all on the same page here and then I'm going to define the outlier detection methods so this is where we're actually going to import these these different algorithms so I'm going to put this into a dictionary of classifiers so what I'm gonna have is I'm gonna have the first one in isolation forest and this is simply type isolation forest with no space and then we have to define a couple parameters so the first one we're going to have is the max samples and we are just going to put this as the length of X so the max samples is the total number of samples that we have it's going to have a contamination which is the number of outliers we think there are and we know this it's actually the outlier fraction that we calculated earlier when we were going through our data set so we went ahead and we calculated that up here simply the number of fraudulent cases over the number of valid credit card transactions here okay we need one more and that is going to be the random state that we defined above so we will just say equals state all right so that's the first one but we also have the local outlier factor it's a local outlier factor and this has a couple parameters as well so the number of neighbors to consider this is what goes into the K nearest neighbors method that it uses we're just going to set this as twenty twenty is kind of a default or standard the higher the percentage of outliers in your in your data set the higher you're going to want to make this number so I might even try to play around with this a little bit and lower this down and see if it produces different different results and once again we need the contamination and this is once again equal to our outlier fraction let me get some spaces there we go so that's our our dictionary of our different classifiers let's go ahead and do a shift enter here to make sure that everything imports correctly and that we're defining these methods correctly as well and it looks like we are so yes it successfully went ahead and got that so we are good to go to move on so let's actually fit the model so this is where the fun begins but first we are going to have to we're gonna have to define a variable called the number of outliers and this is going to be equal to the length of fraud this will be referenced a couple times so we're just going to go ahead and define this ahead of time and what we're gonna do is we're going to do a for loop through the two different classifiers that we defined above so we can say 4i and what we're we're gonna do here is we're going to enumerate our our list of classifiers so that we can cycle through them and this is dictionary so we can repin index it in this way with the dot items and it you know it's a dictionary because we use we use these brackets here okay so we need a colon here to round out our for loop and so we can go ahead and start to finding things now so we're gonna have to do a two different steps here actually because it's a little bit different if we're using the local outlier factor versus versus our isolation for us so what I'm going to do is I'm going to do if CLF name is local outlier factor then I'm going to do the following that is going to be our what we're going to do a Y prediction here and so for the local outlier factor you can do this convenient function which is called the fit predict off of X so it is going to fit the X data which was and then which was all of our columns without the class and then it's also going to predict the labels for those values so let's get a the scores here scores predictions this is going to be C lf- outlier factor so that'll come in handy later and so we'll move on here so other if it's not the local outlier factor in other words it's our isolation forests we are going to do CL f dot bit off of X so we have just the dot fit function here and then we're going to do scores predicted is going to be the CLF dot decision function that it generates off of X and then finally Y predict is going to be the CLF dot predict X okay so that should give us everything that we need it's doing two different fitting methods depending on whether or not it's the local outlier factor it's going to do a fit predict and it's just doing a normal prick fit and a later predict if we're using our isolation forests okay so there is one thing we have to do though before we can move on the results these Y prediction values that we're going to get back are going to give us a negative 1 for an outlier and a 1 for an in liar so that is useful information but we need to process it a little bit before we can compare it to our class labels because if you remember our class class labels are either 0 for valid and 1 for fraudulent so that's why we have to change this a little bit so we want 0 for valid 1 for fraud so we can do that by simply pulling up our Y predictions and then indexing them based off the Y prediction is equal to 1 and then what we can do is we can reassign this to 0 so this is taking all of our in liars and classifying them as 0 or a valid credit card transaction we're also going to do Y predict index by our Y predict equals negative 1 so our outliers here the ones that we think don't belong with the rest of the transactions we are going to classify these as are fraudulent fraudulent transactions ok so that should be good to go there what we want to do now is we want to calculate the number of hours or the number of errors so what we need to do is a comparison to Y if you remember Y is our target so Y is the values of fraudulent or value or valid 0 or 1 for individual case so what we can do is we can do the number of errors here and this is going to be equal to our Y prediction when it does not equal Y and what we get we're gonna do is we're gonna go ahead and sum these so that will give us a total number of errors finally what we want to do here is we want to run the classification metrics because this is going to tell us much more useful information so what we want to print out here is going to be this first part so we want to give us a name and the number of errors here so to do that we're going to do a dot format will have a CLF name and the number of errors okay so that should print out the name for us all right and then we want the accuracy score here so this is one of the metrics that we imported above score and we're going to do a why the values that we want and our Y predictions what we what we think they are all right and then we're also going to print out a classification report and we'll see very quickly why this classification report is going to be important so once again comparing Y to what we predict our Y so it looks like everything is good to go we initialize our classifiers up above let's go ahead and do a shift enter here and see if this is indeed going to work so we're thinking and here comes our first one so we have a local outlier factor here and we had 97 total errors so relatively high but if you see we were ninety nine point six five nine percent accurate so very accurate but you'll see if you look down here at our precision recall and f1 score you'll see that we're not quite as good as we think so what this is saying is for class zero we had a precision of a hundred percent but for class one we only had zero point zero two so that actually is not very good so that means that we have very few actual fraudulent cases that are getting labeled as fraudulent cases I think we only got about one of them correct here so precision counts for false positives which we were assigning a bunch of the valid credit card transactions as as positive as well and the recall accounts for false negatives and so not good with false positives or false negatives at all and the f1 score is a combination of those however our isolation for us was a little bit better so we had ninety nine percent ninety nine point seven five percent accuracy but we had a precision of thirty percent so that's a lot better than zero point two but still we're only correctly identifying about 30 percent of our actual fraudulent cases so we do have a lot of false positives as well so we're going to be frustrating our customers because we're always calling I'm asking did you make this transaction but we have a the recall of a one percentage point better so not good as well so we had a couple of false negatives we should have classified a few as fraudulent when we did no such thing however our f1 scored better for the isolation forests then local local outlier factor so essentially this data set was sufficiently complex our random forest method or random force based method was able to produce better results so thirty percent of the time we are going to detect the fraudulent transaction so as long as the criminals in our case make four transactions statistically we will find them every time so there are better methods out there than this since we do have labels you can use things such as neural networks however this is an interesting data set to explore with anomaly detection again we could probably improve our results if we went back and we took a larger sample of our of our data we have two hundred and eighty four thousand cases here however that is going to be computationally expensive as it is it already took probably twenty seconds to do the isolation forest method here so if you can imagine increasing that semi exponentially you're going to really increase the computational requirements of this program so that is why I cut it down well thank you for following along I hope that you were able to learn a lot of new information in this tutorial we went over importing a CSV data set pre-processing that data set as well as exploring the data and describing the data so that we knew what we were working with and we knew what information we had we did histograms to see if there was any unusual parameters we went ahead and we separated into fraudulent and valid determining the outlier fraction which was very low less than one percent here 0.17% actually we did a correlation matrix to show which parameters were important for our class and we saw some variations here so we we knew that we'd be able to make decent per day we separated these based off our off of our target parameter which was the class and then we use two different methods the isolation forests and the local outlier factor to do anomaly detection on this data set and this was also a valuable lesson in the importance of understanding your data and understanding precision and recall but because we showed that we had a ninety nine point six five percent accuracy but we were only accurate because there are so many more valid cases than fraudulent cases so we were very bad at actually detecting the fraudulent cases with our local outlier factor however with the isolation forests we achieved almost 30 percent in detection of those outliers or those fraudulent cases which still isn't great but it is a significant step in the right direction once again thank you for following along I hope you enjoy the future projects thank you very much you

29 Comments

  1. Eduonix Learning Solutions said:

    Check out another project in Credit Card fraud detection project on – https://youtu.be/9cXeEwOXWU8

    June 30, 2019
    Reply
  2. Tawhidur Rahman Bhuiyan said:

    How can I make solution of this problem in this project??

    ZeroDivisionErrorTraceback (most recent call last)
    <ipython-input-11-9d4aa9daab59> in <module>()
    4 Valid = data[data['Class'] == 0]
    5
    —-> 6 outlier_fraction = len(Fraud)/float(len(Valid))
    7 print(outlier_fraction)
    8

    ZeroDivisionError: float division by zero

    June 30, 2019
    Reply
  3. Anuran Aich said:

    Hello Sir, its a very good example. Please let me know how can we deploy this in the current world? Now you are showing the result from Jupyter Note book, but if you want to give it to someone then how you will provide the same?

    June 30, 2019
    Reply
  4. Manjusri Samanta said:

    What is the error? how can i resolve it?
    AttributeError Traceback (most recent call last)

    <ipython-input-7-b0eb7d78ca44> in <module>

    1 import sys

    2 import numpy

    —-> 3 import pandas

    4 import matplotlib

    5 import seaborn

    c:usersmanjusriappdatalocalprogramspythonpython36libsite-packagespandas__init__.py in <module>

    38

    39 # let init-time option registration happen

    —> 40 import pandas.core.config_init

    41

    42 from pandas.core.api import *

    c:usersmanjusriappdatalocalprogramspythonpython36libsite-packagespandascoreconfig_init.py in <module>

    10

    11 """

    —> 12 import pandas.core.config as cf

    13 from pandas.core.config import (

    14 is_bool, is_callable, is_instance_factory, is_int, is_one_of_factory,

    c:usersmanjusriappdatalocalprogramspythonpython36libsite-packagespandascoreconfig.py in <module>

    54 import warnings

    55

    —> 56 import pandas.compat as compat

    57 from pandas.compat import lmap, map, u

    58

    AttributeError: module 'pandas' has no attribute 'compat'

    June 30, 2019
    Reply
  5. Ejaz Ahmed said:

    Will you plz Send the documentation of this project please????????????

    June 30, 2019
    Reply
  6. chaitanya vardhan said:

    i am working under this could anyone pls hel[p me

    June 30, 2019
    Reply
  7. Victor Velea said:

    why would u use python 2.7 lol

    June 30, 2019
    Reply
  8. mustafa aydın said:

    I can't understand the result. It is hard to predict, thats ok. But how ? The video does not give such a result for this. Instead of all these work, If we apply just a simple decision tree algorithm to the entire data set, it gives 0.99 accuracy as well. Some how, It is needed to be reduced the fraud cases to analyse the data instead.

    June 30, 2019
    Reply
  9. Bhim Nation said:

    nice;sir how do i get the next video

    June 30, 2019
    Reply
  10. Pragya Mittal said:

    can you please provide with the dataset

    June 30, 2019
    Reply
  11. chindanuru manogna said:

    sir can u please tell the code for detection of virus or malware

    June 30, 2019
    Reply
  12. john dicks said:

    hey there . I wanted to ask that I was receiving these errors, can you help me?

    c:usersmujappdatalocalprogramspythonpython37libsite-packagessklearnensembleiforest.py:223: FutureWarning: behaviour="old" is deprecated and will be removed in version 0.22. Please use behaviour="new", which makes the decision_function change to match other anomaly detection algorithm API.

    FutureWarning)

    —————————————————————————

    AttributeError Traceback (most recent call last)

    <ipython-input-36-f489c68bb402> in <module>

    7 clf.fit(X);

    8 scores_pred=clf.decision_function(X);

    —-> 9 y_pred=clf.fit.predict(X);

    10

    11

    AttributeError: 'function' object has no attribute 'predict'

    June 30, 2019
    Reply
  13. Gilded Protagonist said:

    I have just completed this project😁 feel free to check .
    https://github.com/supriem/Data_Analysis_Projects/blob/master/CREDIT%20CARD%20FARUD%20DETECTION/Credit%20CARD%20FRAUD%20DETECTION.ipynb

    Ps: thank you euduonix for this

    June 30, 2019
    Reply
  14. Shivam Bhirud said:

    I am a beginner so I have few doubts: Don't we ever have to hard code any algorithm on our own? Can we use sklearn always and is it acceptable while working on actual industrial projects?

    June 30, 2019
    Reply
  15. Aditya Sharma said:

    this is good that you are sharing your knowledge.But do you think in actual scenario the data would be so easy. There would be so many categorical variable need to handle.The example you are showing here is just a classification problem.

    June 30, 2019
    Reply
  16. Habib Hanifa said:

    please help me that i want to develop fraud management system project so how i can do that

    June 30, 2019
    Reply
  17. vam si said:

    is it top down or bottom up approach?

    June 30, 2019
    Reply
  18. Naveen Kumar Dasari said:

    how we can increase the accuracy of classifier ..

    June 30, 2019
    Reply
  19. Debasish Mahana said:

    need dataset of creditcard.csv …can you provide the link for same

    June 30, 2019
    Reply
  20. Paul McGuire said:

    at 18:18 my code just prints ( ) as the result of determining the parameters can anyone help?

    June 30, 2019
    Reply
  21. artificial intelligence said:

    Helpful Video
    https://itscienceprojects.com/machine-learning-projects/

    June 30, 2019
    Reply
  22. PANKAJ KUMAR said:

    what are the parameters from v1 to v28

    June 30, 2019
    Reply
  23. Abhishek Dubey said:

    is this spending behavior credit card fraud detection ?

    June 30, 2019
    Reply
  24. Ravi Kumar said:

    Sir, I got an AttributeError regarding decision_function
    Can you resolve this problem

    June 30, 2019
    Reply
  25. naina bhatt said:

    I m getting error in scores_pred = clg.decision_function(x)
    Attribute error please help me

    June 30, 2019
    Reply
  26. Nayeem sunny said:

    Sir can you please provide me link to dataset.
    my mail : [email protected]

    June 30, 2019
    Reply
  27. Akhil Chaitanya said:

    Can u exaplian the output and also exaplian what are the values (v1-v28)

    June 30, 2019
    Reply
  28. Max Qin said:

    good job man.

    June 30, 2019
    Reply
  29. Yang Lyu said:

    Very good starting tutorial for credit card anomaly detection project. Would love to see ways to improve on current models uses, Isolation Forest and Local Outlier Factor. Could you improve these two models using hyperparameter tuning? Also, the default cutoff is 0.5 which yields 0.29 recall, which in my opinion would be too risky for credit card companies. What is the average industry-wide cutoff and what factors to consider when choosing such cutoffs? Thank you!

    June 30, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *