Applications of Predictive Analytics in Legal | Litigation Analytics, Data Mining & AI | Great Lakes

analytics communication change management strategy processes cross-cultural relationships product management cost and quality management I think that's the complete business yeah well you seen gentleman he's a senior manager technology with Thomson Reuters please give a huge round of applause for krishna mohan along with him we also have another gentleman he has over 15 years of experience 15 years of rich experience in the field of consulting technical project management and delivery he has extensively worked on customizations extensions and interface including B to be an RN and has experience in managing delivery on implementation upgrade rollout and production support projects his expertise lies in the areas of analytics problem solving troubleshooting along with hands-on technical and functional knowledge in Oracle EBS SCM audio series in gentleman he is also the technical he is the technical supervisor at arrow electronics so please put your hands together for Nitin ho Sircar thanks a lot for coming right after lunch I know you may have to hurry it up and then finished your lunch and just to get in time for this talk Nitin and I just completed a one-year program in business analytics at Great Lakes as part of the requirement for the course we had to complete a capstone project and this is the one we chose primarily because we wanted something where we can implement as much of what we learnt into the project and second thing is we wanted to be as lost as possible and we succeeded in it some predictive analytics in legal just okay now let me give you some background of why we talk about analytics and legal something which you normally don't associate easily worth it all started in about year 2002 the three people behind me they in 2002 noticed that the branch of the Supreme Court which comprises of nine justices had remained unchanged for seven years between 1995 and 2002 what they realized is that this gives a tremendous amount of data as to how these justices ruled in different cases they thought why did not we use this corpus of data and be able to build some predictive models that can predict whether the Supreme Court is going to confirm the ruling of an overcoat or are they going to reverse the ruling of a lower court that is how it all started eventually they built models like this which said okay this is what the prediction is going to be but they had to validate that against the experts in the field of legal so what the debt is the assembled team of experts legal experts from all over the country who understood the Supreme Court and the rulings and the different cases that are had in the coming year they asked these experts to predict each one of those cases and they compared those cases or those predictions against what their models predicted what you can see in there is that the model predicted with 75% accuracy whereas the experts were able to predict only with 56 percent accuracy a big big big win for analytics in legal that is something which clinched the fact that analytics has a role in the legal world so when we were trying to pick a project for our capstone project this is something that appealed to us so now we said okay what are we going to do next so for those who are not very familiar with the US court system here is a brief 30-second introduction just like in India the case of that reach the Supreme Court comes through different channels they can either come through the state Supreme Court or then what we consider as the state high courts or it can come through some of the several federal courts the one that we thought about concentrating was the US Court of Appeals for a large number of cases when the Supreme Court takes up only one percent of the cases that come in front of it as compared to 41 percent that the Supreme Court of India accepts so this court of appeals becomes the last resort for many many many of the cases so what we did was we looked at what can we do in this particular space they just give you a little further more introduction as to what this Court of Appeals is all about the u.s. is calling the United States court system there are 13 courts of appeals or as they are popularly known as the circuit courts and these are the different geographies that they cover amongst those what we decided is that okay let me let us try to define what our problem statement is which is primarily to predict what the outcome of these court of appeals is going to be the big difference being that the models that were built by Professor Katz and others was about a binomial outcome so they were they used extensively logistic regression and techniques like that what we had in front of us was that it had 11 different outcomes from the court of appeals who could we build a model that can predict the outcome of the US Supreme or the US Court of Appeals that is the challenge that was ahead of us what did we base it on without like okay let's look at the basic characteristics of the case let's look at the participants the nature of the case and then also who the judge is where we thought these are the primarily the factors which would eventually influence the outcome of a case this was a hypothesis war did it turn out to be true from here on we started hunting for data how we did it what happened to it my friend Nathan's going to talk about it for a few minutes the next few slides I will take you through how we source the data what kind of data prep we day we did before we run the actual model okay so has the codebook so we have to understand what the underlying data was what each column was how do we so we had a about 90 page of codebook that we downloaded from the South Carolina website just to understand the data okay so these two documents started was the basis is to start over steady how did the data look ok after we downloaded from the website so there was a total of 2,000 160 observations about 245 columns behind one of them was the column that we were interested in that case outcome 11 possible outcomes Krishna was talking each case or cases had 11 outcomes the dataset was a combination of integers date fields categorical binaries complex variables that's one of the complex variables in some moment what it was and how we dealt with those variables that's opponent in some time and majority of them were categorical variables very few numerical most of them were categorical and multivariate like so different levels of data which was pulled like 1 2 3 4 5 6 7 but we had to refer the code book to find out what those is 1 2 3 4 5 6 7 were and then massage the data accordingly before we built the model coded factor variables we had some few variables which were no like just 1 number 4 5 6 7 8 how do you decode that how do you decode that and make sense out of that data so that was one of the exercise that we did again before we ran the model and missing values like we had certain missing values on the data set how it with that so when we started the project we thought okay we are going to our whole study in this following four phases data prep will get that a tower will massage the data will prepare the data for the analysis then do the data prep and then run the final model okay so data prep one thing was obviously the lot of missing values we just remove those columns like certain variables across all observations so we just removed them from the study and decomposed the folk complex field so this is what I was talking about the complex variables right so we have a field which has like one two three four five like how do we how do unicode that so we have to refer back to the code book okay know the first digit means something if the first digit is 1 then second digit mean something if the second digit is this third and fourth mean something right so we had our huge algorithm we had to write just to decode this number into a meaningful categorical column which are used for the model so you know like I think we eventually ended up with some 150 combinations that we hadn't decore finally to get the data prepared ok so there are as many as 12 12 subcategories I'll take an example for no understanding suppose we had something called an opulent like the person is appellant is the person who repelling right if it is one two three four five one means he's a person no he's like a natural person whose personal right if it is too much it's a corporate if it's three means it's a bank or if it's four means it's a no maybe our tax something like knows what we had to read each of them and then decompose the whole data okay so this was one one face and changed the column names obviously by looking at the data set we didn't understand what is columns word but to build the model we needed a proper description to those columns so we just changed the names as per what we understood and changed levels from numbers which I already explained this graph which you are seeing here actually you know there were 11 levels that we talked initially right 11 outcomes of the case those never now comes if you see the five outcomes by case treatments actually constituting or 88% of the entire dataset remaining 12 percent who are distributed in the remaining 7 outcomes so what we did is we reduce the number of levels the similar case outcomes we rarely reduced it to file Evans eventually our six levels so when levels we whatever like a similar case treatments we reduce that to five or six outcomes and then finally we built the model for that okay again now I told there were two hundred forty four columns we draw up it became two hundred columns but we thought it would be nice if we can represent the data in certain categories right so these were all the columns we had in the data set and how we organized that for our understanding of the data right although like no what was the previous code which hurt the case or from which which district it was coming from which state it was coming so that constituted the case origin okay and who are the participants like who has the ability nature of the case so when there is a criminal or it was a no it was a corporate espionage or it was our tax fraud or was the immigration that constituted the nature of the case and what kind of laws were applied and who was a judge whether it is single judge or a panel of judge so if there was more than one judge typically we saw there were three judges for each case so it became a panel of judges and the outcome that was the variable that interested us so what we do is what we did is initially before we started the model we also thought let let us run just understanding what are the predictors that are important for the moral right so we just ran a quick chi-square test for independence we formulated a null hypothesis and alternate hypothesis and we just ran a chi-square test and then okay we just analyzed okay what are the significant variables and what are the non significant variables this was on a level of confidence level of 5% alpha of 0.05 what we also did is know as a part of dimensional dimensionality reduction or feature selection is there's a package available called as per root are actually not sure how many people have seen this it's a package in are actually it uses the random forest algorithm to give the list of important predictors so we ran that model also and then we just cooperated our chi-square with Bharata package to find out okay what were the significant variables so all those which are marked in green the path the model is saying that or the package is saying that there's significant variables and whatever is in red the model is saying they're not significant and whatever is in blue the model says I cannot decide you take a decision whether you want to include that in the model or not so as domain experts as no people who's running the model it becomes our call whether you want to include that variable or the predictor in the final model or not then after we ran this what will it is we reorganize the entire data like we had created various categories so based on the results from the chi-square test and biruta package we came up with a list of predictors that were interested in and we also went through the data set and found out okay which might be the other variables know which the which we feel it's may be significant may be chi-square test didn't throw that as significant but still we wanted to include that in the model okay so we included those variables and came up with these pillows like again the same case origin participants nature of the case laws applied and whether it was a single judge or the panel of judges so we organized the data in this fashion because it becomes more clear for anybody to understand how the how the entire dataset is looking this actually took almost like sixty sixty-five percent of our whole project timeline because running the model and tuning the model that was no fairly easy after we did this actually but here is what where we took the maximum number of time to decompose understand and since everything was coded as a number no we have to quote that into a categorical variable which we can understand so this is where we spent more most for time on the data prep and cleansing the data before running the actual model okay so I think I'll hand it over from here thank you so now as you understood the data was complex it was not something which was straightforward we take it was predominantly categorical it was something which we did were not familiar with now having done all the exercise so far what is the next thing that we're going to do we have to do some figure out which models are we going to apply so based on all these criteria the ones that we eventually settle down for where random forests x.g boost neural network and then of course we want to try out and Samba – yes there is a fascination like many offers that we fell into the trap also to say okay let's concentrate on the techniques it happens and we wanted to try it out experiment with it we didn't know which ones are going to work which ones are not going to work and how much we would be able to kill the model but it was definitely an exercise which we wanted to not miss out in this particular project in the interest of time I'm going to talk about neural networks and X G boost otherwise although we did all for models as part of our project so here are the things that we did we first ran the model on the entire data set as you can see can people in the back see the slides is it visible with the fonts okay so as you can see the accuracy levels were pretty low when you use the entire data set then we said okay Leah let's try to trim it because as we did with the chi square and the biruta package it indicated which ones are the significant variables and at the same time when you ran the model the neural network also says which are the significant variables as well so based on this we figured out ok let let's trim the data set with to only the significant variables and rerun it not much of an improvement we just moved from 62% to 64% now what is the next step that we're going to do all right let's say our data set when we looked at it the distribution was imbalanced they were when you look at all the different was there a good representation of all the different kinds of combinations in the data that we used when we trimmed the model we figured okay no then we went ahead balanced it with the carrot package all right we move up a little bit but not enough then we looked at the outcome the distribution of our outcome variable there was one outcome which was affirmed which was dominating over all the other outcomes this was the reason why we thought like the model was not learning what the other outcomes are going to be like therefore we said okay why not we do something like over sampling so that there is a good representation of all the different outcomes and run it against our model that's where the big change happened you can see that the accuracy moves significantly to 85% and when we'd run ran it against the test dataset it was at about 82 percent so definitely what we understood in the process was that neural networks is one of those we should be focusing on a whole lot more similarly what we did is we looked at the other three packages which happened to be the XG boost similar exercise we went through the same kind of steps we said okay yep run it against the full model let's see what it looks like eerily the results were very very very similar but it's just that extra boost when we ran it against the balanced and the over sampled data sets it gave a much higher level of accuracy we did different combinations of tuning with both neural networks and XG boost and this is what we were able to eventually settle down with want to show a little more in details as to what this when you look at the confusion matrix of these two models that we used the one that I want to bring some attention to is if you look at the MIS classification as to where it occurred right there is that we you see that a good number of what the model predicted as a firm actually happened to be reversed so that means the model is predicting exactly the opposite of what the actual outcome was similarly vacated is another one where the model thought that it is a form but the actual happened to be vacated that that's something which we observed when we look at X G boost and we looked at the MIS classification on X G boost again they were very very similar you can see that right here about 42 and 4096 of them were misclassified which should have in Reverse 96 should have been reversed but the model thought they were reformed now what it's telling us is that there probably are of just a few variables which are tilting the balance in there are which is causing this miss classification to a point that you are getting a result which is exactly the opposite of this now this is where we are focusing a little lot more at this moment like this miss classification and what we are trying to do as on some of the steps ahead is that try to boil this down to like an e equals mc-squared kind of a formula where we can very much simplify and then say okay if you can understand two or three different variables really well you can predict what the outcome of the US Court of Appeals is going to be that is something which we are working and hopefully we'll have some breakthrough in the coming days now the lessons that we learned from all these exercises is that when you use a classification model it is very very important that you have good representation of all the possible outcomes that is where the oversampling helped us in getting that balance but there's definitely always a trade-off when you do this something like this because when you are trying to interpret this and then tie it back to what exactly is happening within the model it becomes much more difficult when you have just the data that is that you received it in the first place you can't figure out okay what happened in there but when you have over sampling where data is being simulated interpretation of the outcome becomes a lot more difficult so that's the trade-off so this is some something important that we understood in the process the other thing is that there was a pattern in the most significant variables across all the different models that we used some of those were like what matters is that the circuit court began we had 13 circuit courts which we started off with which circuit court is dealing with the case is a significant variable so it can be interpreted that if you take a case to one Supreme One circuit court your chances of winning the appeal could be different where if you take it to a different Court of Appeals this comes into play especially in interstate disputes like for example the whole cavalry dispute if it had gone to a Circuit Court and if some one party had decided to appeal they could this kind of information will help them figure out which circuit court is more likely to give me a ruling in my favor the other ones are like what was the verdict of the previous quote if it happened to be not a certain ascertain mainly because the jurisdiction might be an issue the law supplied might have been an issue for deaf several different reasons the lower court would have said okay I don't know what to do with this I will leave it at that if that was the outcome of the lower court the chances of you winning the appeal in the circuit court is much higher nature of the appeal and there are different kinds of appearance as Newton mentioned so if the person happens to be a natural person not a company or a different organization who happens to have come under some issue which is related to the First Amendment which primarily deals with a right of a person to protest or to have the Liberty to speech then this Court of Appeals is more likely to take up the case and give a ruling in your favor and finally who was the initiator whether it was a plaintiff or it was another party or the person who actually lost the case in the previous court all of these are very significant factors which helps the different parties figure out whether they want to use this good go to the court of appeals or not so overall what we see is that in India's context how does this help right now there are like our 3 crore cases sitting in the court legal system in India the billions of documents on tens of billions of papers which are in the court system so data analytics can play a role in helping all of these the reluctance of the legal fraternity to adapt or up to changes or especially technology is something that is going to be a challenge in Indian context and pay from that is that if you just simply use an 8020 rule by using data analytics you may be able to resolve 80% of the cases using 20% of other cases as your reference you can also look to say the 80% of the cases actually are identical so they are probably consuming and only 20% of the cases are probably consuming 80% of the court's resources which doesn't make sense and which is the reason why we have such a huge backlog of cases by being able to cluster them or being able to classify them into certain categories and also giving information to the parties to say okay what is your chance of winning or losing a particular case the court system can definitely be much more optimized so when you look at the Indian legal system and the quick 30-minute flavor of how data analytics can come into this you see that there's a huge challenge and data analytics can it find a bigger challenge anywhere else so you as the data analytics fraternity are you ready to help thank you so how much time do have the phones okay any questions solution to in an environment so first question with regard to the chi-square yeah we had a predominantly categorical data and chi-square was just a reference point for us and then when we looked at the significant variables from all the different models that were built we compared those and then so how much of them happened to be in common so there was a validation of chi-square was just a starting point for us and similarly that's the reason why we use the biruta package and everything else as well so eventually you

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *