|
Naveenchandra
2012-02-15, 11:34
Ramprakash Ramamoorthy
2012-02-15, 13:20
Naveenchandra
2012-02-15, 13:43
Ted Dunning
2012-02-15, 15:02
Lance Norskog
2012-02-16, 03:45
Naveenchandra
2012-02-16, 03:54
Naveenchandra
2012-02-16, 05:41
Lance Norskog
2012-02-16, 06:23
Naveenchandra
2012-02-16, 06:52
Lance Norskog
2012-02-17, 00:15
Naveenchandra
2012-02-17, 03:58
Sreejith S
2012-02-17, 04:14
Naveenchandra
2012-02-17, 05:07
Stuart Smith
2012-02-18, 06:54
Naveenchandra
2012-02-21, 13:56
Stuart Smith
2012-02-22, 22:55
Naveenchandra
2012-02-24, 07:35
Naveenchandra
2012-02-24, 13:08
Naveenchandra
2012-02-24, 13:09
Naveenchandra
2012-02-24, 13:10
Ted Dunning
2012-02-24, 19:25
Stuart Smith
2012-02-24, 21:18
Naveenchandra
2012-02-27, 11:57
Ted Dunning
2012-02-27, 17:39
Naveenchandra
2012-02-28, 10:58
Ted Dunning
2012-02-28, 14:27
Naveenchandra
2012-02-29, 07:01
|
-
Naive-Bayes work flowNaveenchandra 2012-02-15, 11:34
Dear all, I am a newbie to mahout, currently working on naive bayes classification. I ran the classification for the popular 20 newsgroups example and got quite good result as gave in the mahout in action book. But I tried to run the classifier on my own data set having 2 files each containing 20k records i am getting a efficiency around 50%, please help me. I would be grateful if you explain me how the actual process is taking place Thanks a ton -Naveenchandra
-
Re: Naive-Bayes work flowRamprakash Ramamoorthy 2012-02-15, 13:20
On Wed, Feb 15, 2012 at 5:04 PM, Naveenchandra <[EMAIL PROTECTED]>wrote:
> > Dear all, > > I am a newbie to mahout, currently working on naive bayes classification. > I ran the classification for the popular 20 newsgroups example and got > quite > good result as gave in the mahout in action book. > > But I tried to run the classifier on my own data set having 2 files each > containing 20k records i am getting a efficiency around 50%, please help > me. > > I would be grateful if you explain me how the actual process is taking > place > > Thanks a ton > > -Naveenchandra > > Is it efficiency or accuracy? I mean the 50% you had mentioned... -- With Thanks and Regards, Ramprakash Ramamoorthy, +91 9626975420
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-15, 13:43
Hi, Thanks for taking your time and replying, its efficiency actually when i the tested the classifier with sequential as the method i got that. I wanted know what effect does the setting of ng makes on the result, whether increasing the ng increases the efficiency or it decreases ? Just now i trained and tested the naive bayes classifier on a input data set containing 2 csv files of around 30k records each, having 2 target variables set. I got the following confusion matrix with ng set to 3 Confusion Matrix ------------------------------------------------------- a b <--Classified as 32151 1 | 32152 a = target_a 32095 57 | 32152 b = target_b The efficiency is 50% please help me to improve it.
-
Re: Naive-Bayes work flowTed Dunning 2012-02-15, 15:02
Efficiency is not normally a term used with classifiers. Can you define it?
>From you confusion matrix, it looks like nearly all of your documents are being classified into one class. That usually indicates that there is some fundamental formatting difference between your original training data and your test data. Without being able to see any of your data or the output of your training run, it is impossible to say more. On Wed, Feb 15, 2012 at 8:43 AM, Naveenchandra <[EMAIL PROTECTED]>wrote: > > Hi, > > Thanks for taking your time and replying, its efficiency actually when i > the > tested the classifier with sequential as the method i got that. > > I wanted know what effect does the setting of ng makes on the result, > whether > increasing the ng increases the efficiency or it decreases ? > > Just now i trained and tested the naive bayes classifier on a input data > set > containing 2 csv files of around 30k records each, having 2 target > variables > set. I got the following confusion matrix with ng set to 3 > > > Confusion Matrix > ------------------------------------------------------- > a b <--Classified as > 32151 1 | 32152 a = target_a > 32095 57 | 32152 b = target_b > > The efficiency is 50% please help me to improve it. > > > > >
-
Re: Naive-Bayes work flowLance Norskog 2012-02-16, 03:45
NGrams also has this effect in the ASF emails example.
On Wed, Feb 15, 2012 at 7:02 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Efficiency is not normally a term used with classifiers. Can you define it? > > From you confusion matrix, it looks like nearly all of your documents are > being classified into one class. That usually indicates that there is some > fundamental formatting difference between your original training data and > your test data. > > Without being able to see any of your data or the output of your training > run, it is impossible to say more. > > On Wed, Feb 15, 2012 at 8:43 AM, Naveenchandra <[EMAIL PROTECTED]>wrote: > >> >> Hi, >> >> Thanks for taking your time and replying, its efficiency actually when i >> the >> tested the classifier with sequential as the method i got that. >> >> I wanted know what effect does the setting of ng makes on the result, >> whether >> increasing the ng increases the efficiency or it decreases ? >> >> Just now i trained and tested the naive bayes classifier on a input data >> set >> containing 2 csv files of around 30k records each, having 2 target >> variables >> set. I got the following confusion matrix with ng set to 3 >> >> >> Confusion Matrix >> ------------------------------------------------------- >> a b <--Classified as >> 32151 1 | 32152 a = target_a >> 32095 57 | 32152 b = target_b >> >> The efficiency is 50% please help me to improve it. >> >> >> >> >> -- Lance Norskog [EMAIL PROTECTED]
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-16, 03:54
Hi Ted, By the term efficiency i only mean the percentage of correctly classified instances that's it. And i am providing the same data-set for training as well as for testing, it is in the format of ( target_variable'\t'data ) you want me to mail you the data-set ? I have a comma separated data-set so i just want to know if i could specify any desired column as target-variable instead of the 1st column followed by tab ? -Thank you Naveenchandra
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-16, 05:41
Hi all, Can any one explain me how and where the input files are being read, now i am going through the source code of bayes classifier, i know that we are creating a job which calls the class to read that input file. May i know which java file is reading the records from the input csv files ?
-
Re: Naive-Bayes work flowLance Norskog 2012-02-16, 06:23
The file examples/bin/asf-examples.sh shows how to use the Naive Bayes
classifier. There are a few stages needed to prepare files before they get to the classifier training and test passes. If writing your own code, I would use the Apache Commons CSV parser: http://commons.apache.org/sandbox/csv/ This is pulled into the Mahout integration/ sub-project, and is used by the 'CSVVectorIterator'. If you need a Hadoop file reader for CSV, you would create a new one from scratch. On Wed, Feb 15, 2012 at 9:41 PM, Naveenchandra <[EMAIL PROTECTED]> wrote: > > Hi all, > > Can any one explain me how and where the input files are being read, now i am > going through the source code of bayes classifier, i know that we are creating a > job which calls the class to read that input file. > > May i know which java file is reading the records from the input csv files ? > > > > > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-16, 06:52
Hi, Lance you might have gone through the confusion matrix which i posted, can you rectify the problem why the target_b is so poorly classified ? My input as i said earlier are 2 csv files in which each line of record is in the format of "target_varible<tab>text"
-
Re: Naive-Bayes work flowLance Norskog 2012-02-17, 00:15
I do not know why they were classified poorly. It would be really
helpful if the classifiers logged every item and the confidence level for the classification choice. This would let you examine your data. On Wed, Feb 15, 2012 at 10:52 PM, Naveenchandra <[EMAIL PROTECTED]> wrote: > > Hi, > > Lance you might have gone through the confusion matrix which i posted, can you > rectify the problem why the target_b is so poorly classified ? > > My input as i said earlier are 2 csv files in which each line of record is in > the format of "target_varible<tab>text" > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-17, 03:58
Please can you explain me exactly in which format my input data set should be.. I assume it is target_variable<tab>data(comma separated values) Also please explain me how naive bayes classifier works.. It is calculating TfIdf, weights, ThetaNormalization.. How these are calculated and how the target variables of test data are compared with that of the trained data ? I think these are too silly questions to be asked but i want to learn from the scratch. Starting from the inputs taken from command line, how and where csv files are read, in which format they get stored, how they are processed, how does the classifier is being trained, what's the testing mechanism ? Thanks a ton
-
Re: Naive-Bayes work flowSreejith S 2012-02-17, 04:14
Hi Naveenchandra,
Some of your questions are already addressed in the mailing list.So please take a look at the archives.:) And it is not advisable in any mailing list to post questions with out have a basic search in the past archives. Thank You, On Fri, Feb 17, 2012 at 9:28 AM, Naveenchandra <[EMAIL PROTECTED]>wrote: > > Please can you explain me exactly in which format my input data set > should be.. > I assume it is target_variable<tab>data(comma separated values) > > Also please explain me how naive bayes classifier works.. > > It is calculating TfIdf, weights, ThetaNormalization.. How these are > calculated > and how the target variables of test data are compared with that of the > trained > data ? > > I think these are too silly questions to be asked but i want to learn > from the > scratch. Starting from the inputs taken from command line, how and where > csv > files are read, in which format they get stored, how they are processed, > how > does the classifier is being trained, what's the testing mechanism ? > > Thanks a ton > > -- *Sreejith.S* http://srijiths.wordpress.com/ * *http://sreejiths.emurse.com/ tweet2sree@twitter <http://tweet2Sree>
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-17, 05:07
Hi, Thanks for your advice sreejith, but my problem is that i want to know exactly where my csv file is read for training the classifier, do we have any java class as csvfilereader which reads each line of the input csv file, i assume the class org.apache.hadoop.mapred.keyvaluetextinputformat is called to perform. But when i checked that java file i did not find any function to read the csv file. Please help me
-
Re: Naive-Bayes work flowStuart Smith 2012-02-18, 06:54
Nave,
It's: Target value (tab) _space_ separated values Not comma separated values. That is, if you are using trainclassifier and not nbtrain, or whatever the vector based one is... which it sounds like you are. Don't worry about asking dumb questions.. AFAICT mahout documentation is a mess right now.. yes,yes I should do my part to help & not complain, just sayin' For all the theory, google for "tackling poor assumptions naive bayes" (without the quotes) Take care, -stu P.S. sorry I'm on a phone, so I just truncated your name...
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-21, 13:56
Hi, Thanks stuart for the info, but can you tell me in which format the input file is being stored and exactly which process is reading the input files.. As i have analyzed the process of training the classifier goes in this way : calling TrainClassifier class and passing params to it, then BayesDriver is being called and params are passed to it, it calls bayesFeatureDriver and passes params to it, in bayesFeatureDriver a job is being created and the inputPath is passed to fileInputFormat.. From here i am unable to get how further process is running, I just want to know which class is reading all the input files and storing in which format ? params contain : inputPath,outputPath,type of classifier,nGrams value etc...
-
Re: Naive-Bayes work flowStuart Smith 2012-02-22, 22:55
Hello Naveenchandra,
The input files format is what I said it was (A clearer explanation is below, if it helps). What problem did you have using that format? Clearer explanation: You need to do a funky directory layout to get it to work. If you have two classes you want to train on: purple yellow And each object has 4 attributes (say rgba color levels), and you want stuff classified as red or green, you should make the following directories on hdfs: hdfs://user/naveen/bayes/input/train hdfs://user/naveen/bayes/input/test Put two files in each directory: hdfs://user/naveen/bayes/input/train/purple.tsv hdfs://user/naveen/bayes/input/train/yellow.tsv hdfs://user/naveen/bayes/input/test/purple.tsv hdfs://user/naveen/bayes/input/test/yellow.tsv Each files is formatted like so: purple.tsv: purple[tab][R level][space][G level][space][B level][space][A level space] purple[tab][R level][space][G level][space][B level][space][A level space] purple[tab][R level][space][G level][space][B level][space][A level space]... yellow.tsv: yellow[tab][R level][space][G level][space][B level][space][A level space] yellow[tab][R level][space][G level][space][B level][space][A level space] yellow[tab][R level][space][G level][space][B level][space][A level space] ... put 10% of your examples in test, 90% in train. If you can't figure out how to do that, just duplicate your data in the test and train directories, and worry about splitting it up later. Get it to work first. Now train your classifier by running: mahout trainclassifer -i /user/naveen/bayes/input/train -o /user/naveen/bayes/color-model -type bayes -source hdfs -ng 1 Wait. If you see any errors, post the COMPLETE OUTPUT to the mahout list along with the COMPLETE COMMAND you used to run it and the directory layout of your input. Once that works do mahout testclassifier -d /user/naveen/bayes/input/test -m /user/naveen/bayes/color-model -type bytes -source hdfs -ng 1 You should get some M/R stuff, then a nice grid of results (which will probably suck initially, until you curate your feature set). If you get all zeros in the grid, something went wrong. Email the list with your COMPLETE command & output and COMPLETE directory layout as before. As far as how stuff is stored internally.... it's really not relevant if you can't get everything to work, but I believe it's sparse vectors (mahout vectors, not vanilla java api vectors). If you want more helpful replies, please post complete output and errors. Otherwise I'll just keep reposting this response in reply ;) You seem like the kind of engineer that likes to have all their ducks in a row first... sometimes you just gotta get your hands dirty and start trying. But, if it helps, see: http://www.manning.com/ingersoll/ingersoll_meapch1.pdf Take care, -stu ________________________________ From: Naveenchandra <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, February 21, 2012 5:56 AM Subject: Re: Naive-Bayes work flow Hi, Thanks stuart for the info, but can you tell me in which format the input file is being stored and exactly which process is reading the input files.. As i have analyzed the process of training the classifier goes in this way : calling TrainClassifier class and passing params to it, then BayesDriver is being called and params are passed to it, it calls bayesFeatureDriver and passes params to it, in bayesFeatureDriver a job is being created and the inputPath is passed to fileInputFormat.. From here i am unable to get how further process is running, I just want to know which class is reading all the input files and storing in which format ? params contain : inputPath,outputPath,type of classifier,nGrams value etc...
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-24, 07:35
Hi Stuart, I performed the classification as said by you, and got the following confusion matrix Confusion Matrix ------------------------------------------------------- a b <--Classified as 59 41 | 100 a = purple 58 42 | 100 b = yellow Is the result acceptable ? A look at input data set purple.csv purple 3 2 0 5 purple 2 5 5 4 purple 1 1 1 1 .... yellow.csv purple 3 2 0 5 purple 2 5 5 4 purple 1 1 1 1 ....
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-24, 13:08
The python code which used is :
import random f = open("/home/hadoop/yellow.tsv", "w") for i in range(0,1000): print >> f, "yellow\t",random.randint(0,5),random.randint(0,5), random.randint(0,5),random.randint(0,5) same for purple.tsv also, i copied 1st 100 records from tsv files to use as test data
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-24, 13:09
than i ran the classifier by using the following commands bin/mahout trainclassifier -d naveen/bayes/input/train/ -m naveen/bayes/color-model -type bayes -source hdfs -ng 1 bin/mahout testclassifier -d naveen/bayes/input/test/ -m naveen/bayes/color-model -type bayes -source hdfs -ng 1
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-24, 13:10
and please can you explain me how TfidfMapper.java works ?
-
Re: Naive-Bayes work flowTed Dunning 2012-02-24, 19:25
If your synthetic data comes from the se distribution for yellow and purple then clearly no classifier will help.
Also naive bayes wants words not numbers. Sent from my iPhone On Feb 24, 2012, at 5:08 AM, Naveenchandra <[EMAIL PROTECTED]> wrote: > The python code which used is : > import random > f = open("/home/hadoop/yellow.tsv", "w") > for i in range(0,1000): > print >> f, > "yellow\t",random.randint(0,5),random.randint(0,5), > random.randint(0,5),random.randint(0,5) > > same for purple.tsv also, > i copied 1st 100 records from tsv files to use as test data > >
-
Re: Naive-Bayes work flowStuart Smith 2012-02-24, 21:18
Ted, > Also naive bayes wants words not numbers. Doh. Yeah, I should have given a better example. Naveen, Glad you finally got it working! Yes, doing Yellow/Green RGBA was not intended to give you good results - just a good example to get things going. So, yes, your results look appropriate for what you're trying to do. Now for TfIDF, look at: (1) http://en.wikipedia.org/wiki/Tf*idf (2) Once you have the concept down, take another look at the code. (3) It probably will still not make sense, but will make more sense. (4) Look at the math here: https://cwiki.apache.org/MAHOUT/bayesian.html (5) Go back to (1) :) And randomly insert "look at news group example shell script", and reviewing Machine learning theory into your steps. For a review of the theory, I recommend: http://academicearth.org/courses/machine-learning This course was simply awesome. Watch the video, read the course notes. It will do wonders - or your money back ;) Andrew Ng was the first person I've seen explain machine learning with any sort of clarity. And I tried to understand it many times before... I really did.... And that was via a video were I couldn't interact with him. So, yah, a good source of information that. For a review of Naive Bayes specifically, try: http://see.stanford.edu/materials/aimlcs229/cs229-notes2.pdf If you're still interested in the theory after that, I'd start on Christopher Bishop's book. It doesn't have the clarity of Ng's lectures, but it has been nice so far (still making my way through it). If it makes you feel better, it took me a while to sort out the format & just understand Mahout's Directory names, so I could understand where the code _was_, let alone starting deciphering the code. Time and patience. Take care, -stu ________________________________ From: Ted Dunning <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, February 24, 2012 11:25 AM Subject: Re: Naive-Bayes work flow If your synthetic data comes from the se distribution for yellow and purple then clearly no classifier will help. Also naive bayes wants words not numbers. Sent from my iPhone On Feb 24, 2012, at 5:08 AM, Naveenchandra <[EMAIL PROTECTED]> wrote: > The python code which used is : > import random > f = open("/home/hadoop/yellow.tsv", "w") > for i in range(0,1000): > print >> f, > "yellow\t",random.randint(0,5),random.randint(0,5), > random.randint(0,5),random.randint(0,5) > > same for purple.tsv also, > i copied 1st 100 records from tsv files to use as test data > >
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-27, 11:57
Hi guys,
Thanks alot for your regular replies and support, i am trying to classify a data set of mine, which contains 3 csv files namely active.txt, deactive.txt and expired-i.txt, each of these are in the form of ( target_variable""\t""space_separated_values), the number of records in each files are : active.txt : 4789, deactive.txt : 5199, expired-i.txt:12 i know the reason for expired-i.txt to classify completely as other class is because of less number of records compared to them, but i did not get why this drastic result with active.txt. I am posting a sample records of each of the files and also confusion matrix. deactive.txt deactive 10034317 region3 district15 28 package4 active active deactive deactive deactive deactive 10090713 region7 district17 23 package1 deactive deactive deactive deactive deactive deactive 10094740 region7 district23 24 package4 active deactive deactive deactive deactive deactive 10032155 region10 district56 24 package3 active deactive deactive deactive deactive deactive 10029994 region2 district51 34 package3 active active active active deactive deactive 10048464 region10 district56 38 package2 deactive active active deactive deactive active.txt active 10062351 region2 district32 34 package3 active active active active deactive active 10068051 region10 district56 97 package1 active active active active active active 10096942 region10 district56 14 package3 active active active active active active 10072680 region10 district56 91 package2 active active active active active active 10087947 region11 district39 6 package1 active active active active active expired-i.txt expired-i 10060808 region8 district29 34 package3 expired-i expired-i expired-i expired-i expired-i expired-i 10012966 region4 district50 40 package3 deactive deactive deactive expired-i expired-i expired-i 10080110 region7 district17 16 package1 expired-i expired-i expired-i expired-i expired-i expired-i 10083495 region7 district17 65 package2 deactive deactive deactive deactive deactive expired-i 10035336 region6 district48 18 package1 expired-i expired-i expired-i expired-i expired-i confusion matrix : a b c <--Classified as 0 1 11 | 12 a = expired-i 0 3577 1622 | 5199 b = deactive 0 4338 451 | 4789 c = active thank you,
-
Re: Naive-Bayes work flowTed Dunning 2012-02-27, 17:39
This is a tiny dataset. Have you considered just trying R? In fact in terms of just diagnosing the problem it would be good to run a regression in R first.
Sent from my iPhone On Feb 27, 2012, at 3:57 AM, Naveenchandra <[EMAIL PROTECTED]> wrote: > Hi guys, > Thanks alot for your regular replies and support, > i am trying to classify a data set of mine, > which contains 3 csv files namely active.txt, deactive.txt and > expired-i.txt, each of these are in the form of ( > target_variable""\t""space_separated_values), > the number of records in each files are : > > active.txt : 4789, > deactive.txt : 5199, > expired-i.txt:12 > > i know the reason for expired-i.txt to classify > completely as other class is because of less number > of records compared to them, but i did not get why this > drastic result with active.txt. > > I am posting a sample records of each of the files and also confusion matrix. > > deactive.txt > > deactive 10034317 region3 district15 28 > package4 active active deactive deactive deactive > deactive 10090713 region7 district17 23 > package1 deactive deactive deactive deactive deactive > deactive 10094740 region7 district23 24 > package4 active deactive deactive deactive deactive > deactive 10032155 region10 district56 24 > package3 active deactive deactive deactive deactive > deactive 10029994 region2 district51 34 > package3 active active active active deactive > deactive 10048464 region10 district56 38 > package2 deactive active active deactive deactive > > > active.txt > > active 10062351 region2 district32 34 > package3 active active active active deactive > active 10068051 region10 district56 97 > package1 active active active active active > active 10096942 region10 district56 14 > package3 active active active active active > active 10072680 region10 district56 91 > package2 active active active active active > active 10087947 region11 district39 6 > package1 active active active active active > > expired-i.txt > > expired-i 10060808 region8 district29 34 > package3 expired-i expired-i expired-i expired-i expired-i > expired-i 10012966 region4 district50 40 > package3 deactive deactive deactive expired-i expired-i > expired-i 10080110 region7 district17 16 > package1 expired-i expired-i expired-i expired-i expired-i > expired-i 10083495 region7 district17 65 > package2 deactive deactive deactive deactive deactive > expired-i 10035336 region6 district48 18 > package1 expired-i expired-i expired-i expired-i expired-i > > confusion matrix : > > a b c <--Classified as > 0 1 11 | 12 a = expired-i > 0 3577 1622 | 5199 b = deactive > 0 4338 451 | 4789 c = active > > > thank you, >
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-28, 10:58
Hi Ted, I have not used R, you are saying the poor performance of classifier is because of the less number of records ? I have tried with 50000 records(same for train and test) but the result is same, deactive being correctly classified and most of the active to deactive, what range you records you me to give for getting correct result. Or it is because of the data ?
-
Re: Naive-Bayes work flowTed Dunning 2012-02-28, 14:27
I think that you have an invocation or format bug and you are effectively giving NB different data you think.
Note that this is what is called a stopped clock model. That means it is only getting correct results by putting out a constant value. Sent from my iPhone On Feb 28, 2012, at 2:58 AM, Naveenchandra <[EMAIL PROTECTED]> wrote: > > Hi Ted, > I have not used R, you are saying the poor performance of classifier is > because of the less number of records ? > I have tried with 50000 records(same for train and test) but the result is > same, deactive being correctly classified and most of the active to deactive, > what range you records you me to give for getting correct result. > Or it is because of the data ? >
-
Re: Naive-Bayes work flowNaveenchandra 2012-02-29, 07:01
Hi, How to solve this problem of format bug, please explain exactly which type of data to provide to get good classification results. I think the problem is because of presence of columns which contain the same value as that of target variable, i had read about target leak problem in mahout which says that if we have those columns in the predictor variables which contains the same value as that of our target variable than we will get poor results, am i getting such results because of target leak ? |