-Re: Beginner's Question: What is a feature?
Em 2011-05-22, 17:32
thank you for your answer.
I got no data, I just try to understand and learn more about Mahout,
since I am a beginner in machine-learning.
Mahout in Action says that there are typically four types of features:
categorical, word-like, text-like and continous.
So, let's say I got a descriptional-text of 100-200 words (text-like).
Does this mean that I got one feature (the description) or does it mean
that I got 100-200 features (the words)?
The OnlineLogisticRegression-class requires me to tell it how many
categories are there and how many features I like to provide.
My question now is, if I got a categorical- and a text-like feature, do
I have to tell the class that I am going to add two features?
What happens, if I encode 20 different features into the vector but
missconfigured the algorithm in a way that I told there were only 10
features? I miss a little bit some formula or something like that for
the algorithms that are part of mahout. This would make understanding
the different parameters more easy, I think.
That's what I ment.
Hopefully my explanation is better now?
Am 22.05.2011 18:15, schrieb Jeremy Lewi:
> Typically in machine learning a feature vector is just a vector of
> numbers which describes the data.
> For example, if you are trying to classify images, the features might be
> a vector of pixel intensities. Or you could process the image to extract
> higher level features. For example, you might compute some basic
> statistics of the pixel intensities for each image (e.g, the mean, max,
> min, etc...) and then use those summary statistics as the features for
> each image.
> So in your case if you use key and value as the features then you have a
> 2-d feature vector.
> Can you describe your data a little more?
> On Sun, 2011-05-22 at 05:56 -0700, Em wrote:
>> Hi list,
>> I just read Mahout in Action and I tried to understand the chapter about
>> classifying data.
>> While I am reimplementing one of the examples from the book, I get really
>> confused and a little bit disappointed about the assumptions the author
>> makes about the reader.
>> There are some lines of code where you can see a variable is in use but you
>> never saw where and how it was defined.
>> So far, my question is:
>> When using an OnlineLogisticRegression-Algorithm, what is ment by "feature"?
>> Let's say I got a bunch of data in a csv-format.
>> There are the following columns I want to consider for classification:
>> "Key", "Value" - does it mean I got two features?
>> View this message in context: http://lucene.472066.n3.nabble.com/Beginner-s-Question-What-is-a-feature-tp2971745p2971745.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.