Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # dev - CIMapper Question


+
Jeff Eastman 2012-02-12, 01:18
+
Paritosh Ranjan 2012-02-12, 15:00
+
Sean Owen 2012-02-12, 15:27
+
Ted Dunning 2012-02-12, 16:01
+
Sean Owen 2012-02-12, 16:27
+
Jeff Eastman 2012-02-12, 16:35
+
Raphael Cendrillon 2012-02-12, 16:43
+
Jeff Eastman 2012-02-12, 17:22
+
Lance Norskog 2012-02-13, 04:57
Copy link to this message
-
Re: CIMapper Question
Jeff Eastman 2012-02-13, 05:07
PolymorphicWritable actually works great in the two applications of it I
committed today. They are low-volume of course so the overhead of
writing the class name is not onerous.

On 2/12/12 9:57 PM, Lance Norskog wrote:
> Another option is TupleWritable. But pull the source and make sure it
> works, I had problems.
>
> On Sun, Feb 12, 2012 at 9:22 AM, Jeff Eastman
> <[EMAIL PROTECTED]>  wrote:
>> This approach worked out, not exactly as below, but I was able to create a
>> ClusterWritable which used PolymorphicWritable to read and write its Cluster
>> value field. This makes it through the mapper and reducer but I'm still
>> working on getting it all to fly in the ClusterIterator.
>>
>>
>> On 2/12/12 9:43 AM, Raphael Cendrillon wrote:
>>> Hi Jeff,
>>>
>>> It's great to see some discussion on this. I ran into a similar problem
>>> when trying to make the SplitInput job work for any arbitrary key and value
>>> classes. In the end I was able to side step the issue by just reading the
>>> key and value classes from the SequenceFileInput, but I never found a way to
>>> deal with this head on.
>>>
>>> On 12 Feb, 2012, at 8:35 AM, Jeff Eastman wrote:
>>>
>>>> Thanks Sean&    Ted. That is what I've observed experimentally. I was going
>>>> to pursue a ClusterWriteable along the lines of VectorWritable but will try
>>>> PolymorphicWritable<Cluster>    first. Looking at it, I see it does send the
>>>> class name which might be onerous as Sean observed except for the fact that
>>>> I am only sending (k) clusters between each mapper and the reducer. I will
>>>> report on this an an hour or so.
>>>>
>>>>
>>>> On 2/12/12 9:01 AM, Ted Dunning wrote:
>>>>> But this sounds like a runtime problem, not a type checking problem.
>>>>>
>>>>> Polymorphism is generally a problem in the Hadoop API.   That is why we
>>>>> have VectorWritable and why I added PolymorphicWritable.
>>>>>
>>>>> Jeff,
>>>>>
>>>>> Two questions:
>>>>>
>>>>> 1) would PolymorphicWritable<Cluster>     help?
>>>>>
>>>>> 2) can you say more about what the IOException is?  Does it give any
>>>>> hints?
>>>>>
>>>>> On Sun, Feb 12, 2012 at 7:00 AM, Paritosh Ranjan<[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>> Can something like this help?
>>>>>>
>>>>>> public class CIMapper<T extends Cluster>     extends
>>>>>> Mapper<WritableComparable<?>,**VectorWritable,IntWritable,T>     {
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> On 12-02-2012 06:48, Jeff Eastman wrote:
>>>>>>
>>>>>>> I'm wondering how to tease the elephant into accepting any concrete
>>>>>>> instance of the interface o.a.m.clustering.Cluster when writing
>>>>>>> trained
>>>>>>> clusters in the cleanup() method of CIMapper. I've gotten the MR
>>>>>>> version of
>>>>>>> the ClusterIterator to get to that point in testing but it blows
>>>>>>> chunks
>>>>>>> with an IOException when I try to pass a
>>>>>>> o.a.m.clustering.kmeans.**Cluster
>>>>>>> (I will rename the latter for 0.7). Seems the MapTask.collect() wants
>>>>>>> =>>>>>>> and not instanceof.
>>>>>>>
>>>>>>> I've talked with Ted about passing Clusters rather than the current
>>>>>>> ClusterObservations but don't see how at this point. Any ideas?
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>
>