-Re: Java Heap Error: ItemSimilarityJob
Sean Owen 2012-06-06, 09:50
You need to increase the size of the children's heap.
mapred.child.java.opts can be set to -Xmx4g for example. This is
usually put in mapred-site.xml.
Sampling does decrease the size of the intermediate outputs; probably
not the final output so much. But this is not your problem. You are
running out of heap on the workers.
You should definitely use more than one reducer! It's really up to
you, says Hadoop, to specify this, use -Dmapred.reduce.tasks=10 or
whatever is appropriate.
The name of the jobs kind of says what they do, and the javadoc says a
little more. If you have specific questions I bet people can explain
On Wed, Jun 6, 2012 at 7:39 AM, Something Something
<[EMAIL PROTECTED]> wrote:
> I am running this job with a file containing 791,732,411 lines.
> Step 1 (PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer) completed in
> 3 minutes.
> Step 2 (PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer) took 2 hours
> but completed successfully. It used only 1 Reducer so I am assuming the
> output is sorted, right?
> Step 3 (PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer) failed
> after running for 54 minutes with 'Error: Java heap space' error & it was
> all downhill from there.
> Question: Are there any configuration parameters I can use to cut down
> size of output? I noticed this in ToItemVectorsMapper:
> public static final String SAMPLE_SIZE = ToItemVectorsMapper.class +
> How do I cut down this sample size?
> Also, is there any documentation available that shows what each of these
> steps does? If not, I will just debug. Please let me know. Thanks.