Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Good starting instance for AMI


Copy link to this message
-
Re: Re : Good starting instance for AMI
Ken Krugler 2010-01-18, 20:31

On Jan 18, 2010, at 12:15pm, Ted Dunning wrote:

> Is there an important difference between creating an existing AMI or  
> using
> an existing AMI with a startup script that populates everything from  
> S3?
>
> Building an AMI takes a few hours of time and is a total pain in the  
> butt.
> My eventual result was that I didn't need to do it at all.

[snip]

Leaving aside the pros/cons of having a pre-installed Hadoop, there  
were two things that I found non-trivial to handle via the init script:

1. Get LZO support installed.

Though I didn't dig into the various ways to do a scripted install.

2. Turn off noatime.

You can do it via the script, but it feels kind of odd to have to re-
mount disks, and either know about the set of volumes or do fancy sed-
fu to dynamically generate the list.

Maybe there's an easy way that I missed? Input welcome...

-- Ken
The two things that
>
> I found that I had roughly three levels of variation in my production
> systems:
>
> - the OS
> - the infrastructural components like java, hadoop and zookeeeper
> - the application that I wanted to run
>
> My initial thought was that the AMI should cover the first two  
> aspects of
> variability.  But I also found that I wanted to change the version  
> of the
> infrastructure stuff fairly often in development of the AMI and not
> infrequently in production.
>
> For Mahout customers, I would imagine that there is a reasonable  
> amount of
> variability in desired OS (Ubuntu versus Redhat versus Centos at  
> least), JDK
> and Hadoop versions.  We definitely can't afford the time to build  
> AMI's for
> all options.
>
> My final answer for deepdyve was to use a standard alestic.com AMI.  
> That
> let me change the OS whenever I needed to and would let Mahout  
> customers
> pick their preference.  These AMI's allow a 16K startup script which  
> I used
> to handle infrastructure variation.  That worked very well for me  
> and could
> be used for Mahout.
>
> The cost was a few 10's of seconds at boot time.  The benefit was  
> vastly
> better debug and development cycle.  Somebody else handled the OS  
> and I
> could test many variations of setup script very quickly.  This  
> practice is
> very much in line with what RightScale does.
>
> Generally, I would avoid the full-custom AMI in favor of a few S3  
> hosted tar
> balls rooted at / that anybody can rain down on any Linux version they
> want.
>
> On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll  
> <[EMAIL PROTECTED]>wrote:
>
>> Create an AMI with:
>> 1. Java 1.6
>> 2. Maven
>> 3. svn
>> 4. Mahout's exact Hadoop version
>> 5. A checkout of Mahout
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g