|
|
-
Re: ASF Public Mail Archives on Amazon S3Grant Ingersoll 2010-11-17, 20:03
Hmmm, let me look. I don't know if I will be able to recover it
On Nov 17, 2010, at 1:48 PM, Michael McCandless wrote: > Grant, public_p_r.tar seems to be missing? Is that intentional? > Maybe some super-secret project inside there :) > > Mike > > On Thu, Oct 14, 2010 at 12:05 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: >> Hi ORPers, >> >> I put up the complete ASF public mail archives as of about 3 weeks ago on Amazon's S3 and have made them public (let me know if I messed up, it is the first time I've done this). I also intend, in the coming weeks, to convert them into Mahout files (if anyone wants to help let me know). >> >> There are 5 files: >> https://s3.amazonaws.com/asf-mail-archives/public_a_d.tar >> https://s3.amazonaws.com/asf-mail-archives/public_e_k.tar >> https://s3.amazonaws.com/asf-mail-archives/public_l_o.tar >> https://s3.amazonaws.com/asf-mail-archives/public_s_t.tar >> https://s3.amazonaws.com/asf-mail-archives/public_u_z.tar >> >> The tarballs are organized by Top Level Project name (i.e. Mahout is in the public_l_o.tar file). The tarballs contain GZIP files by date, I believe. I believe the total uncompressed file size is somewhere in the 80-100GB range. That should be sufficient to drive some semi-interesting things in terms of scale, even if it is towards the smaller end of things. >> >> As the ASF has very clear public mailing list archive policies, it is my belief that this data set is completely unencumbered. >> >> From an ORP standpoint, this might make for a first data set for evaluation once we have the evaluator framework in place. >> >> Cheers, >> Grant >> |