Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - time-weighted averages


Copy link to this message
-
time-weighted averages
Lance Norskog 2011-04-10, 03:56
Is this work appropriate for summarizing irregularly sampled events
into timed buckets?

http://tdunning.blogspot.com/2011/03/exponentially-weighted-averaging-for.html
et. al.

The use case is actually more complex: summarizing multiple time
series in map/reduce.
Given N system logs with parallel recordings, I would like to
summarize these in one common set of time buckets.
Each mapper would emit "something" with the time bucket as the sorting key.
Each reducer would summarize and emit the bucket.
Of course, to minimize I/O there should be a combiner that does a
partial summarization.

One use case would be total query/second for a distributed system.

I can see the very basic stupid approach to this. Does the
exponentially weighted algorithm provide a more sophisticated result
set? And is it worth the complexity of including the original time
stamps of the events?

--
Lance Norskog
[EMAIL PROTECTED]