Lance Norskog 2011-04-10, 03:56
Is this work appropriate for summarizing irregularly sampled events
into timed buckets?
The use case is actually more complex: summarizing multiple time
series in map/reduce.
Given N system logs with parallel recordings, I would like to
summarize these in one common set of time buckets.
Each mapper would emit "something" with the time bucket as the sorting key.
Each reducer would summarize and emit the bucket.
Of course, to minimize I/O there should be a combiner that does a
One use case would be total query/second for a distributed system.
I can see the very basic stupid approach to this. Does the
exponentially weighted algorithm provide a more sophisticated result
set? And is it worth the complexity of including the original time
stamps of the events?