On Sat, Apr 9, 2011 at 8:56 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Is this work appropriate for summarizing irregularly sampled events
> into timed buckets?
>
>
>
http://tdunning.blogspot.com/2011/03/exponentially-weighted-averaging-for.html> et. al.
>
Yes. That is the idea. This averaging turns the irregular samples into a
step function which can be sampled at any time.
> The use case is actually more complex: summarizing multiple time
> series in map/reduce.
>
That is a bit trickier. If you can assume that mappers get disjoint time
ranges, it gets a bit easier, but you still have to glue the time ranges
together. This isn't too hard to do since the state of the averager is
summarized by the last value of the average and the time it was established.
> Given N system logs with parallel recordings, I would like to
> summarize these in one common set of time buckets.
>
Sorting by time first would help a good bit. On the other hand, since the
exponential averager is a linear operator, it follows the distributive law
and you should be able to average each sequence in isolation and
combinethem. There is a wee bit of math necessary to figure out just how, but it
should make a lot of sense when you get the result.
Each mapper would emit "something" with the time bucket as the sorting key.
>
I think that each mapper would emit the time for the end of the window that
the current input sample falls into, the sample and the time the sample
occurred relative to the end time of the window.
The
combiner and reducer would key on window end time and do a weighted
average for all available samples and would emit the rectified time, the
weighted average and set the time for the weighted average to the be time
for the latest sample relative to the rectified time.
If desired, relative time could be expressed as exp(-t / alpha) to avoid
recomputing exponentials.
This would compute weighted averages. Average rates would be analogous,
following the derivation in the blog.
set? And is it worth the complexity of including the original time
> stamps of the events?
>
By superposition, you only need the time for the last event.