no - not so something that I've encountered during algebraic expressions.  I'd be using it for the results of the embarrassingly parallel sections of a GA.  So the algebraic expressions will be in-core on the back end.
A very straight-forward Map only evaluation with a utility function in the closure (appending a score to the last column).  Then a sort on the score for crossover, etc.
I'd could do it in Spark only but if possible, It would be good to keep engine neutral.

> Another potential issue is that matrices do not generally imply ordering or
formation of intermediate products, i.e., we inside optimizer, you might
build a pipeline that implies ordered RDD in Spark sense, but there is no
algebraic operator consuming sorted rdds, and no operator that guarantees
preserving it (even if it just a checkpoint). This may create ambiguities
as more rewriting rules are added. This is not a major concern.
Thanks- I'd not thought of this.  This does complicate things a bit.  Need to think a bit more about this.
________________________________
From: Dmitriy Lyubimov <[EMAIL PROTECTED]>
Sent: Tuesday, September 5, 2017 6:01:21 PM
To: [EMAIL PROTECTED]
Subject: Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

In general, +1, don't see why not.

Q -- is it something that you have encountered while doing algebra? I.e.,
do you need the sorted DRM to continue algebraic operations between
optimizer barriers, or you just need an RDD as the outcome of all this?

if it is just an RDD, then you could just do a spark-supported sort, that's
why we have an drm.rdd barrier (spark-specific). Barrier out to spark RDD
and then continue doing whatever spark already supports.

Another potential issue is that matrices do not generally imply ordering or
formation of intermediate products, i.e., we inside optimizer, you might
build a pipeline that implies ordered RDD in Spark sense, but there is no
algebraic operator consuming sorted rdds, and no operator that guarantees
preserving it (even if it just a checkpoint). This may create ambiguities
as more rewriting rules are added. This is not a major concern.

On Tue, Sep 5, 2017 at 2:24 PM, Trevor Grant <[EMAIL PROTECTED]>
wrote:

> Ever since we moved Flink to its own profile, I have been thinking we ought
> to do the same to H2O but haven't been to motivated bc it was never causing
> anyone any problems.
>
> Maybe its time to drop H2O "official support" and move Flink Batch / H2O
> into a "mahout/community/engines" folder.
>
> Ive been doing a lot of Flink Streaming the last couple weeks and already
> bootlegged a few of the 'Algorithms" into Flink.  Pretty sure we could
> support those easily- and I _think_ we could do the same with the
> distributed (e.g. wrap a DataStream[(Key, MahoutVector)] and implement the
> the Operators on that.
>
> I'd put FlinkStreaming as another community engine.
>
> If we did that, I'd say- by convention we need a Markdown document in
> mahout/community/engines that has a table of what is implemented on what.
>
> That is to say, even if we only were able to implement the "algos" on Flink
> Streaming- there would still be a lot of value to that for many
> applications (esp considering the state of FlinkML).  Also beats having a
> half cooked engine sitting on a feature branch.
>
> Beam does something similar to that for their various engines.
>
> Speaking of Beam, I've heard rumblings here and there of people tlaking
> about making a Beam engine- this might motivate people to get started (no
> one person feels responsible for "boiling the ocean" and throwing down an
> entire engine in one go- but instead can hack out the portions they need.
>
>
> My .02
>
> tg
>
> On Tue, Sep 5, 2017 at 4:04 PM, Andrew Palumbo <[EMAIL PROTECTED]> wrote:
>
> > I've found a need for the sorting a Drm as well as In-core matrices,
> > something like eg.: DrmLike.sortByColumn(...). I would like to implement
> > this at the math-scala engine neutral level with pass through functions
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB