FWIW, I now understand what I was missing that made me think Manifold
was running TIka when it wasn't.  It turns out that Alfresco uses Tika
internally and when you get a document from Alfresco (using the
Webscripts connector anyway) the set of fields you get includes all
the image metadata and what-not (for image files).  I never realized
this because I don't typically use Alfresco for images.  But when I
added extra logging to the Alfresco WebScripts connector code, to spit
out the incoming field set, I see things like:

Found property exif:yResolution = 72.0
Found property cm:owner = admin
Found property exif:isoSpeedRatings = 400
Found property exif:fNumber = 3.5
Found property sys:node-uuid = 0516a5cc-fc04-4512-a4ed-b595b7c3908b
Found property exif:pixelYDimension = 2048
Found property exif:resolutionUnit = Inch
Found property exif:dateTimeOriginal = 2005-01-09T16:00:55Z
Found property sys:locale = en_GB

which explains why the Solr connector was trying to save fields like
exif_fNumber and exif_resolutionUnit.   This came up because the
Alfresco instance I'm experimenting with has their default sample
workspace which includes images and things I don't normally touch.

As for managing all this so my history doesn't contain all those
failure messages, I thought about creating a "WhitelistFieldTransform"
as a transform connection to drop any fields other than the ones that
are whitelisted.    Two questions:

1. Does this seem like a reasonable approach, or is there a better way?

2. If this is reasonable and I create such a filter, would there be
any interest in having it contributed back to MCF?

This message optimized for indexing by NSA PRISM
On Sun, Oct 15, 2017 at 10:11 AM, Karl Wright <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB