Hi Karl,

Sorry for not explaining the issue in a detail manner.

(1)   Is it likely to go away or not on a retry;

The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN error are not likely to go away on immediate retry.

(2)   Does it substantially impact the ability of ManifoldCF to properly process the document;

The impact is someone need to monitor the indexing and if it gets stopped on these issues, need to use the restart-minimal to start the indexing again.
(3) Is it generally acceptable to skip ALL documents where the error occurs.
Yes, those errors are occurred for a large number of documents and its tough time for the user to restart the indexing again. Total documents count - 700000+
Im not sure whether the occurrences of these issues are common on the documentum / due to improper documentum configuration/maintenance. We have encountered those errors on a couple of the documentum instances of lower environments (Not validated on production).

The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN are of type DfException caused from the getObjectByQualification  method in the org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.

We made a fix to print the error on the log(documentum server process) and return null.
    catch (DfException e)

      return null;
      //throw new DocumentumException("Documentum error: "+e.getMessage());
On the run() method of the  ProcessDocumentThread inner class on  the org.apache.manifoldcf.crawler.connectors.DCTM.DCTM file,  if did a null check to continue with the document processing.
IDocumentumObject object = session.getObjectByQualification("dm_document where i_chronicle_id='" + documentIdentifier +
          "' and any r_version_label='CURRENT'");
        if(object!=null) {

      catch (Throwable e)
        this.exception = e;

The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely due to the document uploaded is parked in interim BOCS and moved to Repository after a shorter time.
If indexing happens on the gap, the properties will be accessible, but the document content will not be available that causes the error. The fix is not yet completed.
The code snippet that causes this error is shared below.
The run() method of the  ProcessDocumentThread inner class on  the org.apache.manifoldcf.crawler.connectors.DCTM.DCTM
            strFilePath = object.getFile(objFileTemp.getCanonicalPath());
          catch (DocumentumException dfe)
            // Fetch failed, so log it
            activityStatus = "NOCONTENT";
            activityMessage = dfe.getMessage();
            if (dfe.getType() != DocumentumException.TYPE_NOTALLOWED)
              throw dfe;

The getFile method on the org.apache.manifoldcf.crawler.common.DCTM.DocumentumObjectImpl

    catch (DfException dfe)
      // Can't decide what to do without looking at the exception text.
      // This is crappy but it's the best we can manage, apparently.
      String errorMessage = dfe.getMessage();
      if (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1)
        // Treat it as transient, and retry
        throw new DocumentumException(dfe.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
      // It's probably not a transient error.  Report it as an access violation, even though it
      // may well not be.  We don't have much info as to what's happening.
      throw new DocumentumException(dfe.getMessage(),DocumentumException.TYPE_NOTALLOWED);

The approach to discard uncrawlable documents and continue with the  indexing process is meaningful rather than stalling it. If you feel it is good to include, kindly do the required coding exception.

Tamizh Kumaran Thamizharasan

From: Karl Wright [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 14, 2017 12:36 PM
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error

Hi Tamizh,

For any repository  errors, ManifoldCF needs to know the following:
(1) Is it likely to go away or not on a retry;
(2) Does it substantially impact the ability of ManifoldCF to properly process the document;
(3) Is it generally acceptable to skip ALL documents where the error occurs.

In this case your underlying error seems quite worrying:

[DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily parked on a BOCS server host. It will be available when it is moved to a permanent storage area."

I could imagine that many or most documents are in fact in that state, in which case nothing can really be crawled?

I'm happy to make coding exceptions in the Documentum connector for discarding uncrawlable documents, but only if it makes sense to do that.  Here it is not clear at all that we'd want to change MCF to throw away all documents with this problem.  It sounds instead like there's some significant Documentum configuration issue to me.

On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Team,

Below behavior is observed on using ManifoldCF Documentum connector.
•         On any Documentum specific error, the application throws the error and the job stops abruptly. If there is any specific reason for this approach?

Can we handle these errors by logging the errors, ignoring the document and continue the indexing?
Please find the sample error causing the job to fail.
Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error:  "The server was unable to convert the following string (String Unavailable) to an integer or long."

Caused by: org.apach
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB