Thanks Karl!!

After monitoring the CPU usage of Postgresql, the agents process, and the documentum server process, mainly the documentum server process consumes most of the CPU and the agent process is the second most CPU consumer.

In documentum server run script, java heap is having value as below.
-Xmx512m -Xms32m

Is there any way to speed up the indexing through heap configuration or increasing hardware?
If so, Kindly share us the details.

Regards,
Tamizh Kumaran

From: Karl Wright [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 05, 2017 6:19 PM
To: [EMAIL PROTECTED]
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: ManifoldCF slow documentum indexing performance

Hi Tamizh,

The likely culprit is Documentum itself.  In my experience it can be quite slow, depending on how it is configured.  But you can confirm that by monitoring the CPU usage of Postgresql, the agents process, and the documentum server process.  If none of these are CPU bound, then Documentum itself is the problem.

Thanks,
Karl
On Wed, Jul 5, 2017 at 8:24 AM, Tamizh Kumaran Thamizharasan <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Team,

The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the same linux box. The documentum server sits on a different linux box. The indexing performance is slow(approx 1000 doc per hour) with the documentum crawler. The used properties files is as below for reference

<configuration>
  <!-- Version string for UI -->
  <!-- Point to a specific (common) logging file -->
  <property name="org.apache.manifoldcf.logconfigfile" value="./logging.ini"/>
  <!-- Specify the connectors to be loaded -->
  <property name="org.apache.manifoldcf.connectorsconfigurationfile" value="../connectors.xml"/>
  <!-- Specify the path to the file resources directory -->
  <property name="org.apache.manifoldcf.fileresources" value="../file-resources"/>
  <property name="org.apache.manifoldcf.databaseimplementationclass" value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
  <property name="org.apache.manifoldcf.postgresql.hostname" value="localhost"/>
  <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
  <property name="org.apache.manifoldcf.dbsuperusername" value="postgres"/>
  <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
  <property name="org.apache.manifoldcf.database.name<http://org.apache.manifoldcf.database.name>" value="manifoldcf"/>
  <property name="org.apache.manifoldcf.database.username" value="postgres"/>
  <property name="org.apache.manifoldcf.database.password" value=""/>
  <property name="org.apache.manifoldcf.database.maxhandles" value="100"/>
  <property name="org.apache.manifoldcf.crawler.threads" value="15"/>
  <property name="org.apache.manifoldcf.crawler.repository.store_history" value="false"/>

  <property name="org.apache.manifoldcf.zookeeper.connectstring" value="***********:8349"/>
  <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" value="5000"/>
<!-- Tell MCF where to find the connector jars -->
  <libdir path="../connector-lib"/>
  <libdir path="../connector-common-lib"/>
  <libdir path="../connector-lib-proprietary"/>
  <!-- Any additional local properties go here -->
</configuration>

Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and the observation is it taking a long time gap between each batch of 45 documents during processing.
Can you please point out any changes/recommendations that will speed up the indexing.

Regards,
Tamizh Kumaran Thamizharasan #!/bin/bash -e

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

JAVA_HOME="/Data/POC/jdk1.8.0_121"
DOCUMENTUM="/Data/POC/documentum_shared/config"
MCF_HOME="/Data/POC/apache-manifoldcf-2.7.1/multiprocess-zk-example"

if [[ $OSTYPE == "cygwin" ]] ; then
    PATHSEP=";"
else
    PATHSEP=":"
fi

#Make sure environment variables are properly set
if [ -e "$JAVA_HOME"/bin/java ] ; then
    if [ -f "$MCF_HOME"/properties.xml ] ; then
        # TODO: Check this on both Windows and Redhat
        if [ -f "$DOCUMENTUM"/dmcl.ini -o -f "$DOCUMENTUM"/dfc.properties ] ; then
   
            # Build the classpath
            CLASSPATH=""
            for filename in $(ls -1 "$MCF_HOME"/../processes/documentum-server/lib) ; do
                if [ -n "$CLASSPATH" ] ; then
                    CLASSPATH="$CLASSPATH""$PATHSEP""$MCF_HOME"/../processes/documentum-server/lib/"$filename"
                else
                    CLASSPATH="$MCF_HOME"/../processes/documentum-server/lib/"$filename"
                fi
            done

            for filename in $(ls -1 "$MCF_HOME"/../processes/documentum-server/lib-proprietary | grep "\.jar$") ; do
                if [ -n "$CLASSPATH" ] ; then
                    CLASSPATH="$CLASSPATH""$PATHSEP""$MCF_HOME"/../processes/documentum-server/lib-proprietary/"$filename"
                else
                    CLASSPATH="$MCF_HOME"/../processes/documentum-server/lib-proprietary/"$filename"
                fi
            done

            LIB_STATEMENT=""
            if [[ $JAVA_LIB_PATH != "" ]] ; then
                LIB_STATEMENT=-Djava.library.path="$JAVA_LIB_PATH"
            fi
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB