Reporting Problems

Here are listed some procedures to extract information of a running Nuxeo instance.
Most of these scripts generate files in the /tmp/ directory.
These information can be requested by the Support team.
Please always compress files before uploading them to your JIRA ticket.

Nuxeo Status and Configuration

To dump your server status and configuration:

./bin/nuxeoctl showconf > /tmp/nuxeo-showconf-`date +%Y%m%d-%H%M%S`.txt

Nuxeo Health Check Status

curl -XGET http://localhost:8080/nuxeo/runningstatus > /tmp/nuxeo-healthcheck-`date +%Y%m%d-%H%M%S`.json

Nuxeo Metrics via JMX

When JMX is enabled (uncomment JMX related lines in nuxeo.conf), the Nuxeo Platform exposes lots of metrics accessible in the "metrics" domains.

You can use GUI tools like Java Mission Control or VisualVM to introspect these metrics, but if you want to dump all of them to report a problem you can use jmxterm (using the same JVM and user as your Nuxeo):

Initialize the script:

# download jmxterm
wget https://github.com/jiaqi/jmxterm/releases/download/v1.0.1/jmxterm-1.0.1-uber.jar -O /tmp/jmxterm-1.0.1-uber.jar
# list metrics beans and create a script
echo -e "domain metrics\nbeans" | java -jar /tmp/jmxterm-1.0.1-uber.jar -l localhost:1089 -n | sed -e "s,^,get -b ,g" -e "s,$, \*,g" > /tmp/metrics-script.txt

Then proceed to the metric capture:

(now=`date +%Y%m%d-%H%M%S`; java -jar /tmp/jmxterm-1.0.1-uber.jar -l localhost:1089 -n -i /tmp/metrics-script.txt)  > /tmp/nuxeo-metrics-$now.txt 2>&1

JVM Garbage Collector

The garbage collector attempts to reclaim memory used by objects that are no longer in use by the application.

The garbage collector is monitored by default since Nuxeo 6.0, the log file is located here: ${nuxeo.log.dir}/gc.log.

In case of problem think to save this file before restarting because the file is overridden on start. If you see many full GC in the file try to run a JVM heap histo.

JVM Heap Histo

To see what objects are present in the heap:

jcmd Bootstrap GC.class_histogram > /tmp/nuxeo-heap-histo-`date +%Y%m%d-%H%M%S`.txt

JVM Thread Dump and CPU Activity

A thread dump is useful to understand what code is running at time t.

The first step is to log in as same user as the Nuxeo JVM then use jcmd:

jcmd Bootstrap Thread.print > /tmp/nuxeo.tdump

It is interesting to correlate the code path given by a thread dump with the CPU activity:

top -bcH -n1 -w512 > /tmp/top-thread.txt

To pinpoint stuck code this needs to be done multiple times:

# 6 dumps during one minute
for i in {0..5}; do now=`date +%Y%m%d-%H%M%S`; jcmd Bootstrap Thread.print > /tmp/nuxeo-$now.tdump; top -bcH -n1 -w512 > /tmp/top-$now.txt; sleep 10; done

Oracle JVM Flight Recording

If you are subscribing to an appropriate Oracle commercial license for the JVM, you can activate this option in the nuxeo.conf:

JAVA_OPTS=$JAVA_OPTS -Dcom.sun.management.jmxremote.autodiscovery=true -Dcom.sun.management.jdp.name=Nuxeo -XX:+UnlockCommercialFeatures -XX:+FlightRecorder

Then to record JVM activity for 1 minute use the following command:

jcmd Bootstrap JFR.start duration=60s filename=/tmp/nuxeo-record-`date +%Y%m%d-%H%M%S`.jfr

JVM Core Dump

When the JVM is stuck, in addition to thread dump and before restarting, a core dump can give more context information,

If you have gdb installed, you can generate a core dump without killing the application:

sudo gdb --pid=<PID> --batch -ex generate-core-file -ex detach

Network

Measure the round trip between Nuxeo and the database:

ping -s 8192 <database IP>

Use mtr to discover what is between the Nuxeo server and the database, report any firewall or known hardware.

Look at the number of errors reported by netstat -s , as a large number of errors may indicate a network problem.

A network capture can be helpful at some point:

# Capture all eth0 traffic
sudo tcpdump -i eth0 -w /tmp/out.tcpdump
# Capture http traffic to port localhost:8080
sudo tcpdump  -i lo -A host localhost and tcp port 8080 -w /tmp/nuxeo-network-`date +%Y%m%d-%H%M%S`.tcpdump

OS

You can report a Linux configuration using the aspersa summary script:

wget https://raw.githubusercontent.com/AnthemiusGuo/aspersa/master/summary && bash summary > /tmp/nuxeo-os-summary-`date +%Y%m%d-%H%M%S`.txt

To monitor the system the sysstat utilities are a collection of performance monitoring tools for Linux that is easy to setup.

You can monitor the system activity like this:

sar -d -o /tmp/nuxeo-sysstat-`date +%Y%m%d-%H%M%S`.log 5 720 >/dev/null 2>&1 &

This will monitor the activity every 5s during 1h.

Very useful also is to have a process monitoring, this can be done with atop running as root:

atop -w /tmp/nuxeo-atop-`date +%Y%m%d-%H%M%S`.log 5 720 >/dev/null 2>&1 &

PostgreSQL

Follow the Nuxeo recommendation and perform the reporting problem procedure. Pgbadger and explain are your friends.

Elasticsearch

If the problem is related to Elasticsearch access (initialization or bad health status), please list:

the non default nuxeo.conf, elasticsearch.* options
the non default Elasticsearch configuration options (especially the discovery)

And report the output of the following commands, assuming that Elasticsearch is on localhost and that the HTTP protocol is open on port 9200:

(ES="localhost:9200"; curl "$ES"; curl "$ES/_cat/health?v"; curl "$ES/_cat/nodes?v"; curl "$ES/_cat/indices?v") > /tmp/nuxeo-elastic-`date +%Y%m%d-%H%M%S`.txt

In addition if the problem is related to unexpected search results or errors, follow this procedure: Reporting Settings and Mapping

Redis

How much memory is used:

redis-cli info memory > /tmp/nuxeo-redis-mem-`date +%Y%m%d-%H%M%S`.txt

Capture activity, hit Ctrl-C to stop:

redis-cli monitor > /tmp/nuxeo-redis-monitor-`date +%Y%m%d-%H%M%S`.txt

Kafka

You can get low level information using directly Kafka scripts, for instance:

# List all topics:
/opt/kafka/bin/kafka-topics.sh  --zookeeper zookeeper:2181 --list

# Describe a topic
/opt/kafka/bin/kafka-topics.sh  --zookeeper zookeeper:2181 --describe --topic nuxeo-audit

# List all consumer groups
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

# Describe a consumer group
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group nuxeo-AuditLogWriter

Nuxeo Stream Consumer Lag

The stream.sh utility is located in the same bin directory as nuxeoctl.

When Using Kafka

# List topics and consumers lag
./bin/stream.sh lag -k >  /tmp/nuxeo-stream-lag-`date +%Y%m%d-%H%M%S`.md

# Get the latency in addition to the lag
./bin/stream.sh latency -k --codec avro >  /tmp/nuxeo-stream-lag-`date +%Y%m%d-%H%M%S`.md

When Using Chronicle

When not using Kafka, you need to get the consumer activity on each Nuxeo node:

# List streams and consumers position
(NUXEO_DATA=/var/lib/nuxeo/data; ./bin/stream.sh lag --chronicle $NUXEO_DATA/stream/bulk; ./bin/stream.sh lag --chronicle $NUXEO_DATA/stream/audit; ./bin/stream.sh lag --chronicle $NUXEO_DATA/stream/default) >  /tmp/nuxeo-stream-lag-`date +%Y%m%d-%H%M%S`.md

If required you can also take a snapshot of the streams:

(NUXEO_DATA=/var/lib/nuxeo/data; tar czsf /tmp/nuxeo-cq-`date +%Y%m%d-%H%M%S`.tgz $NUXEO_DATA/stream/ $NUXEO_DATA/avro/)

Security

If you think you've found a security issue, please report it privately to [email protected].