Here are listed some procedures to extract information of a running Nuxeo instance.
Most of these scripts generate files in the /tmp/
directory.
These information can be requested by the Support team.
Please always compress files before uploading them to your JIRA ticket.
Nuxeo Status and Configuration
To dump your server status and configuration:
./bin/nuxeoctl showconf > /tmp/nuxeo-showconf-`date +%Y%m%d-%H%M%S`.txt
Nuxeo Health Check Status
curl -XGET http://localhost:8080/nuxeo/runningstatus > /tmp/nuxeo-healthcheck-`date +%Y%m%d-%H%M%S`.json
Nuxeo Metrics via JMX
When JMX is enabled (uncomment JMX related lines in nuxeo.conf), the Nuxeo Platform exposes lots of metrics accessible in the "metrics"
domains.
You can use GUI tools like Java Mission Control or VisualVM to introspect these metrics, but if you want to dump all of them to report a problem you can use jmxterm (using the same JVM and user as your Nuxeo):
Initialize the script:
# download jmxterm
wget https://github.com/jiaqi/jmxterm/releases/download/v1.0.1/jmxterm-1.0.1-uber.jar -O /tmp/jmxterm-1.0.1-uber.jar
# list metrics beans and create a script
echo -e "domain metrics\nbeans" | java -jar /tmp/jmxterm-1.0.1-uber.jar -l localhost:1089 -n | sed -e "s,^,get -b ,g" -e "s,$, \*,g" > /tmp/metrics-script.txt
Then proceed to the metric capture:
java -jar /tmp/jmxterm-1.0.1-uber.jar -l localhost:1089 -n -i /tmp/metrics-script.txt) > /tmp/nuxeo-metrics-`date +%Y%m%d-%H%M%S`.txt 2>&1
JVM Garbage Collector
The garbage collector attempts to reclaim memory used by objects that are no longer in use by the application.
The garbage collector is monitored by default since Nuxeo 6.0, the log file is located here: ${nuxeo.log.dir}/gc.log
.
In case of problem think to save this file before restarting because the file is overridden on start. If you see many full GC in the file try to run a JVM heap histo.
JVM Heap Histo
To see what objects are present in the heap:
jcmd Bootstrap GC.class_histogram > /tmp/nuxeo-heap-histo-`date +%Y%m%d-%H%M%S`.txt
JVM Thread Dump and CPU Activity
A thread dump is useful to understand what code is running at time t
.
The first step is to log in as same user as the Nuxeo JVM then use jcmd
:
jcmd Bootstrap Thread.print > /tmp/nuxeo.tdump
It is interesting to correlate the code path given by a thread dump with the CPU activity:
top -bcH -n1 -w512 > /tmp/top-thread.txt
To pinpoint stuck code this needs to be done multiple times:
# 6 dumps during one minute
for i in {0..5}; do now=`date +%Y%m%d-%H%M%S`; jcmd Bootstrap Thread.print > /tmp/nuxeo-$now.tdump; top -bcH -n1 -w512 > /tmp/top-$now.txt; sleep 10; done
Oracle JVM Flight Recording
If you are subscribing to an appropriate Oracle commercial license for the JVM, you can activate this option in the nuxeo.conf
:
JAVA_OPTS=$JAVA_OPTS -Dcom.sun.management.jmxremote.autodiscovery=true -Dcom.sun.management.jdp.name=Nuxeo -XX:+UnlockCommercialFeatures -XX:+FlightRecorder
Then to record JVM activity for 1 minute use the following command:
jcmd Bootstrap JFR.start duration=60s filename=/tmp/nuxeo-record-`date +%Y%m%d-%H%M%S`.jfr
JVM Core Dump
When the JVM is stuck, in addition to thread dump and before restarting, a core dump can give more context information,
If you have gdb
installed, you can generate a core dump without killing the application:
sudo gdb --pid=<PID> --batch -ex generate-core-file -ex detach
Network
Measure the round trip between Nuxeo and the database:
ping -s 8192 <database IP>
Use mtr to discover what is between the Nuxeo server and the database, report any firewall or known hardware.
Look at the number of errors reported by netstat -s
, as a large number of errors may indicate a network problem.
A network capture can be helpful at some point:
# Capture all eth0 traffic
sudo tcpdump -i eth0 -w /tmp/out.tcpdump
# Capture http traffic to port localhost:8080
sudo tcpdump -i lo -A host localhost and tcp port 8080 -w /tmp/nuxeo-network-`date +%Y%m%d-%H%M%S`.tcpdump
OS
You can report a Linux configuration using the aspersa summary script:
wget https://raw.githubusercontent.com/AnthemiusGuo/aspersa/master/summary && bash summary > /tmp/nuxeo-os-summary-`date +%Y%m%d-%H%M%S`.txt
To monitor the system the sysstat utilities are a collection of performance monitoring tools for Linux that is easy to setup.
You can monitor the system activity like this:
sar -d -o /tmp/nuxeo-sysstat-`date +%Y%m%d-%H%M%S`.log 5 720 >/dev/null 2>&1 &
This will monitor the activity every 5s during 1h.
Very useful also is to have a process monitoring, this can be done with atop running as root:
atop -w /tmp/nuxeo-atop-`date +%Y%m%d-%H%M%S`.log 5 720 >/dev/null 2>&1 &
PostgreSQL
Follow the Nuxeo recommendation and perform the reporting problem procedure. Pgbadger and explain are your friends.
Elasticsearch
If the problem is related to Elasticsearch access (initialization or bad health status), please list:
- the non default
nuxeo.conf
,elasticsearch.*
options - the non default Elasticsearch configuration options (especially the discovery)
And report the output of the following commands, assuming that Elasticsearch is on localhost and that the HTTP protocol is open on port 9200:
(ES="localhost:9200"; curl "$ES"; curl "$ES/_cat/health?v"; curl "$ES/_cat/nodes?v"; curl "$ES/_cat/indices?v") > /tmp/nuxeo-elastic-`date +%Y%m%d-%H%M%S`.txt
In addition if the problem is related to unexpected search results or errors, follow this procedure: Reporting Settings and Mapping
Redis
How much memory is used:
redis-cli info memory > /tmp/nuxeo-redis-mem-`date +%Y%m%d-%H%M%S`.txt
Capture activity, hit Ctrl-C
to stop:
redis-cli monitor > /tmp/nuxeo-redis-monitor-`date +%Y%m%d-%H%M%S`.txt
Kafka
You can get low level information using directly Kafka scripts, for instance:
# List all topics:
/opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper:2181 --list
# Describe a topic
/opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper:2181 --describe --topic nuxeo-audit
# List all consumer groups
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
# Describe a consumer group
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group nuxeo-AuditLogWriter
Nuxeo Stream Consumer Lag
The stream.sh
utility is located in the same bin
directory as nuxeoctl
.
When Using Kafka
# List topics and consumers lag
./bin/stream.sh lag -k > /tmp/nuxeo-stream-lag-`date +%Y%m%d-%H%M%S`.md
# Get the latency in addition to the lag
./bin/stream.sh latency -k --codec avro > /tmp/nuxeo-stream-lag-`date +%Y%m%d-%H%M%S`.md
When Using Chronicle
When not using Kafka, you need to get the consumer activity on each Nuxeo node:
# List streams and consumers position
(NUXEO_DATA=/var/lib/nuxeo/data; ./bin/stream.sh lag --chronicle $NUXEO_DATA/stream/bulk; ./bin/stream.sh lag --chronicle $NUXEO_DATA/stream/audit; ./bin/stream.sh lag --chronicle $NUXEO_DATA/stream/default) > /tmp/nuxeo-stream-lag-`date +%Y%m%d-%H%M%S`.md
If required you can also take a snapshot of the streams:
(NUXEO_DATA=/var/lib/nuxeo/data; tar czsf /tmp/nuxeo-cq-`date +%Y%m%d-%H%M%S`.tgz $NUXEO_DATA/stream/ $NUXEO_DATA/avro/)
Security
If you think you've found a security issue, please report it privately to [email protected].