Known Issues

No.

JIRA

Affected Version(s)

Fix Version

Issue

Steps to workaround this issue

14

-

2.4.x +

-

Hiveserver2 can send a ton of metrics which causes performance issues with Metrics Collector.

Symptoms

AMS goes down intermittently.
AMS displays only host metrics (Host summary page on Ambari / System - Servers Grafana dashboard).
Aggregated data is not seen. (AMS Summary page / System - Home Grafana dashboard / HBase - Home Grafana dashboard) .

How do you find out if this is the issue?

Check out AMS metadata endpoint - http://<ams-host>:6188/ws/v1/timeline/metrics/metadata.
Use a JSON viewer to look at the JSON response. The number of metrics coming from every component is seen. AMS can handle around ~10000 unique metrics. Check out hivesever2 or any other component's metric count is causing an explosion of metrics.

Try increasing the heap settings for Metrics collector and HBase (Configurations - Tuning), since the system might be tuned for a small cluster while in fact is getting a lot of metrics.
Set
- ams-site : timeline.metrics.service.resultset.fetchSize = 10000
- ams-hbase-site : ams-hbase-site: hbase.regionserver.handler.count=30
If Ambari 2.5.x, set timeline.metrics.cluster.aggregation.sql.filters = sdisk_%,boottime,default.General%
Whitelist -
- In Ambari 2.4.x the whitelisting feature is quite limited (only a metric whitelist file can be used).
- However in Ambari 2.5.0, a lot of refinements were added to whitelisting.
- Check out Ambari Metrics - Whitelisting.

13

AMBARI-20056

2.2.2

2.5.0

On large clusters AMS can become in-operational due to store file explosion and no compaction.

Consequence : Large number of store files (~10000) in AMS HBase and AMS shutting down regularly.

Set ams-site : timeline.metrics.hbase.fifo.compaction.enabled = false
Connect to HBase shell on Metrics collector host.
/usr/lib/ams-hbase/bin/hbase --config /etc/ams-hbase/conf shell
Execute the following statement

alter 'METRIC_RECORD', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '1000', 'hbase.hstore.defaultengine.compactionpolicy.class' =>
'org.apache.hadoop.hbase.regionserver.compactions.FIFOCompactionPolicy'}

If the above does not solve the issue, the only way to recover the system is to reset the metric system.

12

AMBARI-18093

2.2.2

2.4.0

On large clusters, if the TTL of high precision tables is more than 3 days, it leads to too much data and too many regions in AMS HBase. It is better to have a smaller ttl for the higher precision data. The 5 minute aggregate data will still be available for 7 days to work with.

Following config changes in "ams-site" from the UI
timeline.metrics.host.aggregator.ttl : 1 day
timeline.metrics.cluster.aggregator.second.ttl : 3 days
Restart AMS collector

11

AMBARI-17779

2.2.1, 2.2.2

2.4.0

The HBase normalizer which automatically splits / merges regions based on region size was leveraged in AMS (2.2.1). However, due to over aggressive region splitting by normalizer sometimes, in large clusters it could lead to an explosion of regions. This will eventually lead to AMS crashing every time it starts up.

As of 2.2.2, the AMS HBase normalizer cannot not be disabled through AMS configs.

Instructions for disabling normalizer on AMS HBase tables.

1. su ams (kinit if needed)
2. Connect to HBase shell on Metrics collector host.
/usr/lib/ams-hbase/bin/hbase --config /etc/ams-hbase/conf shell
3. Execute the following statements one by one.

 alter 'METRIC_RECORD', {NORMALIZATION_ENABLED => 'false'} 
 alter 'METRIC_AGGREGATE', {NORMALIZATION_ENABLED => 'false'}
 alter 'METRIC_RECORD_MINUTE', {NORMALIZATION_ENABLED => 'false'} 
 alter 'METRIC_AGGREGATE_MINUTE', {NORMALIZATION_ENABLED => 'false'}
 alter 'METRIC_RECORD_HOURLY', {NORMALIZATION_ENABLED => 'false'}
 alter 'METRIC_AGGREGATE_HOURLY', {NORMALIZATION_ENABLED => 'false'}
 alter 'METRIC_RECORD_DAILY', {NORMALIZATION_ENABLED => 'false'}
 alter 'METRIC_AGGREGATE_DAILY', {NORMALIZATION_ENABLED => 'false'}

4. Verify the configuration change took effect. Open the HBase master UI on a browser (http://<collector_host>:61310), and search for the String "NORMALIZATION". It should return no matches.

10

AMBARI-15492

2.2.1

2.2.2

Ambari metrics collector shuts down and restarts randomly. The following error is seen in the collector log.

ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM

Comment out these 2 properties in /etc/ambari-server/conf/ambari.properties
- #recovery.enabled_components=METRICS_COLLECTOR
- #recovery.type=AUTO_START
Restart Ambari Server.

9

AMBARI-13758

2.1.2 and lower

2.2.0

When Ambari Metrics collector host is moved from one host to another, host metrics are not seen

Restart Ambari Server.
Restart Ambari Metrics Monitor

8

AMBARI-14257

2.1.2

2.2.0

Storm metrics are not seen after upgrading to Ambari 2.1.2

On every host with a storm component [nimbus / supervisor / client] , carry out the following steps.

Verify broken link

ls -al /usr/hdp/current/storm-<component>/lib/ambari-metrics-storm-sink.jar

Remove symlink

rm -f /usr/hdp/current/storm-<component>/lib/ambari-metrics-storm-sink.jar

Reattach symlink to new JAR

ln -s /usr/lib/storm/lib/ambari-metrics-storm-sink*.jar

/usr/hdp/current/storm-<component>/lib/ambari-metrics-storm-sink.jar

7

AMBARI-13798

2.1.2

2.2.0

Ambari Metrics service graphs might not show data for certain metrics and the the following error might be seen in the metrics collector log (Ambari 2.1.2, 2.1.2.1).

"The time range query for precision table exceeds row count limit, please query aggregate table instead"

Change the ams-site configuration, Set timeline.metrics.service.default.result.limit = 15840
Restart the Collector

6

AMBARI-13711

2.1.2

Ambari Metrics Server wont start successfully with Kerberos in Distributed Mode (AMBARI-13711)

The problem is, we cannot have separate principals for HBase Master and RS. The Zookeeper ACLs will not allow znode created using one principal to be read by the other unless proper ACL are set.

Since master created the znode with a different principal than RS in 2.1.2 this will happen.

Change the AMS configuration to use the Master keytab and principal for RS
Set,

ams-hbase-site ::: hbase.regionserver.keytab.file = /etc/security/keytabs/ams-hbase.master.keytab
ams-hbase-site ::: hbase.regionserver.kerberos.principal = amshbasemaster/_HOST@REALM

Restart the Collector

5

Metrics data for last one month is missing many data points that should exist.

- check /var/log/ambari-metrics-collector/ambari-metrics-collector.log for metrics data aggregation errors like OutOfOrderScannerNextException or SpoolTooBigToDiskException

- Set bigger value for hbase_regionserver_heapsize property in Advanced ams-hbase-env using Ambari Web UI

- Restart Metrics Collector.

4

AMBARI-11501,

AMBARI-12347

2.0.x and 2.1.0

AMS HBase does not start after Kerberization in distributed mode, 2.0.x and 2.1.0 (Note: Also look at issue: 1)

Issue 1: (AMBARI-11501)
Steps to workaround this issue:

On the ambari server host

cd /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/params.py
Edit hbase_staging_dir and point it to the desired location on HDFS, e.g.: "/ams-hbase/staging"
Restart Ambari server.
Restart Metrics Collector.

Issue 2: (AMBARI-12347)
Steps to workaround this issue:

Edit the following properties in Advanced ams-hbase-security-site using Ambari Web UI:

ams.zookeeper.principal = zookeeper/_HOST@EXAMPLE.COM (Substitute appropriate REALM)

ams.zookeeper.keytab = /etc/security/keytabs/zk.service.keytab

Note: This is assuming you have a Zookeeper keytab on the host with Metrics Collector.

If not you should create one with appropriate permissions.

If a keytab already exists, make sure to chmod 440 /etc/security/keytabs/zk.service.keytab

Example: ]# klist -kt /etc/security/keytabs/zk.service.keytab

Keytab name: FILE:/etc/security/keytabs/zk.service.keytab
KVNO Timestamp         Principal
---- ----------------- --------------------------------------------------------
   1 07/08/15 22:29:07 zookeeper/ambari-sid-3.c.pramod-thangali.internal@EXAMPLE.COM

Restart Metrics Collector

3

2.0.0, 2.1.0, 2.1.1

2.1.2

Alter TTL is not supported by the version of Phoenix used with Ambari 2.0.0, 2.1.0, 2.1.1. This property can be modified using HBase Shell command.

~]$ su - ams
~]$ export JAVA_HOME=/usr/jdk64/jdk1.8.0_40/
~]$ /usr/lib/ams-hbase/bin/hbase --config /etc/ams-hbase/conf shell

hbase(main):007:0> describe 'METRIC_RECORD'

------- HBase output describing table information ---------------

hbase(main):009:0> alter 'METRIC_RECORD', { NAME => '0', TTL => 172800}

hbase(main):007:0> describe 'METRIC_RECORD'

------- HBase output describing table information with new TTL ---------------

2

2.0.x

2.1.0

Ambari Metrics service does not work after enabling security with AMS in distributed mode in Ambari 2.0.x

Copy core-site.xml and hdfs-site.xml to metric collector config directories after HA is enabled.

cp /etc/hadoop/conf/core-site.xml /etc/ambari-metrics-collector/conf/

cp /etc/hadoop/conf/hdfs-site.xml /etc/ambari-metrics-collector/conf/

cp /etc/hadoop/conf/core-site.xml /etc/ams-hbase/conf/

cp /etc/hadoop/conf/hdfs-site.xml /etc/ams-hbase/conf/

Restart the Metrics Collector Component.

1

AMBARI-10707

2.0.x

2.1.0

Ambari Metrics service does not work with NN HA in distributed mode in Ambari 2.0.x

Space shortcuts

Child pages

1 Comment

Venkatraman Poornalingam