CDOT Event Logs

Back in 7-mode, each node contained a /etc/log/messages{.[0-6]} file containing all event log entries. Obviously all events should be submitted to a centralised event management system (e.g. Splunk), however in the absence of such a facility, you would normally refer to these log files.

In C-DOT, there is a command event log show

You can filter the list by time, message, log level, etc.

CLUSTER02::> event log show -node CLUSTER02-01 -time <13h -severity ERROR,WARNING,CRITICAL,NOTICE -event *Panic*,*takeover*,*down*
Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
4/14/2015 21:00:25  CLUSTER02-01 NOTICE        cf.transition.summary: eventType="SFO Giveback (bsg_01_t1_aggr01)", eventSummary="Protocol Transition Time(msec):NFS=2594[1436|1158], CIFS=2594[1436|1158], FCP=2595[1437|1158], ISCSI=2595[1437|1158]", shutdownSummary="Pre_critical=0, Phase1=8, Phase2=0, Phase3=153, Phase4=0, Phase5=12, Phase6=7, Phase7=10, Phase8=4, Phase9=25, aggr_offline=89, aggr_migrate=838", onlineSummary="src_done_total=0, pre_wafl=568, Phase1=698 {aggr_premount=23, aggr_mount=391 {new_label=41, old_label=0}, vvol_mount=284}, Phase2=785 {aggr_premount=43, vvol_premount=0, aggr_flush_raid=0, set_vols_dirty=0, upgrade_cp_wait=0, upgrade_cp=0, aggr_mount=0, vvol_mount=0, nvfail_cp_wait=0, nvfail_cp=731, lmgr_mount_start=11}, Phase3=107 {aggr_mount=78, vvol_mount=28}, lun_online(6luns)=1", lagSummary="start_lag=9, pkt_sent=1429009165223, pkt_recv=1429009165214"
4/14/2015 20:59:29  CLUSTER02-01 NOTICE        mgr.stack.framename: Stack frame 2: kernel::kproc_shutdown(0xffffffff80341c40) + 0x18b
4/14/2015 20:59:29  CLUSTER02-01 NOTICE        mgr.stack.proc: Panic in process: nodewatchdog
4/14/2015 20:59:29  CLUSTER02-01 NOTICE        mgr.stack.at: Panic occurred at: Tue Apr 14 20:34:19  2015
4/14/2015 20:59:29  CLUSTER02-01 NOTICE        mgr.stack.string: Panic string: Resource exec_context had 100% stranded capacity for 725 seconds in process nodewatchdog on release 8.3RC1 (C)
4/14/2015 20:58:34  CLUSTER02-01 ERROR         vifmgr.lifdown.noports: LIF cus_vsv01_iscsi01 (on virtual server 14), IP address 192.153.23.19, currently cannot be hosted on node CLUSTER02-01, port a0a-19, or anywhere else, and is being marked as down.
4/14/2015 20:58:34  CLUSTER02-01 ERROR         vifmgr.lifdown.noports: LIF bba_vsv01_iscsi01 (on virtual server 21), IP address 192.153.33.9, currently cannot be hosted on node CLUSTER02-01, port a0a-18, or anywhere else, and is being marked as down.
4/14/2015 20:58:34  CLUSTER02-01 ERROR         vifmgr.lifdown.noports: LIF trn_vsv01_iscsi01 (on virtual server 25), IP address 192.154.24.68, currently cannot be hosted on node CLUSTER02-01, port a0a-12, or anywhere else, and is being marked as down.
4/14/2015 20:56:37  CLUSTER02-01 WARNING       pvif.alllinksdowntrap: vifName="a0a"
4/14/2015 20:56:37  CLUSTER02-01 CRITICAL      pvif.allLinksDown: a0a: all links down
4/14/2015 20:54:37  CLUSTER02-01 NOTICE        cf.fsm.takeoverOfPartnerEnabled: Failover monitor: takeover of CLUSTER02-02 enabled
4/14/2015 20:54:36  CLUSTER02-01 NOTICE        cf.fsm.takeoverByPartnerEnabled: Failover monitor: takeover of CLUSTER02-01 by CLUSTER02-02 enabled
4/14/2015 20:54:35  CLUSTER02-01 NOTICE        kern.syslog.msg: The system was down for 7 seconds
13 entries were displayed.

Historical Logs

If anyone can suggest an easy way to access old logs (e.g. the equivalent to messages.5 etc), feel free to comment below. So far the only advice seems to refer to connecting to individual nodes via CIFS (the "old way"). I'm hoping Data ONTAP has addressed this in C-DOT (at least 8.3 onwards)