How to check if centrify is down?
04-22-2016 12:48 AM
We are using IBM Datastage on AIX Servers which is using Centrify for authentication. I am supporting datastage, and we are creating script to check if our datastage login is working on all servers thru ssh.
Now, I need your help. From one of the sandbox server, I wanted to check if cerntrify services are running on other dev,qa,uat and prod servers.
04-22-2016 03:50 AM
Welcome to the Centrify forums.
In practice, there is no Centrify being "down" and that's because of how the client is architected.
You could have your utilities rutinely ask for status with the "adinfo --mode" the statuses are:
- connected (normal operation, connected to AD),
- disconnected (running, but no DCs available) and
- down (service down).
$ adinfo --mode connected
You can also use lssrc, SMIT and other AIX utilities.
"Down" may mean a lot of things to different people, but I can walk you through scenarios that will provide you with the assurance that DirectControl has the mechanisms in place to ensure smooth operation.
It all starts with setup. Before installing our software, most likely you'll have to run the adcheck utility, the first check is an OS check; unless you force it, the client will tell you if it's not compatible with the platform.
The second check is a patch check. This is especially important in Solaris, HP-UX and AIX; we will point out if there's a critical patch that affects our functionality.
$ adcheck --test os OSCHK : Verify that this is a supported OS : Pass PATCH : Linux patch check : Pass PERL : Verify perl is present and is a good version : Pass SAMBA : Inspecting Samba installation : Pass SPACECHK : Check if there is enough disk space in /var /usr /tmp : Pass HOSTNAME : Verify hostname setting : Pass
The Server Suite Cheat Sheet contains many adcheck tips and tricks: http://community.centrify.com/t5/Community-Tech-Bl
During operations "down" may mean:
a) lack of active directory connectivity - this is where AD sites and services and the offline cache will ensure that either we find connectivity or if there's no connectivity at all, you'll always be able to log in with cached credentials or in large environments if you've been prevalidated (this means logging in to a system without AD connectivity even if you've never logged in before).
In addition, DirectControl implements telemetry calculations. These are proactive checks against a pool of domain controllers to determine which ones are responding at what speed. The client will proactively latch to the best connected eligible DCs. The adinfo command with the --sysinfo (y) switch offers many tests.
$ adinfo -y netstate System Diagnostic ===============Network State=================== Site Map centrifyimage.vms=>PreferredSite:Demo-Site, SubnetSite:Demo-Site Domain Map centrifyimage.vms dc: dc.centrifyimage.vms gc: dc.centrifyimage.vms forest: centrifyimage.vms state: alive swept: 16 mins ago Domain Controllers dc.centrifyimage.vms (192.168.81.10) pinged: 16 mins ago state: up ping: 0.000753 secs forest: centrifyimage.vms nbhost: dc site: Demo-Site flags: WCTKLG Blocked Services: None ===============DC Statistics=================== dc.centrifyimage.vms Last Success: Fri Apr 22 06:32:58 2016 Last Failure: Mon Apr 4 16:52:37 2016 Successes: 37776 Failures: 6
Account Prevalidation HOWTO: http://community.centrify.com/t5/Community-Tech-Bl
DNS - since domain name system is so integral to AD operations, the client keeps its own DNS cache and performs sweeps against name servers.
$ adinfo -y dns System Diagnostic =======DNS Servers State========== DNS Server Used: 192.168.81.11 DNS Status: Up =======DNS Server Info======= Last Sweep: Fri Apr 22 05:46:32 2016 Fast Sweeps: 5 Deep Sweeps: 1751 Okay Sweeps: 1755 Failed Sweeps: 1 Cache Hits: 16718 Cache Misses: 42 DNS Flushes: 4
b) Integrity issues
File integrity > our files are owned by root and we provide a functionality called autoedit. Autoedit ensure that the system files we need like /etc/nsswitch.conf, /etc/krb5.conf, PAM files, etc (in other platforms the file locations and names vary) will be "automatically fixed" in case of manual override. The autoedit parameters are configurable via config file or GPOs:
dzdo grep autoedit /etc/centrifydc/centrifydc.conf Demo Password: # adclient.autoedit: true # adclient.autoedit.pwgrd: false # adclient.autoedit.nscd: false # adclient.autoedit.irs: true # adclient.autoedit.logincfg: true # adclient.autoedit.nss: true # adclient.autoedit.pam: true # adclient.autoedit.nss.netgroup.reaction: [NOTFOUND=return] # adclient.autoedit.nss.<map_name>.reaction: # adclient.krb5.autoedit: true
Process Integrity and health > we implement a parallel process called cdcwatch. The only sole purpose of this process is to make sure that the main daemon (directcontrol a.k.a adclient) is working. In the case of adclient being "down" or overwhelmed, cdcwatch will adjust threads or even spawn new processes.
$ ps -ef | grep cdcwatch root 3288 3287 0 Feb08 ? 00:00:00 cdcwatch 3287 -F -M
Performance > Centrify implement sophisticated cache systems. The idea here is to balance performance and impact to Domain Controllers. Different types of cache are implemented: Object cache, Authorization and DNS caches.
During normal operation, adclient will log events of interest to the syslog. We have an XML file that provides the definitions of all these audit trails. The idea is that you can use your Splunk, ARCSight, Loglogic, etc infrastructure to alert or trigger actions based on this.
Audit Trail Event list: https://docs.centrify.com/en/css/suite2016/AuditTr
Because it's software, in the event of a crash, core dumps will be created in well-known locations.
In this playlist I walk through many disaster scenarios involving centrify software:
I originally talked about HA in this post in my personal blog: http://centrifying.blogspot.com/2014/05/security-c
Finally, @SatishV (PM Extraordinaire) is working on a project that will make available apps for 3 of the most popular SIEM tools, this should be available very soon. Perhaps he will chime-in.
04-22-2016 09:01 AM
Mani - A good place to start is the XML document around CSS events. I would love to know if there are specific events that you are looking for and are not in the document.
Currently, the best way to know the Centrify agent health is using the Centrify utilities like Robertson mentioned.
04-24-2016 11:17 PM
Thanks Robertson and Veerapuneni for your reply.
Your inputs was helpful, but sorry if I had conveyed my requirement wrongly. Let me put it this way, I have server A which hosts cerntrify. Other servers like B, C, D, .... have adclient. I have access only to Server B, C, D etc. which are my datastage servers.
Server B $> ps -ef | grep adclient
root 1638800 2228252 2 Feb 06 - 4412:14 /usr/sbin/adclient
Server C $> ps -ef | grep adclient
Server D $> ps -ef | grep adclient
Now, my script running on Server B, will need to check if adclient is running on other servers C, D.. I can do a ssh user@serverC ps -ef | grep adclient, I get an output like above, so no issues.
Situation is if for somereasons adclient is not running on server C, then ssh user@serverC will hang, so I need trap this issue remotely. Note that I do not have root access and unless end users complain we will not know.
Thanks in advance.
04-26-2016 12:41 PM
My recommendation is that your monitoring script runs on the server itself. For example, a cronjob that runs every so often checking the status of adclient and its connectivity health with AD.
More strategically, we should look into why the agent is going down in the first place as this should not be happening. If the agent does go down, a core file will be produced in /var/centrifydc and a stack trace should tell us what happened. Further, the watchdog process should restart the service. If this is not happening, we should look into why. The Centrify team is happy to engage to help troubleshoot the problem.
Technical Director - NA East, LATAM
Found my response helpful? Click the Kudos button!