Wednesday, February 23, 2011

Cisco WLC Radius Fallback

Many of you probably haven't been here before, but as I take up studying again this time for the CWSP I figured I could use my blog as a place to regurgitate(and probably be corrected more than once) what I am attempting to learn.  So in my first attempt to actually utilize this little corner of the internet that I call  my own I will cover some testing I recently did at work with Cisco's 7.0 code and RADIUS fallback.


I recently upgraded all my distribution center locations to the WLC 7.0.98.0 code from 4.2  This upgrade was a long time coming and I lucky for me there are many feature enhancements to play with, the biggest being Clean Air support.  I haven't had the chance to roll out Clean Air APs to all my sites yet, but there are a few other enhancements that I will be taking advantage of in this code release, one being RADIUS fallback.  Before I go into my testing, let me explain a little bit about this feature and the need for it in my environment.

When using EAP authentication on your SSID, up to 3 RADIUS servers can be defined for client authentication.  What the WLC had done the past when your primary RADIUS server stopped responding to authentication requests was roll through the list of servers in a looping fashion.  So if server 1 died, then it would move onto server 2 and stay there.  If server 2 went out then the WLC would move onto server 3 until that one stopped responding.  Finally if server 3 died the WLC would then move back to server 1 and continue to roll through the servers until it found one that would respond.  Overall I don't think the logic of this is terrible, but not the best considering there is a reason I have them defined in that order and would prefer it to roll back to a higher priority server when at all possible.

In my situation I have multiple RADIUS servers, with the primary being local and the remaining two being at remote data centers.  I'd really prefer that all my authentication requests are send to the local server to avoid unnecessarily sending traffic across the WAN.  Again in the 4.2 version of code if the primary server stops responding due to something like server patching, it will just move on to the next server in the order and never look back.  I am now stuck sending all my authentication requests across the WAN, which was what I was trying to avoid by having local RADIUS in the first place!

So what has changed in the 7.0 release?  Well Cisco has created what they call RADIUS fallback, which when enabled (it is disabled by default) tells the WLC check to see if the primary RADIUS server is active and if it can start using it again for authentication requests.  Not only did Cisco add this feature but they created two methods of checking!(Yahtzee!)

The first method is what is called passive, meaning after the controller has reached the predefined time limit (default is 300 seconds) it will send a real client authentication request to the primary RADIUS server to see if it responds.  If it does, it's all good and makes it the the active server once again.  If not it, the client auth will time out and it will have to resend it's request again.

The second method is called active.  In this scenario you must create a fake username, I used WLAN-fallback so the RADIUS admin can easily filter it out as failed auth and know where it's coming from.  Again there is a predefined timer, like the passive method, and after it expires it will send the fake username to see if the RADIUS server will respond.  It doesn't matter if the servers sends a reject.  And it should send a reject because the account being used is fake and just a username with no password.  All the WLC wants to know is if it's primary RADIUS server is up by getting a response back.

In my testing I used the active method and set my timer to 300seconds.  Once the WLC failed over to the secondary RADIUS server, I waited a few minutes for the primary to receive the dummy user account auth request from the WLC.

Below is the output log from the RADIUS server in use.  Disregard the failed auths as the primary purpose of my testing was not to ensure successful auth's but rather ensure my WLC would actually reactive my primary RADIUS server.




02/17/2011 15:09:46 User dhc\test1 ultimately failed challenge sequence 
            Last request from the WLC to primary 
02/17/2011 15:17:51 Unable to find user WLAN-Fallback with matching password 
           WLC actively checking the status
02/17/2011 15:21:56 User dhc\test1 ultimately failed challenge sequence
           WLC activating the primary RADIUS server and sending user auth requests again

Ultimately the testing meet my requirements and i will be rolling this out in the near future to ensure my WLCs stay use the local RADIUS server at my sites when at all possible.

Below are CLI outputs and GUI screen shots of the setup of RADIUS fallback.


CLI active setup
(Cisco Controller) >config radius fallback-test mode active 
(Cisco Controller) >config radius fallback-test username WLAN-fallback
(Cisco Controller) >config radius fallback-test interval 300

CLI passive setup
(Cisco Controller) >config radius fallback-test mode passive
(Cisco Controller) >config radius fallback-test interval 300

GUI active setup

GUI passive setup












You can find the Cisco documentation here: RADIUS Fallback Documentation




Travis