Friday, April 8, 2011

WGB acting up? Clean it up!

It's Friday and it's suppose to be the day to ease into the weekend.  Well I'm doing that by fixing problems with WGBs.  I  was pinged bright and early to take a look at a Cisco WGB that was having issues maintaining a stable connection to it's parent AP. 
Upon further inspection I noticed the WGB was roaming between several APs causing the device connected to it's Ethernet port to have a few issues.  The funny part is, is that this application controls the main gate to the facility, if the application's session is breaking then people aren't able to enter the facility to work

My first reaction was to use a command that I have used in the past to stabilize WGBs connections and keeping them from roaming.  On a Cisco Autonomous AP there is a command that can be applied to the radio interface that specifies the number allowed attempts the AP makes for sending a packet before giving up.  Optionally there is a command that can added that tells the AP to drop the packet rather then seek a new association when the max number is reached.
this command is as follows:

#conf t
(config)#interface dot11 0
(config-if)#packet retries 128 drop-packet

You can verify you command by doing a show run on the dot11 radio interface you applied the command to.  In my case it was the dot11 radio 0.

#sh running-config | begin interface Dot11
interface Dot11Radio0
 no ip address
 no ip route-cache
 !
 encryption mode ciphers aes-ccm
 !
 ssid lab
 !
 speed  basic-6.0 9.0 12.0 18.0 24.0 36.0 48.0 54.0
 packet retries 128 drop-packet
 station-role workgroup-bridge
 bridge-group 1
 bridge-group 1 subscriber-loop-control
 bridge-group 1 block-unknown-source
 no bridge-group 1 source-learning
 no bridge-group 1 unicast-flooding
 bridge-group 1 spanning-disabled

You have now successfully to the WGB to retry sending the data packet(frame) 128 times and when that value is reached, send it to the bit bucket rather than attempting to find a different AP.  Keep in mind this really doesn't address the root cause of the issue, the number of packet retries, but it help mitigate the effects.  To really fix that I would likely need to travel onsite and do some additional surveys and antenna adjustment.

There is another option that was pointed out to me that allows the WGB to be configured with parent APs.  This is not an option I have used or tested, so I plan to mock this up in my lab and write a short post about how to configure it.




Wednesday, February 23, 2011

Cisco WLC Radius Fallback

Many of you probably haven't been here before, but as I take up studying again this time for the CWSP I figured I could use my blog as a place to regurgitate(and probably be corrected more than once) what I am attempting to learn.  So in my first attempt to actually utilize this little corner of the internet that I call  my own I will cover some testing I recently did at work with Cisco's 7.0 code and RADIUS fallback.


I recently upgraded all my distribution center locations to the WLC 7.0.98.0 code from 4.2  This upgrade was a long time coming and I lucky for me there are many feature enhancements to play with, the biggest being Clean Air support.  I haven't had the chance to roll out Clean Air APs to all my sites yet, but there are a few other enhancements that I will be taking advantage of in this code release, one being RADIUS fallback.  Before I go into my testing, let me explain a little bit about this feature and the need for it in my environment.

When using EAP authentication on your SSID, up to 3 RADIUS servers can be defined for client authentication.  What the WLC had done the past when your primary RADIUS server stopped responding to authentication requests was roll through the list of servers in a looping fashion.  So if server 1 died, then it would move onto server 2 and stay there.  If server 2 went out then the WLC would move onto server 3 until that one stopped responding.  Finally if server 3 died the WLC would then move back to server 1 and continue to roll through the servers until it found one that would respond.  Overall I don't think the logic of this is terrible, but not the best considering there is a reason I have them defined in that order and would prefer it to roll back to a higher priority server when at all possible.

In my situation I have multiple RADIUS servers, with the primary being local and the remaining two being at remote data centers.  I'd really prefer that all my authentication requests are send to the local server to avoid unnecessarily sending traffic across the WAN.  Again in the 4.2 version of code if the primary server stops responding due to something like server patching, it will just move on to the next server in the order and never look back.  I am now stuck sending all my authentication requests across the WAN, which was what I was trying to avoid by having local RADIUS in the first place!

So what has changed in the 7.0 release?  Well Cisco has created what they call RADIUS fallback, which when enabled (it is disabled by default) tells the WLC check to see if the primary RADIUS server is active and if it can start using it again for authentication requests.  Not only did Cisco add this feature but they created two methods of checking!(Yahtzee!)

The first method is what is called passive, meaning after the controller has reached the predefined time limit (default is 300 seconds) it will send a real client authentication request to the primary RADIUS server to see if it responds.  If it does, it's all good and makes it the the active server once again.  If not it, the client auth will time out and it will have to resend it's request again.

The second method is called active.  In this scenario you must create a fake username, I used WLAN-fallback so the RADIUS admin can easily filter it out as failed auth and know where it's coming from.  Again there is a predefined timer, like the passive method, and after it expires it will send the fake username to see if the RADIUS server will respond.  It doesn't matter if the servers sends a reject.  And it should send a reject because the account being used is fake and just a username with no password.  All the WLC wants to know is if it's primary RADIUS server is up by getting a response back.

In my testing I used the active method and set my timer to 300seconds.  Once the WLC failed over to the secondary RADIUS server, I waited a few minutes for the primary to receive the dummy user account auth request from the WLC.

Below is the output log from the RADIUS server in use.  Disregard the failed auths as the primary purpose of my testing was not to ensure successful auth's but rather ensure my WLC would actually reactive my primary RADIUS server.




02/17/2011 15:09:46 User dhc\test1 ultimately failed challenge sequence 
            Last request from the WLC to primary 
02/17/2011 15:17:51 Unable to find user WLAN-Fallback with matching password 
           WLC actively checking the status
02/17/2011 15:21:56 User dhc\test1 ultimately failed challenge sequence
           WLC activating the primary RADIUS server and sending user auth requests again

Ultimately the testing meet my requirements and i will be rolling this out in the near future to ensure my WLCs stay use the local RADIUS server at my sites when at all possible.

Below are CLI outputs and GUI screen shots of the setup of RADIUS fallback.


CLI active setup
(Cisco Controller) >config radius fallback-test mode active 
(Cisco Controller) >config radius fallback-test username WLAN-fallback
(Cisco Controller) >config radius fallback-test interval 300

CLI passive setup
(Cisco Controller) >config radius fallback-test mode passive
(Cisco Controller) >config radius fallback-test interval 300

GUI active setup

GUI passive setup












You can find the Cisco documentation here: RADIUS Fallback Documentation




Travis