Friday, October 30, 2009

OCS XMPP TLS Handshake Error FIX

Not your fault. Probably not even your certificates fault. Google has a bug. Essentially the bug is that anyone who ever created an account with Google Apps under their corporate email address using your domain in the near past prevents you from creating an XMPP channel with Google until they flip a bit. Mine was flipped after a lot of people got involved to 1) diagnose the issue and 2) figure out why Google wouldn't let us communicate with the XMPP gateway. The symptom is a TLS error on your Edge server to your OCS XMPP server. This is just a symptom of the above problem. Newer Google Apps accounts do not have this issue, but I found that we had four employees who had signed up using their work email to Google Apps in 2008. Google is going through all these domains and fixing the bug. Mine got fixed faster because I had someone on the Microsoft side pushing for me. To find out if you have users who have registered using your domain, guess what you have to do? Sign up using your domain account to Google Apps. You can then see "users". I only had four users who had signed up, but one was enough. I've since deleted their accounts after taking Admin control over our domain with Google Apps. Now if they want to use Google Apps, they go through me. In addition, as soon as Google flipped the bit, we were on. Full on presence and chat with Gmail users with our OCS R2 clients. Hope you find this and hope it helps.

Tuesday, October 27, 2009

The Nightmare before OCS R1 Cutover

High Level OCS R1 to OCS R2 migration path

Build R2 Server Infrastructure
I was able to do this in tandem with my R1 infrastructure with additional hardware and VM's - yes, I realize I am lucky that way.

I built every role for the R2 pool, moved myself to the pool, upgraded my client and played for weeks in the environment before I felt comfortable moving anyone else. I moved my team, my boss, and then a few poor souls who wouldn't mind getting kicked off during troubleshooting.

I had the most issues with CWA R2 running on 2K8 and getting ISA set up correctly for the web access, there are some posts in this blog on that. I even got an iPhone application working with CWA R2 although I can't recommend it (if you want to know more, contact me through this blog).

Move Users to R2 Pool
When I was ready to bite the bullet, and after having both environments up for a while validating client compatibility testing (R1 client > R2 pool etc), I moved every user to the R2 Pool a few weeks ago without any issues. I did not change the SRV records as I continued to use the R1 Edge servers. Essentially, the R1 front end server was acting like a director without officially being one.

Migrate to R2 Edge Services (move names, IP’s, dns, certs so all has same name/resolution).
I moved from two dedicated R1 Edge servers, one running the Access Proxy and Web Conferencing Role, and one running the A/V R1 Edge Role to an R2 Consolidated Edge.

Since we use Federation and PIC, I essentially had to cut over from R1 Edge to R2 edge. This was a huge pain in the rear Microsoft -thanks for this horrible migration “path”.

For the A/V edge, since I didn't use NAT in R1 but I am doing so for R2, I am used a new name, IP, DNS, Certificate etc for that role. This is an ok strategy and probably would work fine for web conf role also. It’s the provisioning that is in place for the Access Proxy that really requires you to move the names/certs over. Now that AOL and MSN no longer require a license, perhaps when Yahoo gets on board this will no longer be an issue for Wave14 migrations.

At the time of the edge cutover I changed out all the SRV records for each of my SIP domains to point to the new R2 Front End server.

Note: I had already cut over my ISA server rules to reflect R2 reverse proxy and CWA R2 rules pointing to the new R2 web access role. My R1 ISA rules were removed when I deactivated the R1 web access server.

Deactivate and decommission R1 servers/pool
The migration guide wants you to do this during migration - how about no, not a great idea. Once you deactivate, all your R1 bits are gone from AD. There is no way I am not having a fall back plan. Microsoft will just have to live with this.

The good news is after the successful Edge Cutover, once all roles were fully on R2 and everything was validated, I shut down my remaining R1 servers overnight to see if R2 complained. There were no issues. Deactivating took place on a weekend, every R1 role removed cleanly and R2 never once blinked an eye. Was I lucky? Probably.

In Process: Upgrade Client end points to new client/addons (this is in process and I will be HAPPY to share whatever I get out of this as I have had zero luck finding someone out there in the same situation).

This is probably the biggest pain point as it touches every client in the realm over slow links and remote clients around the globe. I will be utilizing our existing SCCM infrastructure and will need to uninstall all the existing software. To do this cleanly, it involves closing outlook, closing communicator, uninstalling the modules (in this case Outlook Conf Add-In, Office Communicator 2007, OC 2007 MUI, and the Live Meeting Console). Then reinstalling the 3 upgraded modules and relevant hotfixes. Plus I have a registry key I'll be throwing in for the Live Meeting portal (this gives me single sign on for Live Meeting Service).

We might also get the Group Chat client installed as well.

In addition, any users out there (who knows who they are?) that have multiple endpoints MUST be upgraded to R2 at the same time. Otherwise, the Communicator client freaks and basically breaks if an R1 client and an R2 client for the same user attempt to sign in. This is really going to help me find all those users with the COMO client manually installed years ago when we went to R1.

XMPP Gateway to Hell?

TLS Handshake errors your enemy? You are not alone.

I’ve tried both public and internal certs. I have a ticket open with MS - going on two weeks, and I'm on my fourth engineer, now at the senior level. I've requested certs from command line, IIS, OCS front end wizard, imported them in etc without success. The Edge server and XMPP server can telnet to each other over port 5061, I can telnet externally to my FQDN of the XMPP gateway over 5269. It looks perfect - yet won't work. If I receive a fix from the latest engineer I will post here. The only thing I haven't tried is to rebuild on Server 2003 instead of 2008 because I do not want to go backwards. If you inadvertently or for testing assign a certificate to the XMPP configuration and want to remove it, you need to uninstall XMPP and reinstall.