Antivirus is the bane of a SysAdmin’s existence. Well, one of many banes; notable others being the engineering department and text messages from Nagios at 2:37 AM on a Saturday.
Over at the SysAdmin Network I was intrigued by one member’s comment on a thread concerning what the best enterprise antivirus software was. That member was Isaac Bush and his intriguing comment was that he had forgone the search for the best antivirus software because he had successfully dropped antivirus for his users’ PCs three years ago.
I’m here today with Isaac to interview him concerning his anti-antivirus project.
Wesley “Nonapeptide”: Starting off, who are you and what do you do?
Isaac Bush: My name is Isaac Bush and I’m the IT manager for the Georgia O’Keeffe Museum in Santa Fe. The Museum’s IT department is quite small so in additional to the manager hat I also wear the lead sysadmin hat for our servers, network, storage, etc … so a typical small shop admin really.
Wesley “Nonapeptide”• Can you explain your workplace’s technology environment a little bit?
Isaac Bush: We have a little over 100 users and these are primarily knowledge workers. Like most companies our desktops are Windows based due to application requirements. We use Active Directory for our Windows machines and leverage AD functionality (GPO, managed software installs, etc …) to manage them. We have around a dozen SQL backed line of business apps, and an Exchange 2003Office 2007 deployment for groupware. Typical stuff really.
We have approximately 20 PC desktopslaptops and a handful of Macs. The Museum is a little unusual in that we use Microsoft Terminal Services heavily to provide desktop sessions. Unlike many TS deployments we’re not using TS exclusively for task workers or kiosks. Instead our standard desktop for all users, including knowledge workers, is a TS session and over 70% of our users have thin clients. Although we are using TS for these desktops currently, shortly we’re going to be transitioning the user base to VMware View.
Server side we’re a mix of Windows, Solaris, and Linux, heavier on Solaris and Windows. We have 43 servers by my last count; it’s a rather high server to user ratio given that we are not an ASP. Most of these servers are dedicated to a single application or provide redundancy for important services, multiple domain controllers or multiple terminal services servers as examples. The majority of our servers are VMs in a VMware vSphere cluster.
WN: What was your experience with antivirus while you were using it?
IB: When I came on board the company was using Symantec products for AV and anti-spam. Like most companies all of our Windows machines were running an AV client. The reality was that managing AV, in and of itself, did not take exorbitant amounts of time. All client deployment and definition updates were done automatically so we only needed to deal with a single management application.
The real time sink for IT staff was dealing with cleanup when AV didn’t catch something. Wipe and reload always worked, but it cost time for our users and for IT. I viewed this time cost as unfortunate, but not unusual since that was what I was used to during my whole career in IT. I had always deployed AV and I always had to reload machines due to malware from time to time.
In addition to malware clean up we also needed to manage the interaction of the AV clients with the rest of the system. For instance it was common to have to configure the AV client to exclude certain directories, or sometimes the AV client had to be disabled while we did updates to the OS or to apps. Other times we had problems with AV killing performance on machines. We had more than one user complain about their machine running very slowly and the problem was tracked down to AV. In these cases, although the problem clearly was the performance impact of the AV agent, we viewed AV as mandatory so it was just too bad for the user. We’ll buy you a faster machine next year. And of course like any other piece of software we needed to keep it patched. There have been many ugly AV security flaws and because AV inherently runs highly privileged these were patches we really needed to jump on. Hello irony! Real irony, not Alanis Morissette irony.
WN: What inspired you to ditch your antivirus? Was there a major failure of the antivirus system? Was there an “Ah ha!” moment when you realized it would work or was it just common sense that you knew all along?
IB: At every company I’ve worked with it has been standard practice to give local admin rights to the end users. This avoided a lot of problems with applications that assumed admin rights and, to be honest, we just didn’t care what the end users did with their desktops; we cared about the server side but not the desktops.
As I mentioned earlier we had ongoing problems with our desktops picking up malware despite updated AV definitions. In particular there were a set of common use PCs that were constantly picking up infections, and almost every few weeks we were doing something to those machines. These machines were used by our staff, as opposed to random guests, so these desktops were configured the same as every other desktop which included giving local admin rights. Eventually we started locking these machines down and part of that included dropping admin rights. Once we started using non-admin accounts infections dropped off dramatically.
Honestly this is something I should have been doing to begin with. I’m from a Unix background originally and of course least privilege and not running as root are core concepts there. Moreover I was always careful about privileges for application service accounts on our Windows servers. For some reason I just didn’t apply that to Windows desktops, even though NT based operating systems have a significantly more advanced and fine grained ACL system then most flavors of Unix. Just a blind spot from tradition I guess.
AV had not been able to keep those machines clean; it was dropping admin rights that did that. This experience really started to put AV in a bad light; it seemed very superfluous. After all, on our non-Windows machines we don’t run AV, we use proper security procedures. So why should Windows be any different? I did still see the value of server side AV for mail filtering. We’re filtering for spam of course, and we could deal with many viruses during that process as well. Even if malware from email would not be able to infect a machine, it would still fill the end user’s mailbox. Plus we could kill phishing emails which was very important as no desktop security model was going to stop that. For file servers we also saw the value as we have documents going back many years and from many sources and it seemed prudent to scan those periodically. It was the desktops, as opposed to the servers, where we were primarily interested in dropping AV.
WN: Did you have to present the idea to upper management? If so, did you get any pushback and have to convince them it was viable?
IB: I’m responsible for IT planning and implementation, so I didn’t need to formally present this to upper management for approval. That said, I certainty keep my management in the IT loop, and while I didn’t have any “hard” pushback there were concerns expressed. However presenting the results I had seen in the pilot group, plus the expected cost saving, went far in assuring any concerns amongst senior management.
The real pushback came from some of our IT staff that were unhappy with the idea of removing AV from the desktops and laptops. Their view was that while it didn’t seem to help all that much, it didn’t hurt either. We should use defense in depth, multiple layers of security, etc, etc … I agree with the importance of multiple layers of security, but only if a layer seems to add to the overall security posture. Based on my experiences the client side AV layer seems to be ineffective at best and is rendered pointless by a restricted desktop security model. Moreover, AV comes with a cost, a cost that can be substantial in both time and money. Therefore, we moved ahead with dropping client side AV.
WN: How did you prepare to do this changeover? What were the challenges that you ran into?
IB: The main issue wasn’t so much that we would be dropping client side AV, but rather that we would be removing admin rights. The principal problem this caused was that a number of our applications would not work correctly without administrative rights. In every case the problem was tracked down to applications wanting to write to areas of the registry or file system that are read only to non-admins. So, we needed to loosen permissions enough for these apps to run, but not loosen them enough to render the security model pointless. Thanks to the various sysinternals tools we were able to identify all the places in the file system and registry where these applications were needed additional access. Once we had that information we setup a GPO to alter the ACLs on the particular files and registry entries in question. Later we filtered this down with groups so that these changes would only be made to certain computers and users.
WN: How long did it take to fully implement this?
IB: I’d estimate roughly a month to 2 months after I decided to proceed and the majority of that time was spent testing things. The actual implementation only took an hour, if that. We created a few GPOs to adjust the file system and registry permissions and then ran a script after hours to remove domain users from the local admin group. One reboot later and it was done. This entire project could have been completed much faster if we hadn’t had other projects going on concurrently.
WN: What was the user response both while it was happening and after it was done? Were they annoyed that they’d lose admin privileges?
IB: For the most part the users were not really aware of the change in their privilege level as their normal work did not require admin rights. IT already handled maintenance of these machines so it was not as if they were used to running their own updates, or otherwise were responsible for administrating their own machines. We didn’t make an announcement to the effect that we were removing admin rights as, to be honest, it would have sounded negative. Instead, we stated that IT would perform any installs of hardware of software in the future. This was the policy anyway, so really it was more like a reminder. Later we did have some complaints from people that were used to installing whatever they felt like. In many of these cases we didn’t want the software installed at all, iTunes as an example, while in other cases we installed the software, and made sure that the software was part of our standard install in the future.
The most challenging group of users was actually IT. The IT department had a very bad habit of having their accounts be domain administrators. To address that serious problem we created new domain administrator accounts and dropped admin rights from our existing accounts. Unlike the rest of the users we needed admin rights in our day to day jobs so this was something of a hassle as it meant using runas all the time which in turn made it tempting to just use the new admin accounts for everything. What IT staff ended up doing was logging into a management server over RDP with their admin account and leaving the session open, and that made it a lot easier to use the two accounts simultaneously.
WN: How do you handle software deployment and updates? I’m sure users want certain software titles now and then. Also, Adobe Reader gets patched roughly every 4 and a half hours.
IB: We had viewed AV as covering us until patches could be applied. Now that we were dropping AV we would have to be very aggressive about getting patches out. We have clients configured to install patches from WSUS quickly so getting Microsoft patches out was very straightforward. A bigger issue was dealing with possible application problems caused by patch/app incompatibilities. This wasn’t really an issue with Microsoft products, but a few of our 3rd party apps would sometimes break after patching; one vendor in particular has Q&A “issues” and their apps are “fragile” to put it mildly.
In order to address this we developed a structured testing methodology. Ideally we would have run an automated test suite of some sort, but that’s really beyond us as a smaller outfit. Instead we have testing VMs with different application loads that mirror the different configs we have deployed. We’ve developed a checklist of tests to run to verify that our apps are functioning correctly after patches. Due to Microsoft’s patch Tuesday policy we’ve been able to streamline this according to a regular monthly schedule. Typically we’ll integrate 3red party vendor patches at that same time.
Patch priority really depends on the severity of the issue. Many patches are addressing issues that are not widely seen out in the wild at this time or relate to software that’s not publicly accessible and in those cases we scheduled them into the monthly patch install. Others are far more critical and we’ll push them through as soon as possible, ideally the same day as the patch being released or the day after. It all depends.
WN: How do you handle threats born from removable devices?
IB: Malware spread from removable devices is, essentially, little different from malware spread through other vectors. Assuming up to date patches, the worst case is that malware will be able to infect the user’s profile. In every case we have seen malware limited to infecting the profile as it lacks admin rights, again assuming everything is patched. However the truth of the matter is that once a machine has been infected, even if it is limited to the user’s profile, it cannot be fully trusted again. Therefore, we always reload the OS, and in order to follow that policy we have had to streamline our installation processes to minimize downtime.
The main issue that is specific to removable devices is tracking down the removable device in question, and that’s an issue we’d be dealing with regardless of whether we had AV on the desktops or not. Incidentally, the fact that most of our people are using TS sessions has minimized these issues; you’re not going to be plugging music players into your thin client.
WN: Are there exception users in the environment?
IB: Not users, but rather certain machines. A handful of our desktops are used for working with unusual hardware that requires admin rights for the interfacing software. In these cases we use “runas” to use those programs under a local admin account. It’s not the best solution and it wouldn’t be viable for every case, but it’s been sufficient for these few machines.
WN: Do you use client firewalls?
IB: Yes for our laptops, but no for the desktops. We’ve had a number of issues with remote management when the firewall is enabled. I see the real benefits of client side firewalls, and I’ve never been too thrilled that we don’t use them, but it just seemed to be the pragmatic solution. We have perimeter firewalls of course, and that has been viewed as our primary network defense. I’m well aware of the flaws in that model, that it makes our network hard on the outside but soft on the inside. Like candy! Hacker candy. But that is more of a conceptual problem, whereas issues relating to client side firewalls blocking required ports are real immediate problems. Of course you could simply punch holes in the client firewall, but if you open up all the important ports then how much value does the firewall even have?
Recently however I’ve been talking with a colleague at another company where they keep client side firewall on and open up whatever ports they need but only for connections from an internal management subnet. I like that idea a lot so we’ll be rolling that out later this year. It gives us the advantages of client side firewalls, but doesn’t turn the firewall into network Swiss cheese. I’m sure everyone else is doing something like that but I’m not always the quickest on the uptake.
WN: Have you heard of anyone else doing this?
IB: I’m not aware of anyone else that has actually removed client side AV like us, however I’m aware of many companies that do not give admin rights to end users. Microsoft is clearly pushing that model with their newer operating systems so I’m sure it’s only a matter of time before this is standard procedure at every company.
WN: Do you have visitors that come on the corporate network or do you have a sandbox network? How do you protect your clients from them?
IB: We routinely have visitors coming onto a network and, as a rule, we sandbox them. However, we don’t have any real controls on it because it just comes down to what VLAN the port is configured for. It’s hardly a hardened or robust environment, especially given our lack of client firewalls. We’re investigating what will be involved in implementing 802.1x , and it is virtually certain we will be using 802.1x in the future as it solves a number of problems we have. Ideally we would pair it with a NAC implementation.
WN: If you could do it over again, what would you do differently?
IB: We got caught by those hardware edge cases I mentioned, and we also ran into problems with certain low end multifunction printers. We didn’t catch these before we made changes and that caused disruptions for the users during the transition. So instead of just testing our common apps, I’d have tested this with our peripherals as well.
WN: What advice would you offer to anyone considering embarking on this same journey?
IB: We’re talking about removing AV here but I want to strongly emphasize that this wasn’t about removing client side AV and then simply leaving it at that. What this came down to was that we realized a proper desktop security model was more effective then client side AV and made the AV client unnecessary.
If you were are similar situation and wanted to move in this direction, I’d start by evaluating the end user environment first. The user base in some orgs not only has admin rights, they use those rights extensively. Can this be feasibly changed? Then the other principal consideration is your IT staff. Fast patching is absolutely vital, so the question is how busy are your people and how disciplined is your department. Of course patching is absolutely mandatory anyway, but I feel it’s more so in this arrangement. I’d also make sure that you have all your processes lined up beforehand. If wiping machines is SOP, then how fast can you get the machine back to its pre-infection state? Is it a matter of a quick scripted install or image load? Or do you need to break out a bunch of disks? Will the user need to recreate personal settings and preferences in their profile or will the profile just pull down from a server? How prepared are you for app testing? Do you have a testing environment already setup? You really don’t want to be dealing with things once you’ve already changed the install base.
There you have it. One man’s fight against antivirus software ended with him as the victor. Have you been fighting antivirus? You may want to consider Isaac’s methods for your environment. Do you have a similar story of ditching antivirus for good? I’d love to hear it.