I have an appliance that performs email spam and virus scanning and includes a quarantine system for people to check and release messages that were wrongfully flagged as spam. It’s been merely an okay appliance so far, but I am especially frustrated by a few of the design decisions for the overall application.
In one case, there is a report dashboard that shows delegated administrators of an email domain what their incoming, outgoing, unscanned, and quarantined queues look like. The incoming, outgoing, and unscanned queues are hard coded to have their statistics and current state updated on the dashboard every ten minutes. The quarantined queue requires manual intervention on the part of the universal appliance administrator to recalculate.
What’s the rationale from the appliance maker? Apparently the quarantine queue is often the largest one out of the four, and on some appliance there are a million or more messages in the queue. Calculating the statistics and state of the quarantine queue affects every domain on the appliance at the same time; you can’t recalculate the statistics for individual domains. Recalculating that many messages will have a significant detrimental impact on the appliance.
Does the vendor assume that maybe the appliance administrator knows his own domains’ characteristics and usage patterns? Of course not! Does the vendor give the appliance owner the ability to change the interval that statistics are calculated? Nope. How about giving me the option of turning on and off automatic quarantine queue recalculations in the case of those who don’t have a million messages a day? Nope, vendor knows best. How about scoping the queue recalculation down to specific domains rather than appliance-wide? Apparently that would be HARRRRRRDD!
No, the most obvious solution is to hard code it all so that I, as the appliance super admin, have no other option but to log in, click a button in the appliance’s UI, affect all the domains and sub-administrators, without any other options for the process. Thrilling.
Oh… wait! I could always use the appliance’s shell access to script out a solution, right?! Let’s explore that option for a moment.
Real Appliances Have Shells
This is not a real appliance.
Wget allows you to have command line HTTP conversations in exactly the same way that your browser does with any website. This includes posting form data, e.g. usernames and passwords, as well as handling cookies.
In the case of my email appliance, I have to do two simple actions. I first have to log in to the appliance. Then, once I’ve logged in and saved my authenticated session, I have to request a simple URL that causes the recalculation of the quarantine statistics. Easy, right? Right.
First, let’s figure out how to log into the appliance.
Getting Through the Gate
In my case, the login page is at
https://stupidemailappliance.com/index.html. However, I needed to view the page source to find out where the login form POSTs to. The POST target is the
/checklogin.phtml document. The username element of the form is named
uname and the password element of the form is named
passwd. Also make note that the URL is HTTPS, which means we need OpenSSL on the server that this script runs on. Furthermore we need to make sure that the certificate presented to us from the appliance is trusted on the server that the wget script runs from.
If it’s not trusted, we can make the dubious choice of running wget with the
--no-check-certificate option. A better choice would be to figure out why the certificate is untrusted and fix that. If it’s a self-signed certificate, or perhaps a certificate with an untrusted intermediary certificate (like what happened to me in the post Solving wget “ERROR: cannot verify site certificate. Unable to locally verify the issuer’s authority.”) then we can either ammend our main
cert.pem file that OpenSSL looks to or make a separate file with the certificates that need to be trusted and use the
--ca-certificate=file option in wget.
So what does all of the above tell us? Four things: The URL that we POST to is
https://stupidemailappliance.com/checklogin.phtml, we have to POST our username to the
uname element and the password to the
passwd element, and in my case I have a RapidSSL certificate that has an untrusted intermediary certificate. As a result I need to point to a specific
.pem file on the server invoking wget in order to avoid errors. The command is looking like this so far:
wget --post-data='uname=admin&passwd=stupidemailappliance' --ca-certificate=/etc/pki/tls/rapidsslcabundle.pem https://stupidemailappliance.com/checklogin.phtml
Staying in the Courtyard
Great, we got through the gate. However, the next wget command we issue will have no idea that we’ve logged in before. The appliance will completely reject our advances and toss us off the property.
To remind a website that you’ve logged in previously, you’ll need to save any cookies that you receive. Wget offers the option to
--save-cookies to a local file that you can later consume by another invocation of wget. In my case, the appliance uses session cookies rather than persistent cookies. No problem! We can use the
--keep-session-cookies option to save the cookie that’s normally kept purely in memory.
So now the command to get in the gate and stay there is looking like this:
wget --save-cookies mycookie.txt --keep-session-cookies --post-data='uname=admin&passwd=stupidemailappliance' --ca-certificate=/etc/pki/tls/rapidsslcabundle.pem https://stupidemailappliance.com/checklogin.phtml
Charging the Castle!
Okay, we’re in and we’re staying in. Now what do we do? I know the specific URL that the appliance requires me to GET to update the statistics in question. But first I have to snarf up the session cookie that was saved using the
--load-cookies option. The second command is looking fairly simple:
wget --load-cookies mycookie.txt --ca-certificate=/etc/pki/tls/rapidsslcabundle.pem 'https://stupidemailappliance.com/moardumb.phtml?section=stupidqueuestatus&rr=1'
Notice that I had to put the URL in single quotes because there’s an ampersand in the URL. Without the quotes, the ampersand makes your shell think you want to fork all of the command prior to the & sign.
Clean up Your Mess
Of course, running the above commands will download the page that you’re POSTing or GETing. I’m not really interested in that, so I’ll add the
--delete-after option to each command so that all downloaded files are removed after each command is run.
Writing Your Own History
You do remember to log everything, right? I prefer to have a history of all my automation, so we simply add
--append-output=logfile to each of the above commands and we’re good. Future problems can now be troubleshot with more information to help lead us to the root of whatever the problem might be.
Enjoying the Spoils of War
In this case, the spoils are that I get better behavior from the appliance, all automated and repeatable, and have the satisfaction of sticking it to the vendor manufacturer. It’s as simple as putting the wget commands in a cron job at the interval I want.
Have any additional tips? Any cool uses of wget to automate website interactions? See something I missed? Let me know in the comments below or tweet them at me @Nonapeptide.