Today I’ll be live blogging the May 2013 CentOS Dojo event held in Scottsdale Arizona. I can’t find a specific itenerary for what talks will be given when, but this isn’t a multiple track event, so we all get what they give, and it looks like some cool talks will be given. The event starts at 9:30AM Arizona time (currently the same time as Los Angeles). Refresh this page for updates!
Talk #1: Pinterest and scaling
Older code at Pinterest is Python but the newer stuff is in the Java stack. Use Ostrich for Java processes. Python is using Sentry, StatsD, distributed tracing system is an internal tool called Kafka.
Application metrics are in Varnish / MySQL / Nginx / Redis. There are some tools like Percona’s toolkit.
If Pinterest is creating codes and needs metrics, they don’t like gauges. Gauges are a “right now” view. That’s not as useful because you can miss a lot if there is a giant spike between collection periods. They use “counters”, and while you might not know what happened between two sample periods, but you absolutely know that something happened and how much happened.
The nice thing about a counter is that if there’s a collection period downtime, counters in your code are always putting up numbers.
They choose to measure requests per second to get ideas of which systems are unbalanced. Read vs write requests, and of course errors. They measure latency, but also payload size. They saw latency issues in memcache, but they noticed that payload was huge which explained the latency.
- Ganglia is used for Cluster Metrics and Host Metrics
- OpenTSDB for high cardinality and alerting data (Some metrics are really unique and ephemeral, like memcache slabs and EC2 autoscaling.
- Kafka for event data
- Graphite for Application Metrics. Simple incremental monitoring for new applications as they’re made.
They use cgroups that allow them to run collectors in isolation. Kernel processes are separated from user space processes and user code is protected from kernel code. So there’s a cgroup for monitoring that has constrained resources. That way monitoring doesn’t take down the entire system. In the worst case scenario of bad monitoring code, then the monitoring gets killed with the OOM controller. Not the core services that you’re offering.
They use Ganglia because it requires the least amount of touch. It’s set up and they’ve had very few problems with it. They use RRDcache to buffer the stats to disk and don’t use solid states. They don’t use EBS because they don’t want to. They use local ephemeral storage.
Graphite and StatsD are used. The talk on this was very vast and over my head. Sorry.
OpenTSDB. They have 60 billion points on a 10 node cluster. They keep network performance data, memcacheD ketama ring and slab stats, HBase dynamic metrics.
Their graphite front end is written in Flask. Other tools like D3 and rickshaw for UI elements. JSON is good. Data sources need to speak JSON to their front ends (or so people say).
They use R for really complex, weird data queries. If you have data of any kind, you can look at it with R in esoteric ways, it’s just a little more difficult. Pinterest has a data science team that handles it.
Talk 2: Sensu as a monitoring system
Joe Miller (www.getpantheon.com) speaking. Sensu is a monitoring tool / framework. They do PaaS with Drupal.
Sensu has been called “the monitoring router.” It’s really a framework for building a monitoring tool. It’s Ruby 2.0 based, RabbitMQ for messaging, and Redis for basic amount of state storage. JSON is everywhere. You can simply re-use Nagios plugins. It is packaged with all the gems and Ruby packages that are needed.
Leverage your existing config management system to attach checks to your infrastructure. They tend to focus on Chef. There is a Puppet project.
The GUIs for Sensu are slim. There are two options. The Sensu-dashboard (which is included in the omnibus package that has everything in it). It has no user authentication and is stateless. No roles. The other option is Sensu-admin which is a separate package that has users, roles, scheduled downtimes, etc. It’s a Rails app.
Sensu runs on Checks, Events, and Handlers. Checks create events and are fed into handlers. Handlers are passed data simply through STDIN. There are handlers available to send information to logstache, graylog, and many more. It’s all in the community. Another way to diagram how Sensu works is that there is a Sensu-Server on one end, and Sensu-Clients on the other end with rabbitMQ in the middle. If you have a client, a node that needs to be monitored, it gets subscribed to a type, like webserver. It is monitored however webservers are defined. Messages passed from client to server, and then the server can fire off actions like alerts.
You don’t have to discover new systems, you don’t have to add it to any other monitoring system, it gooks into config management. The config files specific to Sensu are all JSON.
Some examples: Because every sensu client heartbeats via rabbit to the server, if it disappears for 180 seconds it generates an event that is sent to the default handler. However, for one large company, it will run a decommission script to remove that server from configuration management tools, and other systems that keep track of the system. Obviously, that’s intended for large cloud instances with lots of ethereal, come-and-go, servers.
18 month old project. Seems very malleable and able to morph with rapidly changing environments, or rather, make a static and rigid environment more flexible with the help of config management.
Linux Enablement for Calxeda ARM Servers by Rob Herring. He works for Calxeda, an ARM processor licesnsee that makes server deployments. They are making a “EnergyCore” arch for processors, quad core Cortex based ARM processors, with a fifth core dedicated to the IPMI, OOB management, etc. It includes a Fabric Switch on board with 10Gbit links, and of course an IO controller is included. It’s all the functionality of a server on one SOC. No South Bridge, North Bridge, etc.
The amazing part is that they are making four-node cards, so four SOCs, that are baked into one card that is about the size of a PCI card. The system board is essentially passive to connect all the nodes together. The first generation processor is called the ECX-1000, AKA “High Bank”, and it has a 1.4GHz quad core CortexA9. The next level up is the ECX-2000 CortexA15 that has virtualization support, KVM and Xen.
There are some example systems that are on the market. There are already example systems that are using massive amounts of these cards, including systems from HP and Boston Servers.
The Linux kernel has increasing ARM support, and major distros are including support for it, but that might be the next hurdle. The hardware is useful and can be used for low power, highly distributed applications, but the distribution support might need to come up more and more. UEFI, ACPI, and LPAE need to grow on Linux for ARM. A demonstration included running OpenStack on a Calxeda box and spinning up a new VM.
Mike McLane – Performance tuning as a web hosting provider. Works on the Genesis Team that is in GoDaddy to performance optimize and tune the services. GoDaddy started as web nodes attached to NAS. Then went to multiple web nodes attached to NASs with a hardware load balancer in front.
They use nodes with lots of RAM and local disk space, but relatively conservative CPUS. CentOS 6.current is used. Apache 2.2.current but moving to Apache 2.4.current. All customer content is served via NAS (NFS) over local storage. GD is not using RHEL / CentOS 3 as is rumored. It’s 6.current.
They run mod_fcgid for FastCGI. mod_fastcgi was not ideal for scalable web hosting. They do run mod_rewrite, which is to dispel a myth running on the internet concerning GoDaddy not using mod_rewrite.
The use NAS because it was comparable to DAS / local storage for user experience so the didn’t change anything.
GD decided to take a close look at performance but they didn’t have quite the amount of data that they would need to make deep, scientific analysis of the large systems. They did queries for the top fastest loading pages, top slowest, 95th percentile, fastest 250 pages, etc. all measured across all customers, but it ended up being all over the place and just a bunch of numbers. Very hard to make hard conclusions from it.
The problem was later found to be PHP serialization that if two requests came in at the exact same time, two processes would be spun up but the requests would be processed one after the other. It turns out there was a one second sleep in mod_fcgid that would be triggered to keep the box from being thrashed under certain circumstances. mod_fscgid author was notified and he made a nice patch, but apparently it’s not in the stable branch right at the moment.
That was a good first major win for tuning the GD stack. Then they implemented graphite to gauge the performance of the machines across the board. They sometimes use Apache bench to dump the stats to a CSV and then plot it out. Good ol’ iperf for bandwidth measurement.
It’s important to test performance form cold, warm and hot starts if you have different levels of cache. Cache can skew things.
They noticed that certain Xeon processors had problems with cstates. Even if power savings mode was disabled in the BIOS, some CentOS kernels would have stability and performance issues surrounding power state options.
Talk #5: Logging Love w/ Rashid Khan Elasticsearch Developer
Worked at a newspaper, they logged direct to disk, he then created Kibana. If you store locally, you have to ssh into each server, or use a really bad for loop for ssh. Then you get smarter and do a central syslog server. But then you end up making a massive grep/sed/awk spaghetti script. Log search scripts would break as things changed.
Then he started looking at commercial log analyzers, but they were punishing and hard. Enter: Logstash and Elasticsearch. Logstash is like an event pipe that works over the network. It takes things in with various inputs, amqp, eventlog, exec, irc, tcp, redis, twitter, udp, xmpp, etc. and etc.
Log files suck. Dates in particular. So many different types of date possibilities. Logstash can take in different date inputs and normalize. Logstash can use filters like grep, geoip, grep, metrics, multiline. Then outputs can be things like circonus, cloudwatch, elasticsearch, more and more and more. Rashid uses Elastic search to store logstash information in. It’s schema free, clusters very well, JSON over HTTP, does search and aggregation.
Der.Hans talking about SSH and Bash tricks.
Set your shell’s prompt to something informative. $PS1 is the variable. u@h is a minimal set of information, that shows username and hostname. Sometimes you need the FQDN instead though. If you have your own account just use your shell config files on the remote machines, or you can use SSH config files. However, SSH config files runs before SSH launches your shell, so config files run with system default shell, not your user account’s shell.
In the case of not being able to modify any of those files, you can call ssh with a script that set up the remote environment on a temporary basis.
Obfuscate the names in known_hosts. If you put command= in your auth keys file, whatever command you put after the equal sign is run when you open a connection to that machine.
Set PermitRootLogin=”no” – never log in as root. You can pipe straight to ssh so you can tar on one machine and pipe it to ssh. Fuse-sshfs can mount a remote system as a local system.
Parallel SSH! for i in $( cat hostnames.txt ); do ssh $i some_command; done. However, Shmux is better! Also, ClusterSSH, parallel SSH, Mussh, Massh. sssl runs on a port and listens for incoming connections, determines what is intended for HTTP and what is for SSH and then sends the traffic on to the proper daemon.
SSH can perform local and remote tunnels, etc. etc. Check the -L and -R switches. Lots of examples, but I was paying attention and not writing.
Mike Dorman CentOS & Xen at GoDaddy. A couple years ago Go Daddy released a product called cloud servers. It turned out to not be such a great fit for GD’s target group. GD is more focused on the SMB that needs to have a simple website and have a simple online store. It’s been EOL’d and was completely turned off last week.
The development group for that products has come back together and they’re revamping it to an internal private cloud platform that will be sued internally. The vision is kind of like Google where they have their infrastructure that runs everything concerning the company, public and private. Maybe in the future GD will get off of the physical focus of dropping in more metal, and be more cloud based.
To provision a new server, there are server build times of 10, 12, to 15 weeks because of “The Checklist” that five, six, or seven groups have to look at and approve. The idea is to have a cloud platform to make it faster for servers to spin up and develop internal products and projects.
One of the troubles with monitoring it all is that the dynamic creation and destruction of machines is hard with traditional tools. They’ve also been trying to decouple the idea of physical servers from services. Physical servers are like puppies. You name them, care about them, and nurse them back to health when they’re sick. Virtual servers are like cattle. They’re given numbers, they’re driven hard, and if they get sick you shoot them in the head.
GoDaddy went with CloudStack at the outset of their decision to go cloud. Clustering and HA functions gave them trouble. It was essentially a forked version of cloudstack with code that couldn’t be submitted back up to the core project for various reasons.
They looked at CloudStack, and they ran into the problem that is in the OpenStack community about “clouds of clouds” or “pods of pods” where you can’t really grow a cloud too big. There were some management problems with it as well where it was not operationally simple to deal with. There are also concerns about the OpenStack community looking at the big players, their aren’t a lot of good contributions back into the core. A lot of the main users of OpenStack are essentially using forked versions of the project.
So, GD developed their own internal management stack and called it Norm, after the robot from some cartoon. It’s basically CloudStack, rebuilt into something very different.
So why not use XenServer as the hypervisor? It worked okay. You do have official Microsoft support if you want to run MSFT products in XenServer. However, it looks like Linux, but it can’t be treated like Linux. It has the network layer configured through a different configuration system. Management of XenServer couldn’t be done like a regular Linux machine.
Now they use stock CentOS and open source Xen. There is more of a community around CentOS, but the XenServer community isn’t comparable in a support sense. Patch management is easier because none of the XenServer patches are able to be patched through regular management systems like SpaceWalk. They never found a way to update XenServers to scale.
So, now after trying all the major cloud stacks, they’re building from CentOS and Xen raw and real.