I think it’s important to keep you guys in the loop. I’ve spent the past 3 days making a lot of small fixes to improve reliability and speed.
Haproxy
I’ve upgraded haproxy to version 1.3.14.10 and split the haproxy config into two separate instances, one for the WWW connections and another for persistent connections to the jabber servers. This will make it easier to maintain the loadbalanced WWW services without touching the Jabber processes.
Haproxy is also sitting on a server that is only providing memcached, memqd, and mysql. This means that when something runs away, i.e. the multi-protocol server for (MSN and AOL), it will not affect load balancing at all. The server new server is also fast and reliable
02:11:10 up 439 days, 17:51, 2 users, load average: 0.10, 0.10, 0.05
Stable Servers for Stable Services
I have moved the stable services, i.e. Jabber, Static Image server, and the RPC servers to machines that only run stable services. Again, this separates the Jabber, RPC, and status serving from the other services and ensures against a cascading failure (as we’ve seen in the past).
Currently, there are 2 servers that each run RPC / Clustered Jabber, and the status server. Haproxy load balances between these servers to make sure your request hits the fastest server available.
Cacti Monitoring of Diskspace
Although, we do not yet have automatic notification of when a partition has filled up with logs. (The problem that is responsible for 99% of the issues we have had; however, it never seems to happen with the same log file twice.) I have added a view to cacti to help us monitor the diskspace on all of our servers in one place.
I have added logrotated configurations for for all the log files I have been able to find, and will use the cacti graphs to make sure the logs are being rotated in a safe manner. I have also adjusted the logrotate scripts to rotate based on file-size, not time, to ensure that a copytruncate operation doesn’t fill up a partition when a log goes without rotation for too long of a period.
Standardization of Hab.la website cookies
Some users have reported problems logging into the forum, although we cannot recreate these problems, the only thing that would prevent a user from logging into the forum would be a bad cookie, or a cookie set on the wrong domain. In the past we have set cookies on both .hab.la and .www.hab.la and www.hab.la. Although most modern browsers can handle these 3 domains with no problem, from now on we will only set cookies with domain=www.hab.la.
Cleaner and more secure architecture
In the past our subnets and network configuration was pretty ad-hoc. I put in some time juggling services around so that all of the Hab.la services are now on nonroutable internal IPs. I.e. In the past I was using firewall rules to keep outsiders from reaching SNMP or MEMCACHED from the outside. I have replaced all the firewall rules with much cleaner assignment of new IP addresses. Internally all hab.la servers are now on 192.168.2.0/24. This is a lot cleaner than having iptables setup to accept packets from the same subnet, and let’s me do things like organize servers by type, i.e. 192.168.2.100 – 192.168.102 are RPC/Jabber/Status servers.
[NOTE: I still need to figure out a clean method of mirroring static.hab.la across multiple servers – but everything else is peachy]
That’s it for now, there is more work to be done, but you can rest assured that we are getting very close to the type of architecture you would expect us to have
.