Linux is the most popular server platform due to its low TCO, stability and security. However, Linux servers could face performance and security issues if not maintained properly. With regular security/performance optimization and emergency server administration, our Linux engineers deliver high server uptime and reliability. Bobcares Linux Systems Administrators deliver 24/7 end customer technical support and provide reactive/preventive server management that keep your service top notch.
How we manage your Linux infrastructure
A typical day for a Linux system administrator involves a wide range of server management tasks. Here’s a summary of what we do to keep your infrastructure top notch.
24/7 emergency administration
Today’s resource hungry applications not only need services to be available, but also be highly responsive. With 24/7 monitoring and emergency administration, we ensure your Linux servers meet the performance standards.
Resource abuse issues
– Memory usage spikes – Such issues lead to increased page swapping, and thereby sluggish service performance. Particular applications and users causing the memory spike are found through load monitoring tools, and using debugging tools,
memory leaks are resolved.
– I/O usage spikes – I/O spikes are the most common resource bottle-neck, and are usually seen in mail and database servers. Mitigation actions include I/O throttling for individual users, application optimization, server settings tweaks, etc.
– CPU usage spikes – CPU hungry servers such as web servers hosting interactive websites, media streaming servers, etc., sometimes cause CPU bottle necks. Some common solutions are CPU throttling, implementing caching system, building load balancers, etc.
– Bandwidth usage spikes – In media servers, bandwidth bottlenecks are caused by increased traffic. Resolution to such issues include improving the compression systems, implementing load balancers, upgrading the network speed, etc.
– Web server crashes – Issues like module upgrade incompatibility, incompatibility of new security rules, configuration update errors, file system errors, etc., lead to web server crashes. Resolution includes log analysis, isolating affected server from cluster, methodical issue resolution stage-by-stage service restoration, etc.
– Database server crashes – Database table corruption, lost connectivity between master and slave, upgrade issues, configuration update errors, file system errors, etc., leads to database server crashes. Resolution includes quick isolation of affected server from cluster, repairing the corrupted tables, restoring the server files, synching the data, etc.
– DNS server crashes – Configuration errors, zone data sync errors, upgrade errors and file system errors are the most common causes for DNS server crashes. Resolution includes methodical identification of corrupt rules, re-synching zone files, running configuration checks, re-linking the server to the DNS server cluster, etc.
– Mail server crashes – Upgrade errors, rule update errors, permission errors and file systems errors leads to mail server crashes. Resolution includes de-linking server from mail cluster, saving the mail queue data, methodical resolution of the error, sending queued mails, re-linking to the mail server cluster, etc.
Data center issues
Hardware failures – Hardware components like RAID, memory, motherboard, power unit, hard disk, network card, etc., are prone to service deterioration or failure. Quick reaction and co-ordination with the data center minimizes the downtime, and saves customer data.
Network lag – Services might be unreachable or be delayed for customers if the data center network has issues. Some common issues are, routing tables getting corrupted, upstream bandwidth providers being down, etc. Such issues are resolved by arranging an alternate network route by coordinating with the data center.
– Listing in RBL/DNSBL – If the server IP is listed in RBLs, immediate resolution is to block any ongoing bulk-mail campaign and then to change the interface IP. More detailed investigation is done based on log analysis and mail header analysis. Tools like Wireshark are used to locate any outbound attack. Identified vulnerabilities are patched immediately, and firewall rules are updated to prevent any future exploits.
– DoS/DDoS attacks – Quick reaction to DoS/DDoS attacks is to deploy custom scripts to automatically block attacking IPs. Additionally, specialized software like mod_evasive, Nginx reverse proxy, etc., are used to block future attacks.
– Brute force attacks – Immediate mitigation of brute force attacks is done using custom scripts that block attacking IPs based on connection frequency and authentication failure. Specialized software such as LFD, BFD, etc. are used to dynamically block such attacks in future. Additionally ports of administrative services like SSH are changed to negate attacks from automated bots.
– Spamming – Large volume inbound spamming can lead to sluggish mail services. Mitigation is done by adding custom rules to the anti-spam filter to block particular IP ranges or detect specific patterns in the connection string. Outbound spamming is caused by malware in websites, or abuse of mail accounts. Emergency response includes clearing the spam queue, changing the interface IP, quarantining the user, updating security rules in web application firewall, etc.
– SEO issues – Website blacklisting in search engines is usually caused by malware uploaded by hackers. Hack recovery includes cleaning the malware code, restoring specific files from backup, re-scanning the website using specialized security tools, assisting in search engine review, etc.
– Optimal service levels are maintained by monitoring performance parameters like number of slow queries, replication delays between master-slave servers, thread counts of web applications, average response times, etc. Root causes are traced to issues like hardware degradation, network lag, application bugs, resource bottle necks, etc. Resolution includes freeing up resources from abusive connections, coordinating with data center to replace hardware, changing load-balancing settings to reduce traffic, etc.
Patch and upgrade management
Timely patching and server upgrades ensure security and a better user experience, but it could lead to service downtimes if not managed properly. Its our priority to keep your servers updated, while ensuring zero downtime.
Applying emergency patches
– We receive priority alerts from various security organizations about emerging security threats. Such threats are mitigated in two stages. As soon as the vulnerability is disclosed, access restriction or feature quarantining is done to block an attack, and as soon as
a patch is available, server is updated to nullify any hack attempts.
Automatic security updates
– Updates critical to security such as kernel patches and system software patches are automated as far as possible using tools like KSplice, KernelCare, yum-plugin-security, etc.
Software updates from inhouse development
– Following established change management procedures, new updates to applications or websites are released. Versioning systems such as CVS, SVN, Git, etc. are used to control the deployment and roll-back procedures.
Application upgrade using source
– Some applications might be compiled from source and would be customized for that particular server environment. Such applications are upgraded using the latest source files obtained from their code base.
In-house package repositories
– To minimize bandwidth usage, and to ensure availability of only sanitized software packages, inhouse software repositories are setup. Custom packages are created from source (using spec files, control files, etc.) to meet specific requirements. Some tools used for this are, yum, createrepo, apt-mirror, rsync, rpmbuild, bzr, etc.
Server software upgrades
– Application server software like DNS server, web server, LDAP server, etc are upgraded through a change management process where the updates are tested in a sand box first, and then implemented in production environment in stages. Dependency resolution, regression testing, transaction testing, etc. are carried out to make sure no customer functions are affected.
System software upgrades
– Server software needs to be upgraded regularly to keep pace with technology. Some distributions allow seamless upgrades, while others will need migration to a new server. Such system upgrades are planned and executed ensuring minimal downtime for critical services.
Monitoring system customization
Fine grained monitoring is needed in today’s application servers to continually improve quality of service. We customize the monitoring systems to record events as varied as web server load, backup timeliness, DNS query responsiveness, etc.
Monitoring systems, technologies and utilities
commonly used include Zabbix, Nagios, MRTG, Big Brother, Xymon, Cacti, Big Sister, Ground Work, Icinga, Graphite, Ganglia, Zenoss, Munin, Monit, SNMP, Top, Atop, NetHogs, Auditd, Psacct, NtopNG, IPTraf, VMstat, IOStat, etc.
Monitoring system design
– Defining the full list of performance parameters, security requirements, and other specific needs. For eg., server uptime, web server response time, pending mail queue size, user logins, web page file size changes, etc.
– Defining the requirements for fault-tolerant monitoring (distributed monitoring), notification priorities for categories of alerts, notification mechanisms (mail, instant messenger, SMS, sound notification, browser pop-up, pager, smart phone app, etc.), and escalation policies (on call techs, senior techs, data center ops, managers, everyone, etc.).
– Identifying the monitoring system that meets the requirement. Some of the common systems are, Zabbix, Nagios, Big Brother, Xymon, Cacti, Big Sister, Ground Work, Icinga, Graphite, Ganglia, Zenoss, Munin, Monit, etc.
– Identifying the tools required in the servers to collect data for un-supported monitoring requirements.
– Defining the data retention period, display formats, and audience to display this data to.
Monitoring system implementation
– Installing the chosen monitoring system and its plugins in standalone or multi-server configuration as per design.
– Installing agents in servers to be monitored, and enabling supporting services like SNMP.
– Creating custom software in Bash, Perl, Python or C to cover requirements not supported through the monitoring system. These software are either created as plugins or as API calls which the monitoring system can parse. Some of the common tools used in such software are top, atop, nethogs, auditd, psacct, iptraf, vmstat, free, iostat, /proc data, netstat, etc.
– Testing the accuracy of monitoring system by comparing data seen in server with the monitoring system data, and making corrections as needed.
– Optimizing monitoring settings and server resource allocation to make sure monitoring system doesn’t cause resource hogs.
Monitoring system maintenance
– Installing new monitoring system modules as needed. For eg: waiting database queries, raid status monitor, license expiry check, etc.
– Installing new features for the monitoring system for improved reporting, security, or accessibility. For eg., service status display for customers, responsive interfaces for access via mobile, etc.
– Developing custom software using Bash/Perl/Python/C for new monitoring requirements not covered through plugins in market.
– Resolving connection breaks between monitoring system and server pool.
– Resolving monitoring agent crashes due to various reasons like server environment changes, process priority changes, etc.
– Upgrading the monitoring system and associated modules to implement new features and improve security.
Backups are your safety net. Through data recovery tests, backup process fault monitoring, and continual improvement of backup systems, we ensure that you always have a reliable system and data backup to restore at a moment’s notice.
Commonly used technologies
in backup and versioning systems include Amanda, Bacula, Barman, MySQLDump, Tar, RSync, FreeNAS, Mondo Rescue, Partimage, SVN, CVS, Git, Puppet, etc.
Backup system design
– Designing a backup policy that defines the extend of backup (system files, user files, databases, etc.), periodicity of backup (continuous, hourly, daily, weekly, monthly), data retention requirements (1 week to N years), off-site storage requirements, ease of restore, etc. – Selecting hardware to meet the requirements for reliability and fast restores. For eg., SATA II, RAID 10, NAS, SAN, Tape, etc. – Selecting software systems to meet the architecture requirements. For eg., FreeNAS, Bacula, RSync, SVN, etc. – Selecting hardware to meet the requirements for reliability and fast restores. For eg., SATA II, RAID 10, NAS, SAN, Tape, etc.
– Selecting software systems to meet the architecture requirements. For eg., FreeNAS, Bacula, RSync, SVN, etc.
Backup system implementation
– Setting up the servers by co-ordination with the data center.
– Testing the disk and network access speeds using stress, smartctl, iperf, etc. to confirm it works as per specifications.
– Installing and configuring the selected software to realize the backup policy.
– Testing the backup system with test data for backup and restoration performance.
– Activating the backup policy and linking to the monitoring system to log performance.
Backup system maintenance
– Based on monitoring system alerts and log analysis, fixing issues like abnormal termination of backup process, backup process causing load spikes, disk space outage issues, disk read/write errors, degraded network speed causing backup transfer delay, remote server connection/access issues, etc.
– File size tests and random restore tests to confirm all backups are retrievable, and to measure the speed of restoration.
– Modifying the backup and versioning systems for faster backups, faster restores and adding new data to be backed up.
– Access management to versioning systems such as SVN and CVS.
– Performing restores of specific files, specific folders or the whole server as needed.
– Upgrading the backup software to implement new features and to maintain security.
– Upgrading server hardware by coordinating with data center if monitoring reveals degraded performance.
Security of a Linux server can be guaranteed only through multi-layered security, reinforcing the defenses through frequent testing, and constant updates. Given below are a few of the common security administration procedures.
Common technologies used in security administration
include Endian, pfSense, IPFire, Shorewall, CSF, APF, mod_security, ComodoWAF, IronBee, mod_evasive, Suhosin, SuPHP, magic-smtpd, ASSP, DSpam, SpamAssassin, CHKRootkit, RootKit Hunter, KSplice, KernelCare, ClamAV, SaneSecurity, and more.
Implementing multi-layer security
– Setting up dedicated firewalls like Endian, pfSense, IPFire, etc. to implement access policies from a central location.
– Installing and optimizing firewalls and firewall management tools, like iptables, firewall-cmd, Shorewall, CSF, APF, etc.
– Custom configuring web application firewalls like mod_security, ComodoWAF, IronBee, etc.
– Hardening web server settings and installing modules to prevent exploits and DDoS. For eg., mod_evasive, Nginx tuning, OpenLiteSpeed tuning, Apache tuning, etc.
– DNS service hardening. Commonly supported servers include BIND, TinyDNS, etc.
– FTP server hardening. Commonly administered servers include ProFTPd, Pure-FTPd, VSFTPd, etc.
– PHP hardening to prevent hackers from executing malware. Some common solutions used are Suhosin, SuPHP, disabling vulnerable PHP functions, hardening settings, etc.
– Mail server hardening using advanced ACLs to prevent incoming and outgoing spamming. Supported software include Exim, Postfix, Sendmail, Qmail, magic-smtpd, etc.
– Denial of Service attack protection using solutions like sysctl hardening, DDoS-Deflate, LFD, BFD, etc.
– Access protection through restricting IPs, using custom ports, disabling direct root login, and installing login alert scripts.
– Implementing strong password policies in Web, Mail, cPanel and FTP services.
– Securing access to common services through wild-card SSLs, and strong ciphers.
– Implementing brute force protection using solutions like BFD, LFD, non-standard service ports, etc.
– Protecting server from port scanning using tools like Port Sentry, LFD, etc. This is usually a pre-cursor to someone attempting a hack.
– Hardening temporary directories by securing file system.
– Configuring rootkit detection tools like CHKRootkit and RootKit Hunter.
– Implementing server software auto-upgrades for security patches using tools like yum-plugin-security.
– Implementing kernel auto-updates using tools like KSplice, KernelCare, etc. to prevent zero-day hacks.
– Installing file upload scanners (like LMD, CXS) and fortifying them using ClamAV beefed up with third party rules like SaneSecurity.
– Preventing un-authorized modification of server binaries using LES.
– Securing the backup process, which includes restricting root access to central backup location, auto-integrity checking, etc. This is an insurance that backups will work when you need it.
– Implementing central WAF or Anti-Spam gateways as required. Commonly used software include, ASSP, Magic-SMTPd, DSpam, SpamAssassin, ClamAV, etc.
– Implementing two-factor authentication for administrator logins as required.
– Configuring server access notifications using custom scripts in Bash, Perl, Python or C.
– Implementing notifications for system binary changes using custom plugins for monitoring systems such as Zabbix, Nagios, Icinga, etc.
– Setting up monitoring for suspicious processes. For eg., long running processes, processes that are bound on non-standard ports, etc.
– Monitoring RBLs to notify if server IP is listed for some reason.
– Configuring auditing and logging systems to provide trace data for hack investigations. Tools used for this include Auditd, PsAcct, etc.
Regular security auditing
– Periodic top-down security audits are done to check and take corrective actions:
– If all defenses are working as it is supposed to be.
– If logs point to any system vulnerability.
– If all system software and kernel are up-to-date.
– If any password is past its expiry date.
– If all firewall rules are to-to-date.
– If all anti-virus and web application firewall rules are updated.
– If all server accesses are valid and authorized.
– If all security settings modifications were as per authorized process.
– If the organization is subscribed to all security news outlets.
– If backups are working fine.
– If disaster recovery plans are working fine.
Optimization and high availability
By implementing and maintaining software and hardware solutions such as load balancing, clustering, software optimization, SAN, RAID, etc., we ensure that applications get the resources they need to run smoothly and reliably.
include HAProxy, Nginx, Nginx Plus, Apache mod_proxy_balancer, Apache mod_jk, PgPool, MySQL NDB, MySQL Proxy, HeartBeat, Pacemaker, Ldirectord, RSync, GlusterFS, DRBD, Memcached, Page Speed, Varnish, XCache, APC, MX clusters, SAN, NAS, RAID 10, Cloud Linux, and more.
Resource limits configuration
– Resource limits are placed on processes and users so that a mis-configured application cannot bring down the server. Some solutions used are Ulimits, CloudLinux limits (IOPS, IO, NPROC, VMEM, PMEM, CPU, NCPU, etc.), Apache RLimit settings, etc.
Service configuration optimization
– Web server – Settings are fine tuned based on server traffic. A few common settings are – KeepAliveTimeout, MaxRequestsPerChild, MinSpareServers, etc. in APache; Worker_Processes, Worker_Connections, Static File Caching, etc.in Nginx; Max Connections, Smart Keep-Alive, Connection Timeout, etc. in LiteSpeed. Architecture changes like installing Nginx reverse proxy in front of Apache is done to improve high concurrent connection performance. Tools like ApacheBench, JMeter, etc. are used to identify poorly configured settings.
– Database server – Settings like max user connections, key buffer size, sort buffer size, etc. are optimized to ensure fast query execution. Tools like MySQLTuner, MTop, etc. are used to locate areas for improvement.
– Mail server – Parameters like concurrent connections, connections per IP, etc. are tweaked in mail servers to ensure fast delivery
– DNS server – Name server settings like max-transfer-time-in, max-refresh-time, recursive-clients, etc. are tuned to limit resource wastage and faster execution.
– Firewall – Firewall is optimized by clearing out unused rules, sorting the rules as per hit frequency, tuning auto-block scripts, etc.
– Resource usage is reduced and response times are improved using caching systems like Memcached, XCache, APC, eAccelerator, Varnish, Page Speed, etc.
Load balancing, fail-over and clustering
– Web server clustering or load balancing – For high traffic web services, load balancing solutions like Apache mod_proxy_balancer, Apache mod_jk, HAProxy, Nginx Plus, etc. are used to distribute traffic over nodes in a cluster of servers. Systems like Rsync, DRBD, GlusterFS, etc. are commonly used for data synching. To configure fail-overs systems, we’ve found solutions like HeartBeat, Pacemaker, Ldirectord, etc. to be useful.
– Database clustering or load balancing – For high availability and responsiveness, solutions such as MySQL NDB, Replication, MySQL Proxy, HAProxy, HeartBeat, PgPool, etc. are used.
– Mail server load balancing – Load balancing for SMTP servers are implemented using multi-MX configuration.
– DNS load balancing – DNS load balancing is usually done with optimized master-slave configuration for systems such as BIND, TinyDNS, etc.
We aid in decision making and successful deployment of new systems by evaluating benefits of different hardware technology, far sighted capacity planning, and seamlessly integrating servers into the current infrastructure.
Commonly supported Linux software
include CentOS, Red Hat, OpenSuse, Ubuntu, FreeBSD, Gentoo, Vyatta, Astrix, Endian, OpenVPN, PPTP, Shorewall, Squid, Puppet, NFS, LDAP, LVS, etc.
includes the following:
– Choosing a data center based on factors like ease of scalability, network reliability (based on peering, reputation, bandwidth provider stability, etc.), proximity to primary user base, infrastructure capabilities (cooling efficiency, power backups, etc.), availability of online service portals, support responsiveness, etc.
– Designing the network topology based on considerations such as load balancing, HTTP/SMTP/Security gateways, local/off-site backup systems, geographically redundant DNS, VPN systems, intranet vs public-facing systems, private/public bandwidth, central configuration server, IPMI, etc.
– Choosing the hardware based on reliability and performance requirements which includes evaluation of technologies such as RAID, SAS, SATA, SSD, DDR4, multi-core CPU, hardware firewalls, etc.
Installation and integration
involves the following:
– Co-ordinating with data center to ensure proper deployment of hardware.
– Stress testing the hardware using tools like ipref, stress, smartctl, netio, etc., to see if the hardware can perform as per network architecture.
– Installing operating systems and application software as per requirements. Common distributions installed are CentOS, Red Hat, OpenSuse, Ubuntu, FreeBSD, Gentoo, Vyatta, Endian, Astrix, etc. Application software include OpenVPN, PPTP, Shorewall, Squid, NFS, LDAP, LVS, etc. – Integrating the server into the existing infrastructure by linking to the monitoring system, linking to central configuration system (like Puppet), setting up network routing, setting up sync with central DNS server, linking to backup server, etc.
Web server management
Every web master loves to have the latest web technology for their websites. We enable you to power the best web sites out there by keeping your web server feature rich and streamlined to effortlessly support a growing user base.
Supported technologies include
Apache HTTP server, Nginx, OpenLiteSpeed, Litespeed, Lighttpd, tHTTPd, Apache TomCat, HAProxy, Red5, FFMPEG, Ampache, mod_security, mod_qos, mod_mono, FastCGI, WebDav, SuPHP, Suhosin, ComodoWAF, Memcached, XCache, APC, Varnish, PageSpeed, etc.
Web server setup
– Based on availability and speed requirements, appropriate technology like RAID 10, SAS, SSD, etc., are chosen for hardware.
– For high availability, technology such as HAProxy, NGinx, DRBD, RSync, GlusterFS, etc., are used.
– For specialized requirements such as image server, media streaming, etc., solutions like tHTTPd, Red5, FFMPEG, Apache Tomcat, mod_mono, Lucene, etc., are employed.
– For fast access speeds, caching techniques such as Memcached, XCache, APC, Varnish, PageSpeed, etc., are used.
– For web security, solutions like mod_security, ClamAV, SuPHP, Suhosin, etc are used.
Web server maintenance
– Troubleshooting performance issues due to specific web applications. Memory maps, process tracing, and log analysis determines the scripts/programs causing the issue.
– Investigating security issues, and writing custom security rules to block attacks. For eg: creating custom mod_security rules to block comment spamming in web forms.
– Assistance in installing/upgrading specific web applications which need customized redirection rules, specific permission sets, or optimized connection settings to database server and mail server.
– Mitigating DoS or DDoS attacks using specialized modules like mod_evasive and through custom scripts that block IPs dynamically.
– Optimizing web analyzer programs to avoid load spikes in the server.
– Regular performance testing using tools like ApacheBench, JMeter, HttPerf, etc., and adjusting the server settings for fast performance.
– Continual tuning of caching systems like Varnish, Memcached, etc., to accommodate for changed feature sets and user base.
Web server upgrades
– Hardware upgrades include adding new servers to the server cluster, adding extra resources, or replacing old components of RAM, CPU, HDD, Network card, etc.
– Software upgrades include upgrading the server software, upgrading specific modules, or introducing new rule sets for security modules.
Database server management
The best online services are powered by reliable and fast database systems. We design, build and maintain stand-alone or clustered database systems of all types that meet the stringent performance criteria for world-class applications.
Supported technologies include
MySQL, PostGreSQL, MariaDB, MongoDB, Redis, SQLite, etc.
Database server design
– We help you choose the ideal hardware configuration and database platform.
– Replication, clustering or load balancing solutions are considered for heavy traffic databases that need easy scalability.
– Hardware solutions like RAID 10, SAN or NAS are considered for highly reliable databases.
Database system implementation
– Database server installations are done with careful capacity planning. Stress testing is done to find the limit of transactions it can take before needing server upgrades. The server configuration is adjusted accordingly before deploying it into production environment. Monitoring systems are integrated, and fault tolerance systems are setup to keep the system reliable.