Or why using the Command and Control (C&C) server ban lists will not help with security.
If the malware botnet concept is a bit blurry to you, this article will surely clarify things. Without further ado, here are 11 frequently asked questions about malware botnets:
1. What is a malware bot?
A malware bot is a type of malware that, upon infection, exercises control over the infected machine. However, unlike an ordinary type of malware, which is autonomous and once released in the wild requires no further input from its creators, the malware bot receives instructions from a master, and acts accordingly. The ability and necessity to communicate with the remote server is what makes malware a bot.
2. What kind of actions can a malware bot perform?
A malware bot is capable of any actions that could be performed on a machine, from Web browsing to Bitcoin mining. However, it is typically used for a certain class of actions, such as:
- Stealing information (credentials, documents, visited web sites, communications);
- Spying and tracking (capturing webcam shots, keylogging);
- Hijacking (replicating user actions on a machine, i.e. accessing corporate resources);
- Performing malicious Internet activity (sending spam, hosting command servers or antiphising pages, working as proxies, infecting other machines etc.).
There are, however, two types of malware actions the malware bot does not typically perform:
- Actions revealing the infection. The bot's ability to run on the machine would be damaged if the user knew their machine was infected. Therefore, the bot tries to be stealthy, and does not perform actions that would help the user identify the infection on their machine. So actions such as modifying browser settings or popping up dialogues are not typical for a malware bot.
- Actions that threaten the health of the machine. The malware bot needs a working environment to function properly; if the environment is damaged, the machine is typically reinstalled, wiping the bot out. Therefore, bots do not usually perform any destructive actions that would hinder their ability to run on a machine.
The reason why these actions are not performed is because they make little business sense, not because the bot is not capable of performing them. On the contrary, malware bots are capable of such actions, and some bots have been used to perform them before.
3. What is a botnet?
A botnet is a term used to describe multiple computers infected by a similar malware bot, all controlled by the same operator. For the purpose of defining a botnet, it doesn't matter how exactly these computers are controlled, as long as the control is performed by the same operator/group. The number of infected computers varies, and can be from tens to tens of thousands of infected computers or even more.
It is not necessary for those computers to run exactly the same malware sample, or even use the same communication servers or channels. As long as those bots are controlled by a single operator (which may be a person or a group), and can act together, they are considered a single botnet.Unlike, for example, a corporate network, a botnet composition is not stable, but fluid. New computers get infected and “join” the botnet, while other computers “leave” the botnet permanently once the malware is detected and removed.
4. How does a botnet emerge?
First, the malware writer creates a special program, which will serve as the communication point for the bots. This program runs on dedicated machines, and is typically never delivered to the users; it is used solely to control the bots. This is described in details below.
Second, the same or another malware writer creates the part of the bot meant to communicate with the program described above. This code is then merged with different malware payloads, depending on the distribution scenario. For example, if the malware bot is distributed as a legitimate application, the malicious payload may be added into the legitimate application installation package.Then, once the malware is ready, it is distributed in the wild through spam, phishing, exploits or drive-by-downloads through infected sites. As soon as it infects the user machine, the bot instance starts, connects to the communication point and the machine becomes part of the botnet.
An important note is that many of the distributed bots do not make it to the end users, or do not infect the machine. Some users will not open the malicious attachment or click on the malicious link. Other users have a good security solution (like Bitdefender) whose proactive protection will catch the majority of new malware. And even the reactive antivirus products will catch up with the detection soon, once it is made public. Therefore, a malware writer would have to keep changing their bot to let it slip through at least the types of products mentioned above. Those changes include more work related to the creation and distribution processes.
To summarize the above, malware writers put a lot of effort into creating a botnet, and they do not want to let it go to waste. This affects the communication choices described below.
5. Why do botnets emerge?
As stated above, developing, deploying and maintaining a botnet requires a lot of effort. So why are the malware writers willing to put this kind of effort into botnets?
The main goal is financial gain. Once the botnet is established and controlled, the malware writers have a number of ways to capitalize on the botnet. The most typical way is to rent the botnet to other criminals for a fee, who would use it then for malicious actions such as:
- Coordinated DDoS attacks, when the infected machines try to flood the target Web server with a large number of requests, bringing it down. As a result, legitimate users cannot access the server. This could be done for political reasons (for example, to shut off a certain speech) or for ransom, when the criminals demand a payment to “restore” the normal operation of the target server.
- Hiding behind the user while performing criminal activity. When the user's machine is used as a proxy, the criminal can commit online crimes while remaining undetected. This happens because the user is seen as the source of those actions, and they cannot point to the criminal. The actions in this case may include anything from:
Those actions may also seriously hurt the user, as the law enforcement may assume that he or she initiated those actions – so they may face arrest or even prosecution.
The botnet in these cases is used to go through a chain of computers, often spread throughout different countries, to lower the criminal’s chance of being detected, by:
- Infecting other machines. For example, a botnet may be used with another malware based on a new exploit, in an attempt to infect other machines. In this case, using a botnet amplifies the impact, allowing the criminal to simultaneously attempt to infect many more machines than if they were using regular distribution channels.
- Sending spam. Same as with infecting other machines, using a large number of controlled computers allows the malware writer to send more spam faster, and from different sources, increasing the chance of not being detected.
As such, the botnet potential for financial gain is significant. So much so that oftentimes it fuels rivalry between different malware writers. There have been cases in which malware writers included special code in their bots to detect and remove “competitor's” malware, and even to patch the vulnerability used by the competitor’s malware to get into the machine!
6. How is a botnet controlled?
The botnet is controlled through a dedicated computer or group of computers running a command and control server (often abbreviated to C&C server). Malware bots communicate to and receive instructions from this server in a format the bot understands. This server typically performs a number of functions, including but not limited to:
- Instructing the bots to execute or schedule a certain task;
- Updating the bots themselves by replacing them with a new type of malware;
- Keeping track of the number of bots and distribution (per region, country, or Internet provider);
The server may also offer a control panel with some kind of interface (similar to a Web interface) for an operator. Some control panels are very flexible, allowing the operator to schedule a task only for the bots running in certain geographic areas or on different customers (such as corporate networks). Some go even further by offering a third-party access, allowing others to rent the botnet from the operator.
7. Is there a reliable way to detect the bots that communicate with the C&C server?
Each malware creator designs their own protocol independently, so there is no standard language, protocol or way to communicate. The reason for this is that the content of the communication heavily depends on the flexibility of the bot, and the programming skills of its creator. It may vary from simple text strings executing the commands, to complex scripts written in languages such as Python or Lua.
The way the bot communicates also varies. Some bots use the IRC protocol, some use the HTTP protocol and some use a custom one (for example there are bots using the ICMP protocol to communicate). Some use protocol-provided encryption (such as HTTPS), some implement encryption themselves, while others use no encryption at all. Modern bots are highly resilient to traffic monitoring, and reliable detection of even the majority of the bots with “traffic signatures” is impossible.
Hence, each bot has to be analyzed separately to find out the protocol, and in case the protocol is encrypted, there is no easy and reliable way to detect such communications.
8. How are botnets investigated?
When the malware writers release their bots into the wild, sooner or later they reach the malware researchers. The researchers typically capture the initial wave of the bot malware through various channels such as honeypots, malware spam, phishing sites or product reports.
Once the malware bot is captured, it is analyzed in a controlled environment. Generally, the researchers want to keep the malware bot “alive” to receive its latest updates, as it can accelerate the analysis process. Because malware bot updates are not received frequently, getting updates directly within a honeypot can simplify analysis.
The main problem here is that the bot operators do not want the malware researchers to receive those updates. If the malware bot operator detects that its bot is in a honeypot, then it will separate it away from the rest of the bots and will not send updates to it. The researchers have to “convince” the bot operator that the infected machine is real and allow it to send spam or infect other machines. But this action poses significant difficulties on the research activity, since it’s not an acceptable business practice and, in some cases (such as distributing certain materials), may carry serious legal consequences.
9. Is it possible to block bot communications with traffic analysis solutions?
This is a typical question based on some vendors’ claims that their solution is able to detect and block bots based on:
- A list of C&C IP addresses or domains;
- A list of traffic patterns used by bots to communicate;
- A list of domain names queried by bots;
Unfortunately, all those solutions could detect only a single class of bots. None of them provides a good detection rate for malware bots overall. To understand why, let's see the various ways the bots communicate.
Direct communication to a single server
Bots using this method communicate directly to a single physical server with a fixed IP address. This mode is the easiest to detect and block, as all it requires for the researchers to do is to capture the bot and trace the destination of its communication – a very simple task. Once the server identity and location is established, the law enforcement works with the ISP and they are usually very quick in shutting it down. And even if the local ISP is not cooperative, their upstream ISP usually is.
This is the only method which could be easily addressed with the IP block lists. But this mode is rarely used nowadays, as once the server is down, the whole botnet is essentially deactivated – and all the effort the malware creator puts into it is wasted. This forces them to choose more resilient communication methods.
Direct communication to multiple servers
Bots using this method communicate with multiple C&C servers, which are geographically spread. This communication model provides some form of resilience since it makes it impossible to shut the network down by looking at a single bot instance communicating with a single server, or overtaking this server. The bots in this model also automatically update frequently to add new servers to the list and remove the servers which are not active anymore.
But again, once a copy of the bot is captured and analyzed by researchers, all the server information hardcoded in the bot is found, and it’s blocked quickly. The malware creators counter this by using the so-called domain generation algorithms (DGA). Now, the C&C servers are not hardcoded into the malware bot anymore, but instead, the list of C&C servers is generated using an algorithm, based on certain parameters such as the current date. Such an algorithm can generate hundred thousands of possible domain names, and the malware creator would only register a few of them, which the bot would eventually resolve.
The DGA-based approach is challenging for traffic analysis solutions relying on domain lists. The possible domain lists are virtually unlimited considering the number of different botnets available. However, reverse-engineering the malware bot to reveal the algorithm is more problematic. This is a very time-consuming process (malware is often obfuscated and uses other techniques to prevent reverse-engineering) and the bots are frequently updated. So once the algorithm is reversed, most bots are already using the latest version with a different algorithm.
Direct communication through TOR network
This is a relatively new (late 2013) communication method, where the C&C servers are hidden in the TOR anonymous network.
These servers are very difficult to trace and locate through traditional means (they are widely used by online criminals, for example, for purposes such as selling drugs and firearms), and their location typically includes State-level law enforcement cooperation.
They do not use traditional IP addresses and domain names, but the “hidden server” addresses in the .onion domain. Only the malware and the last node in the chain knows which server the malware is trying to communicate to. This means no solution listed above could identify or block such malware communication.
Communication through intermediate nodes
This method of communication is relatively new as it is more complex and requires more development and testing from the bot creators. In this method, the malware bots communicate not directly to the servers, but to the other malware bots that communicate with the servers:
This method of communication lowers the load at the servers and makes it more difficult to locate the C&C servers. The researcher has to spend time to understand whether the malware is talking to a C&C server or to another infected machine. Suffice to say, C&C server IP lists are useless in blocking this approach, as the majority of the infected machines would be talking to other infected machines, and not the C&C servers.
Communication through popular websites
A very lucrative communication method for malware writers is to use popular web sites as C&S servers. There was a type of malware that used Gmail, Twitter and Pastebin for communications and updates. This approach provides significant advantages for malware writers. Here are just a few examples:
- Popular websites/services are typically available to most users, even in otherwise restricted environments;
- Even if the service is found to be used as C&C, it will not be shut down; rather, the service itself would have to deal with this use;
- These services are built on reliable architectures, so the malware writers do not have to take care of scalability and reliability issues.
While the earlier versions of such malware were easy to detect as they used cryptic, base64 encoded messages, which no normal user would leave, there are many ways for malware writers to avoid detection. It is possible to use steganography, for example, to hide C&C instructions – or even malware updates – in the text and images. Obviously, any kind of C&C list is useless against this approach.
10. Can we analyze the traffic for “anomalies” to detect a bot?
It depends on the communication mode. The main problem with this approach is that it will always be reactive, as the bots keep changing to avoid detection by desktop security solutions. So changing the protocol encryption is not a problem. Also, there are a number of ways to blend into existing traffic patterns (for example, using HTTP Authentication and Cookie headers to send/receive commands – such fields can contain different values which are generally impossible to trace). Hence, this approach will only detect the easiest bots.
11. What is then the best way to detect a bot?
The most reliable way to detect a bot is to perform the detection on the same machine on which the malware is running. Not only can the malware be prevented from infecting the computer altogether, but in case it slips through, the behavior blocker may find out the malicious actions even if the malware is not known to the antimalware engine. Moreover, the detection routines work the same way if the malware is not communicating with the C&C server engine will detect the malware on the machine later, even if it manages to slip through undetected, and the behavior blocker doesn’t notice it.
For example, one of the advertisements in DarkWeb offering hacker services states, “I can download, store and distribute child pornography using your target's computer, in a way which is easy to detect by the law enforcement. I can even purchase it online using their credit card. Several of my clients had their targets convicted of possession, and more had their lives permanently ruined”.
"Hackers Are Using Gmail Drafts to Update Their Malware and Steal Data”: http://www.wired.com/2014/10/hackers-using-gmail-drafts-update-malware-steal-data/
"Twitter-based Botnet Command Channel": http://www.arbornetworks.com/asert/2009/08/twitter-based-botnet-command-channel/
“Pastebin a Convenient Way for Cybercriminals to Remotely Host Malware”: http://securityintelligence.com/news/pastebin-convenient-way-cybercriminals-remotely-host-malware/