Techzine recently attended SplunkLive! in Utrecht, an event for all Splunk customers and partners to learn more about the products and latest features. We were mainly guests to get to know the company better. After all, what exactly does data company Splunk do?
Initially, Splunk started on the IT operations side, where it mainly provided insight into log files of servers and endpoints by bringing them all together in a centralized location. There, the log files could then be searched and analyzed. This is useful because otherwise, you have to go through every link in the chain per server or per endpoint in order to identify and solve a problem.
Basically, Splunk indexes log files or lines with textual data. To do this, it needs a line of text and a timestamp. Splunk then stores these in a database, after which it is searchable in many different ways. It can process all forms of log files or events, as long as it is readable text. It cannot handle binary files, i.e. the content of a PDF or Word document cannot be indexed.
Splunk knows the formats of many log files
When indexing data, Splunk needs a log and a timestamp. Based on this, the log line is written. Timestamp is a very important factor to find out where something has gone wrong. In addition, Splunk can recognize many different types of log files, and the user can also specify what type of log it is. Within such a log line, Splunk can then automatically recognize things. In the case of a web server log, for example, the URL-path, session ID, status code, IP address, language, browser, operating system and referrer can be recognized. Because Splunk can recognize these kinds of things, it can also apply analytics to them fairly quickly.
To stay with the web server logs for a while: a good example is mapping the entire payment process of an online shop and seeing where exactly people drop out. Next, it can also be detected whether a web page has been delivered correctly (status code 200) or whether an error has occurred (status code 503). In this way, it can be determined whether there is something technically wrong with the checkout, or whether the checkout pages are so badly designed that people leave their purchases behind.
If there is a technical problem, there can be several reasons. The web server can cause problems, but this could also be caused by a database server or perhaps an external module that converts postcodes into addresses, for example. By bringing all the different logs together in Splunk and analyzing those trajectories based on the time, it can be detected, for example, that the server that has to convert these postcodes and addresses was not available, which caused an error. Or, perhaps the database server was extremely overloaded. That’s why the timestamp is so important.
Splunk is not new, but it is on the rise
Splunk is now a sizeable company with a turnover of around 1.5 billion dollars. The company has been in existence since 2003 but has become increasingly popular in recent years. Mainly because Splunk is increasingly versatile in addition to IT operations and security. Companies can actually just use Splunk for almost all forms of analytics. You just have to think out of the box from time to time about how to apply it best.
We understood that at the SplunkLive event in Florida, about 8000 people were present. Twice as many as the year before. In Utrecht, there were about 400 people present. This shows that the possibilities of Splunk are starting to become known in more and more companies, so the brand is emerging.
Searching for dashboards
Splunk has its own kind of programming language. This makes it possible to create commands that search all available data in Splunk. It is therefore not only possible to search for a keyword, but also for elements from logs that have been recognized by Splunk. For example, the aforementioned variables such as status code, IP address, URL, and so on.
These types of searches can also be linked to datasets with additional information. For example, it is possible to import a CSV file with all product codes, names and prices. If it is then possible to determine which product codes have been successfully ordered by people by means of a search, it is also possible to calculate how high turnover is in a certain period.
Of course, you don’t want to re-enter these kinds of searches every day and make them transparent. That’s why Splunk has developed a dashboard function. After doing a search, you can use a few tabs and options to change the layout of the result. For example, you can create line or column graphs, or you can add the result together as a large number (the total turnover, for example). As soon as you’ve created a graph, table or number, you can then save it on a dashboard, so that the next time you don’t have to repeat the search, you can simply open the correct dashboard on which various elements can be found with analytic information.
Analyzing all kinds of data, from tax authorities to Irdeto
We can’t emphasize enough how many different types of data you can analyze with Splunk. Although the company started out in IT operations and then developed into a Security Information and Event Management (SIEM) solution in the direction of the security market, it is now much more than that.
Belastingdienst
We talked to the Belastingdienst (the Dutch tax authorities) and Irdeto, among others. At the Belastingdienst, Splunk is used a lot. In total they process about 1.5TB of log data in Splunk every day, during the period that the whole of the Netherlands has to file his declaration, this amounts to 2TB. In total, the Tax and Customs Administration has about 1300TB of log data collected from more than 26,000 systems. Every hour more than 3,700 searches are carried out. There are also more than 100 teams using Splunk.
This is a substantial Splunk environment, which is not only keen to control and monitor infrastructure, but also to improve the image of the Belastingdienst. The Dutch department has used an entire Splunk implementation based on the DNS protocol to prevent phishing e-mails using DKIM and SPF records. These are used by criminals to send e-mails in order to extract money from unsuspecting citizens in the name of the Belastingdienst. By analyzing the RFC of the e-mail protocol and the SPF record, the Tax and Customs Administration has found a method to combat phishing. However, this goes a few steps further than a standard log analysis.
Irdeto
During SplunkLive, we also spoke to Irdeto, a company that offers all kinds of services, such as streaming media and granting game licenses. The infrastructure of Irdeto is actively analyzed through Splunk, as well as media streaming and licensing. Irdeto uses Splunk mainly to guarantee the SLAs that it has agreed around its services. It can clearly demonstrate that it complies with the agreements made and as soon as a piece of infrastructure runs up against the limits of what is acceptable, early action can be taken before there is a malfunction. It also keeps an active eye on its streaming activities, because sometimes a problem can originate from an unexpected source. In that case, if the use of the service suddenly decreases significantly, this may indicate a malfunction.
Splunk also makes sense for GDPR
One of the customers we spoke to during SplunkLive! also pointed out to us that Splunk is very useful in the context of the GDPR. As a company, you can never rule out the possibility of a data leak. There are simply too many processes and people involved. If it does happen, you risk a fine. If a company can demonstrate that everything has been done to prevent this from happening, there will often be no fine. With Splunk, you can keep track of all processes, so logs of the processes that have to guarantee compliance are part of this.
Next steps are AI, machine learning and automation
The next step for Splunk is to use artificial intelligence and machine learning to automate things further. That’s what it all started with Splunk Phantom in the field of security. It is also possible to combine apps from security companies so that analyses can be carried out automatically on the basis of logs, but also information from these external companies so that specific actions can be carried out. For example, shutting down or quarantining files, processes or complete systems.
It should be clear that this is not the end of the story for Splunk. If we look ahead to the Internet of Things and all the data that these edge devices and sensors will collect in the future, Splunk is an ideal tool to bring all that data together. Especially now that there is also support for streaming data, so that action can be taken immediately. The automation of actions in Playbooks, like with Splunk Phantom, will not be limited to security, but undoubtedly become much more versatile.