SoftNAS High-Performance SQL Storage

SoftNAS High-Performance SQL Storage

Many companies are moving SQL Server deployments to the cloud. If you are moving an app that is using SQL Server to the cloud, you will want to achieve the best performance possible, while keeping cost lowSoftNAS can be used in the cloud to optimize SQL performance and free up resources.  By reducing resource load on the SQL Server businesses will benefit with lower SQL licensing cost, and the reduce the cost associated with cloud storage.  

As a unified solution, SoftNAS provides an excellent base for Microsoft Windows Server deployments by providing iSCSI for Microsoft SQL Server, and network file system (NFS) or server message block (SMB) file storage for Microsoft Windows client access. This reduces complications for companies that are migrating from the data center to the cloud. While also enjoying the benefits of dedicated storage channels. 

Take advantage of high-performance storage by using SoftNAS and Microsoft SQL Server together. Protect your data and keep your SQL Server highly available while scaling up without limits.  

Microsoft SQL Server datastores consist of two fundamental data structure containers: data files and log files. Storing the data files and log file on separate disk drives distributes I/O activities. Backups should be stored on different storage located at a different availability zone. 

SoftNAS use the concept of a storage pool, which is a collection of storage and cache devices exclusively assigned to the pool. Storage is provisioned in shared file systems or block storage, and it is backed by intent-log and write-cache devices 

SoftNAS allow data on disk to be compressed automatically and transparently to the application. Various levels of compression are available, with increasing performance impact as the data compression level increases. 

SoftNAS offers a broad range of instance sizes and region availabilityIt’s important to select the right instance size to configure a storage solution that is the right combination of performance and price for your use case. General guidance and a guide tool are provided to help you select an instance size for your workload to get your project started. Our instance size calculation tool is available directly on our main website:  

For extremely heavy workloads, increase cache memory with “High-Memory Instances” and/or use EBS-Optimized and Provisioned IOPS to provide better control over available IOPS. 

The above tool is designed to help new users find the right initial instance size for their workload quickly and easily. Buurst always recommends further analysis and testing of their selected instance until workload characteristics are fully understood. This will allow the customer to then refine their instance size selection to the perfect balance of performance and cost. 

Available Memory: SoftNAS uses around 1 GB of RAM for the kernel and system operation. Memory beyond 1 GB is available for use as cache memory, which greatly improves overall system performance and response time – more memory = better performance, to a point. If application workloads involve a high number of small, random I/O requests, then cache memory will provide the best performance increase by reducing random disk I/O to a minimum. If running a SQL database application, cache memory will greatly improve query performance by keeping tables in memory. At a minimum, 2 GB of RAM will yield around 1 GB for cache. For best results, start with 4 GB or more RAM. With deduplication, add 1 GB of RAM per terabyte of deduplicated data (to keep deduplication look-up tables in RAM) 

CPU: SoftNAS needs a minimum of 2 CPUs for normal operation. To maintain peak performance when using the Compression feature, add CPUs (e.g.,4 CPU) if CPU usage is observed at 60% or greater on average. 

Network – In EC2, SoftNAS uses Elastic Block Storage (EBS), which are disks running across the network in a SAN (storage area network) configuration. This means all disk I/O travels across a shared network connecting the EC2 computing instance with the SAN. This makes network I/O an important factor in SoftNAS® environment performance. 

Multiple Performance & Scale Options: EC2 offers Fixed Performance Instances (e.g. m3, c3, and r3) as well as Burstable Performance Instances (e.g. t2) for occasional heavy use over baseline. EC2 also offers many instance sizes and configurations. Consider all potential networking requirements when choosing instance type. Purchasing models include On-Demand, Reserved, and Spot Instances. 

For more information on deploying SoftNAS for High Performance SQL, see the deployment guide. 

SoftNAS storage solutions for SAP HANA

SoftNAS storage solutions for SAP HANA

As we all know, one of the most critical parts of any system is storage. Buurst SoftNAS provides cloud storage performance to SAP HANA. This blog post will make it easier for you to understand the options available to SoftNAS and SAP HANA to improve data performance and reduce your environment’s complexity. You will also learn how to choose specific storage options for their SAP HANA environment.

SAP HANA is a platform for data analysis and business analytics. HANA can provide insights from your data—faster than traditional RDMS systems. Performance is essential for SAP HANA because it helps to provide information more quickly. SAP HANA is optimized to work with real-time data, so performance is a significant factor.

All data and metadata for the SAP HANA system are in shared objects. These objects are copied from data tables to logical tables and accessed by SAP HANA software. So, as this information is grown, the impact on performance grows as well. By using SoftNAS to address performance bottlenecks, SoftNAS can accelerate operations that otherwise might be less efficient.

For example, the copy operation will undoubtedly be faster if you deploy storage with read-cache. Read cache is implemented with NVMe or SSD drives and helps copy the parameters from source tables to specialization indexes. Tables are frequently written, such as ETL operations, the technique of storing data and logs in SoftNAS reduces the complexity and resiliency, which reduces the overall risk for data loss. Cloud NAS can also improve your data requests’ response times and other critical factors like resource management and disaster recovery.

Storing your data and logs volumes on a NAS would certainly improve resilience. Using a NAS with RAID also allows for added redundancy if something goes wrong with one of your drives. Utilizing RAID will not only help to ensure your data is safe, but it will also allow you to maintain a predictable level of performance when it comes time to scale up your software.

Partitioning of data volumes allows for efficient data distribution and high availability. Partitioning will also help you scale up your SAP HANA performance, which can be a challenge with only one large storage pool. Partitioning will involve allocating more than one volume to each table and moving the information across volumes over time.

SAP HANA supports persistent memory for database tables. Persistent memory retains data in memory between server reboots. Loading data requires time to boot and load the data and then refresh the data. With SAP HANA deployed with SoftNAS storage, loading times are not a problem at all. The amount of data you consume will significantly benefit from persistence memory. While reading (basically accessing) records from persistent memory takes a long time, writing to the memory works much better with SoftNAS.

SoftNAS data snapshots enable SAP HANA backup multiple times a day without the overhead of file-based solutions, eliminating the need for lengthy consistency check times when backing up and dramatically reducing the time to restore the data. Schedule multiple backups a day with restore and recovery operations in a matter of minutes.

CPU and IO offloading help to support high-performance data processing. Reducing CPU and IO overhead effectively serves to increase database performance. By moving backup processes into the storage network, we can free up server resources to enable higher throughput and a lower Total Cost of Ownership (TCO).

You want to deploy SAP HANA because your business needs access to real-time information that allows you to make timely decisions with maximal value. SoftNAS is a cloud NAS that will enable you to develop real-time business applications, connecting your company with customers, partners, and employees in ways that you have never imagined before.

Three Technology Trends Helping to Revive the Oil and Gas Industry

Three Technology Trends Helping to Revive the Oil and Gas Industry

My career has somehow always aligned with oil and gas technology. From my time at Sun Microsystems to Microsoft to AWS and now Buurst. I can say that 2020 has been one of the most painful years. Starting with a price war between Saudi Arabia and Russia followed by a global slowdown from Covid-19 to the reduced cost of competing energy sources. Ouch! So how do we start the recovery?

Here are three important technology trends to help Oil and Gas bounce back.

Trend One: The Cloud

We all know drilling dry holes is no longer acceptable. Leveraging geospatial application technology like Halliburton Decision Space 365® (Landmark), Schlumberger Petrel®, or  IHS Markit Kingdom® have become table stakes to ensuring you not only know the best place to drill, you know the right place to drill.

Most energy companies have invested $10s of millions of dollars into building world-class data centers dedicated to this work. These investments are essential and are of exact strategic value. But the costs keep going up, and you need more and more IT and security people to manage your investment (it’s a dangerous world out there). Worst of all, you get to replace this investment every three years if you want to stay competitive.  What do you do when you don’t have $20M to retool? Today there is an answer: Move the workloads to the cloud.

Why the cloud? You get the fastest and most up to date processing power without the need to buy the infrastructure. Moving to the cloud lets you move your investment from capital expense to an operating expense that you pay for by the hour, all backed and secured by companies like Microsoft (Azure) and Amazon (AWS). Moving to the cloud is happening today, and it’s happening fast. We see our Energy business grow more quickly this year than over the past seven years. There is a tipping point for the cloud and 2020.

Trend Two: Lift and Shift

Saving investment dollars, closing datacenter, and simplifying your IT footprint is a crucial goal of moving to the cloud. Companies often take two approaches. The first approach is to move specific workloads that are data and processing-intensive. Geospatial applications are great examples. The second approach is to focus on closing your datacenters. Many companies have thousands of applications in their data centers, and the prospect of moving these can be daunting. Migrating an application or an entire data center is commonly referred to as Lift & Shift. The good news is that 80% of these apps will go with no or very few issues. So now, what do you do with the 20% of the applications that are hard to move? If the goal is to close the data center, you can’t finish till all the applications are migrated. If the data center is still open, you cant achieve the cost savings. Your IT infrastructure will be more complicated, making the hard to move apps move is essential.

Many companies go down the path of rewriting these hard to move applications, but it’s unnecessary. Mostly these apps don’t work in the cloud because they leverage protocols that are not supported in the cloud or are latency-sensitive. The number one most common protocol that is unsupported in the cloud is iSCSI. There are solutions here, and it’s essential to leverage them for these hard to move apps.  Buurst SoftNAS is a great example.

Legacy SQL Server workloads often fall into this category, and the countless instances across an enterprise could take all your DevOps resources years to rework. Don’t let your DevOps resources work on legacy workloads. These expensive and vital resources should be building the applications of the future. Leverage cloud storage solutions that support the protocols you’re using and move forward.

Trend Three: Cross-platform business partnerships

So this trend is a little controversial but essential. If you ask any cloud vendor about a multi-cloud strategy, they will always tell you just to pick one, and make sure it’s them. There are some excellent reasons to choose one cloud, and it’s worth spending time to look before you make the decision. The trend we’re seeing is to pick a solution that works in different clouds, so moving will not require reengineering your infrastructure.

A VMware hypervisor is a great example. You probably use it in your data center today, so moving it is a straight forward effort. Storage is often overlooked and becomes the “lock-in” element of choice for cloud vendors. Making sure you know what makes a cloud sticky is essential. If you know going in, you can avoid making costly mistakes. Fortunately, there are many great partners like SoftwareONE, Kaskade.Cloud, CANCOM, VSTECS, or LANStatus, to name a few with cloud architects that can help you manage this part of your transition.

BP recently said that oil demand may never rebound to pre-pandemic highs as the world shifts to renewables. I don’t know if that’s true, but for now, there is a clear need to rethink spending and change resources to take advantage of the learning from other industries. No one ever wants to be first to take the plunge. Fortunately, companies like Halliburton, Schlumberger, Petronas, IHS Markit, and ExxonMobil have all moved and are leveraging these strategies. Come on in, the water’s fine.

Do IOPS really matter?

Do IOPS really matter?

From the beginning of the Storage era, almost all storage vendors challenged each other to achieve the highest number of IOPS possible. There are a lot of historical storage datasheets showing IOPS as an only number and probably customers at that time only followed those numbers

Do IOPS really matter?

The short answer here is: a little bit”. It is a one factor of several other factors. After the data revolution a lot of things got changed. Now the source of data could be millions of devices in an IoT system, that means there are millions of systems that are trying to Read/Write simultaneously. The type of workloads dramatically varies especially in the presence of caching from write intensive media solutions like VDI solutions to read intensive in the database world. The time to reach the data become extremely important in several time-sensitive architectures like core banking. 

So now the huge numbers measured in millions is nothing to be proud of, so let us check what other factors we need to check before selecting or judging our storage 

How IOPS are measured and does that related to your workload? 

Storage vendors used to do their benchmarks in a way that helped them reach a higher number of IOPS, usually using few number of clients which might not be your use case, small block size such as 4k which might be much more lower than the one you need, random workloads where SSD speed grows 50% Read/Write which also might not be related to for example VDI or archiving workloads. Usually the reads are much faster than writes especially in RAID arrays. Such type of benchmarking will lead to a huge number of IOPS which might be not relevant to workloads that may need lower amount of IOPS, but more data written per each IOPS that may introduce a game-changing factor which is latency. 

Latency does matter!

Latency is a real critical factor, never accept those huge IOPS numbers without having a look at the latency figures. 

Latency is how long it takes for a single IOPS to happen, by increasing the workload the storage hardware including the controller, caching, RAM, CPU, etc will try to keep the latency consistent but things are not that ideal, at certain huge number of IOPS things will go out of control and the storage appliance hardware will get exhausted and more busy, so a delay serving the data will start getting noticed by the application and problems will start to happen. 

Databases for example are very latency sensitive workloads, usually they need small latency [5ms or lower] especially during writing otherwise there will be a huge performance degradation and business impact. 

So if your business is growing and you noticed degradation in your database performance, You don’t only need a storage with higher IOPS rate but with lower latency as well which leads us to another side point which is storage flexibility that Buurst can help you with. Just few steps you can upgrade your storage with whatever numbers that satisfies your workload 

IOPS/latency

Note:
Although the storage supports up to 10m IOPS, but it is almost not usable after 2m IOPS

 

How to get a storage that will work?

Generally speaking, any storage data sheet is not usually meant for you, but it can be somehow relevant and give you an idea about the general performance of the storage especially if it includes: 

1. Several benchmarks based on different block size, different read/write ratio and for both the sequential and random workload cases.

2. The number of clients used per each benchmark, the RAID type and the storage features [compression, deduplication etc].

3. The IOPS/latency charts for each of the above case, which is the most important thing. 

That is not all, if you are satisfied with those initial metrics, you are recommended to ask for a PoC to check how the storage works in your environment and in your specific case. 

Buurst will be happy to help you with the sizing and the PoC too with a trial license 

Data Loss Prevention

Data Loss Prevention

Through this post we will discuss more about data loss, which is the worst nightmare in the IT world, and how to protect ourselves in addition to how Buurst can help you keeping your data safe. 

Why we should care?

I believe the below numbers are enough to make us care: 

  • 93% of companies suffering from a catastrophic data loss do not survive – 43% never reopen and 51% close within two years. (University of Texas) 
  • 30% of all businesses that have a major fire go out of business within a year and 70% fail within five years. (Home Office Computing Magazine) 
  • 7 out of 10 small firms that experience a major data loss go out of business within a year. (DTI/Price Waterhouse Coopers) 
  • Every week 140,000 hard drives crash in the United States. (Mozy Online Backup)

%

of companies suffering from a catastrophic data loss do not survive

Know your enemy!

To know how to protect our data, of course we need to know what to protect it from. There is a wide range of events that can cause data loss. It might be Intentional, unintentional, due to failure, disaster or a crime. we can summarize them in the below points: 

  • Formatted disks/Deleted data that can happen due to human error or an application bug that may wipe out certain data 
  • Data corruption 
  • Catastrophic damage 
  • Corporate sabotage or an angry system admin that intentionally deleted all the data and even the backups on all sites (that happened) 
  • A hacker that gained a root privilege (this also happened). 
  • A virus, malware and ransomware 

No data or business is 100% safe, that is why you must have a backup strategy that can handle all these failures. But what strategy can handle all of that? 

There are several strategies depending on the budget and the criticality of data, one of the most common and somehow successful backup strategies is the 3-2-1 rule, that is acceptable and recommended by wide range of organizations including US-CERT [United States Computer Emergency Readiness Team]

What is a Backup? 

Before digging deeper into the 3–2-1 rule, let us first define what we meant by backup to avoid any misconception in the following sections: 

According to the Storage Networking Industry Association (SNIA):

a backup is a collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible – also called a backup copy

From between the lines, that means that a backup is an independent copy of the data i.e. stored on a different media. That is a very critical concept, and we will know why soon. 

The 3-2-1 backup rule 

1. Have at least 3 copies of your data
Three copies mean your original data that you are using plus two additional backups. Usually one copy in hand in case of any localized failure, you can restore it immediately

2. Keep these backups on 2 different media
These backups should be stored on two different media types or technologies since the same media type may have the same life span, and that is risky as you may lose both backups at the same time. The cloud can take care of that as your data is distributed on several medias by default

3. Store 1 backup offsite
This copy should be far away to be safe enough and survive any catastrophes like fires, earthquakes or wars that can remove a certain area from the map. I believe in the future this copy should be sent to another planet or even another solar system!

Backup Myths

Before proceeding to how Buurst can help you protecting your data based on the 3-2-1 rule, let us demolish two popular myths about backup:

1. I have RAID, I am Safe! 

That is a big misunderstanding for RAID, from its name it only cares about fault tolerance which a very different topic than backups which means according to SNIA: 

The ability of a system to continue to perform its function (possibly at a reduced performance level) when one or more of its components has failed. 

Backup is concerned about how to restore back any lost data through wide range of techniques, but it does not care about downtime as far as the data is safe and restorableOn the other hand, fault tolerance cares about business continuity in case of any failures. 

If you lose one disk, RAID is so important to keep your business going, as serving your first copy of data will keep going but it is not an independent copy of data, so it will never protect you from the other failures like data corruption or deletion. 

2. OK, I will take a snapshot

Snapshots are a great components in your backup strategy especially when it comes to replication, but it is not a backup by itself, as it does not create an independent copy, it just refers to data on the same disk, so it can only help restoring deleted data, but in case of data corruption or disks failures it cannot be used as a recovery medium

How Buurst can help you achieve the 3-2-1 backup rule? 

Snaprep, is a technology based on snapshots replicating between two nodes, the snapshot process has zero overhead on the performance and the storage space, it will be sent after compression to another independent node in another availability zone which is a different datacenter. 

Both nodes can have independent automatic snapshot schedule that can protect against data deletion. A SnapClone of any snapshots will allow you to serve/restore the data at the point in time it was taken 

The second node can be a redundant node and can serve the data in case of any failures and that will be discussed in a different article.  

So now we have two independent copies of the data, how about the third one?

You can use the second node as a backup source not to disturb the other node. You can integrate it with any backup solution you have, or you can use a third Buurst node [in a different region] to create a fully independent Disaster Recovery site by replicating the data to it using rsync or zfs send/receive etcThis will allow for a faster access of your data in case of an unforeseen failures which will eliminate time wastage when restoring from tapes (of course it is a time-budget trade off) 

So, by doing that we have achieved the 3-2-1 rule, by having 2 more copies of data one of them in a different region, but the question is: Is the 3-2-1 rule enough? 

Is the 3-2-1 rule enough? 

It will be sufficient in wide range of scenarios, but it will not protect against certain cases, your terminated backup admin got access to the three environments so he can easily remove everything including the snapshots and the DR site. A hacker with the same access can also do the same 

A new intelligent ransomware or virus that we never heard of can also affect all the data copies, and who knows, maybe it is smart enough to understand the snapshots and harm them too, that is why more backup models got introduced to mitigate such problems such as 3-2-2 and 3-2-3 that can be a discussion for another day 

Final thoughts 

There are a lot of data loss reasons and it will keep increasing. Humans are usually the biggest data threat by their intentional and unintentional activities. The race between attack and defense will keep going, so always review/update your risk management plan that will decide your backup strategy but try to avoid too much Paranoia! 

Sar, Elasticsearch, and Kibana

Sar, Elasticsearch, and Kibana

Kibana is a great visualization tool and this article shows how to automate building graphs and dashboards using API with sar logs as a data source.

Sar is an old, but good, sysadmin tool that helps answer many performance related questions…

Did we have a CPU spike yesterday at 2 pm when the customer complained?

Do we have enough RAM?

Do we have have enough IOPS with our brand new ssd disks?

Sar was a nice little tool that helped us collect statistics even without CloudWatch or SNMP or any other monitoring tool configured.

Well, sar has its issues. By default it collects statistics only once in 10 minutes and you will be deciphering the output like this:

01:00:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
04:30:01        all      0.25      0.00      0.23     99.52      0.00      0.00
04:40:01        all      0.25      0.00      0.21     99.54      0.00      0.00
04:50:01        all      0.26      0.00      0.22     99.52      0.00      0.00
05:00:01        all      0.24      0.02      0.23     99.51      0.00      0.00
05:10:01        all      0.26      0.00      0.23     99.51      0.00      0.00
05:20:01        all      0.24      0.00      0.20     99.56      0.00      0.00
05:30:01        all      0.26      0.00      0.22     99.52      0.00      0.00
05:40:01        all      0.25      0.00      0.22     99.53      0.00      0.00
05:50:01        all      0.57      0.00      1.01     48.45      0.00     49.97
06:00:01        all      0.32      0.00      0.41     10.32      0.00     88.95
06:10:01        all      0.24      0.00      0.19      0.33      0.00     99.25
06:20:01        all      0.23      0.00      0.18      0.35      0.00     99.24
06:30:01        all      0.24      0.00      0.17      0.32      0.00     99.27
06:40:01        all      0.24      0.00      0.19      0.36      0.00     99.21
06:50:01        all      0.46      0.00      1.00     25.55      0.00     72.99
07:00:01        all      1.26      0.00      3.52     90.35      0.00      4.87
07:10:01        all      1.26      0.00      4.01     90.57      0.00      4.16
07:20:01        all      1.07      0.00      3.56     89.42      0.00      5.95

This is actually a good example that shows some event possibly requiring further investigation. The server was clearly stuck on IO subsystem as the %iowait column shows it was more than 99%. At 05:50 it suddenly became better, iowait dropped to nearly zero and overall CPU usage was less than 0.5%. Surely something was going on!

Elasticsearch is a much more sophisticated technology. Elasticsearch is a distributed search and analytics engine, but when we really speak of Elasticsearch, we are speaking of a bunch of interconnected products commonly known as Elastic Stack:

Beans – many small agents to upload data to Elasticsearch.

Logstash – accepts data from the Beans, and after potentially complicated processing, uploads the transformed data into Elasticsearch.

Elasticsearch – the search and analytics engine and the heart of the Elastic Stack.

Kibana – a great visualization tool and a graphical interface into Elasticsearch.

Elastic (ELK) Stack Architecture

So, these capital letters comprise what used to be called an ELK stack – E from Elasticsearch, L from Logstash, and K from Kibana. These days we tend to include Beans into the Stack and call it Elastic Stack.

Performing virtual appliances health checks, our team often needs to analyze log sets from different customers on a regular basis. The logs contain tons of valuable information so why not feed it to Elasticsearch and see what happens!

Naturally, log files that we check most often have been sent to ElasticSearch using one of the beats – like the Filebeat – so we could visually explore the logs in Kibana pretty much instantaneously. Keeping the logs centrally is a good practice and ways to do it are really countless. Rsyslog, Splunk, Loggly, CloudWatch Logs are popular central log solutions and Elasticsearch fits really well in this family.

Sar logs are a usual part of the log sets to be analyzed but there is sometimes a tiny inconvenience with sar logs. They are often generated by older sar versions, and there are 2 problems with that:

1. The current sar does not understand the old version logs, and the old sar version needs to be installed just to process the sar logs.

2. The graphs can’t be easily produced due to the limitations of the old versions.

The backward compatibility of sars logs is out of our hands, and some practice and automation does not make the old sar version installation too much of a problem. At the same time, analyzing sar logs for many days and checking many parameters demands some graphical data presentation. For example, a current sar on Ubuntu allows these commands to run:

sadf -g > cpu.svg
sadf -g -- -r > ram.svg

See these graphs in your favourite browser or image viewer:

The older sar versions simply don’t have an option to produce graphics. Still, sar logs are well structured and Elasticsearch is a powerful tool to process logs in 2 easy steps:

1. Load sar data into Elasticsearch.

2. Use Kibana to do all the visualizations and dashboards based on the data in Elasticsearch.

So how do we do it automatically? By all means, there are many logs and we don’t want to do it manually after proof of the concept!

The answer is API and bash. We occasionally thought of writing API calls using Python or other full-featured language but bash proved to be more than enough for most cases.

We used 2 absolutely different APIs to do the task – the first API was Elasticsearch to load data and the second API was Kibana to create all the graphs and dashboards.

We have found that the Kibana API is less documented and we feel that more examples would benefit the community. As such, we provide all the API calls examples here. Each API call is a curl command referring to a json file. We shall provide both the curl command and the example json file for all the calls.

We have also utilized the Kibana concept of spaces to distinguish between logs from different servers. One space is only for one server. Ten servers means ten Kibana spaces. Using spaces greatly reduces the risk of processing data for the wrong server.

Depending on which metric we process in the loop, we used the following commands on the sar log referred as $file below.

for CPU:

sadf -d `echo $file`

for RAM:

sadf -d `echo $file` -- -r

for swap:

sadf -d `echo $file` -- -S

for IO:

sadf -d `echo $file` -- -b

for disks:

sadf -d $file -- -d -p

for network:

sadf -d $file -- -n DEV

Once we have output from one of the above commands or whatever other command we want to process further and vizualize, it’s time to create the indexes in ElasticSearch. Indexes are required so there is a place where we can upload sar data. For example, the index for CPU data is created this way:


curl -XPUT -H'Content-Type:application/json' $ELASTIC_HOST:9200/sar.$METRIC.$HOSTNAME?pretty -d @create_index_$METRIC.json
 
$ cat create_index_cpu.json

{
  "mappings": {
    "properties": {
      "hostname":    { "type": "keyword" }, 
      "interval":  { "type": "integer"  },
      "timestamp":   {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss zzz"
      },
      "CPU":    { "type": "integer" }, 
      "%user":  { "type": "float"  },
      "%nice":   { "type": "float"  },
      "%system":    { "type": "float" }, 
      "%iowait":  { "type": "float"  },
      "%steal":   { "type": "float"  },
      "%idle":   { "type": "float"  }
    }
  }
}

Once the indexes for all the metrics are created, it’s time to upload sar data into Elasticsearch indexes. Bulk upload is the easiest way and below is an example json file for swap sar data:


curl -H 'Content-Type: application/x-ndjson' -XPOST $ELASTIC_HOST:9200/_bulk?pretty --data-binary @interim.json

$ more interim.jso

{"index": {"_index": "sar.swap.server1.example.com "}}
{"hostname":"# hostname","interval":"interval","timestamp":"timestamp","kbswpfree":"kbswpfree"
,"kbswpused":"kbswpused","%swpused":"%swpused","kbswpcad":"kbswpcad","%swpcad":"%swpcad"}
{"index": {"_index": "sar.server1.example.com "}}
{"hostname":"SoftNAS-A83PR","interval":"595","timestamp":"2020-06-01 05:10:01 UTC","kbswpfree"
:"0","kbswpused":"4128764","%swpused":"100.00","kbswpcad":"23324","%swpcad":"0.56"}
{"index": {"_index": "server1.example.com"}}
{"hostname":"SoftNAS-A83PR","interval":"595","timestamp":"2020-06-01 05:20:01 UTC","kbswpfree"
:"0","kbswpused":"4128764","%swpused":"100.00","kbswpcad":"23324","%swpcad":"0.56"}
{"index": {"_index": "server1.example.com"}}
{"hostname":"SoftNAS-A83PR","interval":"595","timestamp":"2020-06-01 05:30:01 UTC","kbswpfree"
:"0","kbswpused":"4128764","%swpused":"100.00","kbswpcad":"23324","%swpcad":"0.56"}

All Elasticsearch work is done now. Data is uploaded to Elasticsearch indexes and we are switching to Kibana to create a few nice graphs.

First, we change the Kibana time format and Kibana time settings to how we like them.

The settings could be found in advanced settings in the Kibana UI but it’s easy to forget for any new Kibana installations:


curl -X POST -H "Content-Type: application/json" -H "kbn-xsrf: true" -d @change_time_format.json  http://$KIBANA_HOST:5601/s/$SPACE_ID/api/kibana/settings

curl -X POST -H "Content-Type: application/json" -H "kbn-xsrf: true" -d @change_time_zone.json  http://$KIBANA_HOST:5601/s/$SPACE_ID/api/kibana/settings

$ cat change_time_format.json 
{"changes":{"dateFormat:scaled":"[\n  [\"\", \"HH:mm:ss.SSS\"],\n  [\"PT1S\", \"HH:mm:ss\"],\n  [\"PT1M\", \"MM-DD HH:mm\"],\n  [\"PT1H\", \"YYYY-MM-DD HH:mm\"],\n  [\"P1DT\", \"YYYY-MM-DD\"],\n  [\"P1YT\", \"YYYY\"]\n]"}}

$ cat change_time_zone.json 
{
  "changes":{
    "dateFormat:tz":"Etc/GMT+5"
  }
}

Lets create a Kibana space for each server

The screenshot shows the space selector page, where we choose to keep using the default space or choose one of the server spaces created with the api call above.

curl -X POST -H "Content-Type: application/json" -H "kbn-xsrf: true" -d @interim.json  http://$KIBANA_HOST:5601/api/spaces/space


$ cat interim.json 
{
  "id": "server1.example.com",
  "name": "server1.example.com"
}

Now, the real Kibana work – create index patterns. The example shows json file for swap data:

curl -X POST -H "Content-Type: application/json" -H "kbn-xsrf: true" -d @interim.json  http://$KIBANA_HOST:5601/s/$SPACE_ID/api/saved_objects/index-pattern

$ cat interim.json 
{
  "attributes":
    {
      "title": "sar.swap.server1.example.com *",
      "fields": "[{\"name\":\"kbswpfree\",\"type\":\"number\",\"esTypes\":[\"float\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"kbswpused\",\"type\":\"number\",\"esTypes\":[\"float\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"%swpused\",\"type\":\"number\",\"esTypes\":[\"float\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"kbswpcad\",\"type\":\"number\",\"esTypes\":[\"float\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"%swpcad\",\"type\":\"number\",\"esTypes\":[\"float\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"swap\",\"type\":\"number\",\"esTypes\":[\"integer\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"_id\",\"type\":\"string\",\"esTypes\":[\"_id\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":false},{\"name\":\"_index\",\"type\":\"string\",\"esTypes\":[\"_index\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":false},{\"name\":\"_score\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":false,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"_source\",\"type\":\"_source\",\"esTypes\":[\"_source\"],\"count\":0,\"scripted\":false,\"searchable\":false,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"_type\",\"type\":\"string\",\"esTypes\":[\"_type\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":false},{\"name\":\"hostname\",\"type\":\"string\",\"esTypes\":[\"keyword\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"interval\",\"type\":\"number\",\"esTypes\":[\"integer\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timestamp\",\"type\":\"date\",\"esTypes\":[\"date\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true}]"
    }
}

Create graphs, which are called visualizations in Kibana. The json file below is for one of the CPU graphs:


curl -X POST -H "Content-Type: application/json" -H "kbn-xsrf: true" -d @$METRIC.$HOSTNAME.$i.json http://$KIBANA_HOST:5601/s/$SPACE_ID/api/saved_objects/visualization


$ cat cpu.server1.example.com.%user.json
{
  "attributes":
    {
      "title": "sar-cpu-server1.example.com-%user",
      "visState": "{\"title\":\"%user\",\"type\":\"line\",\"params\":{\"type\":\"line\",\"grid\":{\"categoryLines\":false},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"filter\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Max %user\"}}],\"seriesParams\":[{\"show\":true,\"type\":\"line\",\"mode\":\"normal\",\"data\":{\"label\":\"%user\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"lineWidth\":2,\"interpolate\":\"linear\",\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":false,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false,\"labels\":{},\"thresholdLine\":{\"show\":false,\"value\":10,\"width\":1,\"style\":\"full\",\"color\":\"#34130C\"},\"dimensions\":{\"x\":null,\"y\":[{\"accessor\":0,\"format\":{\"id\":\"number\"},\"params\":{},\"aggType\":\"count\"}]}},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"max\",\"schema\":\"metric\",\"params\":{\"field\":\"%user\"}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"timestamp\",\"useNormalizedEsInterval\":true,\"scaleMetricValues\":false,\"interval\":\"10m\",\"drop_partials\":false,\"min_doc_count\":1,\"extended_bounds\":{}}}]}",
      "uiStateJSON": "{}",
      "description": "",
      "version": 1,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[],\"indexRefName\":\"kibanaSavedObjectMeta.searchSourceJSON.index\"}"
      }
    },
  "references": [
      {
        "name": "kibanaSavedObjectMeta.searchSourceJSON.index",
        "type": "index-pattern",
        "id": "2a5ed4b0-b451-11ea-a8db-210d095de476"
      }
    ]

}

We are pretty much done but we could have generated dozens of graphs by now, so lets make a few dashboards to organize graphs by metric, meaning one dashboard for CPU, one for RAM, one for each disks, etc:


curl -X POST -H "Content-Type: application/json" -H "kbn-xsrf: true" -d @$INTERIM_FILE http://$KIBANA_HOST:5601/s/$SPACE_ID/api/saved_objects/dashboard

{
  "attributes":
    {
      "title": "sar-swap-server1.example.com",
      "hits": 0,
      "description": "",
      "panelsJSON": "[{\"version\":\"7.5.1\",\"gridData\":{\"w\":12,\"h\":8,\"x\":0,\"y\":0,\"i\":\"sar-swap-softnas-a83pr-kbswpfree\"},\"panelIndex\":\"sar-swap-softnas-a83pr-kbswpfree\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"},{\"version\":\"7.5.1\",\"gridData\":{\"w\":12,\"h\":8,\"x\":12,\"y\":0,\"i\":\"sar-swap-softnas-a83pr-kbswpused\"},\"panelIndex\":\"sar-swap-softnas-a83pr-kbswpused\",\"embeddableConfig\":{},\"panelRefName\":\"panel_1\"},{\"version\":\"7.5.1\",\"gridData\":{\"w\":12,\"h\":8,\"x\":24,\"y\":0,\"i\":\"sar-swap-softnas-a83pr-%swpused\"},\"panelIndex\":\"sar-swap-softnas-a83pr-%swpused\",\"embeddableConfig\":{},\"panelRefName\":\"panel_2\"},{\"version\":\"7.5.1\",\"gridData\":{\"w\":12,\"h\":8,\"x\":36,\"y\":0,\"i\":\"sar-swap-softnas-a83pr-kbswpcad\"},\"panelIndex\":\"sar-swap-softnas-a83pr-kbswpcad\",\"embeddableConfig\":{},\"panelRefName\":\"panel_3\"},{\"version\":\"7.5.1\",\"gridData\":{\"w\":12,\"h\":8,\"x\":48,\"y\":0,\"i\":\"sar-swap-softnas-a83pr-%swpcad\"},\"panelIndex\":\"sar-swap-softnas-a83pr-%swpcad\",\"embeddableConfig\":{},\"panelRefName\":\"panel_4\"}]",
      "optionsJSON": "{\"useMargins\":true,\"hidePanelTitles\":false}",
      "version": 1,
      "timeRestore": false,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"
      }


    },
    "references": [

      {
        "name": "panel_0",
        "type": "visualization",
        "id": "56224aa0-b451-11ea-a8db-210d095de476"
      },
      {
        "name": "panel_1",
        "type": "visualization",
        "id": "56b95a80-b451-11ea-a8db-210d095de476"
      },
      {
        "name": "panel_2",
        "type": "visualization",
        "id": "5752db60-b451-11ea-a8db-210d095de476"
      },
      {
        "name": "panel_3",
        "type": "visualization",
        "id": "57ec5c40-b451-11ea-a8db-210d095de476"
      },
      {
        "name": "panel_4",
        "type": "visualization",
        "id": "58865250-b451-11ea-a8db-210d095de476"
      }
    ]

}

Json files often look scary, but they are not actually. Once the desired object is created manually in Kibana UI, the json could be found and copy-and-paste is easily applied with only a minor editing or auto replacement.

Just a few more API calls are required while coding all the visualizations and dashboards:

Get index pattern id:

curl -X GET -H "Content-Type: application/json" -H "kbn-xsrf: true" http://$KIBANA_HOST:5601/s/$SPACE_ID/api/saved_objects/_find?type=index-pattern&fields=title

Get visualization id:

curl -X GET -H "Content-Type: application/json" -H "kbn-xsrf: true" "http://$KIBANA_HOST:5601/s/$SPACE_ID/api/saved_objects/_find?type=visualization&per_page=1000"

 Lets enjoy the newly created dashboards!

 The CPU dashboard shows a spike related to a massive data copy operation:

The RAM dashboard shows the same data copy operation from a memory consumption point of view:

The root disk dashboard:

The data disk dashboard. The server has 4 data disks in RAID 0 and the dashboard shows metrics for one the data disks: