Category Archives: Red Hat

Software-Defined Datacenter ? No thanks, I prefer Open and Standardized

I recently did a presentation at HP Discover in Barcelona, Catalonia, called Red Hat’s vision for an open-hybrid cloud (the slides are also available). When preparing the presentation, I thought at first calling it “Red hat’s vision for a Software-Defined Datacenter”. The term “Software-Defined Datacenter” (SDDC), first coind by VMware, has become extremely popular in the IT industry in the past months. There are very few parts of the datacenter that cannot be “software-defined” anymore. The first element was the Software-Defined Networking (SDN), then followed by Software-Defined Storage (SDS), Software-Defined Computing (SDC),  that led to the SDDC.

However, during the preparation of my session, I stepped back a little and thought about what this “software-defined” trend was about and I asked myself this question: what datacenter today runs no BIOS ? no hypervisor ? no operating system ? no application server ? and no application ? None, of course. Why ? Because a datacenter has always been defined by software ! The difference with today’s IT industry are two factors that are driving efficiency: openness and standardization.

  • What is software-defined networking ? It is about taking a standard x86 server, connecting it to the network, and, through software, make it a controller for the network environment using open protocols.
  • What is software-defined storage ? It is about taking standard x86 servers and using the capacity of their internal disks and, through software, put their capacity at the disposal of clients through open access protocols.
  • What is software-defined computing ? It is about taking standard x86 servers and consolidating hundreds of servers virtualizing the standard x86 processors instructions.

A software-defined datacenter is nothing but an open, standardized datacenter.

But what about the cloud ? To me, cloud is the automation layer that will manage resources on top of this infrastructure. Be it public or private, a cloud creates an automated way to provision services by offering a service catalogue to users through a self-service portal.

The question is now with whom do you want to work to implement this open, standardized datacenter ?

After having freed yourself from proprietary, hardware-centric and purpose-built hardware, what would be the point of locking yourself again with a software vendor ? Openness on the infrastructure side can only be matched by openness on the software side, and Free and open-source software (FOSS) is the key for you to keep the control on your environment, and especially have the choice of different vendors to choose from. Open protocols are key to provide access to all part of this type of infrastructure, and that is the beauty of FOSS: there can be no proprietary protocol, as the way applications talk to each other is known by everyone. No secret sauce, no voodoo magic and no “trust us, everything is going to be fine”, just plain openness, from which you can only benefit.

Who do you think can help you building this open standardized datacenter ? In terms of vendors, think of one who’s been standardizing Unix platforms onto standard x86 servers with an open-source operating system for the past 20 years. Think of a vendor that provides storage solutions based on x86 servers and open protocols. Think of a vendor heavily involved in all of OpenStack’s modules, including Neutron, that manages networking. This is what Red Hat has been doing for the past 20 years: opening and standardizing.

The future might bring surprises. The trend toward ARM-based servers, SoCs, and hyperscale computing might create new silos of technology. Software-based storage on top of x86 servers will probably co-exist with fibre channel SANs for some time. But as long as your environment is as open (in hardware and software) and as standardized as possible, you are in good hands. But do not blindly trust vendors who claim they are open. Trust the open-source communities and the vendors who contribute the most to them.

Marathon completed

After 3 hours and 50 minutes of pain and a distance of 42.195 meters, I finally managed to cross the finish line of the Munich marathon.

I would like to thank all the people who donated money to my charity, it means a lot to me. It helped me stay focused when I needed it, both during the training and during the race. I really appreciate your support ! All in all, I received 508€ from private donations, but Red Hat, my employer, will contribute with 5000€ to this project. I am really proud to work for a company that cares about communities… not just the open-source ones !

I experienced the famous “down” after the 30 kilometers, I felt my legs hurting, but I also felt boosted by the people cheering us along the track ! There were even people waiving Breton flags. That definitely is the kind of things that motivated me !

It was a great experience and I think I did well for a first time. I did not know really know how to manage my race as I never ran 42km before. I think that I know my body’s reaction to pain a little bit better… and I am confident that I can be faster next time !

Here is a funny picture of me crossing the finish line with a big grin on my face !

Thanks again to all those who donated, you rock !

EX436 Red Hat Enterprise Storage Management

On my way to earn the Red Hat Certified Architect (RHCA) certification, I successfully passed the EX436, one of the certificates of expertise required to get the RHCA.

This certification is mainly about three things :
– storage management (iSCSI, LVM, multipathing, udev rules)
– Red Hat Cluster Suite
– Red Hat Storage

The full list of objectives is available on Red Hat’s website.

I did not take the course RH436 that prepares to the certification, so I guess that anyone can do it that way too. What do you need to pass this certification ? As usual with Red Hat, this certificate of expertise is based on hands-on tasks to realize, so there is no way to get this certification by just thinking you know about the technology. This presentation from Thomas Cameron, a Red Hatter, at the Red Hat Summit 2011 is a good start to get to know the technology. If you can do everything he does during the presentation, you are on a good way to get the certification 😉

If you do not have a system running on RHEL with a subscription, a CentOS server or virtual machine will do it too to sharpen your skills on the High-Availability Add-on of RHEL, on the Resilient Storage Add-on of RHEL and on Red Hat Storage.

Of course, a virtual environment is a good idea, for example to create multiple networks (such as application, cluster heartbeat, storage1 and storage2) that way you can train on multipathing with iSCSI targets.

Finally, in order to train on Red Hat Storage, downloading the packages from the community website might not be a too bad idea, as the community version of Gluster is not too different from the enterprise version used in Red Hat Storage (although that might change in the future).

All in all, this certification was not too difficult, although I still learned a lot by training for it ! Next certificate of expertise in sight : the EX442 Red Hat Enterprise Performance Tuning Expertise which is, from what I heard from my colleagues, much more challenging. I am looking forward to taking that one !

Red Hat Certified Engineer

When starting at Red Hat as a solution architect, one of the things on is expected to do is become Red Hat Certified Engineer.
This certification happens in two steps. The first exam is the Red Hat Certified System Administrator (RHCSA) exam, and the second is the actual RHCE exam. You need to pass both successfully to become a RHCE. Although I was a RHCSA for a couple of weeks now already, I failed at my first attempt (as do 60% of all participants !) at the RHCE, and only last week did I get my RHCE.

This certification is made of a couple of hands-on tasks. Unlike the LPI, which is a multiple-choice questionnaire and where luck can play a role, you  actually need to really know how things work with the RHCE,which makes it so interesting and challenging.

Now that I am a RHCE (which can be verified) I will continue with other courses, next one will be the Red Hat Enterprise Virtualization class to become RHCVA (Red Hat Certified Virtualization Administrator). You can see below the entire curriculum that leads to the ultimate title, the Red Hat Certified Architect (RHCA)

Update : I get my RHCVA last week. Again, it was a hands-on exam with standard tasks for administrators (i.e. setup a complete virtualization environment with management server, hypervisor, etc.). My next goal is the EX436, clustering and storage management !

Gluster: an open-source NAS solution

On October 7th 2011, Red Hat has announced the acquisition of a company called Gluster, which has developed a distributed NAS technology based on open standards. This technology is included in a software appliance that can be deployed to share files over the network.

Why is this an interesting move ? Because NAS is ideal to store unstructured data, and that is the area that grows the fastest in the storage industry.

But what is structured data as opposed to unstructured ? These are entries that follow a strict definition: such as defined numbers (order numbers inside a company for example) or character strings (such as like customer IDs), etc. SANs (Storage Area Networks), such as Fiber Channel and iSCSI, are generally a good solution to store this structured data – in general in a database. However, NAS is the right solution for the incredible quantity of data that is produced every day from various sources (sensors, digital cameras, spreadsheets, presentations, etc.). According to an ESG report,  the market for NAS will grow by 72% compound annual growth rate (CAGR) from 2010 to 2015 !

IT departments wanting to setup a network-attached storage environment had so far two main options:
– a simple NFS (network file system) server. That simple and cheap solution can be installed on any Linux or Unix server. However, this centralized solution concentrates the file accesses on on single server. Replication, failover and disaster recovery are limited, customized and cumbersome processes and the single server can become a performance bottleneck.
– dedicated appliances based on proprietary technology from EMC, NetApp, etc. Although they have very powerful features and are nicely integrated in enterprise environments, they are very expensive.

What most organizations have been asking, though, is to first to reduce the costs of storing this data that grows at incredible rates and second, have the capability to “burst” and leverage cloud capabilities, such as Amazon’s S3 while managing hybrid environments in an easy manner. Yet legacy solutions cannot offer solutions that cover these needs, and that is precisely why Gluster was developed.

I was fortunate enough to go to Mountain View, CA for a training with the Gluster people and discover their technology that is now called Red hat Storage.

The three strengths of Gluster are its scalability (a cluster can contain up to 64 nodes -supported- and way beyond), its manageability (you manage your public and private cloud storage blocks from one single point of view) and reliability (high-availability is built-in and there is no centralized metadata server, hence no single point of failure).

The Gluster infrastructure is based on commodity hardware, i.e. x86_64 servers from HP, Dell or SuperMicro, with direct attached storage (DAS) at disposal, e.g. the disks that are shipped inside the server. Recommended configuration is to have the OS on two disks (RAID1) and the data on twelve remaining disks (RAID6). This storage space will be put at disposal inside the Gluster environment through the network. No need for an expensive array: just take the servers you already know and Gluster will transform them into storage boxes !

From an architectural point of view, it is very important to mention that, although the technology is called GlusterFS, Gluster is not yet another file system. Gluster leverages standard file systems (such as XFS in the software appliance supported by Red Hat) and provides mechanisms to access the data across multiple servers.

The Gluster architecture is based on four elements (the bricks, the nodes, the volumes and the clients) and looks like this:

[Picture courtesy of Red Hat]

– the node: the Gluster software is installed on our commodity hardware server and on RHEL. This combination is called a storage node.
– the brick: the storage available to the OS, for example the RAID disks, will be formatted with a standard XFS (with extended attributes) and mounted to a certain mount point. A brick equals a mount point.
– the volumes: Gluster will play a sort of LVM role by managing bricks distributed across several nodes as one single mount point over the network.
– the clients are computers which access the data. They can be standard Windows clients (vis CIFS), NFS clients, or they can use a specific Gluster client that provides enhancements over NFS, in terms of high-availability.

Example 1 of a Gluster deployment

Let’s take an example : we have two servers -node01 and node02- running RHEL and Gluster. These two servers are identical and have, for the sake of simplicity, one drive on which we want to store data. This drive is formatted with XFS and mounted to, for instance,  /brick1. This directory (and mount point) is identical on the two servers node01 and node02, to manage them more easily.

What happens next is  that we create one Gluster volume, called volume01 -how creative !- from each brick available on the two servers. As I mentioned above, Gluster will play a sort of LVM role, by creating one logical disk from the two distributed disks attached to node01 and node02.

It means concretely that if I mount the volume via the network from another computer, called client1 (for example via the Gluster client), I would do the following command:

[root@client1]# mount -t glusterfs node01:/volume1

and I would have access to the capacity of both drives via the network. From a client perspective, no matter what new files I would store, no matter files I would read, I would not know that the underlying data is actually distributed across multiple nodes. Moreover, if I were an administrator of the servers, I could access the files via their mount point without even knowing that Gluster is running, because it leverages the standard components of a Linux infrastructure.

Example 2 of a Gluster deployment

In this example, two business units (marketing and legal) need two different volumes, isolated from each other. We will have roughly the same configuration as before, but with two data disks per server. Each disk on the server will be dedicated either to legal or to marketing. From these two disks, we will then create two volumes, one called marketing, the other one legal, that will be mounted by their respective clients.

How are the files stored ?

In our first example, when the client wants to store a word processing file file called “myfile.odt” at a specific location on the volume (for example /gluster/myplace ), Gluster will take in account the complete path to the file (in our example /gluster/myplace/myfile.odt) and a mechanism called EHA (elastic hashing algorithm), based on the Davies-Meyer algorithm, will compute a hash that will then indicate on which node and on which disk the file will be stored. When the file must be retrieved, the path to the file is given by the client, Gluster will compute the hash and will then be able to find the file on the given node.

The interesting part of this EHA is that if you store, for example 100 files on a two nodes cluster, like in our first example, the distribution rate will be quite equal. After having saved the 100 files on the volume, and regardless of the complexity of their names, we will end up having roughly 50 files on the node01 and roughly 50 files on node02. Why is that so powerful ? Because instead of having one single server becoming a bottleneck, the cluster can spread the files across its nodes and ensure that the network bandwidth will not become an issue, thus generating a highly scalable solution.

One important thing is also that there is no centralized meta-data server. The hash is computed, for example by the client,  for every access and hence removes a huge single point of failure compared to competitive architectures. If the meta-data is broken, the data (and it can be up to petabaytes of it !) is simply gone, there is no way to find it back. Gluster, on the other hand, has no such centralized architecture, and the beauty of it is that there is no proprietary file system underneath. Every file can be accessed from a standard XFS file system, even if the Gluster deamon is shut down on the machine.

Mount type glusterfs ?

As you can see in the examples above, it is possible to mount the volumes with the mount types NFS or glusterfs. Indeed, in order to mount a Gluster volume in a “native” mode, the client needs a specific package installed. The advantage of this client is that high-availability is built-in (i.e. if a node fails, access to the replicated data is possible without any disruption) and also, the client is able to use the EHA to calculate itself the position of a certain file inside the cluster and hence will talk directly to the node that contains the data, thus reducing the speed to access data and reducing the network traffic.

What about High-Availability ?

Gluster offers the possibility to mirror bricks across the network. This means that if a node fails, the data will still be available via another node. It is also of course possible to combine both the distribution of files and the replication with, for example four disks : two used to save the data and two that are their replicas. After the node or the brick are available again, Gluster will use a technology called self-healing and will update -in the background- all the data that was modified during the downtime so that the data is identical on both replica after the self-healing process is done.

When it comes to disaster-recovery, it is also possible to use a two-way technology called georeplication that maintains asynchronously a copy of the data at another site. The recovery site can be a Gluster cluster, or another type of storage.

What are the advantages for my organization ?

Gluster is a great technology that brings a lot of advantages. The highlights are definitely that Gluster :
– increases the availability of the data by replicating the data and by having no meta-data server i.e. no single point of failure
– manages the data better. The command-line interface is very intuitive and is able to manage petabytes of data in an easy way
– scales to petabytes level, by spreading linearly the data across multiple nodes, hence avoiding the creation of bottlenecks
– lowers the costs of storage by using commodity hardware

I think that Red Hat was very smart to extend its portfolio to storage. Indeed, after the commoditization of the server market from proprietary Unix architectures to standard Linux servers, it is time for the storage vendors to become more open and dramatically increase their affordability. This is just the beginning…