Category Archives: IT

Free Software companies create shared value

The free-market capitalistic definition of companies’ goals was, for a long time, very simple: to make as much profit as possible. With that in mind, the only difference between a success and a failure was the investor’s return on investment. Short-term profit became priority number one. However, this classic definition of capitalism has transformed the way companies are perceived in the population over time.

Businesses are now considered to be prospering at the expense of the communities. Although people buy from companies, they are not entities people want to trust. Firms are perceived as being obsessed with financial results and having a detrimental effect on their surrounding environment.

TV series, which are an expression of popular culture, are an interesting example of this change. As a child, I used to watch TVseries, such as Knight Rider or Airwolf, where private companies or foundations helped fight organized crime and bring justice. Looking at TV series of today, the contrast is striking. Prison Break and Heroes both staged corporations as their villains. Funnily, all these firms are called “The Company,” making them even more impersonal. In their respective shows, these companies are instigators of conspiracies and use treason, murder, and crime to reach their objectives. The criminals are just executing the master plans of the board of directors while the heroes fight them for justice.

This expression of the popular culture demonstrates the current perception of companies in communities: ruthless managers will do whatever it takes to optimize short-term financial results at the expense of the rest of the world.

Companies, under customer or regulatory pressure, try to correct this image through periodic social actions. However, creating shared value (for both society and corporations) should not be put at the margin of the business model, but at the core. The business model should be to do good, and make money out of it, rather than make money…and do good if time and resources permit.

An interesting example to me was the quarterly earnings calls of a tech company I have invested in. The top managers spoke for an hour and a half about financial results, goals and business initiatives, until the VP of corporate responsibility had a meagre fifteen-minute time slot to present all charitable actions undertaken by the company. These actions are laudable. Nonetheless, it really gave me the feeling that making profit while doing something good for communities was an indirect result through the product and services sold, but was not at the core of this company’s business.

This vision of doing good first and making money out of it may seem like a post-capitalist utopia, driven by top managers lacking social recognition. However, this model already exists and already generates billions of dollars. Some companies’ business units already work according to these principles, but more importantly, many firms are completely focused on creating shared value.

Free software businesses are a great example of creating this shared value. Companies like Red Hat, Talend, or Pentaho embody the principles of shared value by making billions of US dollars of revenue and supporting communities worldwide. Distributing software under, for instance, the General Public License, these businesses charge neither companies nor consumers for the use of their products, but rather for support, consulting, and services. Moreover, they provide the source code (the instructions that make the programs work) for study or modification. The comparison to a proprietary approach is fascinating: instead of selling a license, free software providers sell a solution.

They don’t sell a product; they sell value.

Beyond the sole scope of corporations, free software creates shared value by directly serving disadvantaged communities, providing incredibly advanced technology to lower-income homes. The beauty of free and open source software (FOSS) is that it can be distributed at no charge. Anyone can install and use a zero-cost operating system and applications, provided by company-sponsored initiatives, such as Fedora or Ubuntu. The open source licenses make it very easy to adapt the products to the needs of users, reusing components already developed by other projects. The GPL, for instance, allows the use of GPL-licensed software for any purpose. Unlike proprietary software, which is bound to countless usage limitations, free and open source software fosters the usage of technology by giving the freedom of use to the end users. Of course, hardware is still needed at some point in time, but open source software can be equally used by people in developed as well as in developing countries, thus providing cutting-edge technology (such as virtualization) at virtually no cost.

Free software companies support the development of free and open source software projects. Their creation, such as the way the Linux kernel is written, is a collaborative approach. Anyone can participate and send patches to correct bugs in the program or launch the development of a brand-new module. Companies relying on such community projects and benefiting from the huge manpower provided by these communities cannot take total control of them. Though they can influence them by offering technological support in the form of contributions, they have to take into account the will and motivation of the community to make sure they still benefit from it. They have to balance between their agenda and the desires of the community. Companies will eventually take benefit from the project, but they have to do so by playing fair and accepting that their views are not shared by all project members. They have to support initiatives which are good for the project, but do not directly serve their interests. Why would they do so? Because they need to partner with the community to be considered a nice player to stay relevant and influential. This attitude, to me, creates shared value, both for the community and for the company.

For an employee of a free software company, this combination of working for a company and for the greater good is a compelling vision: by working for a company that shares a lot, they have a sense of working on something greater than just their own business. By helping produce software that can be used to the benefit of anyone around the world, they have a feeling of fulfilment, contributing to the global enhancement of societies.

The advantage of the open source development model is that anyone who has access to a decent Internet connection can get access to all the FOSS knowledge (by downloading the source code of the programs) and participate in the process of improving the products by sending improvement suggestions. Obviously, this can be done worldwide–the only prerequisite being sufficient Internet infrastructure and working computer hardware. The work of NGOs, such as Linux4Afrika, help accelerate the penetration of the market in developing countries by providing support, and by teaching classes at very low costs.

I believe companies working in the FOSS ecosystem definitely create shared value. By giving away their software for free, these companies make it accessible to all. By using processes based on the Internet, they make it possible for virtually anyone connected to the Internet to participate in their development and support communities. And finally, by publishing the source code, they allow anyone to take a look at how cutting-edge software is written and learn from it.

The current financial success of free software companies shows that it is possible to combine the ideal of doing something good and make money out of it, measuring their performance on a multi-dimensional scale and not only in financial terms. This success is proof that the business models built around the open source development model are fast-growing, robust, and sustainable.

This article was published on opensource.com

 

Gluster: an open-source NAS solution

On October 7th 2011, Red Hat has announced the acquisition of a company called Gluster, which has developed a distributed NAS technology based on open standards. This technology is included in a software appliance that can be deployed to share files over the network.

Why is this an interesting move ? Because NAS is ideal to store unstructured data, and that is the area that grows the fastest in the storage industry.

But what is structured data as opposed to unstructured ? These are entries that follow a strict definition: such as defined numbers (order numbers inside a company for example) or character strings (such as like customer IDs), etc. SANs (Storage Area Networks), such as Fiber Channel and iSCSI, are generally a good solution to store this structured data – in general in a database. However, NAS is the right solution for the incredible quantity of data that is produced every day from various sources (sensors, digital cameras, spreadsheets, presentations, etc.). According to an ESG report,  the market for NAS will grow by 72% compound annual growth rate (CAGR) from 2010 to 2015 !

IT departments wanting to setup a network-attached storage environment had so far two main options:
– a simple NFS (network file system) server. That simple and cheap solution can be installed on any Linux or Unix server. However, this centralized solution concentrates the file accesses on on single server. Replication, failover and disaster recovery are limited, customized and cumbersome processes and the single server can become a performance bottleneck.
– dedicated appliances based on proprietary technology from EMC, NetApp, etc. Although they have very powerful features and are nicely integrated in enterprise environments, they are very expensive.

What most organizations have been asking, though, is to first to reduce the costs of storing this data that grows at incredible rates and second, have the capability to “burst” and leverage cloud capabilities, such as Amazon’s S3 while managing hybrid environments in an easy manner. Yet legacy solutions cannot offer solutions that cover these needs, and that is precisely why Gluster was developed.

I was fortunate enough to go to Mountain View, CA for a training with the Gluster people and discover their technology that is now called Red hat Storage.

The three strengths of Gluster are its scalability (a cluster can contain up to 64 nodes -supported- and way beyond), its manageability (you manage your public and private cloud storage blocks from one single point of view) and reliability (high-availability is built-in and there is no centralized metadata server, hence no single point of failure).

The Gluster infrastructure is based on commodity hardware, i.e. x86_64 servers from HP, Dell or SuperMicro, with direct attached storage (DAS) at disposal, e.g. the disks that are shipped inside the server. Recommended configuration is to have the OS on two disks (RAID1) and the data on twelve remaining disks (RAID6). This storage space will be put at disposal inside the Gluster environment through the network. No need for an expensive array: just take the servers you already know and Gluster will transform them into storage boxes !

From an architectural point of view, it is very important to mention that, although the technology is called GlusterFS, Gluster is not yet another file system. Gluster leverages standard file systems (such as XFS in the software appliance supported by Red Hat) and provides mechanisms to access the data across multiple servers.

The Gluster architecture is based on four elements (the bricks, the nodes, the volumes and the clients) and looks like this:

[Picture courtesy of Red Hat]

– the node: the Gluster software is installed on our commodity hardware server and on RHEL. This combination is called a storage node.
– the brick: the storage available to the OS, for example the RAID disks, will be formatted with a standard XFS (with extended attributes) and mounted to a certain mount point. A brick equals a mount point.
– the volumes: Gluster will play a sort of LVM role by managing bricks distributed across several nodes as one single mount point over the network.
– the clients are computers which access the data. They can be standard Windows clients (vis CIFS), NFS clients, or they can use a specific Gluster client that provides enhancements over NFS, in terms of high-availability.

Example 1 of a Gluster deployment

Let’s take an example : we have two servers -node01 and node02- running RHEL and Gluster. These two servers are identical and have, for the sake of simplicity, one drive on which we want to store data. This drive is formatted with XFS and mounted to, for instance,  /brick1. This directory (and mount point) is identical on the two servers node01 and node02, to manage them more easily.

What happens next is  that we create one Gluster volume, called volume01 -how creative !- from each brick available on the two servers. As I mentioned above, Gluster will play a sort of LVM role, by creating one logical disk from the two distributed disks attached to node01 and node02.

It means concretely that if I mount the volume via the network from another computer, called client1 (for example via the Gluster client), I would do the following command:

[root@client1]# mount -t glusterfs node01:/volume1

and I would have access to the capacity of both drives via the network. From a client perspective, no matter what new files I would store, no matter files I would read, I would not know that the underlying data is actually distributed across multiple nodes. Moreover, if I were an administrator of the servers, I could access the files via their mount point without even knowing that Gluster is running, because it leverages the standard components of a Linux infrastructure.

Example 2 of a Gluster deployment

In this example, two business units (marketing and legal) need two different volumes, isolated from each other. We will have roughly the same configuration as before, but with two data disks per server. Each disk on the server will be dedicated either to legal or to marketing. From these two disks, we will then create two volumes, one called marketing, the other one legal, that will be mounted by their respective clients.

How are the files stored ?

In our first example, when the client wants to store a word processing file file called “myfile.odt” at a specific location on the volume (for example /gluster/myplace ), Gluster will take in account the complete path to the file (in our example /gluster/myplace/myfile.odt) and a mechanism called EHA (elastic hashing algorithm), based on the Davies-Meyer algorithm, will compute a hash that will then indicate on which node and on which disk the file will be stored. When the file must be retrieved, the path to the file is given by the client, Gluster will compute the hash and will then be able to find the file on the given node.

The interesting part of this EHA is that if you store, for example 100 files on a two nodes cluster, like in our first example, the distribution rate will be quite equal. After having saved the 100 files on the volume, and regardless of the complexity of their names, we will end up having roughly 50 files on the node01 and roughly 50 files on node02. Why is that so powerful ? Because instead of having one single server becoming a bottleneck, the cluster can spread the files across its nodes and ensure that the network bandwidth will not become an issue, thus generating a highly scalable solution.

One important thing is also that there is no centralized meta-data server. The hash is computed, for example by the client,  for every access and hence removes a huge single point of failure compared to competitive architectures. If the meta-data is broken, the data (and it can be up to petabaytes of it !) is simply gone, there is no way to find it back. Gluster, on the other hand, has no such centralized architecture, and the beauty of it is that there is no proprietary file system underneath. Every file can be accessed from a standard XFS file system, even if the Gluster deamon is shut down on the machine.

Mount type glusterfs ?

As you can see in the examples above, it is possible to mount the volumes with the mount types NFS or glusterfs. Indeed, in order to mount a Gluster volume in a “native” mode, the client needs a specific package installed. The advantage of this client is that high-availability is built-in (i.e. if a node fails, access to the replicated data is possible without any disruption) and also, the client is able to use the EHA to calculate itself the position of a certain file inside the cluster and hence will talk directly to the node that contains the data, thus reducing the speed to access data and reducing the network traffic.

What about High-Availability ?

Gluster offers the possibility to mirror bricks across the network. This means that if a node fails, the data will still be available via another node. It is also of course possible to combine both the distribution of files and the replication with, for example four disks : two used to save the data and two that are their replicas. After the node or the brick are available again, Gluster will use a technology called self-healing and will update -in the background- all the data that was modified during the downtime so that the data is identical on both replica after the self-healing process is done.

When it comes to disaster-recovery, it is also possible to use a two-way technology called georeplication that maintains asynchronously a copy of the data at another site. The recovery site can be a Gluster cluster, or another type of storage.

What are the advantages for my organization ?

Gluster is a great technology that brings a lot of advantages. The highlights are definitely that Gluster :
– increases the availability of the data by replicating the data and by having no meta-data server i.e. no single point of failure
– manages the data better. The command-line interface is very intuitive and is able to manage petabytes of data in an easy way
– scales to petabytes level, by spreading linearly the data across multiple nodes, hence avoiding the creation of bottlenecks
– lowers the costs of storage by using commodity hardware

I think that Red Hat was very smart to extend its portfolio to storage. Indeed, after the commoditization of the server market from proprietary Unix architectures to standard Linux servers, it is time for the storage vendors to become more open and dramatically increase their affordability. This is just the beginning…

A little bit of paranoia is always good

I use some Google services (such as Gmail) and I respect Google’s focus on innovation and its contribution to some Free software projects, but I must admit that I try to be as careful as possible when it comes to privacy. For instance, I tend to use OpenStreetMap instead of Google Maps when I look for a simple map. I use Firefox instead of the default web browser on my Android phone and I never let the Android’s GPS enabled when I do not use it.

However, Google will soon introduce a new privacy policy, that worries me. So far, your research history was not combined with other Google products. They could not combine your feed reader or Google plus account to target their ads towards what they think you were looking for in their search engine. With the new privacy policy, this separation stops and Google will use all its possible data to match your search or action to target their ads even more.

One simple step to avoid that is to follow the Electronic Frontier Foundation’s advice to have Google stop saving your search history. Do it ! It takes just a couple of seconds and your privacy will thank you (a bit).

And if you want to go a bit further, the EFF’s 6 privacy tips will also help you keep your data for yourself !

HP CloudSystem Matrix Part 3: manage your resources

This post is the last of a series of three that explain the concepts and technologies that are used in HP CloudSystem Matrix. The first one was about creating a CloudMap. The second one was about how to deploy a complete IT service automatically.  This post is about the management of the resources (servers, storage, networking, software) that can be used and shared as a pool across several services.

The idea behind CloudSystem Matrix is relatively simple : the whole environment should be as easy to manage as possible.

This starts first with the firmware management. All c-Class enclosures have a defined firmware level according to their Matrix version. This means that the server firmware (HBAs, BIOS, iLO, NICs, etc.), the interconnect modules (HP Virtual Connect Flex-10, Fibre channel or FlexFabric) as well as the Onboard Administrator (the enclosure management processor) have a defined firmware level that was tested and qualified to work together in the best way. Given that HP implementation services take care of the firmware deployment, the administrators don’t have to bother about it.

What can be managed by CloudSystem Matrix ?

The physical servers to be deployed must be HP blades (ProLiant x86_64 or Integrity Itanium servers).The reason for that is that we leverage the capabilities of Virtual Connect to apply network profiles (MAC addresses and WWN) and this technology is available on our blade servers.

However, the virtual machine hosts (VMware, Hyper-V, or HP-UX Integrity Virtual Machines) can be HP blades, HP rack-mount servers (Integrity and ProLiant) and even third-party servers (Dell PowerEdge 2000 series, e300 series , IBM System x servers 6000 series, r800, r900, x300 and x3000 series and IBM blade GS and LS servers) making CloudSystem Matrix probably one of the most open cloud solutions on the market.

In order CloudSystem Matrix to work, the management server needs to discover and manage the targeted equipment. The management console of the VM hosts, the management processors and the interconnect modules must be recognized by the so-called CMS (central management server). It will recognize the presence of the Virtual Connect domain group (which manages Virtual Connect for multiple enclosures) and will put the servers not used as VM hosts as possibly usable for physical deployments.

As soon as the CMS has discovered the equipment, the administrator can use its console on the CMS to create and assign pool of resources to different users.

From this management console, the administrator can manage all the elements provided to both IT architects and business users.

What IT architects need to create their cloud maps first is network connectivity. The VLANs at disposal to the IT architects are the Virtual Connect vNetworks. The administrator provides them to the IT architects using the tab “Networking” on the management console.
There, the CMS communicates with Virtual Connect Enterprise Manager and retrieves all available networks. Each network must then be configured to provide information about the range of IP addresses usable, if the IP address is allocated via DHCP or if the CMS allocates it from its pool of fix addresses.

As soon as a server is put in the enclosure and is managed by Virtual Connect Enterprise Manager, it appears in the “Unassigned” pool of resources. From here, it can be moved to a pool of resources that can be dynamically assigned to a business user. This user will only see the pool of resources that are allowed to him in his self-service portal.

In CloudSystem Matrix, the group of Administrators has all rights, hence they can see all services currently running. The business users can also FlexUp his service by adding either disks or servers to the currently running service, in case, for example, that an unexpected load occurs on the service.

From this console, the administrators can see all items that can be deployed via CloudSystem Matrix: network items, operating systems (retrieved from RDP job, Ignite depots and golden images as well as Hyper-V and VMware templates), storage pool entries, as well as servers. They can control all requests as well as currently deployed services. I will write a new post to explain exactly how the storage provisioning works.

All in all, this third post explained how administrators can, from a single point of control, manage their resources and put them at disposal to the users. The CloudSystem solution is a complete solution that can help IT departments reduce their TCO of up to 56% compared with traditional rack-mount servers. I have already deployed it for customers and must say that many of them are really impressed of the power of the overall solution.

New HP 3PAR storage arrays

The new high-end HP 3PAR high-end storage arrays P10000 were launched a couple of days ago. Here is a nice video that explains the biggest advantages of the product. To me, the most interesting feature is the storage peer motion feature. It creates some kind of a cluster / load balancing approach for storage devices. It can move data across arrays without application disruption and resolves one of the biggest thin provisioning problem: when the capacity overcommitment cannot be increased because there is no physical space left. This 3PAR array solves that issue and it really looks cool !