Feed on
Posts
Comments

Like many people I have files stored on a number of different place on the ‘cloud’. Some are on email (several different providers), and others are on free storage sites, some are on FTP, documents are on Google Docs etc. Accessing these files as and when I need them is not hard if I’m sat in front of my PC, but the fact they are distributed in different services does present some challenges.

I’ve been using SMEStorage for a while now to tie these services together into a ‘virtual cloud file system’ so that I can see and manage them all from one place. It has worked pretty well for me and with the release of new clients, that include a linux cloud drive and Mac Integration it has made it more applicable to the devices and OS’s I use everyday.

Given I spend a lot of time on the move, what was missing was mobile access, in particular iPhone access. Happily this has now been resolved with iSMEStorage now in the App Store.

Almost everything you can do with the web version you can do on the phone version, which includes:

Features include:

- File Viewer

- DropBox like Sync but for your cloud,

- Memo / Voice Memo creation and sync, Ability to load Local Files,

- Secure Files that need password before able to view,

- Cloud Clipboard: which enables you to share file links over email for multiple files from multiple storage clouds

- Get a TinyURL for any file posted to the iPhone clipboard for use in other applications,

- Create / Manage Collaboration Groups for sharing files

- If you want to Sync your files in real time with your underlying Storage Cloud, you can initiate this from Cloud Sync
and it is carried out on the Server.

- Easily rename and upload Photos to your chosen storage cloud

- Cloud Providers: Manage your cloud providers in real time and change which one you want files to be uploaded to
from the iPhone.


A video overview of the App’ can seen below:





Clouds supported include:

- Amazon S3
- RackSpace Cloud Files
- Box.Net
- Google Docs
- Microsoft SkyDrive
- Microsoft Live Mesh
- GMail-as-a-Cloud
- Email Clouds
- FTP Clouds
- Apple MobileMe
- WebDav enabled clouds

Sun (or is that Oracle…) has released a new version of their Grid Engine which brings it into the cloud.

There are two main additions in this release. The First is is integration with Apache Hadoop in which Hadoop jobs can now be submitted to Grid Engine, as if they were any other computation job. The Grid Engine also understand Hadoop’s global file systems which means that the Grid Engine is able to send work to the correct part of the cluster (data affinity).

The second is dynamic resource reallocation which also includes the ability to use on-demand resources from Amazon EC2. Grid Engine also is now able to manage resources across logical clusters which can be either in Cloud or off Cloud. This means that Grid engine can now be configured to “cloud burst” dependent on load which is a great feature. Integration is specifically set up with EC2 and enables scale down as well as scale up.

This release of Grid Engine also implements a usage accounting and billing feature called ARCo, making it truly SaaS ready as it is able to cost and bill jobs.

Impressive and useful stuff, and if you are interested in finding out more you can do so here.

Further to the post that we outlined from Ricky Ho on NOSQL, he now has an additional post on Query Processing for NOSQL. This is well worth a read as many of the NOSQL products have fairly primitive query capabilities and Ricky outlines how this can be approached. I’d recommend reading the comments at the foot of the blog which  provide some good reading on discussion around the approach.

It seems that Apple has pulled the project to implement the ZFS file system over the mac, apparently due to licensing issues (no surprise there, now that Sun is in the hands of Oracle).


Mac’s OX extended format is synonymous with HFS+. HFS+ used B-Trees to store volume metadata and has being around since Mac OS 8.1 so the news that Apple were considering ZFS created quite a buzz at the time.


The Mac OS had limited ZFS support with Leopard and there was a hope that there were would be full support in Snow Leopard, but it was not to be. 


ZFS fundamentally changes the storage equation by integrating devices,storage, and file system structures into a single structure. By integrating the file system with volume management, the risk of misconfigurations at one layer affecting another layer is virtually eliminated.


Meanwhile Apple has advertised openings for File Systems engineers so it appears they intend to move forward in another direction.

Ricky Ho has done a great job of providing a thorough overview of the characteristics and patterns of what are being terms NOSQL products. This includes looking at products purporting to be NOSQL products, API’s ,data partitioning, data replication, client consistency, vector clock, and more. A very good read.


Interesting Post from Perils of Parallel which outlines Tim Sweeney, found of Epic Games,  keynote at High Performance Graphics 2009 in which, in one of the slides (79), he compares complexity and cost of development. Bottom line according to Tim is:

 

Lessons learned: Today’s hardware is too hard!

  • If it costs X (time, money, pain) to develop an efficient single-threaded algorithm, then…
    • Multithreaded version costs 2X
    • PlayStation 3 Cell version costs 5X
    • Current “GPGPU” version costs: 10X or more
  • Over 2X is uneconomical for most software companies!
  • This is an argument against:
    • Hardware that requires difficult programming techniques
    • Non-unified memory architectures
    • Limited “GPGPU” programming models
    Original Post here.

Virtualisation 101

 

With Virtualisation becoming intertwined with cloud computing it is worth taking a step back and looking once again what virtualisation is, and is not. Virtualisation and Emulation are often compared, but there are a set of important differences. Emulation provides the functionality of a target processor completely in software. The main advantage being that you can emulate one type of processor on any other type of processor. Unfortunately it tends to be slow. Virtualisation however involves taking a physical processor and partitioning it into multiple contexts. All of which take turns running directly on the processor itself. Because of this, Virtualisation in faster than emulation.

Virtualisation introduces an abstraction layer on top of resources, so that physical characteristics are hidden from the user. This abstraction layer takes care of resource allocation in order to meet the needs of the applications being run. In essence virtualisation enables you to create one or more virtual machines that can run simultaneously at the same time as the host operating system. In its early days virtualisation was more specialised and was utilised specifically in a vendor-controlled way, such as IBM’s LPAR approach for example. Virtualisation vendors claim consolidation ratios of 4, with the potential for making available up to 75 percent of new infrastructure available in a date center.

Chipset manufacturers are now optimising the processors to support virtualisation. Both Intel and AMD have extended the instruction sets of their newer processors to give increased support for virtualisation. ‘AMD-V’ is what AMD have labelled their technology and  Intel’s technology is called ‘VT.’ Expect even further advances. For example the Intel Xeon 7400 Dunnington processors include something called FlexMigration.   This allows virtual machines to be moved around easily in a server pool.  You will need to understand in detail the processors that any virtualised environment runs upon as they offer a key mechanism for optimization.

Key to the virtualisation architecture is the hypervisor, the virtual machine manager.  A hypervisor is a program that allows multiple operating systems to share a single hardware host. Although each operating system appears to have the host’s processor, memory, and other resources all to itself, the hypervisor is actually controlling the host processor and resources. It allocates what is needed to each operating system in turn, and these allocations can be managed and tuned.

There are two types of Hypervisor:

q  Type 1:  This is referred to as a bare-metal or native hypervisor.  This type of hypervisor is software based and runs directly on a specific hardware and hosts a guest operating system.  XEN, VMWARE, ESX, Parallels Server, Hyper-V all have examples of this type of hypervisor.

q  Type 2:  This type of hypervisor runs within an actual operating system. VMWARE Server (GSX), VirtualBox, Parallels Workstation and Desktop are examples of this type of hypervisor. The Type 2 Hypervisor is typically people are referring to when they think of Virtualisation.

There is a third element: Paravirtualisation. This is when the Operating System has been modified to be aware of the Hypervisor that is running. This makes the interaction and integration between the two much smoother and in theory less prone to any errors that may be generated. ‘Enlightenment” in Windows Server 2008 is an example of this as it enables the OS to interact directly with the Hypervisor.

With computing resources at a premium in terms of space, power, location, and cost, the use of virtualised  infrastructures is a very compelling proposition for existing servers and hardware that are under utilised or have spare capacity cycles. Virtualisation can actually be thought of as addressing one of the deficiencies of building a large infrastructure, that of resource. It also addresses differences in OS infrastructure, software stacks etc. With virtualisation, on-demand deployment of pre-configured virtual machines containing all the software required by a job is possible. Flexibility is also added to resource management and application execution. For example running virtual machines can be controlled by freezing them (similar to check-pointing) or by migrating them in a real-time scenario while keeping the virtualised infrastructure running

Indeed this type of proposition is beginning to be thought of as a ‘private cloud’, in which virtualisation is used to deliver services across an organisation and in which best practices utilised in ‘public clouds’ are used to deliver this. These include Infrastructure-as-a-Service and Platform-as-a-Service type concepts in which Virtualisation and PaasS providers are releasing products and tools to enable the deployment and management of such private clouds. Examples of this are GigaSpaces who recently announcing tighter integration with VMWare which enables GigaSpaces to dynamically manage and scale VMWare instances and enable them to participate in the scaling of GigaSpaces hosted applications. A PaaS cloud provider such as GigaSpaces is able to do this due to VMWare’s launch of VSphere which opens up their VM product to allow management of both internal and external clouds. VMWare is pitching vSphere as the first Cloud-OS, which is able to break up separate hardware platforms into what they offer in terms of resources.

In terms of virtualisation, there are also drawbacks to watch out for. When you communicate ‘to’ and ‘from’ a virtualized node, the packets needs to pass through the virtualised communications layer. This is an overhead and you should estimate between 10-20% of a performance hit for this. Furthermore the VM is not an indication of the speed or performance of your grid. For example running four virtual machines on a 4 core 4-GHz chip is not the same as having 4 & 1Ghz dedicate chips for each VM.  Also when one of your virtual machines is idle, if other VM’s are co-hosted they will get the majority of the power.

As the machines are virtual, and using resource cycles that are not in use, you may find that certain nodes are not available when you need or expect them. To this end you should ensure you have the ability to burst when required and have virtualised management infrastructure in place to handle this.

If you intend to embrace virtualisation then you will need to review machine specifications, paying special attention to processors and RAM,  and review storage and network infrastructure.

The positives however far outweigh any drawbacks and ultimately virtualisation will, over time, save money and, with all the innovation currently occurring around virtualisation, will make server administration easier.

Content adapted from the book “TheSavvyGuideTo HPC, Grid, DataGrid, Virtualisation and Cloud Computing” available on Amazon.

The Future of the Cloud by Simon Wardley from Carsonified on Vimeo.

For those who use Sun Grid Engine or are interested in knowing more about it there is a nice post on the blog “memories of a product manager” outlining new features coming the December 2009 release.

Video of  a pre-release Nehalem-EX Processor shows its capabilities running an NYSE trading simulation.


Older Posts »