Sunday, December 28, 2008

Cloud computing

Cloud computing

From Wikipedia, the free encyclopedia

Cloud computing refers to the delivery of computational resources from a location other than your current one. In its most used context it is Internet-based ("cloud") development and use of computer technology. The cloud is a metaphor for the Internet, based on how it is depicted in computer network diagrams, and is an abstraction for the complex infrastructure it conceals. It is a style of computing in which IT-related capabilities are provided “as a service”, allowing users to access technology-enabled services from the Internet ("in the cloud") without knowledge of, expertise with, or control over the technology infrastructure that supports them. According to a 2008 paper published by IEEE Internet Computing "Cloud Computing is a paradigm in which information is permanently stored in servers on the Internet and cached temporarily on clients that include desktops, entertainment centers, tablet computers, notebooks, wall computers, handhelds, sensors, monitors, etc."


Cloud computing is a general concept that incorporates software as a service (SaaS), Web 2.0 and other recent, well-known technology trends, in which the common theme is reliance on the Internet for satisfying the computing needs of the users. For example, Google Apps provides common business applications online that are accessed from a web browser, while the software and data are stored on the servers.


Architecture


Cloud architecture[49] is the systems architecture of the software systems involved in the delivery of cloud computing, e.g., hardware, software, as designed by a cloud architect who typically works for a cloud integrator. It typically involves multiple cloud components communicating with each other over application programming interfaces, usually web services.[50]


This is very similar to the Unix philosophy of having multiple programs doing one thing well and working together over universal interfaces. Complexity is controlled and the resulting systems are more manageable than their monolithic counterparts.


Cloud architecture extends to the client, where web browsers and/or software applications are used to access cloud applications.


Cloud storage architecture is loosely coupled, where metadata operations are centralized enabling the data nodes to scale into the hundreds, each independently delivering data to applications or users.


What cloud computing really means

The next big trend sounds nebulous, but it's not so fuzzy when you view the value proposition from the perspective of IT professionals

By Galen Gruman , Eric Knorr
April 07, 2008

Cloud computing is all the rage. "It's become the phrase du jour," says Gartner senior analyst Ben Pring, echoing many of his peers. The problem is that (as with Web 2.0) everyone seems to have a different definition.


As a metaphor for the Internet, "the cloud" is a familiar cliché, but when combined with "computing," the meaning gets bigger and fuzzier. Some analysts and vendors define cloud computing narrowly as an updated version of utility computing: basically virtual servers available over the Internet. Others go very broad, arguing anything you consume outside the firewall is "in the cloud," including conventional outsourcing.


Cloud computing comes into focus only when you think about what IT always needs: a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Cloud computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT's existing capabilities.


Cloud computing is at an early stage, with a motley crew of providers large and small delivering a slew of cloud-based services, from full-blown applications to storage services to spam filtering. Yes, utility-style infrastructure providers are part of the mix, but so are SaaS (software as a service) providers such as Salesforce.com. Today, for the most part, IT must plug into cloud-based services individually, but cloud computing aggregators and integrators are already emerging.


InfoWorld talked to dozens of vendors, analysts, and IT customers to tease out the various components of cloud computing. Based on those discussions, here's a rough breakdown of what cloud computing is all about:


1. SaaS

This type of cloud computing delivers a single application through the browser to thousands of customers using a multitenant architecture. On the customer side, it means no upfront investment in servers or software licensing; on the provider side, with just one app to maintain, costs are low compared to conventional hosting. Salesforce.com is by far the best-known example among enterprise applications, but SaaS is also common for HR apps and has even worked its way up the food chain to ERP, with players such as Workday. And who could have predicted the sudden rise of SaaS "desktop" applications, such as Google Apps and Zoho Office?


2. Utility computing

The idea is not new, but this form of cloud computing is getting new life from Amazon.com, Sun, IBM, and others who now offer storage and virtual servers that IT can access on demand. Early enterprise adopters mainly use utility computing for supplemental, non-mission-critical needs, but one day, they may replace parts of the datacenter. Other providers offer solutions that help IT create virtual datacenters from commodity servers, such as 3Tera's AppLogic and Cohesive Flexible Technologies' Elastic Server on Demand. Liquid Computing's LiquidQ offers similar capabilities, enabling IT to stitch together memory, I/O, storage, and computational capacity as a virtualized resource pool available over the network.


3. Web services in the cloud

Closely related to SaaS, Web service providers offer APIs that enable developers to exploit functionality over the Internet, rather than delivering full-blown applications. They range from providers offering discrete business services -- such as Strike Iron and Xignite -- to the full range of APIs offered by Google Maps, ADP payroll processing, the U.S. Postal Service, Bloomberg, and even conventional credit card processing services.


4. Platform as a service

Another SaaS variation, this form of cloud computing delivers development environments as a service. You build your own applications that run on the provider's infrastructure and are delivered to your users via the Internet from the provider's servers. Like Legos, these services are constrained by the vendor's design and capabilities, so you don't get complete freedom, but you do get predictability and pre-integration. Prime examples include Salesforce.com's Force.com, Coghead and the new Google App Engine. For extremely lightweight development, cloud-based mashup platforms abound, such as Yahoo Pipes or Dapper.net.


5. MSP (managed service providers)

One of the oldest forms of cloud computing, a managed service is basically an application exposed to IT rather than to end-users, such as a virus scanning service for e-mail or an application monitoring service (which Mercury, among others, provides). Managed security services delivered by SecureWorks, IBM, and Verizon fall into this category, as do such cloud-based anti-spam services as Postini, recently acquired by Google. Other offerings include desktop management services, such as those offered by CenterBeam or Everdream.


6. Service commerce platforms

A hybrid of SaaS and MSP, this cloud computing service offers a service hub that users interact with. They're most common in trading environments, such as expense management systems that allow users to order travel or secretarial services from a common platform that then coordinates the service delivery and pricing within the specifications set by the user. Think of it as an automated service bureau. Well-known examples include Rearden Commerce and Ariba.


7. Internet integration

The integration of cloud-based services is in its early days. OpSource, which mainly concerns itself with serving SaaS providers, recently introduced the OpSource Services Bus, which employs in-the-cloud integration technology from a little startup called Boomi. SaaS provider Workday recently acquired another player in this space, CapeClear, an ESB (enterprise service bus) provider that was edging toward b-to-b integration. Way ahead of its time, Grand Central -- which wanted to be a universal "bus in the cloud" to connect SaaS providers and provide integrated solutions to customers -- flamed out in 2005.


Today, with such cloud-based interconnection seldom in evidence, cloud computing might be more accurately described as "sky computing," with many isolated clouds of services which IT customers must plug into individually. On the other hand, as virtualization and SOA permeate the enterprise, the idea of loosely coupled services running on an agile, scalable infrastructure should eventually make every enterprise a node in the cloud. It's a long-running trend with a far-out horizon. But among big metatrends, cloud computing is the hardest one to argue with in the long term.


Galen Gruman is executive editor of InfoWorld. Eric Knorr is editor in chief at InfoWorld.


But What Exactly "Is" Cloud Computing?

By Kurt Cagle
December 17, 2008

If buzzwords didn't exist, the computer industry as we know it would collapse. Really! For instance, here's a quick pop-quiz -


1. Define Cloud Computing in twenty five words or less. Please show all work.

Er ... um ... it's well, it has to do with building virtual computers to host virtual services and support virtual communities while passing virtual messages to virtual ... um .... give me a second ... there's got to be a virtual something here.


Put another way, cloud computing is all virtual - it doesn't really exist!


Okay, so maybe this isn't the best position to take when covering cloud computing, but it does in fact provide a good starting point for understanding what cloud computing is and isn't. There are in fact two good working definitions - a very narrow one, and a much broader one. The narrow one first:


Cloud computing is grid computing, the use of a distributed network of servers, each working in parallel, to accomplish a specific task. As an acquaintance of mine put it, if it isn't using MapReduce, it probably isn't a cloud.


Of course, if we were to deal with this strict definition, then all the hype about "cloud computing" and the opportunities for companies to hawk their wares as "cloud friendly" simply wouldn't exist ... and where would the fun be in that? This is especially true given that there simply aren't that many problems even at the large enterprise level that require the use of "slow" massively parallel processing (i.e., processing distributed over networks that have a slow latency compared to processor speed).


The Era of Distributed Virtualization

Selling massive economic simulations would probably not find much of a market at this point in time, weather simulations are realistically only feasible if the grid is relatively self-contained. Hmmm ... you can process deep space satellite programs over the grid, of course, perhaps unfold a few proteins here and there, but chances are pretty good that most businesses just don't have the problems that make grid computing that attractive. So on to the broader definition:


Cloud computing is the distributed virtualization of an organization's computing infrastructure.


Now this is good market-speak - vague enough to have almost any possible meaning, with lots of multisyllabic words that sound really impressive on a power point slide (and you have to love the way that "distributed", "virtualization" and "infrastructure" got so casually tossed out there).


However, while this is perhaps a bit too broad as a working definition, it does in fact point to what seems to be emerging as the next major "platform". If you talked about cloud computing as distributed virtualization, you're actually getting pretty warm to a workable definition.


Much of the work of the last decade has been involved with moving from centralized architectures to distributed ones. Centralized architectures, such as the famed "client-server" relationship of yore, involved a spoke and hope arrangement, where multiple clients connected to a single server, and each server in turn communicated to more powerful "routers". Most applications weren't truly distributed ... instead, they existed as virtual server sessions within the server itself, with just enough state pushed to the client to handle very minimal customizations.


Put another way - the applications stopped at the server boundary.


Eventually, however, it became obvious that it was not that efficient to store your data on the same machine that handled the application logic. This translated into the first "distributed" applications, in which data was kept within a separate "data tier" in a different box, and the data access then occurred through an abstraction layer between the data tier and the logic tier. Client-server became three-tier, with messaging becoming an increasingly important part of the overall process.


Three tier rapidly became n-tier as different services began entering into the mix. One consequence of this shift to n-tier application development is that the messaging architecture continues to build in importance compared to the actual services being deployed - and the standardization of messaging in turn provides a powerful tool for services to simplify their underlying public interfaces to best work with that messaging format. Put another way, as messaging becomes more uniform, the services interfaces tend to become simpler in order to best work with these messaging formats - interfaces become abstract (or virtual).


From Virtual Machines to Commodity Computing

On the physical front, virtual machines have been developing in parallel to this messaging architecture. The concept of a virtual machine has been around for a while - build a "fake" machine that takes a specific set of commands from the applications that run on top of this, then convert these commands into instructions that the underlying machine can use. These became considerably more sophisticated, with applications like VMWare able to run one operating system within another.


The VMWare model is significant in a number of respects - by providing networking access and a virtualized driver model, then allocating a hard driver space as a virtual partition, VMWare was able to not only let someone create a virtual system, but also enabled the ability to take a "snapshot" of that system at any given time that could be saved and then run again at some later point. This meant that an application developer could effectively clone a "template" snapshot of a given application and distribute that as a virtual instance - you could literally have a completely functional, fully enabled system up and running in under a minute.


Other companies and projects took a different approach to machine virtualization. In essence, this approach involved building the virtualization capability into the host operating system directly (rather than run it as a secondary application) meaning that you could start with a "bare-bones" operating system then bring multiple machines online on the same piece of hardware.


In this case, the bare-bones system became known as a hypervisor (the next step up from a supervisor, presumably), specifically Type I Hypervisors. The XEN project uses a hypervisor approach, as does Microsoft Server Hyper-V system. VMWare type approaches, on the other hand, are typically considered Type II Hypervisors, because they run as applications within an advanced host operating system, rather than as a stand-alone operating system running in tandem with other stand-alone operating systems.


These systems were originally developed as conveniences for developers, letting them work on multiple systems simultaneously, but the whole hypervisor concept has taken off dramatically in the cloud-computing space. Typically how this works is that a company with spare processing capability sets up multiple large machines that might have many terabytes of storage, hundreds or even thousands of CPUs and hundreds of gigabytes of RAM.


These systems then use hypervisors to partition these meta-systems into distinct virtual machines that can be configured to any size or power. Unlike physical machines, such virtual machines can additionally be powered on or off without actually shutting down the actual server, and they can moreover have more memory or processing capability added simply by changing a configuration file.


There are downsides to this approach, of course. For starters, any hypervisored OS inherently is running two operating systems, even if one is only minimal, and the abstraction layer takes a certain number of cycles away from the actual processing - which means that a blazingly fast sea of processors still will produce only a moderately fast virtual machine. Additionally, bandwidth becomes a considerably more constrained resource, which makes hypervisored servers reasonable for doing web hosting, but fairly abysmal (and expensive) for hosting video and similar band-width intensive media.


A number of companies have, within the last couple of years, created "cloud computing centers" that take advantage of hypervisors and Storage Area Networks (SANs) in order to create hosted environments where business can effectively duplicate (or replace) much of their existing IT infrastructure. It can be argued that, by the narrower definition above, this is not technically cloud computing, as in many cases the virtual systems are in fact working simply as fairly distinct web servers, but this is the point where the marketing hype transcends the literal definition by just a bit.


Amazon was the first to really make the "cloud computing" model work, effectively, with the creation of the Amazon Elastic Compute Cloud (EC2) system. These use virtualization and a publicly available API in order to make it possible to bring one or a hundred virtual computers up simultaneously. This is complemented with their Simple Storage Service (S3), which effectively provides SANs for data storage. Their model is competitive (if a bit on the high side for some applications).


Microsoft also entered into this space recently with Windows Azure, which provides similar virtual Windows systems, along with a full complement of tools for building large scale distributed applications in this space. Sun, has effectively "re-entered" this space - their first efforts in cloud computing, the Sun Grid, attracted a fair number of customers but was somewhat ahead of its time, and as a consequence they have recently been re-promoting their own cloud credentials.


It should also be noted that many of the big hosting services have not been napping as cloud computing has caught fire. Voxel CEO Zachary Smith noted, in an interview with O'Reilly Media earlier this year, that companies such as Voxel, GoDaddy, and other large scale hosting services have been providing virtual servers at a much lower price point than their dedicated servers for a couple of years now.


Moreover, he is pushing strongly to get an industry-wide agreement on a common standardized API for creating server instances programmatically, possibly using the Amazon EC2 APIs as a model. In order for true commodity computing to come of age, a common industry standard will definitely need to emerge.


Cloud Computing Is Services Computing

You may have noticed the preponderance of the word "service" in the last section. This is not a curious coincidence. The upshot of virtualization is that you are effectively creating an abstraction layer for the hardware - in essence turning that hardware into software that is accessible through a programmable interface, almost invariably using the Internet as the messaging infrastructure.


There is a tendency in Cloud Computing to focus on the hardware, but ultimately, cloud computing is in fact the next stage in the evolution of services that has been ongoing for at least the last decade. The concept of Software as a Service (SAAS) is gaining currency at the small and medium sized business (SMB) level especially, where the advantages of maintaining an internal IT department is often outweighed by the costs. As the economic situation continues to deteriorate, SAAS is likely to become increasingly common moving up the business pyramid.


In a SAAS deployment, those applications that had traditionally been desktop office "productivity" applications - word processors, spreadsheets, slide presentation software and the like - are increasingly becoming web-based, though many are also increasing their foothold in the "offline" capability support that contemporary browsers are beginning to support, either built in or through the use of components such as Google Gears. Google Apps provides a compelling example of a SAAS suite, combining sophisticated word processing, spreadsheet and presentation software into a single web suite. Zoho offers similar (and arguably superior) capability.


Microsoft has recently debuted Microsoft Office Live Workspace, which effectively provides a workspace for working with common documents online, but raises the question of whether it is in fact a true cloud application as it still effectively requires a standalone version of Microsoft Office to edit these documents.


Salesforce.com has often been described as being a good cloud computing application, though its worth noting here that this application also shows the effects that cloud development has on applications. The Salesforce application feels very much like a rich CMS application (similar to Microsoft Sharepoint or Drupal, which also have cloud-like characteristics) dealing with complex dedicated document types.


Cloud Computing and RESTful Services

This concentration on document types itself seems to be an emergent quality of the cloud. Distributed computing really doesn't tend to handle objects all that well - the object oriented model tends to break down because imperative control (intent) is difficult to transmit across nodes in a network.



This is part of the reason why SOAP-based services, which work reasonably well for closed, localized networks (such as within the financial sector) as a remote-procedure call mechanism, don't seem to have taken off as much as they reasonably should have on the web. In general, distributed systems seem to work best when what is being transmitted is sent in complete, self-contained chunks ... otherwise known as documents, and when the primary operations used are database-like CRUD operations (Create, Revise, Update and Delete).


This type of architecture (called a REST architecture, for Representational State Transfer) is very much typified by the way that resources are sent and retrieved over the web, and effectively treats the web as an addressible database where collections of resources are key to working with the web.


A new, emerging model of cloud computing as a consequence is the RESTful Services Model, in which complete state is transferred from point to point within the network via documents while ancillary operations are accomplished through the use of messaging queues that take these documents and process them asynchronously to the transmission mechanism.


The SOAP/WSDL model is one that has taken off especially for financial and intra-enterprise clouds, though here the SOAP wrapper is used not as a flag to trigger specific tasks by the receiving system but as an envelope for queue processing (indeed, the RPC model that many early SOAP/WSDL proponents pushed has been largely abandoned as being too fragile for use over the Internet). Service Oriented Architectures (SOAs) describe the combination of SOAP messages and node-oriented services, typically with a bias towards intentional systems (systems where the sender determines the intent of the message, rather than the receiver).


A second model comes in the use of JSON - a representation of a JavaScript object as a mechanism for transferring state. This model works very effectively in conjunction with web mashups, though its over-simplicity of structure and lack of support for unicode (among other factors) makes it less than perfect for the transmission of semi-structured documents.


The third RESTful model is the use of syndication formats, such as Atom or RSS, as mechanisms for transmitting content, links and summaries of external web resources. Because syndication formats are in fact very closely tied to publishing operations, syndication formats tend to be fairly ideal for RESTful Services in particular.


One of the most powerful expressions of such RESTful Services is the combination of XQuery/REST/XForms (or XRX), in which you have a data abstraction model (XQuery) pulling and manipulating data from other data sources such as XML or SQL databases, a publishing (RESTful) layer and syndication format for encoding data (or data links) such as Atom and its publishing protocol AtomPub, and a declarative mechanism for displaying or editing whole documents on the client (XForms being the most well known, though not the only solution).


While this particular technology is still emerging, already vendors and project developers are working on building integrated solutions. Tools such as MarkLogic Server, the eXist XML Database, EMC/Documentum's X-Hive XML Server, Orbeon Forms, Just Systems' xfy system as well as similar by Microsoft, IBM, Oracle and others in the syndication space attest to the increasing awareness and potential for XRX-oriented applications.


The Edge of the Cloud

One of the more interesting facets about clouds is the fact that the closer you get to them, the harder it is to determine their edges. This is one thing that physical clouds share with their digital analogs - the edge of a virtual cloud is remarkably ambiguous. It's typical in a network diagram to use the convention that the edges of such clouds are web clients - browsers, web-enabled applications, in essence, anything that sports a browsable screen, is used by humans, and most importantly doesn't actually contribute material to the state of a given application.


However, this definition is remarkably facile. Consider an application such a Curl, which really has no real GUI, but is quite frequently found referenced by other applications. Or perhaps you could think of most contemporary browsers that support (or will soon support) offline storage. Both client and server have web addresses (though admittedly DHCP can complicate this somewhat), and certain web clients (typically physical devices) actually have built in absolute IP addresses - they can act both as clients and servers.


Put another way, the notion of web client and web server is slowly giving way to web nodes. Such a node may act as a client, a server, a transit point or all three. This is now increasingly true as AJAX-based web applications become the norm. What this means in practice is that in cloud computing, there really are no edges, but rather a fractal envelope that describes the stage where you have no further connection points - in this case, think of the overall outline (or envelope) of a tree - while individual branches may end within the envelope or be touching the envelope, none extend beyond it.


Is web programming part of cloud computing? Only in very abstract terms - generally, either when you're refreshing the overall state of a given document of content or when you're updating that state through XMLHttpRequest or other peer-to-peer communication protocols. It's more fair to say that most computer languages will eventually incorporate (through libraries or by themselves) cloud computing components ... indeed, most already do.


Languages such as Erlang have specifically evolved for use in asynchronous, multiprocessor, distributed environments that look suspiciously like clouds, while the MapReduce framework written by Google is intended to handle the processing of large amounts of data over clusters of computers simultaneously (which also highlights that while Google does not (yet) have a formal publicly or proprietary cloud, they have been laying the foundation for much of what is emerging as cloud computing within their own search intensive operations).


In a sense, cloud computing is an architectural concept, rather than a programming one per se. For instance, its probably fair to say that bit torrents, which use a peer-to-peer architecture for transmitting pieces of a given resource from multiple sources, represents a fairly typical cloud computing application - asynchronous, massively parallel, distributed, RESTful (torrents are not concerned with the content of the pieces, only their underlying existence) and virtual (the resource does not actually exist as a single entity, but has reality only in potential as many packets, some of which may be duplicates, and some of which may no longer actually exist on the web).


Clouds on the Horizon

It's interesting to note that this also leads full circle back to grid computing. Grid computing had its origins in applications such as SETI Online, which used the free cycles of participating PCs in order to analyze signals from radio telescopes to attempt to find apparently artificial, non-random signal patterns that may have indicated intelligent life.


Ironically, such use of free cycles has never really taken off beyond very specialized applications, largely because of the very real concerns for security. Cloud computing is far more likely to continue evolving, for at least some time, within massive proprietary or dedicated public clouds, rather than ad hoc networks, at least until a way can be found to monetize such ad hoc networks.


Overall, however, the future of cloud computing actually looks quite bright, perhaps because of the very storm clouds that have gathered on the economic horizon. Cloud computing provides ways to reduce the overhead of a formal IT department within a small to medium sized organization ... especially one for which IT is a significant expense.


For instance, a school district may choose to use virtual machines in setting up web sites, centralizing grades and reporting, host distance-learning systems, and so forth, and save not only on the need to physically maintain machines and bandwidth but also to add or remove servers as needed to reflect their demand.


Beyond the immediate advantage of reducing physical hardware cloud computing also has the added advantage of reducing the environmental costs associated with maintaining that infrastructure, along with the power costs.


For instance, the IT manager for a Postal District in Tacoma, Washington laid out to me one of the central problems with their growing IT usage - the building which housed the servers was not designed to handle the heat and electrical load of more than eighty servers, and they had reached a stage where they were seriously looking for better facilities. Instead, they began, server by server, to move non-critical servers to virtual counterparts using a hosted service provider. They kept the most critical servers local, but they were able to reduce their physical server needs by nearly 60%, and were able to put off looking for new facilities for the foreseeable future.


This does point out that, as with any IT strategy, migrating to virtual servers on the cloud makes more sense for non-mission critical functions, and any such strategy should also look at recovery and response time when outages do occur. The danger, of course, is that failure of a cloud center could have disproportionately bad economic effects. On the other hand, this is true of any large-scale IT deployment, and typically, because of these considerations, cloud centers are far more likely to have multiple redundancies in terms of power and backup in place such that if a failure does happen, the losses will be minimal.


This also applies to the ability of such centers to handle the environmental impacts of running such virtual IT centers. Virtual computers use considerably less energy per CPU cycle than physical ones do (most virtual computers actually are very efficient in terms of memory and processing allocation, because much of that is handled in RAM rather than in far more expensive disk access operations). Moreover facilities that host such systems are specifically designed for handling large numbers of servers running simultaneously by introducing much more efficient cooling and power draw systems than tends to be found in most IT departments.


This means that by virtualizing the least mission critical parts of your IT infrastructure on the web, you also can provide significant savings in terms of cooling systems, electrical infrastructures and facilities management that all translate to the bottom line.


This virtual world of cloud computing does, in fact, have some significant impacts on the real world ... and will have more as businesses become more comfortable with moving their services and their infrastructure into the cloud, as technologies for dealing with cloud computing improve and as standards and methodologies for developing for this new computing environment solidify.


What this means, of course, is that this particular cloud has a practically gold lining - and will chase the storms away.


Kurt Cagle is an author, developer, and online editor for O'Reilly Media, living in Victoria, BC, Canada.

No comments: