Clusters

PSSC Labs' PowerServe HPC Servers and PowerWulf HPC Clusters

James Gray — Mon, 16 Oct 2017 14:43:17 +0000

In its quest to provide customers the latest and best computing solutions that deliver relentless performance with the absolute lowest TCO, PSSC Labs has supercharged two server solutions with next-generation processing power.

The breakthrough technology of Intel's new Xeon Scalable Processors has been integrated into PSSC Labs' PowerServe HPC line of servers and the PowerWulf line of HPC clusters, a move that guarantees performance capable of handling cutting-edge computing tasks, such as real-time analytics, virtualized infrastructure and high-performance computing.

Besides the advanced architecture, the new processors offer a diverse suite of platform innovations for enhanced application performance including Intel AVX-512, Intel Mesh Architecture, Intel QuickAssist, Intel Optane SSDs and Intel Omni-Path Fabric.

Both PSSC Labs solutions are designed for reliable, flexible, HPC solutions targeted at government, academic and commercial environments. Some examples of sectors that will benefit from the new performance include design and engineering, life and physical sciences, financial services and machine/deep learning.

Go to Full Article

SUSE Software-Defined Storage Leverages Open Source to Break Proprietary Lock-in and Reduce Cost

John Grogan — Thu, 12 Oct 2017 11:57:20 +0000

by John Grogan

Gartner analysts noted in a recent Cool Vendor report:

It has become painfully evident that storage capacity demands, and expectations for far more rapid provisioning of that storage, have far outpaced the ability of [infrastructure and operations] teams' capabilities. Far-more-automated systems are required to restore a sense of balance, that is, storage solutions that offer much greater scale, but also much more automation.

The power of storage solutions has always resided in the software. SUSE software-defined storage gives one more flexibility and choice than traditional storage appliances provide. It allows users to meet constantly, even exponentially growing storage needs more securely and cost effectively using industry-standard hardware and open-source-based software-defined storage solutions. Accordingly, SUSE has introduced SUSE Enterprise Storage 5 with enhanced ease of management, improved performance and expanded features, including new disk-to-disk backup capabilities for enterprise customers, fulfilling the need for "much greater scale, but also much more automation" as cited by Gartner.

"Every generation of enterprise infrastructure innovation is now being built on open source", said Gerald Pfeifer, Vice President of Products and Technology Programs at SUSE. He continued:

SUSE is expert at both contributing to and using upstream innovation to create enterprise-grade, secure solutions that can be combined with other technologies to best address customer needs. This approach applied to software-defined storage delivers highly scalable solutions that radically reduce storage costs in terms of both capital and operations expense.

SUSE Enterprise Storage 5

The latest release of SUSE's intelligent software-defined storage management solution, SUSE Enterprise Storage 5, will enable IT organizations to accelerate innovation and reduce costs by efficiently transforming their enterprise storage infrastructures. It is based on the Luminous release of the Ceph open-source project, and it is ideally suited for compliance, archive, backup and large data storage. Large data applications include video surveillance, CCTV, online presence and training, streaming media, X-rays, seismic processing, genomic mapping and computer-assisted design. Backup and archive applications include Veritas NetBackup, Commvault and Micro Focus Data Protector, along with compliance solutions such as iTernity.

SUSE Enterprise Storage 5 is the first commercial offering to support the new BlueStore back end within Ceph. This follows SUSE's first-to-market support for iSCSI and CephFS in previous versions of SUSE Enterprise Storage. Notable benefits of this release include:

Go to Full Article

SUSE Unveils Near-Zero Downtime for SAP Apps

John Grogan — Wed, 11 Oct 2017 15:54:19 +0000

by John Grogan

Zero downtime is, of course, a mythical holy grail. According to IDC senior market analyst Prabhitha Sheethal Dcruz:

Zero downtime frequently translates to 99.999% uptime, which equates to 5.26 minutes of downtime per year. While short outages may be acceptable for non-critical workloads, the same is not true for business-critical and mission-critical workloads where the downtime stakes can be very high—consider a stock exchange where a single lost transaction may incur a significant financial cost or a medical system downtime that can cost lives.

So zero downtime is a lofty but difficult to achieve goal. SUSE recently announced a certified near-zero-downtime technology for workloads running in SAP software. According to Naji Almahmoud, SUSE vice president of Global Alliances:

SUSE is an expert at bringing together emerging, fast-paced open-source innovation and turning it into reliable enterprise-grade solutions. Customers running mission-critical workloads can now have more confidence as SUSE works closely with SAP to help ensure near-zero-downtime capabilities for its users.

SUSE, as you already likely know, is an open-source operating system and infrastructure provider for workloads running in SAP software. SUSE has further strengthened its offerings for users of SAP software with new support for high-availability and disaster-recovery solutions, such as:

SUSE support for takeover automation for scale-out clusters in SAP HANA: SUSE now provides automated takeover for users and applications, complementing the SAP HANA platform and data replication between SAP HANA nodes (scale-up) and clusters (scale-out). The SUSE offering is part of a leading platform for SAP solutions, SUSE Linux Enterprise Server for SAP Applications.
Certification for high-availability clusters and improved maintenance for SAP NetWeaver 7.40: SAP has certified SUSE technology to manage high-availability clusters running on the SAP NetWeaver technology platform. The certification, NW-HA-CLU 7.40, is available for x86-64 now, with support for Power (both Big Endian and Little Endian) coming in the next quarter. This makes possible transparent rolling updates of the SAP NetWeaver kernel. While support for SAP NetWeaver high availability has been available previously, SUSE now also supports SAP NetWeaver 7.40 and above, included in SUSE Linux Enterprise Server for SAP Applications.

For more information about SUSE support for SAP solutions and customer workloads, visit https://www.suse.com/products/sles-for-sap and https://www.suse.com/partners/alliance/sap.

Go to Full Article

LINBIT's DRBD Top

James Gray — Wed, 11 Oct 2017 14:19:54 +0000

by James Gray

Many proprietary high-availability (HA) software providers require users to pay extra for system-management capabilities. Bucking this convention and driving down costs is LINBIT, whose DRBD HA software solution, part of the Linux kernel since 2009, powers thousands of digital enterprises.

The cost savings originate from LINBIT's DRBD Top, a new software tool to simplify the management of the LINBIT DRBD application. Via DRBD Top's unified graphical interface, administrators can navigate their DRBD resources conveniently without typing multiple commands.

Available on GitHub, DRBD Top provides critical status, assessment and troubleshooting capabilities for administrators who manage HA clusters, especially those with greater than two nodes.

Go to Full Article

Kodiak Data's MemCloud

James Gray — Fri, 04 Aug 2017 15:30:56 +0000

by James Gray

Scientists working with big data regularly confront the high cost of acquiring the computational power needed to push the boundaries and innovate in data science. In an effort to bridge the Big Data infrastructure chasm, Kodiak Data—a leader in cluster virtualization technology—presents MemCloud, an innovative IaaS solution that accelerates the entire big data-deployment chain.

MemCloud is also "the first memory-speed cloud infrastructure solution for big data scientists and software developers" that provides big data analytic clusters "at up to one-fifth the cost and five times the performance of typical leading cloud hosting services". MemCloud is built on Kodiak Data's Virtual Cluster Infrastructure platform, "the only solution capable of in-software provisioning of compute, networking, storage and data at the cluster level within minutes".

Besides the hosted cloud service option, MemCloud also is available as a compact on-premises appliance for private clouds, an industry first, asserts Kodiak.

Go to Full Article

Webinar: Maximizing NoSQL Clusters for Large Data Sets

LJ Staff — Tue, 15 Sep 2015 16:05:32 +0000

by LJ Staff

This follow-on webcast to Reuven M. Lerner's well-received and widely acclaimed Geek Guide, "Take Control of Growing Redis NoSQL Server Clusters", will extend the discussion and get into the nuts and bolts of optimally maximizing your NoSQL clusters working with large data sets.

Reuven's deep knowledge of development and NoSQL clusters will combine with Brad Brech's intimate understanding of the intricacies of IBM's Power Systems and large data sets in a free-wheeling discussion that will answer all your questions on this complex subject. There will be time for Q & A as well. Please join us September 30 at 2:00PM EDT for this exciting, technical, deeply informative session.

Go to Full Article

LUCI4HPC

Melanie Grandits — Tue, 16 Jun 2015 19:07:04 +0000

by Melanie Grandits

Today's computational needs in diverse fields cannot be met by a single computer. Such areas include weather forecasting, astronomy, aerodynamics simulations for cars, material sciences and computational drug design. This makes it necessary to combine multiple computers into one system, a so-called computer cluster, to obtain the required computational power.

The software described in this article is designed for a Beowulf-style cluster. Such a cluster commonly consists of consumer-grade machines and allows for parallel high-performance computing. The system is managed by a head node and accessed via a login node. The actual work is performed by multiple compute nodes. The individual nodes are connected through an internal network. The head and login node need an additional external network connection, while the compute nodes often use an additional high-throughput, low-latency connection between them, such as InfiniBand.

This rather complex setup requires special software, which offers tools to install and manage such a system easily. The software presented in this article—LUCI4HPC, an acronym for lightweight user-friendly cluster installer for high performance computing—is such a tool.

The aim is to facilitate the maintenance of small in-house clusters, mainly used by research institutions, in order to lower the dependency on shared external systems. The main focus of LUCI4HPC is to be lightweight in terms of resource usage to leave as much of the computational power as possible for the actual calculations and to be user-friendly, which is achieved by a graphical Web-based control panel for the management of the system.

LUCI4HPC focuses only on essential features in order not to burden the user with many unnecessary options so that the system can be made operational quickly with just a few clicks.

In this article, we provide an overview of the LUCI4HPC software as well as briefly explain the installation and use. You can find a more detailed installation and usage guide in the manual on the LUCI4HPC Web site (see Resources). Figure 1 shows an overview of the recommended hardware setup.

Figure 1. Recommended Hardware Setup for a Cluster Running LUCI4HPC

Go to Full Article

How YARN Changed Hadoop Job Scheduling

Adam Diaz — Fri, 27 Jun 2014 20:59:23 +0000

by Adam Diaz

Scheduling means different things depending on the audience. To many in the business world, scheduling is synonymous with workflow management. Workflow management is the coordinated execution of a collection of scripts or programs for a business workflow with monitoring, logging and execution guarantees built in to a WYSIWYG editor. Tools like Platform Process Manager come to mind as an example. To others, scheduling is about process or network scheduling. In the distributed computing world, scheduling means job scheduling, or more correctly, workload management.

Workload management is not only about how a specific unit of work is submitted, packaged and scheduled, but it's also about how it runs, handles failures and returns results. The HPC definition is fairly close to the Hadoop definition of scheduling. One interesting way that HPC scheduling and resource management cross paths is within the Hadoop on Demand project. The Torque resource manager and Maui Meta Scheduler both were used for scheduling in the Hadoop on Demand project during Hadoop's early days at Yahoo.

This article compares and contrasts the historically robust field of HPC workload management with the rapidly evolving field of job scheduling happening in Hadoop today.

Both HPC and Hadoop can be called distributed computing, but they diverge rapidly architecturally. HPC is a typical share-everything architecture with compute nodes sharing common storage. In this case, the data for each job has to be moved to the node via the shared storage system. A shared storage layer makes writing job scripts a little easier, but it also injects the need for more expensive storage technologies. The share-everything paradigm also creates an ever-increasing demand on the network with scale. HPC centers quickly realize they must move to higher speed networking technology to support parallel workloads at scale.

Hadoop, on the other hand, functions in a share-nothing architecture, meaning that data is stored on individual nodes using local disk. Hadoop moves work to the data and leverages inexpensive and rapid local storage (JBOD) as much as possible. A local storage architecture scales nearly linearly due to the proportional increase in CPU, disk and I/O capacity as node count increases. A fiber network is a nice option with Hadoop, but two bonded 1GbE interfaces or a single 10GbE in many cases is fast enough. Using the slowest practical networking technology provides a net savings to a project budget.

From a Hadoop philosophy, funds really should be allocated for additional data nodes. The same can be said about CPU, memory and the drives themselves. Adding nodes is what makes the entire cluster both more parallel in operation as well as more resistant to failure. The use of mid-range componentry, also called commodity hardware is what makes it affordable.

Go to Full Article