Rechercher
Contactez-nous Suivez-nous sur Twitter En francais English Language
 

Freely subscribe to our NEWSLETTER

Newsletter FR

Newsletter EN

Vulnérabilités

Unsubscribe

MapR Releases New Ecosystem Pack with Optimized Security and Performance for Apache Spark

April 2017 by Emmanuelle Lamandé

MapR Technologies, Inc., announced its next major release of the MapR Ecosystem Pack (MEP) program. MEP is a broad set of open source ecosystem projects that enable big data applications running on the MapR Converged Data Platform with inter-project compatibility. Version 3.0 of MEP provides enhanced security for Spark, new Spark connectors for MapR-DB and HBase, significant updates and integrations with Drill, and a faster version of Hive.

The MapR Ecosystem Pack removes the complexity of coordinating many different community projects and versions. MapR develops, tests, and integrates open source ecosystem projects such as Apache Drill, Spark, Hive, and Myriad, among others.

The new MapR Ecosystem Pack version 3.0 includes:

Apache Spark 2.1.0

The Spark 2.1 release focuses on improvements in enterprise-ready stability and security including:
 Scalable partition handling
 Data Type APIs graduate to “stable”
 More than 1200 fixes on the Spark 2.X line
 Provides for secure connections using MapR-SASL in addition to Kerberos for
 Inbound client connections to the Spark Thrift server
 Spark connections to Hive Metastore
 Support for impersonation on SELECT statements

Native Spark Connector for MapR-DB JSON

The Native Spark Connector for MapR-DB JSON makes it easier to build real-time or batch pipelines between data and MapR-DB while leveraging Spark or Spark Streaming within the pipeline. Designed to be highly efficient and simplify code development, the Native Spark Connector includes:
 Two new APIs that allow you to load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table
 A custom data partitioner for better performance
 Data locality of MapR-DB to launch Spark executors when it reads data

Spark-HBase and MapR-DB Binary Connector

The new Spark-HBase connector provides the ability to write applications that consume HBase binary tables and use them in Spark. Added features include:
 Bulk insertion into HBase
 Spark SQL for HBase

Apache Drill 1.10

Significant updates have been added in this release around BI tool integration, end to end security, performance, and usability. Highlights of Drill 1.10 include:
 Support for the CREATE TEMPORARY TABLE AS (CTTAS) command.
 Support for Kerberos & MapR-SASL authentication between the client and drillbit
 Ability to query data with Hue 3.12 (experimental only)
 Improved compatibility with Hive/Spark generated Parquet files
 Improved query diagnostics
  110 bug fixes & other improvements

Apache Hive 2.1.1

The MEP 3.0 release includes a faster version of Hive to greatly improve the speed for data processing tasks, provide smaller latency for interactive queries, and higher throughput for batch queries. Other key improvements include:
 2X Faster ETL through a smarter Cost-Based Optimizer (CBO), faster type conversions and dynamic partition pruning.
 New HiveServer UI with new diagnostics and monitoring tools
 Dynamically partitioned hash joins provide unsorted inputs in order to eliminate the sorting.

MapR Streams C Applications

With MapR core Release 5.2.1, you can develop C applications for MapR Streams. The MapR Streams C Client is a distribution of librdkafka that integrates with MapR Streams.

MapR Streams Python Applications

With MapR core Release 5.2.1, you can create python applications for MapR Streams using the MapR Streams Python client. The Streams Python client is a binding for librdkafka and contains support for high-level consumers.

Availability

The MapR Ecosystem Pack version 3.0 is available now.


See previous articles

    

See next articles


Your podcast Here

New, you can have your Podcast here. Contact us for more information ask:
Marc Brami
Phone: +33 1 40 92 05 55
Mail: ipsimp@free.fr

All new podcasts