Open source Alluxio 1.5.0 has been released with a large number of new features and improvements. Alluxio allows any application to access data from any storage system transparently and at memory speed. Interoperability with other technologies in the ecosystem is an important step for enabling this, and in the 1.5.0 release, we have improved the accessibility of Alluxio in several key ways.
- Alluxio Docker Integration
- Alluxio Golang Client
- Alluxio on Ceph using S3A
- Mount Specific Configuration Properties
Alluxio Docker Integration
Alluxio 1.5.0 adds documentation and scripts to make it easy to run Alluxio inside Docker containers. Alluxio configuration parameters can be passed using -e
arguments, and logs are written to stdout so that they show up in the output of docker logs. The example below illustrates how to run dockerized Alluxio on top of HDFS.
cd alluxio-1.5.0/integration/docker docker build -t alluxio . docker run -d --net=host \ -e ALLUXIO_UNDERFS_ADDRESS=hdfs://HdfsMaster:9000/ \ alluxio master docker run -d --net=host --shm-size=10GB \ -e ALLUXIO_MASTER_HOSTNAME=AlluxioMaster \ -e ALLUXIO_WORKER_MEMORY_SIZE=10GB \ -e ALLUXIO_UNDERFS_ADDRESS=hdfs://HdfsMaster:9000/ \ alluxio worker
See the docs for a step by step tutorial on running Dockerized Alluxio on an EC2 instance.
Alluxio Golang Client
Previously, Alluxio introduced a proxy process, which by default runs alongside every Alluxio master and worker and provides a REST API equivalent to Alluxio’s native file system API. In version 1.5.0, Alluxio introduces a Go client for interacting with Alluxio based on the REST API. This client is available in its own repository in order to facilitate its import through the “go get” mechanism.
Besides providing a mechanism for communicating with Alluxio from Go environments, the client implementation also serves as an example of how straightforward it is to create a language binding for Alluxio based on the REST API.
Note that communicating with Alluxio through the REST API requires extra network hops and / or memory copies and is therefore expected to be less performant than the native Java client. On the other hand, any improvements to the native Java client benefit all REST API based clients, meaning the Go client and any other client developed against the REST API will always have the latest features.
The example below illustrates how to interact with Alluxio using a Go program:
package main import ( "fmt" "log" alluxio "github.com/Alluxio/alluxio-go" "github.com/Alluxio/alluxio-go/option" ) func main() { fs := alluxio.NewClient(<proxy-host>, <proxy-port>, <timeout>) ok, err := fs.Exists(<path>, &option.Exists{}) if err != nil { log.Fatal(err) } fmt.Println(“path %v exists: %v”, <path>, ok) }
Alluxio on Ceph using S3A
In 1.5.0, Alluxio can connect to Ceph under storages using the S3A connector. The S3A connector provides significant functionality and performance improvements over the Swift connector.
As shown in the graph below, the S3A connector demonstrates up to 3x gains in read performance when reading one gigabyte files.
Mount Specific Configuration Properties
One major benefit of using Alluxio is to unify different under storage systems (e.g., S3, HDFS, GCS) into one Alluxio namespace, each under a separate mount point similar as how devices are mounted on local file systems. Since version 1.5.0, Alluxio supports setting (potentially different) configuration properties for each mount point, in addition to respecting the global configuration setting for this type of under storage system.
After configuring and mounting different under storage systems, accessing these systems is completely transparent to Alluxio file system applications. As a result, Alluxio helps system admins hide complexity and improve the ease of managing storage.
To illustrate this feature by an example, a user Alice has multiple S3 buckets on AWS and she wants to access the data stored across different buckets. Previously, Alice could only mount into Alluxio S3 buckets which shared the same system wide authentication key, whereas now Alice can mount each bucket individually using separate authentication keys, like
$ bin/alluxio fs mount /mnt1 s3a://alice-bucket1/ --option aws.accessKeyId=<accessKey1> --option aws.secretKey=<secretKey1> $ bin/alluxio fs mount /mnt2 s3a://alice-bucket2/ --option aws.accessKeyId=<accessKey2> --option aws.secretKey=<secretKey2>
After this, any authenticated Alluxio user can access /mnt1
and /mnt2
freely, without even noticing they are from two different buckets and accessed using different authentication keys. Thus Alice can share her Alluxio deployment with Bob to access her buckets without giving Bob any bucket permissions or distributing her keys to Bob.
And Many More!
This blog only highlighted a few of the new features and improvements in Alluxio 1.5.0. For a more comprehensive list, check out the release notes.
You can easily get started with Alluxio open source or community edition today by following the quick start guide.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.