We’re pleased to announce the general availability of Alluxio Data Orchestration Hub, your single pane of glass to orchestrate data for analytics and AI. The data ecosystem is complex with the separation of storage and compute across data centers and cloud providers. With this release we’ve made great strides towards simplifying data access and management across multiple environments.
Data Orchestration Hub, or the Hub, is a management console that makes it easy to manage an analytics cluster and connect it with multiple data sources to unify data lakes. The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.
- Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.
- Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.
- Manage Configuration: Set and distribute configuration for a cluster.
Alluxio Data Orchestration Hub is available immediately for all Alluxio deployment scenarios with compute engines like Presto, Spark and Tensorflow. The Hub is ready to use out of the box with Amazon EMR and Google Dataproc. Other platforms are also available for use. Please visit the documentation here for more information to try out the Hub.
When to Use
Connecting to data sources across regions
The Hub provides self-guided wizards to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.
These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments. Manage your compute clusters with Alluxio using these easy-to-use wizards.
Managing an Alluxio cluster
The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes. This is especially useful for cloud deployments without access to SSH for configuration and process management.
What’s Next
To start using Alluxio Data Orchestration Hub, simply launch Alluxio enabled clusters in your on-premises or cloud deployment. Further changes and monitoring of the cluster is managed can now be managed using the Hub:
- Process Management: Monitor status of each process part of the Alluxio cluster, and start / stop processes.
- Connect Data Storage: Connect Alluxio to your data sources, such as HDFS / S3 / GCS, across a hybrid cloud, single cloud or on-premises.
- Connect Data Catalog: Configure structured data catalogs for OLAP engines like Presto on Alluxio. Connect to existing catalog definitions to prevent re-definition of table metadata.
- Advanced Configuration: Customize your Alluxio cluster with advanced options for setting and distributing configuration from the central console.
If you would like more information on Data Orchestration Hub and the supported toolset please read the release notes.
Have questions? Come join the Community Slack Channel.
Read the Alluxio 2.4 release product blog to learn more about the expanded features and capabilities to advance analytics and AI in the cloud.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.