The Azure Container Registry team is sharing the preview of customer-managed keys for data encryption at rest. Azure Container Registry already encrypts data at rest using service-managed keys. With the introduction of customer-managed keys you can supplement default encryption with an additional encryption layer using keys that you create and manage in Azure Key Vault. This […]
Unlocking Market Data, Amazon Introduces AWS Data Exchange
In a recent blog post, Amazon has introduced a new market data publisher/subscriber service called AWS Data Exchange. This service is an add-on to the existing AWS Marketplace and contains more than 1000 licensable data products from more than 80 data providers. These data feeds include both free and paid offerings that span industries such […]
Simplifying ETL in the Cloud, Microsoft Releases Azure Data Factory Mapping Data Flows
In a recent blog post, Microsoft announced the general availability (GA) of their serverless, code-free Extract-Transform-Load (ETL) capability inside of Azure Data Factory called Mapping Data Flows. This tool allows organizations to embrace a data-driven culture without the need to manage large infrastructure footprints while having the ability to dynamically scale data processing workloads. By […]
Presentation: Big Data’s Ethical Drought: The Thirst for More Data Has Led to a Lapse in Ethics and Privacy
Katharine Jarmul provides examples of data (mis)use and asking how we can work with data without violating the trust and privacy of users, producing an ethical product? By Katharine Jarmul
Google Releases Cloud Dataproc for Kubernetes in Alpha
Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has now announced the alpha availability of Cloud Dataproc for Kubernetes to provide customers with more efficiency to process data across platforms. By Steef-Jan Wiggers
Jagadish Venkatraman on LinkedIn’s Journey to Samza 1.0
At the recent ApacheCon North America, Jagadish Venkatraman spoke about how LinkedIn developed Apache Samza 1.0 to handle stream processing at scale. He described LinkedIn’s use cases involving trillions of events and petabytes of data, then highlighted the features added for the 1.0 release, including: stateful processing, high-level APIs, and a flexible deployment model. By […]
ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes
At ApacheCon North America, Christopher Crosbie gave a keynote talk title “Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes.” He highlighted Google’s efforts to make Apache big-data software “cloud native” by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes […]
Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads
In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent […]
Article: Data Analytics in the World of Agility
Is it all about customer-centric business, or is there any data left? Can we integrate data analytics and customer empathy? This article explores how we can move towards a more customer-centric business and what information we require in order to understand the most valuable thing we have: our customer. By Almudena Rodriguez Pardo
An Introduction to Structured Data at Etsy
Etsy recently published a blog post detailing how they store and manage structured data. Etsy is a marketplace allowing sellers to post one-of-a-kind items. Their landing page slogan “If it’s handcrafted, vintage, custom or unique, it’s on Etsy.” reveals that uniqueness is a selling point for Etsy. As long as an item falls within the […]
- 1
- 2
- 3
- 4
- Next Page »