Cloudera Blog

Streamlining Critical Business Capabilities in Financial Services with Cloudera

Andreas Skouloudis — Fri, 13 Jun 2025 11:00:00 UTC

We’re living in an era of unprecedented transformation in financial services. Powerful technology and business disruptions are impacting the market, including generative AI (GenAI), cloud computing, an evolving regulatory environment, and financial product innovations, such as digital assets and payment models. In response, financial services organizations are accelerating their efforts to digitize their operations and deliver a consistent, data-driven, digital-first customer experience across channels.

However, as firms attempt to capitalize on technological innovations and maximize value from these investments, they’re running into several challenges, including:

High cloud costs for compute-intensive tasks, such as AI/ML training and data engineering

Incomplete insights due to data and technology silos, which negatively impact decision-making

Lack of real-time responses due to data duplication across systems, which increases latency in data pipelines

Lack of a comprehensive, fine-grained security and governance model from existing data and analytics tools

Why Financial Services Firms are Partnering with Cloudera

As the only true hybrid platform for data, analytics, and AI, Cloudera is uniquely positioned to help financial services firms overcome these challenges, successfully progress their digital transformation initiatives, and embrace a modern data architecture.

Some of our key differentiators include:

Multi-function analytics for building solutions across the data lifecycle, including real-time and batch data movement, AI/ML, GenAI model contextualization and deployment, data engineering, and data warehousing for compute-intensive workloads

Vendor-agnostic deployment model, supporting cloud and on-premises environments across vendors and regions

Integrated security and governance, offering a consistent, fine-grained access model across data services and deployment models to meet the most demanding and complex security requirements

Open-table format (Apache Iceberg) and the ability (through Iceberg REST Catalog) to integrate with the company’s broader data and analytics landscape

Additionally, from our work with over 450 financial services institutions across regions, we’ve identified several critical business capabilities where Cloudera provides exceptional business value. Let’s look at a few examples:

Regulatory Compliance

While more financial data provides more granular insights into risk, managing increasing volumes of data can also make it more difficult to maintain regulatory compliance. This is especially true for banks that are struggling with the multiple data silos so often found in traditional architectures. Inflexible architectures also make it extremely difficult to address new or evolving regulatory requirements, such as DORA, a framework designed to strengthen the operational resilience of financial institutions to combat cyberattacks specifically targeting this industry.

Cloudera can serve as the backbone of a hybrid modern data architecture by enabling organizations to extend their on-premises analytical footprint to the cloud and use transient compute resources for end-of-month or end-of-quarter tasks, such as regulatory reporting. In addition, it can streamline complex data management tasks, such as auditing historical data and modeling market scenarios through the time-travel capabilities of Apache Iceberg.

Financial Risk Management

Financial risk management—whether evaluating market, credit, or liquidity risk—is at the heart of a bank’s operations. As a result, they need to continuously evolve existing risk management strategies and reduce the time to complete risk-related analytics processes (for example, stress testing).

Cloudera AI streamlines the training and deployment lifecycle of data science models to deliver innovative AI/ML models for risk management. In addition, Apache Iceberg simplifies the process of integrating new risk attributes into existing models by optimizing foundational data management tasks, such as schema and partition evolution.

Fraud Prevention

Cyberattacks and fraud attempts are becoming increasingly sophisticated as cybercriminals take advantage of new technologies, AI most notably. To combat these threats, banks also need to leverage these new technologies, yet traditional, inflexible architectures can make it difficult to implement new solutions quickly and seamlessly.

Cloudera addresses multifaceted cyberthreats by offering real-time data processing capabilities using Cloudera Data Flow and Cloudera Streaming, enabling prompt detection and response. In addition, it offers a comprehensive AI deployment service to prevent high-volume, real-time fraud attempts by optimizing the underlying NVIDIA GPUs through NIM Microservices. By leveraging Cloudera to train models on all of a firm’s data, financial services companies can produce more accurate models that result in fewer false positives, reducing friction for customers while keeping their assets safe.

Next Steps: Innovation in Financial Services

If you want to learn more about how Cloudera is accelerating innovation in financial services, check out our whitepaper. It includes several customer success stories across different financial services verticals and regions. Or, see for yourself how Cloudera can benefit your organization with our 5-day free trial.

Cloudera Supercharges Your Private AI with Cloudera AI Inference, AI-Q NVIDIA Blueprint, and NVIDIA NIM

Zoram Thanga,Dennis Duckworth — Wed, 11 Jun 2025 14:00:00 UTC

As we speak with our customers about their goals for AI, a common pain point we hear is that their plans and implementations are sometimes stalled due to concerns about privacy. They want to use AI on all of their corporate data since that is the way their employees and customers will get the most accurate results and answers, but they realize they can’t send their data out to a public endpoint for a closed-source large language model (LLM) since, 1) there is too much data, and 2) their data would no longer be private.

To address these concerns, Cloudera has begun espousing the concept of Private AI, which would allow these customers to get all of the benefits that AI brings and keep their proprietary data safe and secure.

NVIDIA is seeing the same challenge, but at a much higher and broader level: nation states. Governments are realizing that it isn’t in the best interests of their nations to run AI in another country, so they’re working to build out the infrastructure that they need to keep their data and their AI within their own borders. They can then control what other countries or entities they share their data or AI results with.

At the GTC Paris conference today, NVIDIA provided the building blocks for Sovereign AI to support governments in their efforts. This initiative aligns well with Cloudera’s focus on enabling our customers to implement their own Private AI platforms.

NVIDIA made two other announcements that are of particular interest to Cloudera, and in this blog we’ll dive into AI-Q NVIDIA Blueprint for Enterprise Research and the NVIDIA NIM and what this means for our customers.

AI-Q NVIDIA Blueprint with Cloudera AI

NVIDIA’s introduction of the AI-Q blueprint for enterprise research provides Cloudera AI with more capabilities for supporting our customers’ complex agentic AI needs.

Cloudera AI Inference can host all of the NVIDIA NeMo Retriever and LLM inference microservices that make up the AI-Q NVIDIA Blueprint, including NVIDIA Llama Nemotron reasoning models. Combining the strong privacy and security provided by the Cloudera AI platform for the model endpoints with the powerful NVIDIA Agent Intelligence toolkit, you can take your enterprise agentic applications to the next level.

Benefits of Using AI-Q NVIDIA Blueprint with Cloudera AI
Leveraging AI-Q NVIDIA Blueprint within Cloudera AI Inference service unlocks massive AI potential. This powerful combination integrates leading reasoning models packaged as NVIDIA NIM and NeMo Retriever microservices onto Cloudera AI, and it ensures seamless connectivity between agents, tools, and data through full compatibility with the NVIDIA Agent Intelligence toolkit.

This multi-framework capability empowers organizations to build sophisticated enterprise retrieval-augmented generation (RAG) applications with robust privacy and security, taking full advantage of state-of-the-art AI advancements.

NVIDIA NIM microservice with Cloudera AI Inference

NVIDIA's NIM container is a game-changer for getting the best performance from LLMs quickly and easily: it significantly speeds up LLM deployment and inference by automatically selecting the best inference backend based on the model and GPU hardware, enabling a model-agnostic inference solution that streamlines the production serving of numerous cutting-edge LLMs.

Digging deeper, the NVIDIA NIM microservice enables users to quickly deploy LLMs accelerated by NVIDIA TensorRT-LLM, vLLM, or SGLang for top-tier inference on any NVIDIA accelerated platform. It supports models stored in Hugging Face or TensorRT-LLM formats, enabling enterprise-grade inference for a vast array of LLMs. Users can rely on smart defaults for optimized latency and throughput or fine-tune performance with simple configuration options. As part of NVIDIA AI Enterprise, the NVIDIA NIM microservice receives continuous updates from NVIDIA, ensuring compatibility with a wide range of popular LLMs.

Benefits of Using the NVIDIA NIM within Cloudera AI Inference
NVIDIA's NIM provides our customers more flexibility in how they can make use of LLMs in their AI applications. Cloudera AI Inference service already has NVIDIA NIM embedded in it, so customers can implement the NVIDIA NIM microservice quickly and easily. Customers get the benefits of NVIDIA NIM with the ease of use, security, and streamlined support of a single, unified platform: Cloudera.

Through its seamless integration into our AI Inference service, NVIDIA NIM microservice offers significant advantages for Cloudera AI customers, including:

Accelerated deployment: Get your LLM applications up and running faster with pre-built, optimized containers.

Enhanced performance: Leverage the full potential of NVIDIA accelerated computing for high-speed inference and reduced latency.

Scalability: Easily scale your LLM deployments to meet the demands of your growing business.

Simplified management: Manage and monitor your LLM deployments with Cloudera's intuitive interface.

Conclusion

Together, Cloudera and NVIDIA empower businesses to leverage the latest advancements in AI easily, efficiently, and cost-effectively on all of their data, whether public or private. By simplifying the AI application lifecycle, from development to deployment, and by optimizing performance, we're helping our users unlock the full potential of AI.

Be sure to check out NVIDIA’s blog about announcements out of GTC Paris and Cloudera’s blogs on AI, especially the most recent one about “AI in a Box,” powered by Dell, NVIDIA, and Cloudera which gives customers a new way to implement Private AI quickly, easily, and with minimal risk.

Bringing Context to GenAI with Cloudera MCP Servers

Peter Ableda — Thu, 05 Jun 2025 13:00:00 UTC

Figure 1: Two scenarios of AI agents accessing data for AI context:

Left: Without a common protocol, AI agents must handle multiple unique APIs to access context from each source.
Right: MCP unifies access, enabling agents to retrieve context through a single interface, simplifying integration and improving scalability.

Agentic Architectures Need a Standard Integration Layer

As organizations rush to adopt agentic architectures, a consistent integration layer is more important than ever.

“The frenzy around adopting agentic architectures is driving organizations to launch multiple initiatives in parallel. While this momentum is encouraging, it also risks creating the modern equivalent of spaghetti code—something we’ve seen before in the early days of software engineering. What companies truly need is a simplified, standards-based architecture that ensures interoperability across the diverse systems participating in the agentic ecosystem. Anthropic’s MCP is emerging as a promising standard in this space, already seeing broad adoption from AI vendors.”
- Sanjeev Mohan, Principal at SanjMo and former Gartner analyst

MCP isn’t a proprietary Cloudera tool—it’s a widely adopted standard that avoids vendor lock-in while tapping into a growing ecosystem of tools. Cloudera’s approach to MCP Servers aligns with the MCP philosophy of openness, simplicity, and control. Cloudera MCP Servers run natively within Cloudera’s unified platform, eliminating risky data movement and enabling seamless deployment across both multi-cloud and on-premises environments.

A Tenet of Private AI: Bringing AI Compute to Data

AI’s transformative power relies on the quality of the data that fuels it. When data and AI systems operate in isolation, disconnected information delays insights, creates fragile pipelines, and leaves models without the necessary context for accurate decisions.

Cloudera brings data and AI together in a cohesive lifecycle. Data flows smoothly into AI workflows, governed by shared metadata, security policies, and optimized compute resources. This approach eliminates costly data duplication and movement while making every prediction traceable to its origin—ensuring transparency, trust, and compliance.

Take the Next Step

Ready to eliminate integration friction? Explore Cloudera MCP Server for Apache Iceberg here—currently available in preview—and discover how you can empower your AI applications with the context they need, right where your data lives. To put this into action today, try our FREE 5-day trial.

Three years ago, Cloudera customers began exploring generative AI to transform data interactions—building intelligent assistants, summarizing complex documents, and generating insights on demand. And today, our customers manage more than 25 exabytes (that’s 25 billion gigabytes!) of enterprise data across on-premises and cloud environments.

The Context Gap in Enterprise AI

How organizations manage their data is key: in the age of AI, context isn’t just helpful—it’s the difference between accurate decisions and hallucinations. AI models need seamless access to proprietary data to generate insights, answer questions, or automate workflows. Yet, in most organizations, this data remains fragmented across siloed object stores, Iceberg tables, Kafka streams, and operational databases. Developers waste valuable time writing custom connectors and maintaining fragile pipelines—a tax on innovation that slows time to value.

Introducing Cloudera MCP Servers: A Universal Gateway to Your Data

That’s where Cloudera’s Model Context Protocol (MCP) Servers come in. Our servers are built on MCP and provide a universal gateway to govern enterprise data. MCP is an open standard that aims to standardize AI integration in the same way that Microsoft Open Database Connectivity (ODBC) standardized relational databases (more on MCP in the next section).

To support this mission, we’re launching with Cloudera MCP Server for Apache Iceberg via Impala. Apache Iceberg is the backbone of modern lakehouses, offering petabyte-scale management, ACID compliance, time travel, and granular governance. It’s the perfect starting point for bridging the gap between data and AI.

By starting with Apache Iceberg, we address a critical challenge: AI applications need real-time, governed access to analytical data without additional custom code. Our MCP Server enables developers to query Iceberg tables in natural language, integrate seamlessly with frameworks—like CrewAI, Microsoft AutoGen, LangChain or LangGraph, LlamaIndex, and agentic AI toolkits that work with these frameworks, like NVIDIA Agent Intelligence (AIQ) toolkit—while maintaining robust security with Cloudera SDX policies. And this is just the beginning: future Cloudera MCP Servers will extend support to streaming data, operational databases, and file/object stores.

Beyond the Textbook: Peter Norvig on the Future of AI Literacy

Cloudera — Wed, 04 Jun 2025 16:00:00 UTC

In a world where AI is rapidly reshaping everything from how we work to how we live, one truth stands out: education is the cornerstone of both innovation and safety.

On The AI Forecast, host Paul Muller sat down with Peter Norvig, former Director of Research at Google, Stanford Education Fellow, and co-author of the world’s most-used AI textbook—Artificial Intelligence: A Modern Approach. Their discussion spanned decades of AI progress, the deep transformation of education, and what the next evolution of data might look like.

Here are some takeaways from Paul and Peter’s conversation.

AI education must go beyond a static textbook, according to the author of the go-to AI education textbook.

Paul: How would you describe the state of AI education today?

Peter: It’s overwhelming. There’s just too much going on, too fast. When I started, you could keep up with every new development. AI papers came out slowly, and textbooks were relevant for years. Now, there are a dozen breakthrough papers published every week.

I don’t think the “one textbook” model works anymore. What we need is something interactive and personalized. And we need to shift from the idea that you go to college for four years, get a degree, and never have to learn again. That’s not how the world works. AI education should be a lifelong, continually evolving experience.

Paul: What’s your view on how AI is impacting education?

Peter: AI is changing both what we teach and how we teach. Tools like ChatGPT or GitHub Copilot can be great accelerators, but only if you already know what you’re doing. They can lead you astray if you don’t.

The real value of learning to program isn’t memorizing syntax – it’s developing judgment. It’s about knowing how to scope a problem, debug it, recover from failure, and think critically. That’s the mindset we should teach, whether people are coding or using AI tools. The goal is not to produce code. It’s to produce understanding.

AI is all about solving uncertainty.

Paul: What does AI mean to you?

Peter: In our book, we define it as making better decisions to accomplish your goals. But that’s also what economists and software engineers try to do. The real difference lies in the kind of problems AI takes on.

Traditional software is about complexity: writing exact rules to solve exact problems. But AI deals with uncertainty. You're asked to classify an image, decide what a sentence means, or predict how someone might vote. There’s often no ground truth. You're trying to make the best decision based on incomplete, noisy, or ambiguous data. That’s where AI lives.

Can AI give us the most unbiased view of our world?

Paul: What are you most excited about when it comes to the future of AI?

Peter: I’m excited about new ways of gathering data, especially video. We’ve made huge strides in training on text because it’s compact. Image models are catching up. But video? That’s still untapped.

The challenge is scale. Training on all of YouTube isn’t economically feasible today. But give it a few more generations of processing power, and we’ll get there. What excites me is that the video medium captures action. It gives a more accurate view of the world. And, unlike text or photos, it’s less biased.

Everything written is something someone thought was important. Every photo was taken deliberately. But some videos are just a camera running 24/7. That’s about as close as we get to an unbiased record of physical reality. Combine that with what we already know, and we start to connect the physics of the world with the psychology of the world.

We’ve improved processing power by a factor of 1,000 at least three times in my lifetime. We just need to do it one more time.

Catch the full conversation with Peter Norvig on The AI Forecast on Apple Podcasts, Spotify, and YouTube.

Cloudera's AI Studios: Making Advanced AI Accessible to All

Charu Anchlia,Robert Hryniewicz — Fri, 30 May 2025 13:00:00 UTC

RAG Studio

The RAG Studio enables rapid development and deployment of RAG applications through a no-code interface. By integrating external knowledge sources with large language models (LLMs), users can create a more informed, context-aware AI application that excels at document search and question-answering tasks involving real-time, dynamic data.

Synthetic Data Generation Studio

The Synthetic Data Generation Studio provides users with powerful tools to create synthetic datasets for fine-tuning, model training or alignment, and evaluation. This studio offers a scalable and compliant alternative when real-world data is scarce or sensitive. By producing data that mirrors real-world patterns, the studio enables organizations to improve AI model and application robustness while ensuring compliance with regulations like CCPA in the US and GDPR in the EU.

The demand for AI-driven applications is surging, and enterprises have reached an inflection point where they can no longer afford fragmented, siloed development.

Traditionally, AI development is done by data scientists or machine learning experts with deep expertise in multiple tools and frameworks. But now a new class of builders has emerged—developers, not experts—who are enthusiastic about using the power of generative AI (GenAI) to solve their real-world use cases but who often lack specialized AI skills. Enterprises need solutions to simplify the development complexity for these GenAI builders, giving them an easier, faster path to production while maintaining enterprise-grade security, governance, and scalability.

Additionally, conventional enterprise software upgrade cycles are too slow to match the pace of AI innovation. This delay puts organizations at risk of building solutions that are outdated before they are even deployed. Enterprises need adaptive, modular solutions that evolve in-step with the AI landscape, ensuring that their solutions remain cutting-edge.

Cloudera AI Studios can help solve these challenges: by providing modular, no-code tools with high-code extensibility, these studios accelerate AI adoption by guiding developers through various stages of the AI application lifecycle. AI Studios are designed to not only streamline development but also equip a broader range of users with knowledge about the underlying technologies—empowering organizations to solve meaningful business challenges with GenAI.

Democratizing AI Innovation: The Strategic Vision and Design Behind Cloudera AI Studios

Delivering real enterprise value with GenAI demands mastering distinct stages across the complete AI application lifecycle. We deliberately architected AI Studios to map directly to these critical stages, democratizing the entire process with intuitive, low-code tools accessible to all users—regardless of technical expertise.

By seamlessly guiding users through each development stage, our approach eliminates traditional barriers and dramatically accelerates time-to-value. Our comprehensive ecosystem addresses critical challenges across the GenAI lifecycle:

Synthetic Data Studio reimagines data availability by generating enterprise-grade synthetic datasets that solve compliance and data scarcity challenges.

Retrieval-Augmented Generation (RAG) Studio transforms model intelligence by seamlessly connecting foundation models with organizational knowledge—delivering contextually aware AI.

Fine Tuning Studio redefines model specialization through frictionless adaptation workflows that align generic models with specific domain expertise.

Agent Studio pioneers the next frontier of business transformation through sophisticated agentic applications that deliver measurable value across the enterprise.

Fine Tuning Studio

The Fine Tuning Studio serves as a one-stop shop for customizing foundation models to meet specific business needs through increased model accuracy and domain relevance. Without it, fine-tuning would require writing extensive code and managing complex workflows. Instead, users can train, compare, and evaluate adapters against base models through a single interface. With built-in support for supervised fine-tuning (SFT), MLflow-based evaluation, and native integration with Cloudera AI Workbench and Inference, Fine Tuning Studio simplifies and accelerates the entire model adaptation process.

Customizable Across Expertise Levels

The artificial boundary between technical and non-technical users has historically limited innovation. Our architectural approach breaks down this divide and differentiates AI Studios from other low-code solutions by offering seamless switching between visual interfaces and full code environments—customizable to each user’s expertise and needs. These deliberate "escape hatches" prevent lock-in to proprietary black-box solutions with limited functionality, giving executive decision-makers confidence to invest in low-code solutions for serious AI development.

Our architecture builds on the strong foundation of the Cloudera AI Workbench—an established, enterprise-grade, self-service data science product with developer-friendly features like interactive notebooks, models, jobs, and applications. By accessing AI Studios within the AI Workbench, developers can begin with intuitive visual interfaces and transition to custom coding environments as greater control or expertise is required.

We designed AI Studios this way because of our strong belief that technical growth should be encouraged rather than constrained. Business developers with domain expertise can use the visual interfaces and collaborate effectively in the same environment with data experts using the coding interfaces—accelerating AI adoption across the enterprise.

Interoperable Functionality Through AI Development Stages

Each of the AI Studios is designed to be fully functional in a self-contained manner, yet can seamlessly interoperate with other studios through sharing resultant artifacts in the same project environment. This extends beyond just studio-to-studio integration—it encompasses the entire Cloudera AI platform. The studios integrate directly with the underlying AI Workbench and AI Inference services, creating an end-to-end system with consistent data and model governance throughout.

For example, Synthetic Data Studio can generate domain-specific training datasets that Fine Tuning Studio can then use to adapt a foundation model for agentic tasks. This specialized model can then be served by the Cloudera AI platform to power agentic applications orchestrated in Agent Studio, with contextual knowledge enhancement through RAG Studio. This deliberate multi-level interoperability enables organizations to build comprehensive AI solutions while still allowing users to have full flexibility in selecting which stages of the GenAI lifecycle they want assistance with and which they want to handle independently.

Accelerated Integration of Open-Source Innovation

We built each of the AI Studios as independent components capable of rapid release cycles aligned with the AI community's pace of innovation. This modular architecture allows AI Studios to leverage state-of-the-art open-source frameworks and swap underlying libraries without disrupting core functionality.

This reflects our belief that no single organization will drive AI innovation in isolation, and embraces open-source innovation as a way of contributing to a broader ecosystem of shared advancement.

Introducing Cloudera AI Studios

AI Studios provide a purpose-built experience targeted to each of the critical stages of the Generative AI lifecycle. The studios are Synthetic Data Studio, Fine Tuning Studio, RAG Studio, and Agent Studio.

The Next Chapter in Enterprise AI

As GenAI becomes a cornerstone of enterprise innovation, AI Studios represent a paradigm shift: bringing the power of AI to a broader set of users while maintaining the robustness and security that organizations demand. AI Studios are now available in the Cloudera AI Workbench, and together with the Cloudera AI Inference service, power Cloudera’s AI platform for the enterprise.

The future of AI is about more than advanced algorithms: it’s about making AI more accessible, interoperable, and impactful across the enterprise. Empower your workforce to begin building enterprise-grade AI applications by starting with Cloudera’s 5-day free trial.

Agent Studio

Agent Studio empowers enterprises to build, test, and deploy AI agents that combine the reasoning capabilities of LLMs with the operational strength of traditional software. Through native integration with the Cloudera platform, Agent Studio uniquely exposes the full suite of Cloudera’s enterprise-grade services—Cloudera Data Flow, Cloudera Data Warehouse, Cloudera Data Visualization, and more—as composable, callable agents. This foundation, when combined with open-source agents and frameworks, enables sophisticated multi-agent orchestration that seamlessly coordinates operations across diverse data environments, from structured and unstructured data to real-time streams.

Figure 3: Synthetic Data Studio within a Cloudera AI Workbench project

Figure 2: All four AI Studios are interoperable within a single Cloudera AI project.Figure 6: Agent Studio within a Cloudera AI Workbench projectFigure 4: Fine Tuning Studio within a Cloudera AI Workbench projectFigure 1: Cloudera AI: Enabling every stage of the Generative AI lifecycleFigure 5: RAG Studio within a Cloudera AI Workbench project

The core design philosophies of AI Studios are:

Customizable across expertise levels
Interoperable functionality through AI development stages
Accelerated integration of open-source innovation

#ClouderaLife Employee Spotlight: Meet Orla McCarthy, Cloudera’s Vice President of Professional Services, EMEA

Cloudera — Wed, 28 May 2025 13:00:00 UTC

Working together across 31 countries means that teamwork and collaboration are at the heart of everything we do at Cloudera.

The unique corporate culture that makes Cloudera what it is today is due, in part, to leaders like Orla McCarthy – a long-time Clouderan who does what she can to foster development, support growth and encourage opportunities for those she works with, just as she experienced in her own journey.

Her professional background includes years of experience in professional services working with teams around the globe. Most recently, she was with Hortonworks – a company that merged with Cloudera in 2019 – marking a pivotal moment that brought Orla and many of her talented colleagues into the Cloudera family. Today, she’s Cloudera’s Vice President of Professional Services for the EMEA region and has valuable insights to share about the path she took to get here.

Let’s meet Orla McCarthy and learn more about what it’s like growing a career with Cloudera.

Tell us about your career journey at Cloudera. How has the organization supported your growth and development along the way?

When I joined the company nine years ago, I had no idea what kind of rollercoaster I was about to get on. At the time, there wasn’t even an office in Cork, now it’s grown into Cloudera’s European hub.

I started in operations but soon moved over to professional services, not knowing much about the field. Over the eight years since, I’ve had the chance to work with amazing teams across Europe, each role stretching me, challenging me, and giving me new perspective.

Now, I am the Vice President of Professional Services for EMEA and lead an exceptionally talented and lovely team. Together, we help our clients unlock the full value of their data by providing expert services that ensure success throughout their digital transformation journeys.

What got me here was a willingness to be open to change and say yes to opportunities – even if I wasn’t sure I was ready for them. What’s kept me growing: A mix of great mentors, a team that brings out the best in me, and a culture that gives you the space to take risks, make mistakes, learn from them, and keep moving forward. That pattern has continued throughout my journey from leading regional teams to ultimately stepping into the VP role.

Who has inspired you in your career and why?

My former boss and mentor at Hortonworks. Early on, he recognized something in me and decided to bring me into the professional services team. This opportunity was a game-changer, marking the beginning of my journey.

His leadership style emphasized accountability, a relentless focus on customer success and teamwork and I still carry some of his lessons with me.

How would you describe the professional development, working environment, and culture at Cloudera?

Cloudera’s culture has always been one of support, inclusivity and growth. It’s a place where you’re encouraged to take on new challenges.

There’s a big focus on development, whether it’s through the mentorship program, internal stretch opportunities or just having leaders that want to see you succeed. There is also a strong emphasis on wellbeing, with things like unplug days and dedicated give-back days to support our communities.

One of the defining characteristics of Cloudera culture is that success here isn’t just about hitting targets. Don’t get me wrong, they matter, but it’s about the bigger picture: the impact we’re having on our customers, how the team is growing, and how we’re pushing the business forward in EMEA.

When I see my team delivering great results for our customers and developing both personally and professionally, that’s true success.

What excites you most about your work in professional services at Cloudera?

One of the most rewarding parts of working in professional services at Cloudera is seeing how our technology and my team can make a difference in addressing global challenges. It’s wonderful to be part of projects that not only push the boundaries of innovation, but also make a positive impact in communities worldwide.

A great example is our recent collaboration with Mercy Corps, a nonprofit supporting communities affected by crisis, disaster, poverty, and climate change. Through Tech To The Rescue’s AI program, we built an AI-powered tool designed to provide field teams with quick, data-driven insights. The AI tool summarizes, references, and recommends relevant research, best practices, and crisis response strategies, enabling faster, more informed decision-making in critical and time-sensitive situations.

I’ve learned that success truly takes a village. It’s about showing up, listening, adapting, and moving forward together, and projects like this are a real reminder of why we do what we do, and how technology, when used thoughtfully, can be a powerful force for good.

How do you like to spend your free time?

I love spending my free time exploring new places, learning about different cultures and meeting interesting people. I also prioritize quality time with my family, it's important to me to stay connected with them, especially as we live far away from each other.

Closing Thoughts

Orla McCarthy’s journey offers an inspiring example of what’s possible at Cloudera. She’s proof that if you’re willing to collaborate, ask questions, and say “yes” to new challenges, Cloudera is the perfect place to find meaningful camaraderie, new opportunities and grow your career.

Want to keep reading? Click here to learn more about Cloudera Professional Services, or click here to meet another inspiring Clouderan.

Eight Clouderans Earn CRN Women of the Channel Distinction

Cloudera — Thu, 22 May 2025 13:00:00 UTC

Eight Clouderans Earn CRN Women of the Channel Distinction

The ability to solve today's complex business challenges around hybrid cloud, AI, and data analytics hinges on a thriving partner ecosystem. Cloudera’s Partner Organization is the driving force behind this collaboration, relentlessly focused on delivering innovation and accelerating customer success at scale.

We are thrilled to share that the dedication and expertise of eight remarkable women within Cloudera have been recognized by CRN in their 2025 Women of the Channel (WOTC) list. This award celebrates leaders with exceptional expertise and vision who are driving innovation and shaping the future of the IT channel.

Our SVP of Global Alliances, Michelle Hoover, was also elevated to the prestigious Power 100 list – an elite group of WOTC honorees whose contributions over the past year have had a transformative impact on the channel.

Let’s celebrate Cloudera’s own Women of the Channel and learn a bit more about the winners.

Michelle Hoover, SVP, Global Alliances, Channels – Cloudera’s Power 100 honoree made the list due to her leadership through transformative initiatives over the past year. Michelle’s efforts have accelerated enterprise AI deployments and advanced Cloudera's AI and cloud ecosystem through key integrations.

Michelle leads Cloudera’s Global Alliances & Channels team, bringing with her more than 20 years of partner experience. Her deep partnership expertise and strong relationships within the data and AI ecosystem make her an invaluable leader in shaping Cloudera's alliances and partner channel strategies. Her leadership style embodies Cloudera’s values by prioritizing collaboration with stakeholders and team members alike, consistently making time to ensure alignment and shared understanding on every project. She has played a major role in advancing Cloudera’s AI Ecosystem—a group of technology providers that helps enterprises maximize the value of their AI initiatives more easily, cost-effectively, and securely.

Michelle believes a good leader works on the front lines, ensuring that each team member is utilized to their full potential through hands-on partner and customer engagement. This focus fosters strong collaboration and a united front with sales, which is critical to a thriving Cloudera partner ecosystem.

Natascha Lee, Head of Global Partner & Alliance Marketing – A six-time winner of the Women of the Channel awards, Natascha brings over 20 years of dynamic and impactful channel marketing experience, and serves as Head of Cloudera’s Global Partner Marketing team. She leads innovative programs spanning a vast ecosystem of technology partners, driving audacious pipeline goals, deepening partner engagement, and significantly increasing their bottom-line impact.

Her leadership expertly combines the art and science of marketing to ignite partners and customers through innovative messaging, segmentation, targeting, and measurement, consistently generating demand, increasing market share, and exceeding aggressive revenue goals.

Valaretha Brown, Senior Partner Marketing Manager – Valaretha brings innovative, global programs to Cloudera while also strengthening customer, channel, and team relationships to deliver business impacting growth and results. A four-time winner of the Women of the Channel awards, Valaretha has had a significant impact in developing strong partner relationships, establishing joint go-to-market programs and driving operational excellence to increase revenue opportunities. With over 15 years of B2B technology partner marketing experience, Valaretha is masterful at uncovering strategic initiatives that make an immediate impact while establishing herself as a trusted advisor with her marketing counterparts. She is dedicated to evangelizing partners and elevating Cloudera by developing impactful content for joint digital campaigns that deliver new leads and drive existing leads through the sales funnel.

Lan Chu, Senior Partner Marketing Manager – At Cloudera, Lan has consistently built meaningful partner relationships and driven impactful results across the organization. Her diverse experience encompasses marketing, partnerships, and sales enablement, fueling her passion for bringing strategic ideas to life. Lan leverages this deep expertise in marketing strategy and demand generation, collaborating closely with Cloudera’s cross-functional teams, partners, and vendors to deliver channel-focused marketing programs that amplify sales and drive business success for the entire organization.

Naomi Gravelding, Partner Marketing Manager – Since joining Cloudera in late 2021, Naomi has spearheaded a comprehensive partner marketing communications program. Through innovative strategies, improved communication channels, and streamlined processes, she has strengthened awareness of Cloudera's partner ecosystem and significantly expanded its reach with both internal teams and external partners. Her efforts have demonstrably increased the volume of partner-specific updates to internal teams and fostered overall partner communications and engagement, driving deeper collaboration and business impact while establishing clear guidelines for effective digital communications.

Janet O'Sullivan, Senior Partner Marketing Manager – With a far-reaching role spanning four continents, Janet has significantly impacted Cloudera's success by fostering substantial growth and cultivating a strong partner ecosystem that directly addresses crucial customer needs, generating significant ROI in the process. Her strategic approach has driven a significant increase in partner pipeline within her region, achieved through collaborative execution with partners and internal teams, from multi-partner events to focused account-based marketing (ABM) strategies. Notably, Janet has also been the architect of a highly successful 1:many Partner Days program, which has been instrumental in engaging a broader partner base and driving further pipeline growth within her regions. Additionally, she has been a key driver in effectively leveraging the partner marketing team's marketing development funds to further empower the partner ecosystem.

Jessica Espinoza, Senior Partner Marketing Manager – Jessica is a seasoned marketing professional with over 20 years of experience in partner marketing, branding, and integrated strategies. She currently manages marketing relationships for Cloudera’s Cloud Alliances. Jessica is known for her cross-functional collaboration and ability to execute campaigns that align with business goals and scale globally. She has led multi-million-dollar co-marketing programs, produced events for 40,000+ attendees, and developed strategic content across digital, social, and internal platforms. Fluent in English and Spanish, she excels in partner relationship management, content development, and event production—blending creativity and operational precision to drive measurable results.

Chloe Gibel, Director of Partner Strategy and Programs - A seasoned leader in partner strategy, programs, and operations, Chloe spearheads global initiatives to streamline partner experiences and accelerate joint success. Chloe leads the Cloudera Partner Network, designing and implementing partner programs that enhance the partner experience and driving meaningful growth. With more than a decade of experience in partner marketing, demand generation, and channel programs, Chloe brings a unique blend of creativity and precision to building scalable ecosystems.

These Women of the Channel winners embody Cloudera's culture of collaboration that empowers innovation and enables operational excellence in the complex partner landscape, facilitating the delivery of exceptional results to our partner ecosystem.

Learn more how Cloudera’s partner ecosystem can support your hybrid cloud journey.

AI in a Box: Experience the Future of Private AI at Dell Technologies World with Cloudera, Dell Technologies and NVIDIA

Jaidev Karthickeyan,Kevin Coulter — Tue, 20 May 2025 19:00:00 UTC

The race to operationalize private AI at enterprise scale isn’t just about models and algorithms—it’s about infrastructure that refuses to compromise. Welcome to the inaugural post of AI in a Box, a three-part blog series that unpacks how Cloudera, Dell Technologies, and NVIDIA are redefining enterprise AI with a turnkey solution that unifies cutting-edge AI-optimized hardware, intelligent data orchestration and AIOps tooling, and zero-trust governance. No more duct-taping legacy systems or gambling on cloud-only “black boxes.”

Speed to Market with Seamless AI Development to Deployment

Accelerated time to value starts at the silicon layer. Dell PowerEdge servers, equipped with NVIDIA accelerated compute, NVIDIA RAPIDS, and NVIDIA NIM, provide the high-performance foundation required for today’s most demanding AI workloads—whether contextualizing billion-parameter models on proprietary data or delivering low-latency inference at scale.

Cloudera AI, built on this foundation and delivered as a fully managed service, eliminates operational complexity. It automatically provisions GPU clusters for tasks such as fine-tuning domain-specific LLMs or powering real-time RAG pipelines, then dynamically reallocates resources to maximize efficiency.

The result? No more infrastructure sizing guesswork or compatibility issues—just a seamless private AI journey from development to deployment.

The speed advantage extends to pre-integrated AI blueprints. A financial institution rolls out fraud detection models in days, not months, using pre-optimized workflows that plug directly into existing transaction systems. A manufacturer activates predictive maintenance by training on sensor data stored in Dell’s high-performance storage, with Cloudera auto-scaling GPU resources as demand spikes. These aren’t generic templates but battle-tested pipelines refined across various verticals, all while enforcing zero-trust security and granular governance.

Comply with Emerging Regulations by Securing Sensitive Data Across the Private AI Lifecycle

Security and compliance are engineered into AI in a Box, ensuring data protection and regulatory adherence without sacrificing agility. The solution employs hardware-enforced isolation (via NVIDIA MIG technology) and Cloudera’s unified governance to secure sensitive data across the AI lifecycle. Healthcare, finance, and government sectors benefit from HIPAA-compliant diagnostics, air-gapped underwriting models, and immutable audit trails to meet local and global compliance requirements.

For financial services, the stack accelerates AML/BSA compliance with AI-driven transaction monitoring, anomaly detection, and real-time reporting. It aligns with EU AI Act, SEC, and FCA regulations through continuous monitoring, bias mitigation tools, and explainable AI workflows. Data residency controls and zero-trust architecture address GDPR/CCPA mandates, while end-to-end audit trails ensure transparency for credit risk or fraud detection models.

Proactive threat detection, automated incident response, and adherence to Basel Committee-aligned frameworks assist in minimizing breaches and regulatory risks. AI in a Box enables institutions to scale AI confidently, allowing enterprises to transform regulatory complexity into a source of competitive advantage. Managed updates and patching enable teams to strike the right balance between innovation and production-scale operations.

Maximize Cost Efficiency with Scalable and Extensible AI

By colocating AI compute with on-premises data lakes, Dell’s scalable storage keeps petabytes local. The solution dodges the latency penalties and egress fees of cloud-centric AI. NVIDIA’s latest GPUs slash training times by up to half compared to prior generations, while Cloudera’s policy-driven autoscaling ensures resources align perfectly with workload demands.

The result? Optimized Private AI workload economics with a predictable cost base that transitions AI from experimental sandbox environments to a core value engine driving enterprise outcomes.

But the real game-changer is agility. Enterprises pivot on a dime: today’s customer churn model becomes tomorrow’s supply chain optimizer, all on the same infrastructure. The full-stack integrated AI software accelerates everything from hybrid data pipeline management all the way to serving up model endpoints, while Cloudera’s data lineage built into the Shared Data Experience (SDX) stack tracks each of these endpoints to its source—critical for audits in regulated industries. Dell’s infrastructure, always forward-compatible, ensures seamless adoption of next-gen chipsets without costly re-architecture, maintaining effective sustainability within the data center.

Experience the Future of Private AI

At its core, AI in a Box is silicon-to-inference synergy. Dell PowerEdge servers, armed with NVIDIA H100 GPUs and NVLink-switched topology, deliver FP8-precision performance for trillion-parameter training and real-time RAG pipelines. Kubernetes-driven orchestration auto-provisions GPU clusters, dynamically allocating resources to tasks like fine-tuning Mistral-7B models or parallelizing MONAI medical imaging workflows. Cloudera’s Data Fabric unifies streaming and batch ingestion into optimized Parquet sinks, while SDX enforces strong data access control and granular governance—tracking lineage from raw data to model predictions.

This is enterprise AI stripped of excuses. No more waiting for cloud migrations, wrestling with fragmented tools, or risking compliance for innovation. With Cloudera AI, Dell infrastructure offered as a managed service, and NVIDIA accelerated compute and NVIDIA NIM, you’re not just deploying AI—you’re operationalizing it. Fast. Secure. Future-proof.

Reimagine Possibilities with AI in a Box

Schedule a Consultation: Ready to transform your AI strategy after DTW? Connect with us to continue the conversation, schedule a custom workshop, or kick off a pilot initiative to drive measurable outcomes for your business.

Cloudera, Dell, and NVIDIA are propelling enterprises into the AI fast lane with a fully managed service that pairs cutting-edge hardware innovation with operational simplicity. This isn’t about squeezing legacy infrastructure to run AI—it’s about unleashing the full potential of your data using the latest high-performance Dell PowerEdge servers, NVIDIA’s accelerated compute, NVIDIA RAPIDS, NVIDIA NIM, and Cloudera AI. -Together, they work to generate effective and efficient data pipelines, all orchestrated as a turnkey Enterprise AI solution.

Agentic AI Deep Dive: How AI is Changing the Modern Enterprise

Cloudera — Mon, 12 May 2025 13:00:00 UTC

Agentic AI has the potential to revolutionize workplace processes in nearly every sector by improving business decision-making, workflow efficiency and customer interactions and experiences.

While interest in agentic AI is widespread, the motivation to use it differs by industry. Cloudera surveyed 1,484 enterprise IT leaders across 14 countries to better understand their approach to agentic AI in 2025, including how specific industries plan to implement the technology. It found that highly regulated fields, such as finance and healthcare, look to agentic AI to strengthen their cybersecurity posture and protect sensitive data, whether transactional information or patient records.

Almost two-thirds of respondents (63%) intend to use agentic AI for security monitoring. Other industries, such as retail and telecommunications, look to AI agents to improve customer experience, with half of organizations implementing agents for customer support purposes.

Let’s dive into current perceptions of agentic AI and implementation plans by sector.

Improving Security and Client Relations in Finance and Insurance

Financial and insurance companies primarily value agentic AI technologies for their ability to help with security monitoring and for customer experience. Respondents said their top use cases for agentic AI include fraud detection (56%), risk assessment (44%), and investment advisory (38%).

AI agents can identify patterns and anomalies in data sets to avoid data breaches and identify vulnerabilities through security monitoring. This is highly beneficial for protecting sensitive data in regulated industries.

AI agents can also improve advisory services and other client-facing tasks. Seventy-eight percent of industry decision-makers intend to use AI agents for customer support. Agents quickly pull data from multiple sources to generate complex responses on behalf of client requests. So, an AI agent can analyze large volumes of data to generate a response if a client asks for potential investment opportunities with low risk and high returns.

While the thought of agentic AI systems working autonomously with sensitive data understandably raises concerns, security guardrails keep these systems in check. Authorization and permissions are critical components of AI implementation, as AI systems can only access the data they are permitted to work with.

Agentic AI also presents a customer experience continuity challenge. While the technology works efficiently with multiple data sets and can quickly process client information, it does not offer the personal touch of client-facing personnel.

Streamlining Healthcare Workflows

Healthcare workers see several applications for agentic AI, including streamlining administrative tasks and providing decision-making recommendations for better patient care outcomes. According to the survey, healthcare providers view the top use cases of the technology as appointment scheduling (51%), diagnostic assistance (50%), and medical records processing (47%).

AI agents can relieve medical professionals of repetitive tasks by processing insurance information and scheduling appointments. They can also streamline daily workflows to make patient visits more efficient by quickly processing a patient’s medical history and delivering a summary to a healthcare professional. These systems can take this action further by providing diagnoses and evidence-based treatment recommendations.

One scenario might be an AI diagnostic assistance agent trained on thousands of pneumonia or lung cancer X-ray images. Leveraging pattern recognition, it could find early signs not immediately visible to the human eye, highlighting where a radiologist should examine more closely. This would help physicians make more accurate diagnoses.

Improve Manufacturing Efficiency and Safety

Nearly half of manufacturing organizations are exploring AI agents for process automation (49%), supply chain optimization (48%), and quality control (47%). Specifically, they hope to use AI agents to intelligently monitor production lines for defects or reroute supply chain logistics when disruptions occur, dramatically boosting efficiency.

AI agents can also efficiently monitor operations from a safety perspective. Health and safety teams typically inspect manufacturing plants by sending contractors on-site to assess risk. However, these processes are time-consuming and error-prone, as incidents still occur despite protocols.

Agentic AI is a transformative opportunity in this field. It enables organizations to analyze historical data, detect patterns, and identify potential hazards before they materialize. This helps employees deliver more accurate automated risk assessment reports, resulting in safer environments.

Helping Retail, E-Commerce, and Telecommunications with Client Relations

The retail, e-commerce, and telecommunications sectors primarily plan to use AI agents for customer-facing initiatives. Half of organizations in these industries are considering AI agents for customer support purposes (50%), price optimization (49%), and demand forecasting (48%).

Agentic AI systems can analyze customer browsing history, preferences, and purchases to tailor personalized product recommendations to increase the likelihood of repeated purchases. Based on personal customer data, they can curate special offers, emails, and ads to nurture customers along the sales funnel, allowing human employees to focus on more strategic tasks.

Telecommunications organizations see value in using agents to comb through historical data, such as usage patterns, billing history, and customer support interactions, to predict which customers are at risk of churn and why. In the case of a telecommunications business, an agent can flag customers who reduced their monthly usage or had multiple interactions with customer support.

The Future of Agentic AI

Agentic AI has the potential to transform the way work gets done, regardless of industry. Regulated industries can significantly benefit from security monitoring to prevent breaches, while customer-facing organizations can improve client experiences.

Whether a financial firm is looking to improve security, a healthcare provider hopes to improve efficiency, or a retailer aims to improve customer experience, implementing these systems for industry-specific purposes can take enterprises to new heights. To get a leg up on the competition, enterprises seek to leverage this technology as soon as this year before expanding and scaling their capabilities.

Learn more about how enterprises plan to leverage agentic AI by reading the full report.

Leading in the Age of AI

Cloudera — Thu, 08 May 2025 13:00:00 UTC

As AI becomes a regular topic in boardrooms, many executives face critical blind spots around strategy, governance, and implementation. Few are native AI users, and many struggle to connect high-level goals with practical, accountable systems.

To explore that future, Dr. Maya Dillon, an astrophysicist turned AI thought leader and CEO of consultancy XSAIA, joined The AI Forecast. In her conversation with host Paul Muller, Maya emphasizes the need for human-centric leadership in AI and the importance of understanding the holistic impact of AI on businesses.

Here are some takeaways from Paul and Maya’s conversation.

What is AI? A form of co-creation

Paul: What does AI mean to you?

Maya: AI for me means co-creation. A little bit more about me: I am a huge fan of creative things that aren't on my CV. I paint, I write, I play music. And for me, when I'm using AI, I want it to be something that enhances my voice. And it's the same with anything else that I create. I bring in AI and utilize it when searching for ideas or leveling up what I already have. And in that process, it is co-creation.

That's what that means. It's enhancing what's already there. I know we've heard the adage that AI is supposed to augment human intellect. That's what it's here for. It's to bring out the best aspect of us and help us level up.

Real leaders forgo the pressure to ‘just deploy’ and approach AI thoughtfully

Paul: You talk about the idea that the winners in AI won't be the ones who have the best tech, but are the ones with the best leadership. Unpack that for me.

Maya: Fundamentally, AI-first leadership is seeing AI not just as a tech stack. It's looking at it in terms of strategy. When businesses are employing AI, the ones who lead well with it are the ones who think about the holistic impact of AI.

So, normally, what tends to happen is people put AI in a box with IT and R&D, and then they wonder why the transformation isn't happening; why aren't they the disruptors? The reason is that when you start to develop and implement AI projects, you are already changing and challenging the status quo. Developing and building AI demands that you ask particular questions. What problem am I trying to solve? How am I trying to solve it? Who am I serving? How is this going to be deployed, and what impact will it actually have?

People who are leaders in AI involve all aspects of the business. And unless you lead with that holistic view, you are going to find yourself moving from point-of-contact to point-of-contact constantly. You will be at the mercy of the next ‘latest technology’ you wish you had. Everyone seems to be running ahead, and you are falling behind. It's not that they've got the fastest or the fanciest algorithm. It's because they are seeing the world exactly the way I've just described, which is, creating the solution with a view in mind that whoever deploys it is going to be achieving X, Y, Z.

Paul: There is this enormous pressure that, ‘Hey, if we don't start doing something, our competition will. Let's just get going with something and clean up the mess later.’ Tell me why that's the case because intuitively, that seems like not a bad idea.

Maya: AI is now prolific and endemic. It's in everything. The reason it's a bad idea is because of the real-world impact these solutions actually have on people, not just on businesses. We could talk until the cows come home about all the different examples of where it's gone wrong, the lives affected, the reputations damaged.

Another big reason is the negative impact this approach has on your reputation. There’s a business leader who said it took him 20 years to build a reputation and two minutes to have it destroyed. And once the reputation is damaged, the value and the intangibles are priceless.

The power of mentorship – a rising tide lifts all ships

Paul: I know that you are an advocate of great mentoring and mentorship. What's your advice to would-be mentors and hopeful mentees?

Maya: I've been a mentor and a mentee. When someone comes to you for mentorship, you must remember one thing. Their success is not a threat to you. We've been brought into this world where we believe that success is somehow a zero-sum game. And that's the old adage. The candle does not diminish by lighting something else.

The truth is your success is enhanced by theirs. And if we bind into this scarcity mindset of, ‘Oh no. If I help this other individual, they're going to take my job or my opportunities,’ then we just buy into the scarcity mindset. We perpetuate this whole thing, this mythology, and we hold all ourselves back.

Catch the full conversation with Dr. Maya Dillion on The AI Forecast on Apple Podcasts, Spotify, and YouTube.

How Customized Professional Services Elevate Customer Experience in Product Companies: A Technical Perspective Using Apache Trino Integration

Romila Mattu,Aaron An,Ji Chen,Minling Xie — Wed, 30 Apr 2025 13:00:00 UTC

In today’s digital economy, product companies can’t rely on high-quality products alone. Delivering a seamless, personalized customer experience is now essential for standing out in a crowded market. That’s where Customized Professional Services (CPS) come in.

CPS go beyond standard product offerings by delivering bespoke solutions, specialized expertise, and ongoing support tailored to each client’s needs. In this blog, we’ll explore how CPS enhances customer experience—using Apache Trino integration as a practical example.

What Are Customized Professional Services?

CPS are tailored services designed to address the individual requirements of a customer. Unlike generic solutions, CPS involves deep collaboration to understand client goals, challenges, and environments—delivering highly specific outcomes.

Consulting and Strategy Development: Helping customers understand how to best utilize a product to meet their business goals.
Product Customization and Configuration: Adapting the product to fit the customer’s unique needs.
Implementation and Integration: Ensuring seamless integration of the product into the customer's existing infrastructure.
Ongoing Support and Optimization: Providing continuous support and making recommendations for improvements over time.

CPS are often associated with complex enterprise-level solutions, but they can be beneficial in any industry where customers require more than just off-the-shelf products.

The Importance of CPS in Enhancing Customer Experience

The customer experience encompasses every interaction a customer has with a company. It is a key determinant of customer satisfaction and loyalty. When customers feel that a company understands their unique challenges and delivers tailored solutions, it creates a deeper, more meaningful relationship. CPS directly support this by offering:

1. Tailored Solutions to Meet Unique Needs

Every business is different, and off-the-shelf products often don’t provide the flexibility needed to address specific customer requirements. CPSallow businesses to modify their solutions, integrate them with existing systems, and create value in ways that generic offerings can’t match.

For instance, if a customer needs a data analytics solution for querying large datasets spread across multiple sources, a customized service can ensure that the product fits seamlessly into their data architecture, workflows, and business goals.

2. Faster Time to Value

CPS helps customers realize the benefits of a product more quickly through faster onboarding, smoother implementation, and fewer roadblocks—leading to faster ROI and greater satisfaction.

3. Expert Guidance and Support

With CPS, customers gain access to specialized expertise, troubleshooting, and strategic advice. This is especially valuable in complex environments or when internal expertise is limited.

4. Long-term Partnership and Relationship Building

By offering customized services, companies build stronger relationships. Customers who feel seen and supported are more likely to stay loyal—and to explore additional offerings.

5. Scalability and Flexibility

CPS ensures that solutions evolve alongside the customer’s business. Whether scaling to support growth or pivoting for new market demands, CPS offers the agility customers need.

Apache Trino Integration: A Case Study of Customized Professional Services

Apache Trino (formerly PrestoSQL) is a high-performance, distributed SQL query engine designed for querying large datasets across multiple data sources. It can query data in real-time across various storage systems such as relational databases, NoSQL systems, and cloud storage platforms. Its versatility makes it an excellent candidate for integration into complex data architectures, and this is where CPS plays a vital role in enhancing the customer experience.

Let’s dive into how CPS can elevate the customer experience when integrating Apache Trino into a customer’s data ecosystem.

1. Customized Trino Deployment and Configuration

While Apache Trino is a powerful tool, configuring it to fit into an organization’s specific data infrastructure can be complex. Cloudera offering CPS can provide customers with tailored deployments of Trino, ensuring that it works optimally in their unique environment. For example:

Cloudera Manager Integration: Integrate Apache Trino with Cloudera Manager, making it as an add-on service,via developing CM Parcel and CSD file.
Cluster Sizing: Determining the appropriate cluster size based on the customer’s data volume, query complexity, and performance needs.
Security and Compliance: Configuring access controls, encryption, and compliance features to meet the customer’s security requirements.

CPS ensure Trino is not only deployed but optimized, stable, and secure.

2. Integration with Diverse Data Sources

Trino’s ability to query across data lakes, warehouses, and cloud platforms is powerful—but integration can be complex. CPS help by:

Data Connector Configuration: Setting up connectors for various data sources like Hadoop, S3 or Teradata.
Data Mapping: Ensuring that data from different sources can be queried seamlessly by mapping schemas, resolving conflicts, and standardizing data formats.
ETL Optimization: Helping customers optimize their extract, transform, and load (ETL) processes to ensure Trino can efficiently query the data.

CPS allows Trino to operate as a unified query layer—helping customers gain holistic data insights.

3. Advanced Query Optimization and Performance Tuning

Apache Trino is designed for high-performance querying, but customers may need help optimizing their queries to maximize speed and efficiency. With CPS, product companies can provide performance tuning by:

Query Optimization: Helping customers write efficient SQL queries, indexing relevant columns, and suggesting better ways to execute complex joins and aggregations.
Query Plan Analysis: Analyzing the query execution plans to identify bottlenecks and make optimizations such as partitioning, indexing, and caching.
Resource Allocation: Tuning the Trino cluster’s resource allocation (e.g., memory, CPU) to ensure that it performs optimally under different workloads.

This level of tailored support ensures that customers can run complex analytics workloads quickly and cost-effectively.

4. Ongoing Monitoring and Support

Beyond deployment, CPS provide essential post-launch services:

Proactive Monitoring: Setting up monitoring tools and dashboards to track the health of the Trino cluster, ensuring that issues are detected early and addressed before they impact customers.
Troubleshooting and Support: Providing hands-on troubleshooting and support for any issues that arise with the Trino system, such as query failures, performance issues, or integration problems.
Training and Knowledge Transfer: Offering training sessions for customer teams to ensure they can independently operate and optimize Trino over time.

This comprehensive support ensures customers can continue to derive value from their Trino integration long after the initial deployment.

Staying Ahead with Customized Professional Services

Customized Professional Services are a game-changer for enhancing the customer experience. By delivering tailored solutions, expert guidance, and long-term support, companies can help customers fully leverage powerful tools like Apache Trino—while building stronger, more loyal relationships.

As business needs grow more complex, embracing CPS isn’t just smart—it’s essential for staying competitive.

What no one tells you about RAG

Chris Burns — Tue, 22 Apr 2025 17:01:00 UTC

Let’s explore the critical stages of a RAG workflow—partitioning, chunking, embedding, and inserting—and demonstrate how Cloudera’s technology simplifies each step.

Data Partitioning: The Foundation of RAG

The first essential step in a RAG workflow is partitioning. This process involves breaking down large and sometimes unstructured data sources into meaningful segments, enabling programmatic iteration over unstructured data. Of course, the retrieval process is still possible without partitioning, but the more granular control you have over your processing, the more flexibility you will have to build flows for different data sources. Partitioning ensures that data is structured into manageable portions that align with how users query information.

Partitioning strategies vary based on the nature of the data. For example, partitioning by section headers allows for more organized retrieval when processing lengthy documents such as user manuals. In contrast, partitioning might involve breaking content down by timestamps to preserve conversational flow for conversational data such as chat logs. Another key consideration is token limits—since most embedding models have a predefined token size that can be processed at once, partitioning must align with these constraints to ensure optimal performance.

A well-defined partitioning approach helps maintain RAG applications’ accuracy, efficiency, and usability . Developers can optimize response quality by ensuring that only the most relevant data is retrieved and passed to the LLM, while minimizing unnecessary computational overhead.

Chunking: Ensuring Context Preservation

Once partitioning is complete, the next step is chunking. Chunking involves bundling related partitions together to maintain meaningful context. While partitioning breaks content into fundamental components, chunking ensures that these components retain their relationships, preventing context loss.

For example,a clause or regulation might span multiple paragraphs in legal documents. If these are partitioned too narrowly, the meaning may be lost when retrieving content based on a user’s query. Chunking helps by grouping related text segments together into a logically complete unit. This ensures that when a user issues a query, the model receives enough contextual information to generate an accurate and relevant response.

Chunking strategies vary depending on the nature of the dataset. Some approaches involve simple fixed-length chunking, where segments are grouped based on a predefined number of tokens. More advanced strategies can involve chunking the title of a document with the related text.

Effective chunking improves search accuracy, optimizes retrieval latency, and ensures that LLM-generated responses are contextually aware and precise. Additionally, by determining a chunking strategy that maximizes context preservation, you can inform the decision of your embedding model with the pre-determined knowledge of your chunk sizes.

Embedding: Transforming Text into Searchable Vectors

With well-structured chunks in place, the next step in the RAG workflow is embedding. Embeddings are numerical representations of text, allowing machines to understand and compare the semantic meaning of different text segments. Without embedding, RAG applications would be limited to simple keyword searches, which lack the contextual understanding of true semantic retrieval.

Embedding is a multi-step process that involves tokenization, vector transformation, and storage. When a text chunk passes through an embedding model, it is first broken down into tokens. These tokens are then converted into a high-dimensional vector that captures the essence of the text in a format suitable for mathematical similarity searches such as Euclidean Distance (L2) and Cosine Similarity.

Choosing the right embedding model is crucial. Some models are optimized for general-purpose retrieval, while others are fine-tuned for domain-specific applications like legal, medical, or technical documents. Another key consideration is vector dimensionality, which must align with the schema of the vector database. A mismatch in vector size can lead to inefficient searches or compatibility issues.

Once text chunks are embedded into vector representations, they become searchable using similarity metrics. This enables highly efficient retrieval of the most relevant content based on user queries, greatly enhancing the accuracy and responsiveness of RAG-powered applications.

Cloudera Data Flow offers an incredibly powerful yet easy-to-use embedding processor that evolves the capabilities of your data flows, allowing you to leverage a model within the context of the processor. There is no need to call an API (no GPU required). The processor has three simple properties:

Building RAG Applications - The Devil is in the Details

Building Retrieval-Augmented Generation (RAG) applications can become complex quickly, requiring careful handling of data ingestion, processing, and retrieval. Traditionally, developers have navigated the steps of chunking data, inserting embeddings, and integrating vector databases.

However, one of the most common pitfalls when implementing a RAG solution is failing to understand how these components are co-dependent. Developers should ask the question, “Can our data be chunked as-is, or should we refine it prior to chunking?”

Cloudera Data Flow and Cloudera’s exclusive RAG Pipeline processors simplify the complex process of refining unstructured data through partitioning, enabling more effective chunking and higher-quality vector embeddings. While poorly designed partitioning or chunking can harm performance and embedding quality, Cloudera’s tools abstract much of this complexity, streamlining the development of efficient and reliable RAG solutions.

This gives you the granular control to choose the best embedding model for each data flow.

Inserting the Embedded Chunks into a Vector Database: Enabling Efficient Retrieval

The final step in the RAG workflow is inserting the embedded chunks into a vector database. Vector databases are designed to perform high-speed similarity searches, enabling the efficient retrieval of relevant content when a user issues a query.

Unlike traditional databases that rely on structured indexing for exact matches, vector databases leverage similarity searches and algorithms such as ANN and KNN to find embeddings that closely match the user’s query. This is what enables RAG applications to retrieve semantically relevant content, even if the query wording differs from the stored text.

Once embedded data is inserted into the vector database, the system is ready for real-time querying. When a user submits a request, the query is transformed into an embedding, compared against stored vectors, and the most relevant results are retrieved, forming the basis of the LLM’s response.

Cloudera Data Flow offers many VectorDB connection processors such as Milvus, Pinecone, and Chroma, with more on the way.

Streamline Your RAG Application Development Today

With Cloudera Data Flow and its specialized RAG Pipeline processors, organizations can now build, deploy, and optimize RAG applications with unprecedented ease. By abstracting much of the technical complexity, Cloudera’s solutions enable developers to focus on enhancing retrieval accuracy, optimizing response generation, and improving the overall user experience.

businesses can rapidly implement RAG solutions that scale efficiently and deliver precise, context-aware responses by leveraging Cloudera’s exclusive partitioning, chunking, embedding, and VectorDB integration processors.

If you’d like to explore how Cloudera can help streamline your RAG application development, reach out to our team for a demo or check out our technical documentation for more information.

Stay tuned for an upcoming deep dive into advanced RAG optimization techniques!

Learn More:

To explore the new capabilities of Cloudera Data Flow 2.9 and discover how it can transform your data pipelines, watch this video.

https://community.cloudera.com/t5/What-s-New-Cloudera/Cloudera-DataFlow-now-powers-GenAI-pipelines-and-supports/ba-p/388173

https://community.cloudera.com/t5/What-s-New-Cloudera/Cloudera-DataFlow-2-9-now-supports-building-GenAI-pipelines/ba-p/395546

How Cloudera Helps Enterprises Meet Their ESG Goals

Blake Tow — Tue, 22 Apr 2025 13:00:00 UTC

Across the globe, organizations are united by one shared priority: sustainability. While committing to environmental, social, and governance (ESG) initiatives is simply the right thing to do to preserve our planet for future generations, businesses face additional pressure to prioritize sustainability via regulations, expectations from investors, and consumer preferences. Since 2019, thousands of companies have committed to achieving net-zero emissions. More than 9,000 organizations have pledged to reach carbon neutrality by 2030, and that number is expected to continue growing year over year.

At Cloudera, we’re proud to help enterprises meet their ESG goal of carbon neutrality through smarter, data-driven technology decisions. We also practice what we preach by driving our own journey toward carbon neutrality. As we celebrate Earth Day, we’re highlighting the ways our platform empowers customers to operate sustainably, efficiently, and responsibly.

Why ESG Matters More Than Ever

ESG initiatives are reshaping industries and becoming integral to corporate strategy and operations. Energy consumption is a key focus area for vendors and consumers alike:

Data centers use about 4.4% of the world’s electricity today, and that will only increase as demand grows. This substantial usage has placed the infrastructure providers under increased scrutiny, and it has compelled companies to adopt sustainable practices to mitigate their environmental impact.
To meet the soaring power demands driven by AI, cloud vendors are investing billions in advanced energy infrastructure—from small modular nuclear reactors to undersea cable networks—to help offset the ecological impact.

Simultaneously, regulatory bodies are enforcing greater transparency and accountability. For example, the European Union’s (EU) Corporate Sustainability Reporting Directive mandates that large companies, including non-EU enterprises, share clear details about their environmental and social impact.

Sustainability has emerged as a significant competitive differentiator. Consumers and investors increasingly evaluate companies based on their ESG commitments and performance. Organizations that proactively integrate sustainable practices are positioning themselves for long-term success.

Cloudera's Role in Supporting ESG

As ESG becomes mission critical, enterprises must evaluate their vendor relationships to ensure that their technology partners can support their enterprise sustainability initiatives.

Cloudera’s true hybrid platform for data, analytics, and AI empowers you to make informed infrastructure choices that align with your unique ESG initiatives. The solutions you build with our platform can help you lower your environmental impact while delivering high performance and driving business growth.

Here’s how Cloudera enables sustainability through data, analytics, and AI:

Lowering Emissions with a True Hybrid, Cloud-Native Platform

Our true hybrid platform for data, analytics, and AI enables businesses to manage data across their on-premises infrastructure and any cloud environment. This flexibility allows organizations to choose the most efficient, and often lowest carbon, infrastructure for each workload. It also reduces infrastructure sprawl and aligns IT resource usage directly with business demand—a powerful step toward achieving ESG targets.

An additional component that drives sustainability is the cloud-native elasticity of our platform. Enterprises can scale resources dynamically, up when needed and down to zero when idle, which ensures they use only what they need. For example, PTT OR, a leading energy company in Thailand, uses Cloudera’s hybrid platform to optimize its supply chain and reduce fuel delivery routes, leading to lower energy consumption and a reduced carbon footprint.

Reducing Carbon Footprint Through High-Density Storage

Cloudera Object Store, powered by Apache Ozone, makes data storage more sustainable. With the capability to scale to more than 10 billion objects per cluster, Ozone significantly reduces physical hardware requirements, energy use, and carbon emissions. And it supports larger drive sizes, resulting in higher storage density compared to systems like HDFS.

Many of our large enterprise customers already leverage Cloudera Object Store to collectively manage more than an exabyte of data and consolidate infrastructure, optimize resources, and minimize environmental impact. That is what makes Cloudera Object Store a powerful storage choice for enterprises committed to advancing their ESG goals.

Lowering Power Consumption with Energy-Efficient Compute

By leveraging energy-efficient ARM processors like AWS Graviton and AMD, Cloudera helps enterprises reduce their power usage significantly, without sacrificing performance. Switching to ARM processors can save businesses between 20%-90% on infrastructure costs while lowering their carbon footprint.

Reducing Resource Waste with Intelligent Data Lifecycle Management

Cloudera Lakehouse Optimizer automates complex data lifecycle management tasks for Apache Iceberg, including compaction and table cleanup operations, making queries more performant and efficient while reducing resource consumption. This automation supports enterprise ESG goals by minimizing energy usage, hardware requirements, and ultimately, carbon footprints.

Driving Continuous Efficiency Through Operational Insights

Cloudera Observability provides unparalleled insights into enterprise operations. By highlighting inefficiencies and wasteful energy consumption, Cloudera Observability enables continuous improvement for sustainability metrics.

Enhancing ESG Transparency with Automated Data Lineage

Cloudera’s recent acquisition of Octopai strengthens our support for ESG transparency with powerful, automated metadata management. Octopai’s AI-driven solutions provide clear, real-time insights into data lineage, which makes ESG reporting more accurate and reliable. This helps businesses avoid compliance issues and ensures their sustainability claims are real and auditable.

Social Responsibility and Data-Driven Inclusion

ESG goes beyond the environment to include social impact. Cloudera’s data tools help customers track and improve workforce management by offering visibility into diversity, fair pay, hiring, and employee satisfaction. These insights help businesses improve socially and environmentally, while promoting fairness and inclusion for everyone.

Looking Ahead: Our Commitment to ESG Innovation

Sustainability is a journey and, at Cloudera, we're constantly innovating to further support our customers' ESG strategies. Our future initiatives include even deeper integrations with sustainable technologies, enhanced automation for ESG reporting and compliance, and growing and expanding partnerships that help enterprises meet and exceed global ESG standards.

Building a Sustainable, Data-Driven Future

For Earth Day, we at Cloudera want to reiterate our commitment to our customers’ growth as well as protect our collective future. By leveraging data and innovative technology, enterprises can meet sustainability goals without compromising their operational or business objectives.

Let’s shape a greener future together. Reach out today to see how Cloudera can power your ESG goals, or click here to start a free trial!

The Breakout Year for Enterprise AI Agents

Cloudera — Mon, 21 Apr 2025 13:00:00 UTC

2025 is shaping up to be a landmark year for enterprise AI. Advancements in generative AI (GenAI) and large language models (LLMs) have brought the transformative power of agentic AI to the forefront of every IT leader’s mind. While often mistaken for simple chatbots, AI agents are far more advanced—autonomous tools capable of executing complex, goal-oriented tasks. Their impact is already felt across sectors—from real-time fraud detection in finance to workflow optimization in manufacturing and precision diagnostics in healthcare.

With investment and adoption of AI agents soaring, how are enterprise leaders prioritizing the technology in their own organizations? To explore how organizations are embracing this new wave of AI, Cloudera surveyed 1,484 enterprise IT leaders across 14 countries. The findings reveal not just a strong commitment to AI agents, but a transformative shift in how businesses are planning, deploying, and evaluating them.

Enterprises Are Doubling Down on Agentic AI Investment

Adoption of AI agents is no longer an experimental endeavor—it’s a strategic imperative. An overwhelming 87% of respondents said investing in AI agents is essential to stay competitive. Even more telling, 96% plan to increase their use of agents in the next 12 months, with half of them aiming for widespread, enterprise-level implementation.

Considering the uptick in investment, agentic AI adoption is actually a relatively new development for many enterprises. In fact, a majority (57%) of surveyed respondents said their organizations only began implementing them in the last two years, with 21% starting within the past year. This rapid embrace of agentic AI is reflected in how organizations are prioritizing their investments.

Top areas of investment reflect this mindset. Organizations are prioritizing performance optimization bots (66%), security monitoring agents (63%), and development assistants (62%)—tools that promise to enhance both productivity and resiliency. So, how are organizations enabling those AI agents to take hold? According to survey respondents, 66% said they are using enterprise AI infrastructure platforms to develop and deploy AI agents. And 60% are taking advantage of agent capabilities embedded within their existing core applications.

As adoption accelerates, these trends signal a critical need for enterprises to have a reliable, scalable, data infrastructure in place. Given the universal commitment to investing in that adoption, organizations need to ensure that infrastructure is in place quickly, or risk getting left behind on the road to agentic AI.

Where AI Agents Deliver Value—and What’s Holding Them Back

Once successfully implemented, AI agents can deliver tremendous value to organizations. Some of the tangible benefits that enterprises are seeing include improving existing GenAI models (81%), and applications that include customer support (78%), process automation (71%), and predictive analytics (57%).

For most enterprises, those AI agents are most deeply embedded in their IT operations (61%). This was followed by customer support (18%) and marketing (6%), as areas of implementation. Companies that have incorporated agents into IT functions are more likely to branch out into customer and marketing use cases, suggesting that IT is the natural launching pad for wider agentic AI integration.

The benefits are clear, and the use cases associated with AI agents have serious potential to transform the way enterprises function. But the path forward still comes with some apprehension. Specifically, over half of IT leaders identified data privacy (53%) as a concern with adopting agentic AI, while integration with legacy systems (40%) and implementation costs (39%) followed close behind.

Organizations have to manage a delicate balance of protecting sensitive data but also ensuring it is leveraged throughout the AI lifecycle. Any inadvertent exposure of that sensitive data could hamper the quality of AI outputs and leave organizations out of compliance with key regulations, like DORA.

AI Built with Accountability

The concerns many organizations have often boil down to a few major considerations—trust and bias. As AI agents gain more responsibility and take control over mission-critical tasks, questions of accountability, fairness, and transparency are becoming top of mind. Over half (51%) of enterprise leaders reported significant concerns about bias in AI systems.

Understanding of bias is growing, and enterprises are taking added steps to build in accountability and govern AI properly. A sizable number of respondents (38%) are implementing a number of processes that include human reviews, diversified training data, and formal fairness audits. Beyond those steps, another 36% said they have introduced some bias-check measures, like periodic human reviews or bias-detection tools.

But for all the mitigation efforts that are under way, a gap still exists with some enterprises. Cloudera found that 14% of respondents said they have only taken minimal or ad hoc steps so far to combat bias. AI agents cannot function without accountability and fairness. Enterprises adopting this technology need to ensure that they are taking the necessary steps to reduce the impact bias can have on real-world outputs.

Accelerating AI Agent Adoption

2025 is poised to be a massive year, with accelerating adoption of AI agents unlocking new use cases for businesses everywhere. And as investment in the technology accelerates, prioritizing adoption of AI agents is quickly becoming nonnegotiable for long-term success.

Find out what else Cloudera’s survey uncovered and take a deeper dive into the state of AI agents in 2025.

#ClouderaLife Allyship April Q&A: Otho Lyon

Ashton Stockstill — Wed, 16 Apr 2025 13:00:00 UTC

Allyship is more than a concept—it’s a commitment. It’s about standing beside others, listening deeply, and using our voices to uplift marginalized communities. As we turn the calendar to a new month, we enter into a celebration of Allyship April. Over the course of this month, Clouderans will engage in conversations and activities to show support, advocate for, and understand the experiences of others.

As part of our celebration of allyship, we sat down with Otho Lyon, vice president of customer experience, to learn a bit more about his journey back to Cloudera, what allyship means to him, and how he puts words into action in his own life.

Here’s what Otho had to say.

Can You Tell Us a Bit About Your Journey with Cloudera?

My journey with Cloudera has certainly been unique. I was here for seven years before leaving for another role. But after a year and a half away I made the decision to leave that company, ultimately ending up back at Cloudera. An important part of that decision came down to my own skillset and the kind of work I am most passionate about. I’ve always known what my superpower was—people. I love connecting with others and helping to solve challenges for our customers. Cloudera is where I’ve always found the most opportunity to flex those people skills – I’m able to help our organization grow by delivering the best possible experience for our customers and partners.

But it’s about much more than just the work here, there are real people behind the customers we serve. And whether it’s a healthcare or financial services client, the use cases we are involved in have a major impact on people’s lives. Having the opportunity to get back to Cloudera has been incredibly rewarding, and I’m excited for what the future has in store.

Shifting to Allyship April, What Do You Think Makes This Topic So Important?

In my opinion, Allyship is the fabric that keeps our society functioning. We need that support and understanding of one another in order to grow and work together. We all have differences and recognizing those is critically important. There are certainly a variety of definitions, but true allyship comes down to two things: listening to others and taking meaningful action. Making a point to utilize your voice and advocate for equality and inclusion for those who might be marginalized is so important.

To go a level deeper, a critical component of allyship is leading by example—can you go further than just talking about being an ally? Can you ‘walk the walk’, so to speak. My dad used to tell me “You talk a good talk, but the walk is long.” That’s something I’ve kept in my mind throughout my life. The real work comes when you put yourself out there and support the people around you. It’s a mindset where you are always thinking about how you can put those sentiments of equality, inclusion, and advocacy into action.

How Do You See Allyship in Action at Cloudera?

As someone who was at Cloudera, left briefly, and then returned, I think I have a really unique perspective on this. Allyship is something I have always seen as a core part of the way we operate at Cloudera. Coming back to Cloudera, it’s been abundantly clear that the emphasis on allyship and the commitment to fostering a supportive work environment has remained constant.

It’s something that I’ve consistently seen with our executive leadership too. Everyone is totally bought in, which really helps set the tone for the way we all work together. There’s a top-down approach here that is deeply aligned with the ’lead by example’ aspect of allyship I mentioned earlier.

How Do You Put Allyship Into Action in Your Own Life?

This is something that’s deeply personal for me. And as I mentioned earlier, it’s so important to get beyond just talking about allyship and start really bringing it to life. Whenever I find myself out in the world, I always make it a point to be conscious of how I treat other people and ensure I’m coming to every interaction with respect and a sense of fairness. I also work hard to maintain a highly diverse circle of friends and connections to keep myself balanced.

I also do a lot of mentoring with the next generation of professionals, helping prepare them for the working world. I love to have conversations with younger generations and really help them flex that muscle of having face to face interactions. Within my own personal life, my family and I also spend time giving back in our community whether that’s through volunteering or just a small act of kindness to someone in need.

For anyone wondering what they need to do to be a better ally, there’s plenty of opportunities out there—being a good ally doesn’t have to feel like rocket science. Volunteer, help out, just get involved in your community or at your workplace. I would also say don’t be afraid to be bold and stand for what you believe in, that gives you a really strong foundation to build off of when growing as an ally.

Learn more about what Cloudera is doing to take action and support allyship.

Unlocking the Future of AI with ClouderaNOW

Cloudera — Mon, 14 Apr 2025 13:00:00 UTC

The world of AI and analytics is evolving at a breakneck pace. Staying ahead requires more than just keeping up, it demands hands-on access to the latest innovations and insights from industry thought leaders. That’s precisely what ClouderaNOW, our quarterly virtual event series, is designed to deliver.

ClouderaNOW provides direct access to AI advancements, machine learning strategies, and real-world use cases. Through interactive demos, customer success stories, and live Q&A, attendees gain the knowledge and tools to turn AI potential into real business impact.

In our most recent ClouderaNOW series, we hosted five webinars exploring the latest trends in AI adoption. Here’s a recap of the key takeaways from our first event.

The State of AI Adoption

AI is no longer a futuristic concept but rather it’s a critical part of modern business strategy. According to Cloudera’s State of Enterprise AI survey, 50% of businesses already use Generative AI (GenAI). Even more notable: 0% reported having no plans to adopt it or actively ban it. This means every business, regardless of industry, is exploring AI in some capacity.

Jake Bengston, Cloudera’s global AI solution director, covered how many organizations approach AI adoption. They often begin their AI journey by leveraging managed services through external APIs such as ChatGPT, Claude, or Gemini. This initial phase allows companies to quickly test AI’s capabilities, often with chatbots, content generation, or internal workflow automation, without the overhead of building and managing AI infrastructure. It can be a quick way to show AI’s ROI.

However, as businesses start integrating AI into their workflows, they hit a key limitation: off-the-shelf models fail to fully align with industry-specific needs. To enhance AI performance and relevance, companies begin customizing models using techniques like prompt engineering, Retrieval-Augmented Generation (RAG), or fine-tuning to incorporate their proprietary data.

As AI matures within an organization, businesses often transition to open-source models like LLaMA, Gemma, or DeepSeek to increase control, enhance security, and reduce long-term costs. Doing so opens up more possibilities.

The Shift to Open-Source AI Models

Open-source AI is becoming the preferred choice for businesses that need customization, privacy, and cost control. By self-hosting models, companies can fine-tune AI for industry-specific applications while ensuring sensitive data remains secure. This is crucial as data-driven insights increasingly drive competitive advantage.

Cost is also a key factor. Managed AI services typically charge per-token fees, which can quickly scale and become unpredictable. Cloudera’s internal benchmarking found that hosting an open-source AI model within Amazon Web Services (AWS) can reduce costs by up to 40% compared to API-based alternatives. As businesses prioritize cost efficiency and data control, many are shifting toward custom AI solutions that balance performance, privacy, and sustainability.

As more businesses embrace open-source AI to gain greater control over their models and costs, the challenge shifts from simply adopting AI to efficiently deploying and scaling it. To bridge this gap, Cloudera provides Accelerators for ML Projects (AMPs), which are pre-built, one-click deployment solutions that streamline the transition from AI experimentation to production.

Accelerating AI with Cloudera’s AMPs

Many data scientists don’t build AI models from the ground up. Instead, they adapt existing solutions, which can lead to quality issues, integration challenges, and inefficiencies. Cloudera AMPs solve these problems by providing tested, production-ready AI accelerators that work seamlessly within Cloudera’s ecosystem. In addition to accelerating AI projects, Cloudera AMPs are fully open source and include deployment instructions for any environment, serving as further testament of Cloudera’s commitment to the open source community.

AMPs in Action

ClouderaNOW covered two key Cloudera AMPs helping enterprises reach AI production faster:

Fine-Tuning Studio

Fine-tuning allows businesses to adapt large language models (LLMs) for specific tasks, but traditionally, this requires deep technical expertise. Cloudera’s Fine-Tuning Studio is a one-stop-shop studio application that covers the entire workflow and lifecycle of fine tuning, evaluating, and deploying fine-tuned LLMs in Cloudera’s AI Workbench

Fine-tuned models often outperform larger, generic AI models for specialized tasks while also reducing computational costs. Developers, data scientists, solution engineers, and all AI practitioners working within Cloudera’s AI ecosystem can easily organize data, models, training jobs, and evaluations related to fine tuning LLMs.

RAG with Knowledge Graph

RAG enhances AI accuracy by integrating real-time, domain-specific data. However, standard RAG implementations rely solely on vector search, which can overlook important contextual relationships between pieces of information.

Cloudera’s RAG with Knowledge Graph AMP improves AI responses by combining vector search with Neo4j-powered knowledge graphs. This helps contextual understanding by establishing meaningful connections between data points. By prioritizing authoritative sources over purely semantic matches, it ensures AI-generated responses are more factually reliable and relevant to users' specific needs.

Cloudera’s AMPs help businesses transition from AI experimentation to real-world deployment. Rather than building models from scratch, organizations can leverage tested, ready-to-use solutions that seamlessly integrate with enterprise environments.

More Insights on Accelerating AI Adoption

To dive deeper into how Cloudera is helping businesses accelerate AI adoption, watch the full webinar here. Want to stay updated on the latest innovations? Sign up for upcoming ClouderaNOW events here.

Octopai Acquisition Amplifies Metadata Across the Data Estate

Venkat Rajaji — Thu, 10 Apr 2025 13:00:00 UTC

Whether it’s a financial services firm or a major healthcare provider, no other technology is poised to be as transformative for businesses as artificial intelligence (AI). It is redefining how every organization approaches its data and business applications. And so, it’s within this reality that Cloudera has consistently recognized how important it is to set out a clear and actionable roadmap to tap into the potential of AI and GenAI.

As we continue redefining the way organizations approach critical business applications, the acquisition of Octopai represents the next step in that journey.

Octopai Arrives at Just the Right Time

Making the most of data for AI, machine learning, and predictive analytics initiatives is a top priority for countless organizations and a key to improving operations. The effective, data-driven, decision-making that leads to those outcomes depends on their ability to harness company data—specifically, it depends on data that can be trusted.

That process is only as good as the data lineage system that helps users quickly find what they need. Without the right tools, searching for data would be akin to looking at the ocean but only seeing the coastline, ignoring the vast surface area that exists beyond that.

Octopai comes as yet another expansion of Cloudera’s portfolio, fresh off of the acquisition of AI startup Verta in 2024. This acquisition signaled Cloudera’s strong commitment to supporting AI adoption and further cemented its position as an industry leader in the space. Bringing in Verta’s AI solutions, including model cataloging, model development, model monitoring, and AI governance tools. Combined with Cloudera’s platform these expanded offerings have set a strong foundation of AI.

This steady growth, set against a backdrop of AI dominance in the marketplace, has made it the perfect time for Cloudera to offer its customers even more capabilities spanning data lineage, data discovery, mapping, and impact analysis. With that, the stage was set for yet another expansion of Cloudera’s AI portfolio, this time through the acquisition of Octopai.

What It Means for Cloudera

The acquisition, which brought Octopai’s data lineage platform into Cloudera’s data lineage and metadata management capabilities, helps organizations understand and govern their data and their AI initiatives. But beyond that, the capabilities within Cloudera’s platform are now able to help organizations amplify the value of their data and leverage the vastness of the data estate to achieve data as a product—something that is highly sought after among the most ambitious, AI-minded businesses.

Now, Cloudera’s platform is ready to deliver a significantly deeper picture of data, letting businesses quickly find relevant data in complex data sets across cloud environments, follow its journey from the source to ensure quality and stay in compliance with critical regulatory requirements like GDPR and HIPAA with tools that automatically map data across systems and provide detailed insights.

The addition of Octopai is a true game changer for organizations looking to tap into the structured and unstructured data that exists in the data estate. It’s one thing to have the data estate but it is a whole separate endeavor to actually achieve trust, understand it, and use it to implement AI that powers data-driven decisions.

The integration of Octopai is a major step in Cloudera’s goal to deliver trusted data that spans the entire data estate and help our customers in their efforts to power the most robust data, analytics, and AI applications.

Read more about the acquisition of Octopai and what it means for Cloudera customers.

Use AI to Unearth Trends in Customer Loyalty Program Data

Juno Schaser — Mon, 07 Apr 2025 15:00:00 UTC

Before publishing, you can customize the welcome message shown on the dashboard, the prompt that is sent to the AI model, and token limits for each response. These settings enable you to manage response length, optimize performance, and control costs as well as provide your business users with an overview of how to leverage this powerful AI analytics solution.

After publishing, you can interact with the AI Visual by asking questions about your data using voice or text. Want to understand your new loyalty program members’ purchasing patterns in a particular product category? Just ask the AI Visual about trends in this segment, and it will respond with key data points related to customer shipping preferences, purchase frequency, and use of discount and promo codes. These data insights can quickly and easily optimize your retail strategy.

Bring AI to Your BI Workflows with Cloudera Data Visualization and Cloudera AI

If you’re in the retail and consumer goods sector, you know that customer data is a goldmine. An organization’s ability to leverage this data and unlock insights into customer behavior is key to expediting and supercharging sales and marketing efforts.

For example, analyzing loyalty program data can tell beauty brands which skincare products, lip gloss shades, and fragrances their top-tier shoppers are searching for, or it can tell a grocery retailer which customers may have food allergies or dietary restrictions, allowing both retailers to take those data points into consideration and send relevant, targeted digital coupons.

Of course, this wealth of data is worthless if organizations can’t turn it into actionable insights. Not to mention, they must do so while contending with an evolving landscape of data privacy regulations, and while ensuring the security of their customer data.

Cloudera’s artificial intelligence (AI) and business intelligence (BI) offerings are transformative to this effort, helping organizations across the retail industry quickly and securely analyze their datasets, ultimately driving tremendous ROI and increasing the lifetime value of every customer.

Let’s take a closer look at two solutions—Cloudera AI and Cloudera Data Visualization—that can help retailers slice and dice through heaps of loyalty program data and identify patterns and trends in your customers’ purchasing preferences and transaction histories, all underpinned by a robust set of security and governance capabilities.

Cloudera Data Visualization also uses your questions to create smart filters on the dashboard automatically. For example, asking a question about backpacks will filter the rest of the visuals to show records containing similar products. Not only does this let you validate the accuracy of the model’s response, it gives visual consumers an even deeper layer of insight into the data, without forcing them to dig through endless menus to hunt for the relevant data points.

How Retailers Can Use AI Visual

AI Visuals in Cloudera Data Visualization are incredibly easy to build—no coding skills required. You can configure an AI Visual using Cloudera Data Visualization’s drag-and-drop dashboard builder to show fields including items purchased, promo codes used, and so on.

Deploying Private AI with Cloudera AI

Cloudera AI is built to empower your organization to wield control over your data while still capitalizing on AI innovations.

The Cloudera AI Inference service provides a production-grade environment for deploying AI models at scale and delivers performance without compromising on security. Since all model endpoints are within your organization’s security perimeter, your data—such as your proprietary customer loyalty data—is never exposed. With Cloudera, you can deploy Private AI across the enterprise, including in your BI and data analytics workflows.

Introducing the AI Visual in Cloudera Data Visualization

Cloudera Data Visualization is a powerful BI tool for creating interactive dashboards and custom applications. It’s designed as the first stop for business analysts, data scientists, or line-of-business users that need to turn data into insights.

Cloudera Data Visualization’s AI Visual feature brings AI to your BI workflows. With the power of retrieval-augmented generation (RAG) inside the dashboard, you can explore AI-driven insights alongside traditional analytics, query large datasets in natural language, and pass the context of a conversation to structured reports for dynamic decision-making.

Figure 1: Adding columns to the Embedding Context shelf in Cloudera Data Visualization’s drag-and-drop visual builder.

Figure 2: Asking the AI Visual in a published Cloudera Data Visualization dashboard “What trends are visible for new members that have purchased a backpack recently?”

Configuring Model Profiles in Cloudera Data Visualization

The Cloudera Data Visualization 8.0.0 release brings support for connecting to models from a variety of service providers through the authentication methods of your choice. This includes out-of-the-box support for Amazon Bedrock, OpenAI, and Microsoft Azure OpenAI models, along with seamless JWT authentication for models hosted on your Cloudera platform with Cloudera AI Workbench and Cloudera AI Inference service.

You can now easily configure and save multiple model profiles within a single Data Visualization instance, tailoring your AI applications to the models of your choice without friction. Want to use a trusted off-the-shelf model for most of your applications, and an experimental fine-tuned model for a specific set of dashboards? It’s possible.

Gain Deeper Data Insights With Cloudera

Cloudera gives retailers more control over—and insight into—customer data, reducing the time required to parse through datasets like loyalty program data, purchase preferences, and transaction histories.

Where previously an analyst might spend days digging through data from disparate sources—including social media and influencer marketing campaigns, e-commerce and brick-and-mortar sales, and supply chain operations—now using Cloudera’s data and AI solutions, they can work smarter not harder.

Launch your retail and consumer goods data analysts into a new level of productivity. Analyze your data and access insights at speeds like never before with Cloudera AI and Cloudera Data Visualization.

Here are a few options to get started:

Take a tour of Cloudera AI Inference service
Watch a video of Cloudera Data Visualization’s AI Visual in action
Watch a webinar highlighting Cloudera Data Visualization’s self-service functionalities
Contact us to speak directly with a member of our sales team
5-day free trial of Cloudera solutions

3 Elements of a Forward-Looking Data and AI Strategy

Pamela Pan — Thu, 03 Apr 2025 13:00:00 UTC

Key Takeaways from the 2025 Gartner® Data & AI Summit

“AI is inevitable, but is your data ready for all AI has to offer?” That was the unspoken question every keynote, panel, and hallway conversation sought to answer at the 2025 Gartner® Data & Analytics (D&A) Summit. Gartner’s® response was loud and clear: AI can drive incredible value, but without a good data foundation, it’s garbage in, garbage out.

Every year, the Cloudera team attends the D&A summit. We exchange insights with top analysts and enterprise data leaders, participate in panels, host sessions, and engage with data practitioners at our booth. From these conversations, we know organizations are interested in AI, but they want guidance on how to make smart investments that support their current strategies and set them up for future success.

In this blog, we’ll highlight three takeaways from the summit that data leaders can use to build a solid data foundation and a future-proof data and AI strategy:

Fuel AI-Ready Data with Metadata and Governance Tools
Build Open Architectures with Engine Freedom
Leverage Private Cloud AI for Sensitive Data

1. Fuel AI-Ready Data with Metadata and Governance Tools

In the keynote session, Gartner® identified data quality as the top risk D&A leaders must solve, stating that poor data availability is the biggest barrier to AI implementation. To ensure data is AI-ready, enterprise teams must align data with specific AI use cases, enforce contextual governance, and assess data qualifications continuously.

Comprehensive metadata practices are central to an organization’s AI-readiness efforts. Without unified data governance, Gartner® predicts that by 2027, 60% of companies will fail to realize the anticipated value from their AI use cases.

Cloudera empowers organizations to meet AI-readiness with a unified approach to metadata management, security, and data governance. This gives organizations better visibility, accessibility, traceability, and control over all their data, anywhere it may be, ensuring they can trust the outcomes delivered from the data. With the recent acquisition of Octopai, Cloudera enables customers to gain full visibility across the entire data ecosystem—from on-premises and cloud databases to ETL, analytics, and reporting tools—ensuring robust governance and transparency across all data touchpoints.

2. Build Open Architectures with Engine Freedom

Another critical component of an AI-ready data foundation is an open architecture. Across sessions, Gartner® emphasized that to realize the full value of AI, organizations need to follow a modular, open approach that builds trust across every layer of their data ecosystem.

The modular approach to building a tech stack requires organizations to move beyond rigid, one-size-fits-all architectures. Rather than rely on a single vendor for every need, data leaders will benefit from selecting best-of-breed tools that support diverse, modern data and AI workloads.

A modular approach achieves a balanced integration of FinOps, DataOps, and PlatformOps:

FinOps optimizes costs and resources.
DataOps ensures data flows smoothly and remains healthy.
PlatformOps combines FinOps and DataOps to maintain platform performance, integration, and scalability, and drive efficiency across the entire ecosystem.

Open architecture is at the heart of the Cloudera platform, offering enterprise-scale data management without vendor lock-in. Powered by Apache Iceberg, Cloudera’s open data lakehouse uniquely supports hybrid and multi-cloud environments while offering full engine flexibility for diverse workloads across data engineering, advanced analytics, and AI. This allows you to bring the engine of choice and future proof your platform. With Cloudera Observability, customers can further monitor, optimize, and financially govern various deployments in real-time.

Data and analytics teams can create, govern, and share data products—like datasets, dashboards, and AI models—directly from Cloudera’s open data lakehouse. Zero data copies and zero ETL is key to lowering TCO while ensuring security and scalability across teams." - Jeff Healey, SVP of Product Marketing, Cloudera.

3. Leverage Private Cloud AI for Sensitive Data

Security and data governance are perennial themes at the yearly D&A summit. As AI reshapes the data landscape, discussions around these topics focus on the best methods and tools for ensuring security and governance.

This year, Gartner® highlighted the importance of private cloud AI, stating that organizations should deploy small language models (SLMs) on-premises or in private cloud environments to enhance security and compliance.

Private AI refers to the ability to use all your proprietary data to build and run AI models, applications, and agents, without data or insight being shared outside of your organization in any way. To leverage private AI, organizations must have an open foundation, the ability to leverage public and on-premises infrastructure alike, and seamless integration across the data and AI lifecycle.

Cloudera is uniquely positioned to meet this demand with Cloudera Private AI, empowering enterprise customers to build and run AI with full control and zero external exposure of sensitive data.

At the summit, our team showcased Cloudera’s new AI Inference service, which streamlines deployment of production-ready models, applications, and agents, providing ease of use and scalability, with future support for hybrid and on-premises deployments. Attendees also experienced Cloudera AI Agent Studio in action, watching AI agents come to life in under five minutes with a low-code experience.

What’s unique about Cloudera’s AI Studio is that teams can switch seamlessly from low-code to high-code. We’re all about empowering AI builders—whether they are data scientists, engineers, GenAI builders, or business analysts—to collaborate effortlessly and accelerate AI innovations with proven ROI." - Robert Hryniewicz, Director of Enterprise AI GTM, Cloudera

Data Leads the Way in the Enterprise AI Journey

As we look ahead, the convergence of AI, metadata governance, and open ecosystems presents an exciting frontier for enterprises. The insights shared at the 2025 Gartner® D&A summit resonate strongly with Cloudera’s commitment to empowering organizations to unlock the full potential of their data through secure, scalable, and flexible solutions that foster collaboration and drive AI innovation.

Learn more about how Cloudera helps customers drive AI and analytics success. Or, if you’re ready to dive in, you can try the various Cloudera tools and services mentioned here free for five days.

Announcing Cloudera’s Unified Runtime

Blake Tow — Mon, 31 Mar 2025 13:00:00 UTC

At Cloudera, we’re committed to helping enterprises navigate and simplify the complexities of hybrid data management. That’s why we’re thrilled to unveil Unified Runtime as part of Cloudera’s 7.3.1 release. This update paves the way for seamless workload portability, consistent feature access, and synchronized updates across all environments, delivering a true hybrid experience for our customers.

What is “True Hybrid”?

“True hybrid” is more than a hybrid cloud infrastructure. Delivering a true hybrid experience means giving organizations the flexibility to manage, analyze, and govern their data seamlessly across both cloud and on-premises environments. It enables consistent operations, portability, and governance regardless of where your data resides.

A key aspect of true hybrid is the ability to bring compute to your data (processing data where it resides rather than moving data to a central processing location). This reduces the need for data movement and enables faster, more efficient analytics. With Cloudera’s Unified Runtime, organizations can break down silos and optimize for cost, performance, or compliance without compromising capabilities or user experience.

Why is Unified Runtime a Game Changer?

Unified Runtime enables organizations to manage data and analytics for cloud and on-premises deployments cohesively, federated into a unified whole. This helps organizations standardize on the development of analytics applications, which greatly simplifies IT management and reduces operational overhead.

It also ensures that features, updates, and workflows are consistent between infrastructures, meaning they work and feel the same across all environments. This dramatically improves the user experience.

Additionally, and perhaps most importantly, Unified Runtime plays a crucial role in helping us deliver a true hybrid experience for our customers. Cloudera is the only true hybrid platform offering dynamic workload portability, centralized security, and scalable solutions for modern data architectures. Whether you're scaling AI initiatives or optimizing analytics, our roadmap reflects a commitment to building the future of true hybrid.

How Do I Upgrade?

Based on your current deployment, there are different paths for customers looking to upgrade to Cloudera Runtime 7.3.1:

Cloudera Base on premises: Direct upgrades (including downgrades and rollbacks) are supported from versions 7.1.9 SP1, 7.1.8, and 7.1.7 SP3.
Cloudera on cloud: Direct upgrades are supported from 7.2.18.0, 7.2.18.100, 7.2.18.300, and 7.2.17.200 through 7.2.17.500.
Upgrades from other versions not listed above may require a two-hop upgrade.

For detailed instructions, visit our upgrade documentation , or contact your Cloudera account team to discuss your upgrade path.

Key Features

The Cloudera Runtime 7.3.1 release also includes the following key features and innovations:

Write once, deploy anywhere: Develop workflows once and execute them across hybrid environments without refactoring.
ARM CPU support with AWS Graviton: Improved performance, energy efficiency, and cost savings for data processing workloads with enhanced Spark job performance.
Apache Iceberg + Hive Integration: Advanced analytics with features like time travel, schema evolution, and efficient metadata queries for hybrid data workflows.
Enhanced security: Strengthened safeguards to maintain data integrity and ensure compliance across hybrid deployments.

Learn More

Discover the full potential of our Unified Runtime by checking out the true hybrid checklist, or experience the latest features firsthand with our free 5-day trial. For a deeper dive into how this release can transform your hybrid data strategy, reach out to your Cloudera representative or check out our technical resources.

Clouderans Reflect on Women’s History Month

Debbie Kruger — Mon, 31 Mar 2025 13:00:00 UTC

“Women’s History Month and International Women's Day strike a deep chord with me as they highlight how urgent it is to achieve true diversity and inclusion. It’s a time to celebrate female achievements but also remind everyone of the value and distinct capabilities each person contributes. Throughout my life, women have taught me to be strong, given me the confidence to speak up, and the power to believe in myself. The people who have inspired me are those that value competencies and expertise regardless of background.”

- Sara Lewis, Director of Communications, EMEA

“This month is an opportunity for all of us to reflect on how we can accelerate action toward a more inclusive and equitable workplace. It serves as an important reminder that representation matters—when women see other women leading, innovating, and breaking barriers, it fuels the next generation. A good friend and mentor once taught me that leadership isn’t about having all the answers, but about empowering others to find them. That perspective reshaped how I support my team and show up as a leader.”

- Laura Hughes, Manager - R&D, EMEA

Women’s History Month is nearing its end. But the lessons learned during this time don’t stop when the calendar flips to April. The month serves as an important checkpoint for us all to reflect on the countless accomplishments and achievements women have pioneered, and the work that remains to be done in reaching an equitable experience for everyone.

As we wrap up a busy few weeks of celebration, with Clouderans reflecting on their own experiences, engaging in conversations, and hearing from leaders, we spoke with several individuals to learn more about why this month holds such significance for them. Here’s what they had to say.

“It’s a dual emotion—on the one hand it’s wonderful to spotlight equality, women who have mentored me, women who inspire me, women who fought for the rights and privileges I enjoy today. But on the other hand, I wish we didn’t need initiatives like these. This month is a great reminder for me to step up and do all I can to support the women in my life. My first manager did so much for me, training me in a trade, giving me the opportunity to rise from her assistant—sweeping hair and shampooing customers—to eventually managing the 30-stylist salon.”

- Amanda Allan, Manager, Cloudera Executive Briefing Program

“Women’s History Month and International Women’s Day are powerful reminders of the resilience, creativity, and impact of women across the world.To me, they’re about honoring the trailblazers—those who’ve shattered ceilings, defied norms, and built foundations for progress, often against steep odds. In a way, every woman who’s dared to question, invent, or lead has indirectly “mentored” me by fueling the knowledge I draw from.”

- Puja Singh, Staff Program Manager, US Security and Compliance, AMER

“Women's History Month is a powerful celebration of the incredible legacies of women who have made unparalleled contributions across all fields. It’s a time to reflect on the achievements of women like me—a mother, an author, an entrepreneur, and an athlete. For me, these celebrations are not just about looking back at history, but also about amplifying the voices of women today. The women in my life have been a constant source of inspiration and mentorship. I've gained so much from their strength, resilience, and creativity. Whether through their words of wisdom or by observing how they navigate life's challenges, they've taught me to approach every situation with courage and perseverance.”

- Jasmine Fambro, Global Program Manager - Enablement Operations, AMER

“International Women's Day, to me, is a reminder of our continuous efforts toward building a more equitable world, where everyone has the opportunity to thrive regardless of gender. I've been fortunate to have great friends who embraced my imperfections and offered unwavering support. They never tried to change me but instead motivated me to work harder, helping me become a better, more refined version of myself. Together, we continue to grow, supporting and uplifting each other as true allies on this journey.”

- Glitty Jacob, Staff Software Engineer - ML, APAC

Reflecting on a Month of Learning and Recognition

These are just a few examples of some of the incredible women that make Cloudera such a special place to work. At Cloudera, we are committed to fostering a work environment where women are supported and empowered to thrive in their careers.

Whether it’s through the Women Leaders in Technology, Women+ ERG, or celebrating important milestones like International Women’s Day and Women’s History Month, ensuring we are taking the time to recognize every Clouderan is critical to our success.

Discover about how Cloudera is accelerating action and growing its commitment to supporting every Clouderan in their own career journeys. To learn more about Cloudera’s Women Leaders in Technology, visit here.

“For me, Women's History Month and International Women’s Day are about celebrating the impact women have made across every field. I’ve been fortunate to work alongside and learn from talented women who continue to drive innovation and leadership in the technology space and have benefited from strong female mentors and role models throughout my journey, each shaping my career in meaningful ways. My mom was my first inspiration. She encouraged me to pursue engineering and computer science, always believing in my ability to take on new challenges and excel.”

- Praveena Ram, Senior Product Manager, AMER

“For me, IWD is a celebration of women for the different roles we play. We don’t just give birth to life; we shape the future with resilience and strength.

I have been deeply inspired by the women in my life. Starting with my mother, who held our family together with so much grace. She spent most of her life away from her own family, raising both of us without any support, and she did it all selflessly, never expecting anything in return. We often think being a homemaker is the easiest job, even a luxury. But in reality, it’s one of the toughest, and we (including myself) rarely acknowledge the effort women put in every single day.

My ex-boss & mentor, Rani Belliappa - She has shown me what it means to never give up, to focus on the solution and not ponder over the problem. The way she multitasks so effortlessly—balancing both professional and personal commitments—is something I admire. No matter how much she has on her plate, she always makes time for what truly matters.

My sister, Anju Mohan who manages a full-time career while raising a four-year-old son, never compromising on her own well-being—her dedication to fitness even earned her a title of the top 10-fittest woman in India in CrossFit.”

- Asha Mohan Chandran, Learning and Enrichment Partner, APAC

Cloudera Enables Telecommunications Service Providers to Modernize their Data Architecture

Jeremiah Morrow — Thu, 27 Mar 2025 13:00:00 UTC

In February 2025, TM Forum, the leading global industry association focused on digital transformation in the telecommunications industry, hosted its Accelerate event in Lisbon, Portugal. This event brings together telecommunications service providers and technology partners to address challenges within the association’s three pillars: autonomous network operations, composable IT and ecosystems, and AI and data innovation.

Of note was discussion around the optimal modern data architecture for telecom companies. Data architecture impacts virtually every aspect of digital transformation, from embedding AI into network operations, to optimizing service delivery and reducing costs, to delivering innovative and differentiated customer experiences. Therefore, it’s critical for telecom companies to be strategic and intentional as they build a data architecture that will help them achieve their digital transformation initiatives.

Key Components of a Modern Data Architecture

Modern data architectures should enable telecom operators to manage massive volumes of data across distributed environments and make decisions faster than ever, often without human intervention. To achieve this, data architectures must support:

Streaming data processing: Batch ingestion, processing, and analysis are no longer sufficient, especially for use cases like network telemetry analysis. Telecom companies must be able to collect, process, and analyze data in true real-time and as close to the source as possible.
Hybrid cloud deployments: A modern data architecture should provide a consistent platform for data management, analytics, and AI across on-premises and cloud environments.
Distributed data fabric: For a variety of reasons, distributed data stores are necessary for telecom operators. While distributed data is inevitable, modern data architectures must break down silos and enable secure, real-time access to all of an organization’s data.
Integrated data governance: Unified security and governance across hybrid data architectures ensures compliance and trust in data while enabling self-service access for AI and analytics.
Open source and open APIs: Closed, proprietary systems ultimately increase costs and limit innovation. Conversely, data architectures built on open standards, processes, and technologies enable flexibility, agility, and the freedom to choose the best tool or execution engine for every data workload.

Benefits of Adopting a Modern Data Architecture

Data platforms have evolved significantly in recent years: what was once primarily a storage solution that provided reporting and dashboarding is now foundational to business strategy and operations. Specifically, the data and insights derived from your data platform’s analytics and AI capabilities are central to business planning and to efficient and effective OSS and BSS operations.

By implementing a modern data architecture, telecom companies can advance their organizational strategies and realize significant benefits, including:

Operational efficiency and cost reduction: By leveraging data to optimize network performance, reduce downtime, and enhance capacity planning, telecom companies can optimize service delivery, improve customer satisfaction, and ultimately reduce their costs.
AI-driven automation: AI-powered services and applications can automate network operations, mitigate the impact of unplanned downtime (via predictive maintenance), and facilitate personalized customer interactions, ultimately reducing manual effort and enhancing service quality.
Data-driven innovation and growth: Telecom operators looking for growth opportunities can leverage a wealth of customer data to deliver B2C and B2B services and applications beyond simply providing the network and infrastructure. The key to innovative, scalable growth is to harness the data.

Cloudera Enables the Modern Data Architecture for Telecom Companies

Cloudera is one of the most prominent industry thought leaders in open data standards, processes, and technologies. We’re a leading member of the TM Forum in data and AI, a signatory of the Open Data Architecture (ODA) Manifesto, chair of the Modern Data Architecture Collaboration Group, and a catalyst leader and contributor to numerous other projects including data governance and AI governance. Additionally, Cloudera supports the ODA Component Directory through fully certified applications with partners like Amdocs, Nokia, Mobileum, and Subex, developing and adding new data platform components that have not been included explicitly in the past.

Cloudera provides all of the essential components of a modern data architecture:

We offer a single, open, and extensible platform that telecoms can deploy anywhere—on premises, in multiple clouds, and in hybrid environments.
Our platform enables telecoms to capture, process, and analyze data in real-time, secure and govern their data in motion and at rest, and leverage that data for analytics and AI applications. Through these capabilities, we help telecom operators streamline operations, improve the customer experience, and reduce costs.
Cloudera is built on open standards and technologies, so telecom companies can leverage the best tool for every job and adopt the next wave of innovation. Data is easy to discover and access for analytics and AI while remaining secure, governed, and trusted.

Telecom operators looking for a carrier-grade, edge-to-AI solution for their data architecture should implement Cloudera to streamline operations, pursue growth, and reduce costs.

Connect with us at DTW Ignite

Cloudera will be at DTW Ignite in Copenhagen from June 17-19 2025. We would love to meet with you and discuss where we fit in your plans to adopt a modern data architecture. Schedule a meeting with our team onsite.

Cloudera Partner of the Year Awards for AWS, Mission Cloud, PUEDATA, Carahsoft, and More

Michelle Hoover — Wed, 19 Mar 2025 18:01:00 UTC

Global and Regional Awards Recognize Outstanding Partnerships

We’re thrilled to recognize the outstanding achievements, innovation, and dedication of our partners from around the world with our 2025 Cloudera Partner of the Year Awards.

The announcement was made at IMPACT26 - Cloudera's Partner Town Hall, held on March 18 and 19.The recipients were selected based on their exceptional performance across key criteria, including revenue impact, new customer acquisitions, strategic innovation, and commitment to driving business growth. These partners exemplify excellence in technology expertise, solution delivery, and collaborative go-to-market execution, positioning them as true leaders in the Cloudera ecosystem.

This distinguished list of partners drive success, not just for Cloudera, but for our customers and the broader Data & AI community.

Global Partner Award Recipients

Cloud Partner of the Year: Amazon Web Services (AWS)

AWS, the leading on-demand cloud computing platform, delivers outstanding performance and remarkable growth. Its go-to-market approach drives significant new business opportunities, while its efforts have not only broadened Cloudera’s global market reach but also positioned us for sustained success in 2025 and beyond.

Public Sector Partner of the Year: Carahsoft

Carahsoft is a leading provider of IT solutions to the U.S. Government. It supports Cloudera Government Solutions with dedicated teams across sales, marketing, and contract management, and has been responsible for significantly expanding Cloudera’s market reach.

Cloudera Solution Partner of the Year: Fluxraum

Fluxraum, an IT services company specializing in data platforms, worked with Cloudera to build a central Data & AI platform for one of the world's most prestigious automotive brands. This software enables hundreds of AI use cases such as predictive maintenance and connected cars.

The Fluxraum platform is projected to save the company 200 million euros annually by 2030.

Emerging Partner of the Year: Mission Cloud

Mission Cloud, A CDW Company, made a remarkable entrance into the AMER sales motion in early 2024, bringing innovative solutions to the forefront. It spearheaded Generative AI Pilots, accelerated AWS migrations, and made significant investments in building advanced machine learning models—all while playing a key role in closing a major Managed Services deal. Its rapid growth and strategic service delivery make it a standout partner.

Impact Partner of the Year: PUEDATA

This is our flagship award, and recognizes a partner whose commitment and expertise have truly made an IMPACT! PUEDATA is a long standing partner that has deep expertise in Cloudera Data Services and relentless focus on customer acquisition in new markets, driving significant growth and impact in critical key markets in EMEA and the Middle East.

Kolon Benit, SVA, and VERT Analytics recognized on a regional level

APAC Partner of the Year: Kolon Benit

Kolon Benit, an IT solutions provider, demonstrated stellar performance in New and Expansion deals and is marking significant year-over-year growth. It established a key digital transformation within a major Korean Manufacturing company and led an impressive Data Warehouse replacement project for a large financial customer. The company’s strategic achievements and projected growth potential in this next year exemplify the gold standard of partnership.

EMEA Partner of the Year: SVA

System Vertrieb Alexander (SVA), a leading systems integrator, co-sold a monumental Public Sector deal with IBM, This Public Sector customer supports over 200 government agencies resulting in endless opportunities for long-term growth. Its strategic partnership has significantly strengthened Cloudera’s brand and presence in Germany’s Public Sector.

AMER Partner of the Year: VERT Analytics

VERT Analytics, a solutions and services provider specializing in data and analytics, has exemplified excellence in solutions expertise, technology, and execution and has showcased remarkable growth and leadership. With strong future projections, this partner is poised to unlock new market opportunities and drive sustained long-term growth.

The Power of Collaboration and Innovation

A strong partner ecosystem is built on collaboration, innovation, and shared success. That’s why our Partner Town Hall series and awards program are important components of Cloudera Partner Network (CPN). Since launching in November 2022, CPN has deepened collaboration, fostered solution-building, and empowered partners to guide their customers in adopting modern data strategies on the Cloudera hybrid platform.

And Cloudera’s commitment to investing in partnerships remains strong as ever. Over the past year, we have experienced remarkable growth across our ecosystem. Our competency-based, points-driven partner program enables partners to enhance their expertise, differentiate themselves in the market, and collaboratively deliver tangible customer outcomes.

Congratulations to each of this year’s Partner of the Year award recipients, their dedication, innovation, and partnership are what drive Cloudera forward, enabling us to deliver unparalleled value to our customers worldwide.

Learn more about the Cloudera Partner Network, and sign up to become a Cloudera partner today.

Cloudera Joins NVIDIA GTC as a First-Time Sponsor – Powering Innovation Together

Dennis Duckworth,Michael Garrison — Tue, 18 Mar 2025 20:01:00 UTC

Last October, Cloudera introduced the Cloudera AI Inference service powered by NVIDIA NIM microservices, marking a significant milestone in our partnership with NVIDIA. Now, we are excited to announce that we will be a sponsor at the NVIDIA GTC event this year from March 17 -21, in San Jose, CA. This marks Cloudera’s first time sponsoring NVIDIA GTC, highlighting our expanding partnership with NVIDIA.

Engage with Industry Thought Leaders

Robert Hryniewicz, Director of AI Product Marketing, Cloudera along with members of NVIDIA’s product teams will be hosting daily Q&A sessions at Cloudera’s booth (all times PST) :

Tuesday: 3:00 - 4:00 PM with Rohit Taneja, NVIDIA Product Management Leader
Wednesday: 3:00- 4:00 PM with Judy Lee, NVIDIA Senior Manager, Product Marketing
Thursday: 12:00-1:00 PM with Nick Reamaroon, PhD, NVIDIA Senior Solutions Architect

Robert will also be leading an informative session, “Accelerate AI Innovation with Cloudera’s AI Studios” on Thursday between 1:40 and 1:55 PM PST at ESJCC Hall Theater 2.

Connect with Us!

Visit us at booth #2303 to discover how our partnership with NVIDIA is empowering businesses to harness AI. Cloudera’s product experts will be on hand to answer your questions and showcase live demos of Cloudera AI services.

Check out our On-Demand Session

When you have time, be sure to watch the “Scaling Generative AI with Cloudera and NVIDIA (S74326)” on-demand session featuring Peter Ableda (Director of AI Product Management, Cloudera) and Judy Lee (Senior Manager, Product Marketing, NVIDIA) listed in the event catalogue.

We look forward to seeing you there.

Generative AI needs to become private to thrive - introducing Private AI

Priyank Patel,Peter Ableda,Jeff Healey,Christopher Van Dyke — Tue, 18 Mar 2025 20:00:00 UTC

By anchoring our strategy in open-source agility, enterprise-grade security, and a relentless focus on private AI, we empower organizations to break free from yesterday’s compromises. The AI revolution isn’t coming—it’s here. And with Cloudera, it’s yours to shape on your terms.

Take the Next Step

Ready to build AI on your own terms? Learn more about Cloudera’s Private AI capabilities and see how enterprises are putting them into action by taking advantage of our FREE 5-day trial.

3 Years and 25 Exabytes of Data

Cloudera is used by the largest global companies across industries—including healthcare, life sciences, financial services, manufacturing, and high tech—to collectively manage over 25 exabytes of data and drive real-time insights via AI and analytics. We’ve helped and learned from hundreds of enterprises using large language models (LLMs) to build applications, assistants, and agents that power productivity and transform processes. Over the last three years, open and closed models have thrived and application architectures have evolved from RAG to agentic. However, one theme remains consistent: combining proprietary enterprise data and context with generative AI models.

What is Private AI?

Private AI refers to the ability to use all your proprietary data to build and run AI models, applications and agents—whether you use public cloud or on-premises infrastructure—without data or insight being shared outside of your organization in any way.

When AI is built privately, all your training data, configurations, and resulting fine-tuned models are kept within your security perimeter, ensuring every step of model creation remains entirely under your control.
When AI is run privately, all your model endpoints are within your security perimeter, so all of your prompts and context sent to models and the responses received stay within your organization’s environment.

Simply put, Private AI is about taking the tremendous innovation in AI while guaranteeing zero external exposure of sensitive data.

What are the key tenets of Private AI?

A Private AI platform must be built on an open foundation, have the ability to leverage public and on- premises infrastructure alike, and have seamless integration across the data and AI life cycle. Let’s look at what this means:

1. Open Source is not just a Philosophy—It’s the Foundation of Private AI

The momentum behind open-source AI is undeniable. Early models like BLOOM and Falcon proved that open-source AI had the potential to compete with proprietary alternatives in both scale and capability. This paved the way for models like Llama, bringing state-of-the-art language AI within reach of businesses eager to tailor solutions to their unique needs. Today, advancements like DeepSeek continue to push boundaries in code generation, reasoning, and operational efficiency. Yet this is just the beginning. The open-source community thrives on iteration, and tomorrow’s models will raise the bar again, becoming smaller, faster, and more specialized.

Open source is foundational at Cloudera: Cloudera AI is designed to embrace this continuous wave of innovation. We empower customers to adopt any open-source model, whether it’s an early model like Bloom and Falcon, a versatile workhorse like Llama, or cutting-edge reasoning model like DeepSeek. Our platform enables seamless transitions between AI model generations, eliminating the need for costly infrastructure overhauls. Today, our customers are evolving their workflows on Cloudera, from leveraging text-generation models for summarization to harnessing advanced reasoning capabilities for mission-critical challenges like code optimization and decision automation. With the same platform, they’re laying the groundwork for tomorrow’s multimodal AI, where models unify text, data, and visual inputs to solve complex problems that once required siloed tools and teams.

This agility isn’t incidental, it’s intentional. By supporting every phase of the open-source journey, we enable users to turn disruption into opportunity, giving customers the freedom to experiment, scale, and future-proof their AI investments without compromise.

2. Bringing AI Compute to Data

AI’s transformative potential depends on a simple principle: models are only as powerful as the data that fuels them. When data and AI systems operate in isolation, challenges arise. Data stored in disconnected systems becomes difficult to access, leading to delays in insights, fragile pipelines, and models that lack the real-time context needed for accurate decisions. Moving data between fragmented tools also increases risk, compromising security and compliance.

At Cloudera, we unify data and AI into a single, cohesive lifecycle. The Cloudera platform and Cloudera AI services are built to work as one integrated system, where data flows seamlessly into AI workflows—governed, secure, and optimized for performance. Shared metadata, security policies, and compute resources eliminate costly data duplication and movement. Every prediction is traceable back to its origin, ensuring transparency and trust.

This integration is core to Cloudera’s design. By unifying the data and AI lifecycle, models stay updated with the latest information while respecting strict access controls and audit requirements. Organizations shift from experimenting with AI to deploying it at scale, turning raw data into actionable results. The outcome? AI that delivers real-world impact—accelerating innovation without sacrificing security, speed, or governance.

3. Private AI, Even in the Public Cloud

Early AI adoption was defined by limitations. Organizations restricted their AI usage to non-sensitive datasets—drafting generic content, analyzing public trends, or automating routine tasks—because moving proprietary data outside their environments posed unacceptable risks. They still do. As a result, mission-critical workflows remained untouched: financial institutions couldn’t safely analyze transaction logs, healthcare providers avoided patient record insights, and manufacturers hesitated to optimize operations with proprietary sensor data.

As the only true hybrid platform for data and AI, Cloudera redefines what’s possible—our customers can run the same AI workload on every cloud and data center, all within their virtual company firewall. With Private AI, enterprises deploy models like Llama3 and DeepSeek directly within their existing data environments—whether in a data center or in a secured AWS or Microsoft Azure cloud, or a hybrid architecture. When all the enterprise data can be used with AI, our customers evolve from basic tasks like generating reports to solving mission-critical challenges—analyzing proprietary sensor data to optimize operations, detecting anomalies in real-time transaction logs, or personalizing customer interactions—all governed by their encryption, access policies, and compliance guardrails.

This is AI without asterisks: your data stays yours, models adapt to your infrastructure, and innovation aligns with your risk tolerance. Cloudera ensures privacy isn’t a constraint but the foundation for AI that transforms every corner of your business—securely, seamlessly, and on your terms.

Cloudera AI: Build and Run Private AI

We built Cloudera AI to ensure you don’t need to choose between innovation and control–instead you can wield both. Our services enable users to build and run AI privately:

Cloudera AI Workbench accelerates AI development with a flexible, low-to-high-code platform for building and fine-tuning models, creating applications, and agents—by leveraging private data to transform ideas into production-ready solutions faster.
Cloudera AI Inference is our production-ready service for deploying AI models, applications, and agents at enterprise scale. Its natively integrated, optimized model microservices accelerate inference speeds by 36x on the Cloudera platform, enabling responsive, high-performance AI operations with predictable total cost of ownership (TCO)—so you never have to choose between speed, scale, or cost.
Cloudera AI Registry acts as the central hub for your end-to-end AI lifecycle—bridging model development and operations. With access to hundreds of optimized models from open-source communities and our AI partners, it ensures cutting-edge advancements are always readily available, easily integrated, and adaptable to your ongoing AI initiatives.

Celebrating International Women’s Day at Cloudera

Debbie Kruger — Mon, 17 Mar 2025 13:00:00 UTC

International Women’s Day is more than a celebration – it’s a powerful reminder of the voices, achievements and experiences that shape our world. To honor the day, Cloudera’s Women Leaders in Technology (WLIT) hosted a panel discussion, moderated by Chief Marketing Officer Mary Wells, aimed at uncovering insights, career inspirations, and practical takeaways for driving inclusive innovation.

This year, the theme of International Women’s Day was ‘Accelerating Action’, and we were fortunate to be joined by two leaders who embody that sentiment—Cheryl Kiser, Founding Executive Director, The Institute for Social Innovation and Executive Fellow in Social Innovation at Babson College, and Sandi Peterson, Operating Partner at Clayton Dubilier & Rice (CD&R).

From skills and career learnings to overcoming common biases, our panelists covered the ins and outs of what it means to be a confident disrupter in the workplace and lead with authenticity. Here are a few of the highlights.

Building the Skills to Thrive in the Workplace

So, what does it take to chart a successful path as a woman in the workplace? What kind of skills should you focus on to grow and advance in your career?

Reflecting on her own journey, Cheryl noted that taking on roles that are challenging is an important way to grow, both as an individual and as a professional. Stepping into unfamiliar positions, she explained, often means taking on a set of responsibilities in an area you may know little about. That’s where, according to Cheryl, the most important skill comes in: relationship building. It’s essential to connect with and learn from those around you—people who can provide greater insight or understanding into the work in front of you.

For Sandi, there was also an emphasis on embracing challenges that may be outside of your known experiences. She shared three skills that are critical to finding success when navigating uncharted waters: listening intently, being decisive at the right moment, and having a willingness to take risks. As she put it, “There are always going to be people out there who know more than you do or who can do a certain job better than you can, but building relationships and trusted teams, and listening to those individuals, gives you a new opportunity to conquer challenges even quicker.”

Dealing With the Times Where Things Don’t Go as Planned

As much as we try to avoid it, failure is inevitable. Sandi shared that throughout her career, plenty of things didn’t go as planned. What matters most, she explained, is having a clear understanding of the need you’re trying to address—and then taking action. “Just do it – do it, learn from it, and iterate.”

Even when things go wrong, it’s important to remember: it’s okay for a project to fail. We all make mistakes. But getting lost in the ‘what-ifs’ or worrying about what could go wrong doesn’t help anyone move forward. Instead, learning from those experiences and iterating on them lays the foundation for future success.

Cheryl echoed this sentiment, adding how she reframes the idea of risk by thinking of it as an ‘affordable loss.’ Mistakes, she emphasized, are often the only real way to learn. Rather than focusing on the risk itself, she encourages asking, “What am I willing to lose to take that risk and move along?” As Cheryl put it, the key to success lies in falling in love with the problem, not the solution.

Building Credibility and Addressing Bias

We all want to step into a new role and hit the ground running. But building credibility among peers can be a daunting task, especially when you’re new to an organization or earlier in your career. Demonstrating competence—without coming across as overconfident or forceful—can be a delicate balance to strike.

Sandi’s advice: don’t feel like you need to have all the answers on day one, or deliver immediate results in your first week. Instead, focus on being engaged, curious, and genuinely interested in the work you’re doing. Taking the time to build relationships early on pays dividends as you take on new projects and collaborate with different teams.

Of course, building credibility doesn’t come without its challenges. Many people encounter workplace dynamics where it can be difficult to find the right tone—whether it’s the risk of seeming too agreeable or, conversely, too assertive in meetings and discussions. Both Cheryl and Sandi emphasize the importance of creating a culture of mutual support. When you’re in a room with others, take the opportunity to support and elevate your peers. Be authentic, and when you contribute ideas or offer feedback, focus on being inclusive and collaborative. Sometimes, it’s as simple as acknowledging another person’s point or giving credit where it’s due—small actions that go a long way in building trust and credibility.

A Day of Learning and Accelerating Action

This only scratches the surface of the experiences and learnings our panelists and attendees covered over the course of the event. International Women’s Day is a great touch point for us all to look at how we approach difficult challenges in the workplace, whether that’s tackling biases or overcoming fear of failure. The theme of International Women’s Day this year centers around ‘Accelerating Action’. The power of taking informed action, surrounded by trusted and diverse voices, is something our panelists emphasized throughout each story.

Follow and join Cloudera’s Women Leaders in Technology group, get involved, and make sure not to miss out on future events.

Embrace a Hybrid Data Platform for DORA Compliance

Jeremiah Morrow — Thu, 06 Mar 2025 14:00:00 UTC

Cybersecurity is a top priority across industries, but no sector has more to lose from a successful cyberattack than financial institutions. According to the International Monetary Fund’s April 2024 Global Financial Stability Report, nearly one-fifth of all cyberattacks target financial institutions, costing firms as much as $2.5 billion. These institutions are increasingly more susceptible to cybercrime as a result of their digital transformation initiatives, which often introduce complexity, risk, and new vulnerabilities that chief information security officers (CISOs) must account for.

Recognizing the significance of these targeted attacks, the European Union introduced the Digital Operational Resilience Act (DORA), a framework that standardizes risk management and operational resilience processes across the financial sector.

Over the last few years, multinational firms doing business in Europe have been preparing for DORA to go into effect, and as of January 17, 2025, financial services institutions must now demonstrate progress toward compliance.

An organization’s data, analytics, and AI platform is a critical component of both its digital transformation strategy and its ability to demonstrate DORA compliance. Maintaining data security, governance, and resilience across the entire data estate is paramount, and only a true hybrid platform can provide this degree of coverage.

Let’s take a closer look at DORA, its impact, and how a true hybrid platform for data, analytics, and AI creates a flexible, secure, and resilient solution and allows financial institutions to meet DORA compliance requirements.

Understanding DORA and Its Impact

DORA is a regulatory framework designed to strengthen the operational resilience of financial institutions and their technology service providers. It mandates comprehensive risk management, incident reporting, resilience testing, and third-party oversight. Specifically, DORA requires that financial institutions:

Implement systematic risk assessment to identify and mitigate cyber threats.
Establish robust data governance policies to safeguard financial systems.
Conduct continuous resilience testing to ensure operational continuity.
Manage third-party risks, particularly those associated with cloud providers and technology and service providers.
Facilitate cyber threat intelligence sharing in compliance with GDPR and other data privacy laws.

Non-compliance comes with serious consequences, including steep fines and penalties.

The Case for Hybrid Platforms in DORA Compliance

Hybrid and multi-cloud environments are common in financial services. For many organizations, distributed architectures evolve by accident, built in an ad-hoc manner as firms respond to the growing volume, variety, and velocity of data by continually adopting new methods of storing and analyzing it.

While accidental hybrid and multi-cloud environments often pose governance and security risks, an intentional, true hybrid architecture delivers enhanced flexibility and resilience while protecting data wherever it resides. The following criteria differentiate a true hybrid strategy from accidental hybrid architectures:

1. Hybrid and multi-cloud flexibility. As Flexera’s State of the Cloud Report continues to reinforce, most organizations now manage hybrid and multi-cloud environments, with critical sources of data residing everywhere. A true hybrid platform provides a consistent data management, analytics, and AI experience across environments, including on-premises data centers.

2. Consistent data security and governance. DORA mandates strict data access controls and security measures. Intentional and true hybrid architectures provide unified security and governance across on-premises and multi-cloud environments, with automated policy enforcement and comprehensive encryption and access controls for data in motion and at rest.

3. Comprehensive data lifecycle management. DORA requires financial institutions to maintain full visibility over their data, from collection to storage, analysis, and deletion. Ideally, a hybrid data platform provides integrated data services addressing the full data lifecycle.

4. Hybrid portability. Resilience is a core pillar of DORA. Hybrid portability gives data teams the flexibility to lift and shift workloads between environments and run them seamlessly without refactoring or redevelopment. This capability supports operational resilience and business continuity, enables deployment flexibility, and reduces costs.

Cloudera meets all four criteria, and is the only true hybrid platform for data, analytics, and AI. It’s deployable and portable across hybrid and multi-cloud environments and provides consistent data services and unified security and governance across the entire data lifecycle. The platform enables financial institutions to meet DORA requirements while delivering real-time and predictive customer and operational insights.

The Time to Act is Now

DORA is now in effect, and organizations have no time to waste in demonstrating progress towards compliance. The financial sector’s increasing reliance on digital technology requires a proactive approach to cybersecurity, risk management, and governance. A unified hybrid data platform is not just a solution for compliance—it is a strategic enabler for resilience, security, and future-proof operations.

As financial institutions embrace DORA’s mandates, those that invest in hybrid platforms will be best positioned to thrive in an era of heightened regulatory scrutiny and evolving cyber threats. The time to act is now. By adopting a secure, portable, and resilient hybrid data strategy, financial institutions can turn compliance into a competitive advantage while safeguarding their operations against emerging risks.

Learn more about how Cloudera can help financial services firms stay compliant.

The Future: AI Agents in Life and Business

Cloudera — Wed, 05 Mar 2025 14:00:00 UTC

AI is evolving from simple automations to cognitive agents that can perform specific tasks without human intervention, unleashing unprecedented levels of agility and performance. This technology is poised to take AI to the next level and change organizations—and life as we know it—forever. To explore that future, Mike Walsh, CEO of Tomorrow, best-selling author, and host of the Between Worlds podcast, joined The AI Forecast podcast this month. Mike talks us through AI agents from the lens of the fifth industrial revolution – which he believes stems from AI.

Here are some takeaways from Paul and Mike’s conversation.

The pandemic kickstarted the fifth industrial revolution

Paul: What's the big idea of the fifth industrial revolution and what do you see as the opportunity for our business audience?

Mike: We probably weren't on track to having a fifth industrial revolution until probably the 2030s, but then something we didn't expect happened, which was COVID-19. The world shut down. People made massive investments in digital transformation in cloud, in data infrastructure, which is appropriate for today's discussion in robotic process automation. Unknowingly, what we did was not just discover remote work in the background, we accelerated the forces that brought a fifth industrial revolution forward. And that is, I believe, powered by AI – particularly AI agents and eventually humanoid robotics. This is going to be basically a form of digital labor and the catalyst to transforming productivity in almost every industry.

The value of AI agents falls in two camps – back office and front office

Paul: Let’s dive into AI agents. Where do you think it’ll have the most impact?

Mike: Let's define what an agent is because people have different views on this. In the simplest terms—OpenAI defined an AI agent as ‘a system for taking action’.

We're moving away from the idea that AI is going to be an answer machine for all the things that we want to know, to the idea that AI is the powering spirit or the animus of systems which can proactively act on data and information and take action in commercially valuable ways. These agents have commercial intent, and that's why I think people are so excited now because a year ago people were struggling to figure out the real ROI of some of these generative AI tools. But with AI agents, it's pretty clear exactly what they can and will do.

Paul: What are some examples where this technology is being applied today?

Mike: Think of the use cases in two buckets: front of house and back of house. The most obvious areas are in the back office – business workflows, call centers, processes, even the functions of entire departments. If you can describe it, you can automate it. But if you can ask for it, you can create some sort of cognitive tool that will really accelerate the decision-making process without the boundaries of departments.

The other side of the house is something we often forget about, but it's the thing that touches us all the time, which is the customer experience. And the truth is, we will all have our own personal agents very soon. They will shop on our behalf. They will sometimes sit in on calls on our behalf. They may even go on dates on our behalf to screen partners.

If you think about that, a lot of the traffic in the future on websites and digital channels won't even be human. It'll be our agents interacting on those platforms. And this is already creating problems because if you use OpenAI's operator, which is their agent tool, there are some sites that will block you. They realize it's non-human traffic. One of the other side projects that Sam Altman is invested in is called World. You get your eyeball scanned and as a reward, it gives you some cryptocurrency. It's clear that once your identity gets locked onto the blockchain, it'll be tied eventually to a personal AI agent so that you can actually say that agent is now acting on your behalf. It's like a blue tick mark for that virtual extension of yourself. That is going to be a very real thing in the next 18 months as agents start to represent particular humans and can be authorized to make transactions and representations on their behalf.

Leaders shouldn’t be afraid of AI

Paul: You talk about the algorithmic leader, what does the future look like to the algorithmic leader? What are some of the skills they need?

Mike: To be an algorithmic leader in this new era means two things for me. It means a deep understanding of human complexity, understanding what motivates people, what drives people, what's a great customer experience – these analog qualities. But it also means thinking computationally, which is not only to know how to apply technology to a decision, but to be able to break a problem down into smaller pieces and approach the problem strategically like a poker player would. When you do that, the decision whether you get it right or wrong is not as important as the process by which you approach the decision.

That is probably the single biggest insight for any leader in this new world: it doesn't matter whether or not you've got the decision right or wrong. What matters is whether you, over time, personally (but also at scale with you and your team or your organization) are building a better system and model for evaluating and executing high-quality decisions. If you take that frame of reference, AI becomes just another tool in your kit for how you improve that decision making environment.

For the full conversation, listen to Mike’s episode on The AI Forecast here.

Cloudera and NiFi: Driving Data Ingestion and Processing Excellence

Matt Burgess — Wed, 26 Feb 2025 14:00:00 UTC

Empowering Data-Driven Organizations with Cloudera Flow Management 4 (powered by Apache NiFi 2.0)

Apache NiFi has long been a cornerstone for data engineering, providing a powerful and flexible framework for data ingestion, transformation, and distribution. As a leading contributor to NiFi, Cloudera has been instrumental in driving its evolution and adoption. With the recent release of Cloudera Flow Management 4.0 in Technical Preview as the first NiFi 2.0-based Cloudera Flow Management release, we are excited to showcase the enhanced capabilities and how Cloudera continues to lead the way in data flow management.

The Value of NiFi 2.0 and Cloudera Flow Management 4.0

Cloudera Flow Management 4.0 (powered by Apache NiFi 2.0) introduces significant improvements, including:

Enhanced Performance: NiFi 2.0 boasts significant performance enhancements, handling data flows more efficiently and scaling to larger workloads. These enhancements give users more power and reliability to ingest, process, and distribute larger and more complex data sets.
Streamlined Development: The new flow canvas interface and improved drag-and-drop functionality make flow development faster and more intuitive. This significantly decreases flow development time, leading to cost savings.
Advanced Security: NiFi 2.0 introduces enhanced security features, including improved encryption and authentication mechanisms. This provides more confidence in a secure and reliable system for processing sensitive data.
Expanded Integrations: NiFi 2.0 seamlessly integrates with a wider range of data sources and systems, expanding its applicability across various use cases. Cloudera Flow Management 4.0 specifically retains components to support integrations to applications in Cloudera where many components such as Hive and Accumulo were removed in Apache NiFi 2.0. In addition, Cloudera Flow Management 4.0 includes new integrations such as Change Data Capture (CDC) capabilities for relational database systems as well as Iceberg. This allows users to design their own end-to-end systems using Cloudera applications as well as external systems .
Native Python Processor Development: NiFi 2.0 provides a Python SDK for which processors can be rapidly developed in Python and deployed in flows. Some common document parsing processors written in Python are included in the release. Cloudera Flow Management 4.0 specifically adds components for embedding data, ingesting into vector databases, prompting several GenAI systems and working with Large Language Models (LLMs) via Amazon Bedrock. This provides users with an impressive set of GenAI capabilities to empower their business cases.
Best Practices in Flow Design: NiFi 2.0 provides a rules engine for developing flow analysis rules that recommend and enforce best practices for flow design. Cloudera Flow Management 4.0 provides several Flow Analysis Rules for such aspects as thread management and recommended components. Cloudera Flow Management administrators can leverage these to ensure well-designed and robust flows for their use cases.

Cloudera and NiFi - Continued Support, Innovation, and Simplified Migration

Cloudera has been a driving force behind NiFi's development, actively contributing to its open-source community and providing expert guidance to users. Cloudera has invested heavily in NiFi, ensuring its continued evolution and relevance in the ever-changing data landscape.

Our commitment to NiFi is evident in our initiatives. We actively participate in the Apache NiFi community, sharing knowledge, best practices and supporting users through mailing lists, forums, and events. In addition to community contributions, Cloudera Flow Management Operator enables customers to deploy and manage NiFi clusters and NiFi Registry instances on Kubernetes application platforms. Cloudera Flow Management Operator simplifies data collection, transformation, and delivery across enterprises. Leveraging containerized infrastructure, the operator streamlines the orchestration of complex data flows.

Cloudera is the only provider with a Migration Tool that simplifies the complex and repetitive process of migrating Cloudera Flow Management flows from the NiFi 1 set of components to use the NiFi 2 set. To these ends, Cloudera provides comprehensive training and consulting services to help organizations leverage the full potential of NiFi.

Driving the Future of Data Flow Management

With Cloudera Flow Management 4.0.0 (powered by Apache NiFi 2.0), Cloudera fortifies its leadership in data flow management. We will continue to invest in NiFi's development, ensuring it remains a powerful and reliable tool for data engineers and data scientists. In addition, Cloudera provides cloud-based deployments of Cloudera Flow Management, optimizing your operational efficiency and allowing you to scale to the enterprise with confidence. Features enabling, integrating with, and enhancing your AI-based solutions are a central focus of Cloudera Flow Management. We also continue to provide support and guidance to our customers, helping them harness the full power of NiFi to drive business-critical data initiatives.

Learn More:

To explore the new capabilities of Cloudera Flow Management and discover how it can transform your data pipelines, learn more here:

Forget What You Know About Channel Marketing - Here’s How to Win with Partners in the Age of AI

Jerome Alexander — Tue, 25 Feb 2025 14:00:00 UTC

Technology partnerships and channel marketing have long been vital strategies for B2B organizations seeking scalable growth. However, in the age of artificial intelligence (AI), a seismic shift is transforming how businesses approach these tactics. Traditional methods of partner engagement, content creation, and campaign execution are being redefined as AI emerges as a powerful tool for driving efficiency, delivering insights, and enabling personalization like never before.

Are you ready to revolutionize how you engage partners, generate pipeline, and achieve strategic alignment? This blog breaks down how AI is changing the landscape of channel marketing, explores winning strategies for success, and shares actionable advice to ensure your organization thrives in this new era. Be sure to check out the PRO-TIPS to incorporate new tactics into your workflow.

The Shift in Channel Marketing: The Role of AI

While traditional channel marketing relies on static tactics and manual execution, AI introduces dynamic, data-driven approaches that elevate efficiency across the board. Businesses are now leveraging AI to simplify complex processes, gain predictive insights, and personalize partner engagement.

Let’s dive into some key areas where AI is reshaping channel marketing.

Automate Repetitive Tasks: AI can take charge of time-consuming tasks such as scheduling email campaigns, generating reports, and identifying high-potential accounts. Marketers can reclaim their time by reallocating energy to designing strategies, building stronger relationships between teams, and boosting overall productivity. Employ AI tools for tasks like data analysis or repetitive workflows, but maintain the human touch in communications and relationship building.

Predictive Analytics for Better Decision-Making: AI-driven tools like Salesforce, Tableau, and Demandbase streamline decision-making by utilizing predictive analytics to uncover opportunities within the sales pipeline. For example, they can spotlight which partner campaigns are likely to drive ROI, allowing businesses to distribute resources effectively alongside partners.

Pro-Tip: Focus on What Works, Quit Doing What Doesn’t. Leverage AI analytics to identify successful campaigns and strategies. Avoid wasting resources on tactics that fail to deliver measurable results.

Personalized Co-Marketing at Scale: AI makes it easier to create highly personalized, co-branded materials for partners. Tools like Anthropic Claude can draft tailored blog posts, landing pages, and video scripts while breaking down complex technical graphics for campaign use. Align partner messaging more effectively with target audiences while reinforcing a unified brand presence.

The key is prioritizing personalization by using AI tools to customize messaging and co-marketing materials based on partner-specific needs. Personalization strengthens connections and delivers better results.

Winning Strategies for Technology Partners

Success in channel marketing requires a shift in mindset paired with the use of AI tools. Technology partners can gain an edge with AI by revolutionizing processes around content creation, content distribution, and performance measurements. Let’s dive into what that looks like.

Create a Content Flywheel with Partners: Consistent partner engagement depends on offering value through content. AI can revolutionize how you plan, execute, and measure content campaigns.

Pro-Tip: Start with a single piece of content, like a white paper, and turn it into multiple demand generation assets. Here are some examples:

Blog Posts: Break down key sections into short, engaging blog articles.
Infographics: Turn data or insights into visually appealing infographics.
Social Posts: Pull key stats or quotes to create bite-sized content for LinkedIn.
E-books: Expand or combine white papers into a more comprehensive e-book.

Streamline Content Creation and Distribution: Marketers can streamline content creation with tools like Claude, Jasper, and Canva. These AI-powered platforms help you quickly produce partner-specific blogs, whitepapers, and playbooks that are perfectly tailored to your brand and partner needs. The goal is to keep it simple for partners. Use AI to produce co-marketing materials, but be sure to make the outputs clear, concise, and actionable so partners can use them with ease. Once your content is ready for showtime, AI platforms like Smartsheet can automate content distribution across multiple channels, ensuring your messaging reaches the right audience at the right time.

Measure Performance: As Cloudera CMO, Mary Wells, often says, “What gets measured, gets done.” I couldn’t agree more. But having the right tools to determine what to measure is just as essential. Platforms like Tableau and Salesforce provide AI-powered insights that illuminate which campaigns generate the most engagement, empowering teams to refine their strategies for even greater success.

Pro-Tip: Measure What Matters. If you're not measuring it, is it really worth doing? Focus on the metrics that drive ROI. Leverage AI-powered dashboards to track campaign performance and real-time partner engagement where it matters most.

Leverage Data-Driven Decision Making

AI-powered tools uncover insights that were once buried in complex datasets, but now these tools allow businesses to optimize campaigns continuously, align teams, and maximize ROI. Marketers can optimize campaigns with AI tools like DemandBase and Salesforce to provide real-time data on partner marketing performance. By identifying top-performing campaigns, marketers can replicate successful approaches and remove inefficiencies.

Pro-Tip: Regularly Review and Refine Processes. AI isn’t a one-time fix. Routinely evaluate AI-driven workflows to address inefficiencies and improve results.

Remember to share your findings using AI dashboards with both marketing and sales teams. Improving sales and marketing alignment improves performance and ensures partnership strategies are grounded in data and executed effectively. Then teams can share their research and newly found data-backed insights with partners to highlight opportunities for improvement and growth. Being transparent builds trust.

Optimize Partner Relationships with AI

By leveraging data-driven insights, AI helps identify opportunities, predict trends, and implement proactive strategies that strengthen collaboration and drive mutual success. This strengthens business partnerships by leveraging the power of AI-driven insights and tailored strategies. AI enables marketers to use data to identify key opportunities, predict partner needs, and address potential challenges before they arise. With AI, marketers can personalize communication, track performance metrics in real-time, and create collaborative plans that align with shared goals. By fostering mutual trust and efficiency, businesses can build stronger, more sustainable partnerships that drive long-term success for both parties.

Enable Your Partners to Succeed: Identifying gaps in training, enablement, and resources becomes easier with AI tools like HubSpot, which can recommend customized materials or highlight areas where partners need added resources. AI models can anticipate partner needs, whether for additional co-marketing resources or localized support in growth regions and produce actionable plans.

Pro-Tip: Provide Partners a Role in AI Implementation. Ask for their feedback on how AI can meet their needs, and integrate those suggestions into your strategy.

Take Partnerships to the Next Level with AI

AI has incredible potential for productivity and enhancing - not replacing - the work marketers need to get done. This helps them be more efficient in the areas of data analysis, content creation, and ultimately building stronger relationships. Learn more about Cloudera’s work with AI to take your operations to the next level.

The Strategic Importance of a Unified Data Platform

Adrian Castello — Mon, 24 Feb 2025 14:00:00 UTC

Today, organizations are on a continuous journey of transformation to enhance key areas such as: reducing the time to market for digital products, fostering operational excellence across all layers of their operating model, and maintaining a competitive edge. However, these advancements must not come at the expense of cost-efficiency or cybersecurity, particularly when handling sensitive organizational data.

As organizations move to public and private clouds to take advantage of their many operational benefits, they are increasingly shifting away from single-provider solutions. According to the latest Flexera State of the Cloud Report, 89% of organizations now use multi-cloud strategies. Many organizations also manage hybrid environments, which are a combination of cloud and on-premises data centers. Hybrid cloud gives organizations the ultimate flexibility by enabling them to take advantage of cloud innovation where possible while maintaining full control over business-critical areas, such as sensitive data, or choosing the best TCO infrastructure & operations.

From a data maturity perspective, organizations face many challenges, like industry-specific regulatory requirements, different Service-Level Agreements (SLAs) for delivery of insights, and different levels of technical skills and abilities. Despite these differences, most companies follow a well-established analytics lifecycle encompassing data ingestion, data engineering, analytics, visualization, and machine learning (ML) and artificial intelligence (AI) processes. However, the exponential growth of data, combined with the proliferation of tools designed to extract its value, have created significant hurdles. These include decreased productivity and profitability, which are driven by a shortage of talent and the maintenance overhead associated with managing complex, heterogeneous ecosystems.

To increase productivity and profitability organization-wide, teams need access to a seamless framework that combines cloud operations, analytics, and data science functions to promote interoperability, automation, security, and governance.

The strategic importance of a Cloud Center of Excellence

According to the Flexera report, most organizations (63%) have a Cloud Center of Excellence (CCoE) or plan to create one within the next year (14%). At the same time, 70% of large enterprises already have a CCoE, whereas only 29% of small and medium-sized businesses (SMBs) do. About 15% of enterprises expect to add a CCoE in the next twelve months, and 6% expect to add one beyond that time. Just over a quarter (26%) of SMBs are planning to have a CCoE in the future.

A cross-functional CCoE will guide your organization's teams and leadership, overcoming many of the challenges with cloud computing by providing reusable artifacts for other teams' benefits, reducing the cognitive load while accelerating cloud-native maturity, allowing teams to spend more time on business value. These include:

1. Federated Expertise: CCoE provides a central structure of the best experts from different teams and business units who are proficient in critical areas, not just in technologies. The CCoE members are leaders within their teams and areas of expertise, mastering communication, leadership, business acumen, and emotional intelligence.

The CCoE facilitates better decision-making and strategy formulation, architecting reusable assets to empower teams across the organization.

2. Governance and Compliance: CCoE establishes governance policies and ensures compliance with: regulatory standards, well-architected frameworks, security frameworks, and business-critical design principles. These policies enhance security and risk management across clouds and data operations efficiently.
In addition to these critical aspects, Cloudera provides out-of-the-box certification for the majority of security certifications, required by many industries to operate.

3. Cost Optimization: By optimizing cloud resource usage and implementing cost management strategies, CCoE helps reduce overall cloud expenditure, creating a FinOps strategy every team should follow.

4. Portfolio Hierarchy: A primary goal of the CCoE is to create reusable artifacts that provide self-service product blueprints for the organization. These products inherit best practices for cloud architecture, development, and operations, enhancing the efficiency and reliability of cloud-based services. One of the most effective approaches to achieving this is through the establishment of reference architectures.

5. Skills Development: CCoE supports the upskilling of employees by providing training and resources, fostering a culture of continuous learning and innovation.

6. Accelerated Cloud Adoption: With its guidance and support, CCoE accelerates the adoption of cloud technologies within the organization, enabling faster time to market and innovation.

7. Risk Mitigation: By conducting thorough assessments and implementing mitigation strategies, CCoE minimizes risks associated with cloud migration and operation.

8. Standardization: CCoE establishes standards and frameworks for cloud deployment and management, promoting consistency and interoperability across different cloud environments.

9. Vendor Management: CCoE manages relationships with cloud service providers, ensuring alignment with organizational objectives and negotiating favorable terms.

10. Business Agility: Through its agile approach and continuous improvement initiatives, CCoE enables the organization to respond quickly to changing market demands and opportunities.

Cloudera stands out as the premier choice for large organizations addressing modern data challenges. It supports CCoEs worldwide in standardizing product portfolios across various business units while ensuring data maturity and cloud-native processes, including world-class security, auditing, and lineage. The integrated platform uniquely enables organizations to navigate and overcome critical challenges including cloud adoption and deploying production-ready AI.

Cloudera is a mature, enterprise-grade platform that manages the full data lifecycle. It is designed for rapid, secure deployment in compliance with industry standards, enabling end-users to focus on deriving value from data while aligning with corporate policies. By accelerating time to value, Cloudera allows IT and R&D teams to concentrate on achieving business goals rather than dealing with development and integration challenges. These are some key benefits that the Cloudera hybrid platform for data, analytics, and AI provides:

“Edge2AI” Self-Service and Low/No Ops Platform: The platform offers the full data lifecycle anywhere at scale, from IoT field management and ingestion to AI on a self-service model.
Enterprise Support, Migration Acceleration Programs, Migration Assistants, and Professional Services: Cloudera offers innovative tools and professional services bundles to facilitate customer cloud journeys and improve the maturity of projects, resulting in cost-efficiency.
Most Open and Complete Data Platform: Customers can avoid vendor lock-in by integrating open source projects designed for modern data architectures. Our strong belief is that open source, open standards, and open markets will drive the next wave of innovation, powering every organization with open-source, community-shared development and knowledge.
Security: Recently, Cloudera achieved FedRAMP "In Process" milestone to deliver a true hybrid data platform that securely accelerates AI across the U.S. Government, already completing SOC2 and ISO27001. Cloudera helps project leaders with internal security approvals, InfoSec, and legal, in addition to a clear shared responsibility model where Cloudera takes ownership of the platform software patches, Common Vulnerabilities & Exposures (CVEs), and infrastructure lifecycle management. Customers and partners only need to plan the required updates based on their maintenance window opportunities.
Unified Governance and Auditing for Any Compute Form Factor: There is no other full data lifecycle management platform with such a broad governance scope, nor hybrid and multi-cloud integration capabilities.

By leveraging a modern, unified data platform that combines cloud operations and data analytics, organizations can seamlessly automate important tasks and work with data while abiding by even the most stringent security and governance requirements.

Learn more about Cloudera’s data platform.

Benefits of Cloudera

Measuring the Impact of Strategic Alliances

Jerome Alexander — Thu, 13 Feb 2025 14:00:00 UTC

Strategic sales and marketing alliances are increasingly vital to driving growth and scalability in IT. Whether you're building relationships with resellers, distributors, or system integrators, the ability to measure the impact of these partnerships is critical for long-term success. Yet, many businesses still struggle with establishing metrics or evaluating the outcomes of these collaborations effectively.

This conversation explores how to leverage effective measurement frameworks and key strategies for managing and enhancing your strategic sales alliances. By following these insights, you can align your partnerships with business goals, unlock efficiencies, and ultimately help your partners grow—because, as the saying goes, your success begins with theirs.

Why Are Strategic Sales Alliances Essential?

Today’s most innovative and successful businesses rely on collaboration. I have worked primarily in the channel with multiple Routes To Market (RTM) including VARs (Value-Added Resellers), MSPs (Managed Service Providers), GSIs (Global System Integrators), ISVs (Independent Software Vendors), and IHVs (Independent Hardware Vendors) to fuel the growth of our business and drive pipeline with our partners.

Cloudera supports our partners and enables them to tap into new markets, expand their customer base, and deliver more comprehensive solutions that provide the best OUTCOMES for the customer. But navigating these partnerships isn’t without challenges, which makes consistent measurement and alignment vital for success.

I discussed the four pillars for sustainable and impact driven partnerships in a previous post but let's dive a bit deeper here before we hit some key metrics.

4 Pillars for Sustainable and Impact-Driven Partnerships

1. Identify Key Partners:

Not all partnerships are created equal. Identify partners whose strategic goals and market presence align with your organization. The goal isn’t just to recruit partners, it is to recruit the right partners who amplify your value proposition.

Partner selection is implementing an approach to support the process. I encourage you to use various criteria to evaluate variables such as market overlap, complementary offerings, and the partner’s ability to execute. Begin by asking:

Does this partner address a market gap we aim to resolve?
How well do their capabilities align with our target customer needs?
Can they scale effectively to support long-term growth and shared objectives?
Do they demonstrate a commitment to collaboration and mutual success?
Are they Marketing READY?

By answering these questions, you can focus your efforts on partnerships that not only drive revenue but also enhance your overall market positioning. Partner selection is the foundation upon which successful alliances are built, and prioritizing quality over quantity ensures a more efficient, impactful, and sustainable approach.

2. Maintain Open Communication

Strong alliances rely on clear, continuous communication. Sharing priorities, market insights, and operational updates builds trust while ensuring partner alignment stays intact.

Schedule regular partner touchpoints like:

Quarterly Business Reviews (QBRs): CRITICAL!!! Provide progress updates, surface potential challenges, and recalibrate strategies at these reviews. Make sure the decision makers are in the room and ACTION plans are provided back to the team after the review.
Enablement Sessions: Bi-lateral learning is key. If our partners and alliances don't know or understand how our products solve customer needs, then the game is already lost. Practical tips: host webinars or training workshops; build relationships between your and the partner account executives; empower partner teams with up-to-date knowledge of your products and strategies.
On-Demand Portals: Offer access to a portal where partners can train and certify on your products and solutions, track opportunities through deal registrations or incentive programs, and deliver marketing kits that make it easier for partners to promote joint solutions.

Platforms like Cloudera Partner Network make collaboration seamless by offering enablement workshops, partner development , and ready made kits for partners to build a marketing and sales funnel leading to closed-win deals.

3. Align Objectives and Help Your Partners Sell

Misaligned sales goals breed inefficiency. To unlock shared success, ensure that your sales objectives map closely with your partner’s goals. Shared goals also prevent cannibalization of resources and help maximize synergy. TIP: Build this into the alignment when partners onboard. Set the expectation of the partnership so everyone is aligned at the start.

A key strategy that I've implemented and it has worked for me is to focus on “Through Partner Marketing” campaigns that target shared customers and improve each partner’s ability to close deals. For example:

Provide Co-Branded Collateral: Equip your partners with marketing assets like solution briefs, battlecards, customer success stories, and demo scripts for sales and business development teams. They go a long way in supporting the sales process.
Empower Scalable Campaigns: Deploy ready-to-launch campaigns that partners can execute with minimal lift. Think campaigns in a box, social and email drip campaigns, sales 101 and 201 assets, and even program guides to support the campaigns.
Monetarily Incentivize Success: Reward key performance with revenue-sharing models, volume incentive rebates, or exclusive access to new solutions. Tip: This is a tool but not all partners allow these. Work it through with leadership at both orgs to effectively implement

4. Regularly Assess Partnerships

Channel dynamics shift often. Competitors enter the market, new technologies emerge, and customer preferences evolve. I've seen these types of shifts first hand and businesses must continuously evaluate partnership effectiveness to remain competitive.

To assess your partnerships effectively, focus on these questions:

Are partners meeting performance metrics consistently?
Do they provide constructive feedback, or are they disengaged?
How are they contributing to innovation or enhancing customer satisfaction?

Frameworks like Reach. Frequency. Yield, a model pioneered at Microsoft, offer a simple, actionable way to dissect partner performance by assessing:

Reach: How many partners are transacting at least one deal this month?
Frequency: How many deals is each partner bringing in regularly?
Yield: What is the average deal size that each partner contributes?

These granular insights help companies focus on specific growth levers, from enabling better training to launching more competitive financial incentives.

5 Metrics Beyond Pipeline to Measure Partner Marketing Success

For marketers, the word we live by is PIPELINE. We have to build the pipeline but not all tactics translate immediately to support this measure, specifically in the channel/partner world. Beyond this foundational pillar, success is built on actionable tactics along the way. Ensuring you're equipped with the means to consistently track impact is critical for making informed decisions.

Here are five key metrics I use to prioritize measurement of my channel marketing efforts:

Sales Velocity: Measure how quickly deals are closed with the help of partners, from lead generation to final conversion. Shorter cycles highlight operational efficiency within the partnership.
Marketing Development Funds Efficiency (MDF Usage): MDF is directly tied to ROI. Spend on campaigns that provide returns that the partner values. Some partners value logo recognition in your campaigns, others value a stricter set of ROI measurements. When using MDF for campaigns, understand the needs of the partner and provide the results they are looking for and agreed upon.
Net Partner-Initiated Leads: How many high-quality leads are directly sourced through your partner ecosystem? Consistent growth here often reveals the strength of their outbound efforts. Investing in the mapping of accounts with your field and partner sales teams can yield amazing results. This pays dividends long term.
Customer Acquisition Rate (Net new Logos): New customer acquisition growth indicates the success of through-partner programs or go-to-market collaborations. How many opportunities are generated from the mapped white space between your org and the partner? This is a magical area to have a win but it is also more difficult. Build the relationships, leverage the assets, execute a campaign, and target the right customers to find lasting success.
Partner Satisfaction Scores: Check in with your partners. A good time for this activity is during the QBR's. Happy partners mean sustainable growth.

How A Data-Driven Approach Strengthens Alliances

Remember, while metrics help sharpen focus, numbers alone paint only a partial picture. Combining quantitative data (e.g., quarterly deal volume) with context-specific insights like challenges partners face unlocks richer potential for collaboration.

For instance, when frequency metrics decline, respond by assembling workshops that equip struggling partners with more practical tools and marketing guides to reignite deal activity.

Whether it’s refining incentives via insights or integrating multi-channel dashboards to streamline updates, empowering your ecosystem through data sharpens competitive advantages for everyone.

Unlock Your Strategic Alliance Potential Today with Cloudera

By prioritizing the right partners, fostering transparent communication, and maintaining alignment, companies can supercharge their strategic alliances. Regular assessments anchored by robust data frameworks ensure these relationships don’t just survive; …they thrive!

Interested in taking your sales alignment strategies to the next level? Sign up to become a Cloudera partner today.

Data leakage: The missed opportunity of lost data

Cloudera — Wed, 12 Feb 2025 14:00:00 UTC

Data has so much promise, but organizations struggle to use it effectively, especially with the increasing complexity of managing large projects and the pressure of incorporating AI. To help us understand where organizations go right—and wrong—with their data strategy, Simon Asplen-Taylor, the founder and CEO of DataTick and author of “Data and Analytics Strategy for Business”, joined The AI Forecast. Simon has spent years counseling organizations on getting their data right. Simon shared the common missteps organizations experience with their data – with all roads leading back to data quality and a wild west of data collection across a business.

Here are some highlights from Paul and Simon’s conversation.

You can’t value data if you don’t understand it

Paul: Is data really still a hard sell for some CEOs?

Simon: Yeah, it is. A CEO six months ago said to me, “People talk about data, but I don't get it.” This ultimately comes down to everyone having a different view of data. If you're a regulator for the Bank of England, you think of data as being one thing. If you're in marketing, you think of data as another thing. Technologists are historically subservient to those in the business who will come to them with certain requirements to build a data set around.

The problem here is oftentimes people don’t know what to ask for. Technologists end up explaining the weaknesses they’ve seen, and you don’t really maximize the full power of data. We need to get people to the place where they aren’t giving their requirements from the jump. Instead, allowing the space for technologists to proactively explain the value of data and what it can do. When that happens, it’s much easier for the CEO or another leader to “see” the value of data.

Data leakage is actually a data literacy problem

Paul: I want to touch data quality. I remember one of my first projects that I worked on in the supermarket business and they were trying to establish master data management. They just wanted a single view of all their products, what they bought and sold to their customers. It seemed to me they were constantly chasing their tail with that. How much of the problem do you think in enterprise is data quality? And how much of that is born out of the division of labor and business silos where someone owns this chunk of the data and someone else owns another chunk?

Simon: Yeah, big question. Data quality is not a homogenous problem. It starts with what I call data leakage. For example, you start off at the beginning of a process and you may have a salesperson talking to a customer and then only talk about products relevant to them. But your organization may sell three or four products, and that salesperson won’t ask about other interests that may be relevant to those additional products. That’s data leakage because there's an opportunity to do some upsell and some cross-sell. You could sell more things to them, but you haven’t caught that information. That's data leakage.

Someone else could have used that data downstream. Suddenly you have a problem later on where someone has to go back to the customer. That means more interventions picking up on data that's being missed out – the quality isn't good enough. Data quality problems and that leakage is caused by real world problems and by peoples’ incentivization to do certain things. You could argue that it comes down to a data literacy problem – making sure people understand that doing this level of data collection is part of your job. You are going to help the organization grow more because if you do this, we can sell more downstream. Ultimately, it’s about business incentivization and business process and people being able to say, “I am going to do the right thing for the organization, not just for myself.”

+++

Don’t forget to tune in to Spotify or Apple Podcasts to listen to future episodes of The AI Forecast: Data and AI in the Cloud Era.

Accelerate Regulatory Reporting in Financial Services and Insurance with Cloudera

Andreas Skouloudis — Thu, 06 Feb 2025 14:00:00 UTC

Regulatory compliance is one of the most important priorities of financial services and insurance organizations. It has become increasingly complex to navigate the multi-dimensional operating environment that spans the regions in which they work. Amidst a wave of technology changes including the proliferation of cloud solutions and the explosion of data volumes, organizations must rethink how they use data to meet their regulatory reporting requirements. This includes reducing the operational burden without inhibiting business growth or the customer experience.

In this blog post, we will cover some of the challenges financial services and insurance companies face, and how Cloudera provides a complete platform with data services that simplify and accelerate regulatory reporting for financial institutions.

Key Challenges in Financial Services

Over the last few decades, there has been a dramatic change in both the regulatory landscape and the technologies used for regulatory reporting. This landscape is dynamic and constantly evolving, while the data required to satisfy auditors is siloed and difficult to access and integrate. Because of this, compliance teams face several key challenges.

New regulatory obligations

Innovations in business models and financial products are two driving forces behind the growing complexity of regulatory reporting requirements. For example, banking regulators globally have introduced a wave of new regulatory requirements in response to the explosive demand for digital assets among retail and institutional investors.

Constant regulatory change

It is not just the new regulations that introduce change, however. Regulators consistently revise existing regulations and request more information, greater reporting granularity, and increased reporting frequency. As an example, the latest regulatory framework for the management of market risk, the Fundamental Review of the Trading Book (FRTB), requires more historical data, more complex regulatory measures, and more granular reporting than previous methodologies.

Scarcity of regulatory compliance talent

As the Wall Street Journal suggests, hiring good people in compliance departments is getting harder, as younger professionals gravitate towards front-office jobs. That introduces several limitations to financial services and insurance organizations that need to scale their regulatory reporting capabilities with even scarcer talent.

Data accuracy and competition challenges

The rise of software solutions that compose the regulatory reporting stack has led to many data silos and Extract, Transform, & Load (ETL) pipelines used to move data between data warehouses and data lakes, source systems, and proprietary tools. All of that inefficiency has contributed to data accuracy issues among systems, as it is virtually impossible to keep different data sources in sync and resolve discrepancies in data and metadata among tools and environments.

Performance and scalability limitations

The technologies initially used to build regulatory reporting solutions, such as relational databases and proprietary regulatory reporting solutions, lack distributed processing capabilities for growing volumes of data and computational needs.

High technology costs

The legacy, appliance-based platforms that many financial institutions use to deliver regulatory reporting solutions cannot scale in a cost-efficient manner to meet growing business needs due to the high-capital outlays required for purchasing proprietary hardware. As a result of these challenges, financial institutions must re-evaluate their data analytics architecture based on a new set of criteria that meets the needs of compliance teams while preserving business agility and delivering a unified customer experience across channels.

Figure 1: Complexities of Traditional Architectures

Requirements for a Modern Regulatory Reporting Platform

To meet dynamic and evolving regulatory requirements in a cost-efficient manner, organizations need a modern data architecture that delivers five major capabilities:

A unified approach for structured and unstructured data: A modern data architecture must adopt a unified data management approach to break down silos across data warehouses, data lakes, and other databases and analytical solutions.
A scalable data movement and processing model: A modern data platform should scale to meet existing and future storage and processing requirements for regulatory reporting related to moving data between systems and calculating regulatory measures.

A flexible and elastic deployment model: Regulatory processing happens at specified periods of time and it involves data generated predominantly by on-premises systems. Because of this, a modern data platform should offer a hybrid deployment model, using elastic cloud resources to perform complex regulatory calculations at the end of the reporting period.

End-to-end security and governance: Regulatory reporting involves sensitive information (e.g., over-the-counter trades between financial services institutions) that needs to be safeguarded from unauthorized access. A modern data platform should offer fine-grained security and governance to control access for different stakeholders who participate in the regulatory reporting process.

Automation with AI and ML: A modern data platform should provide the capability to meet reporting requirements and build automation into the reporting process, with AIOps tools to provide governance, observability, and repeatability over the model development process, and model explainability.

Cloudera Delivers a Modern Regulatory Reporting Platform

Cloudera delivers the only true hybrid platform for regulatory reporting, offering features such as:

Open Table Format with Apache Iceberg

Apache Iceberg lies at the heart of Cloudera as the common table format for both structured and semistructured data, offering interoperability with Cloudera services such as Data Engineering and Data Warehousing and proprietary tools such as Snowflake. In addition, it streamlines changing and enriching complex data models for regulatory reporting, making it easy to adapt to new regulatory requirements and perform complex computations for risk models. Additionally, Iceberg enables auditability of historical data by giving data analysts a mechanism to reproduce a previous state of the data model to assess the impact of regulations or market scenarios.

Cloudera Data Engineering

With Apache Spark as its processing engine, Cloudera Data Engineering enables large- scale, compute-intensive data transformations, automated data validation, and data normalization and standardization. Beyond that, it offers several tools to streamline data operations such as deep analysis, which offers a visual interface to identify and resolve performance and Apache Airflow to schedule and manage the lifecycle of complex data engineering jobs.

Hybrid Cloud Platform

Cloudera’s platform enables organizations to leverage cloud resources in addition to their on-premises environments to execute regulatory analytics in the cloud when on-premises capacity is not sufficient. This model allows executing heavy computational tasks for regulatory reporting at the end of the reporting period by providing cloud resources in the cloud for those transient workloads. Additionally, hybrid cloud enhances resilience by offering an additional hosting environment to seamlessly run analytics applications in case of failure in an on-premises-only or cloud-only deployment, thus addressing Digital Operational Resilience Act (DORA) requirements.

Cloudera Data Flow

Cloudera Data Flow provides data integration and movement between source systems, Cloudera services, and other analytics solutions. It offers a universal data movement solution with a no-code flow designer enabled by Apache NiFi to unlock data silos from many different technologies and data sources. It can also simplify complex architectures by replacing point-to-point integrations between internal databases and regulatory systems with an enterprise message bus based on Apache Kafka.

Cloudera AI

Cloudera AI is a platform for end-to-end AI and ML model development, providing the tools required to build, train, and deploy models on-premises and in the clouds. Cloudera AI enables secure and governed AI and ML workflows with full observability over the development process. Cloudera AI supports the automation of regulatory reporting workflows to reduce SLAs and deliver consistent and accurate results.

Comprehensive Data Governance

Cloudera Shared Data Experience (SDX) provides fine-grained security for data assets across analytics services to ensure proper data access for all different roles in the organization. It also offers traceability capabilities through end-to-end data lineage to support the auditability of risk assets and processes, addressing a key requirement of regulations such as BCBS93.

Get Started with the Cloudera Open Data Lakehouse Today

Amidst a global wave of regulatory transformation, organizations need to adopt an open, hybrid data strategy using Cloudera as the foundational data management and processing platform to bring together internal and external data for regulatory processing. With over 80% of the largest global banks, four of the top five stock exchanges, and eight out of the top ten wealth management firms, Cloudera has the track record and expertise to help organizations to meet their regulatory reporting needs, regardless of scale or complexity. Learn more about how the Cloudera Data Platform can help you.

Celebrating Black History Month with Cloudera

Antoine Burrell,Dipto Chakravarty — Thu, 06 Feb 2025 14:00:00 UTC

This week marks the start of Black History Month—a time that is incredibly important here in the U.S., but also an opportunity for people everywhere to celebrate and reflect on a subset of society that has often found itself silenced. The impact of Black history can be seen over hundreds of years, including many inventions that have defined the world we live in today.

This time is about much more than knowing the history of the Black community—it is a checkpoint for us to reflect on how we’ve collaborated to improve the impact of this month throughout the year.

We have a busy month ahead, filled with learning, celebration, and action. Let’s explore how Clouderans are getting involved for Black History Month this year.

An Opportunity for Action

Our approach to these activities is broken into a three-phased approach: educate, include, and empower. In essence, we want everything we offer Clouderans this month to be purposeful so that it creates a working environment that is uplifting for people of all heritages and backgrounds.

As part of our Black History Month activities, we are hosting a virtual education event for Clouderans. In past years, we’ve held fun events like a Jeopardy-style game or an event focused around Juneteenth. This year, we’ve got a fun-filled educational event planned where attendees will get to engage with speakers and learn more about some of the most important aspects of Black History.

Bringing Clouderans together in this setting lets us present important facets of Black History that stitch together the past, present, and future in a way that leaves participants with a lasting memory of what they’ve learned.

Black History Month is about more than education, though. With that in mind, our team is taking action by providing students at Historically Black Colleges and Universities (HBCUs) with mentorship and guidance on how they can build a career in the technology field.

This is an incredibly rewarding experience, as Clouderans will join an event to connect directly with the students and answer questions they have about how to get started in their careers. This mentorship initiative shows just how important representation can be for building confidence and learning from others who may have had a similar path—particularly for segments of the population that may be underrepresented in the broader industry. Many of us have leaned on mentors throughout our careers, so having the chance to give back and serve that role for the next generation is critically important.

Learning from Black History Month is a Year-Round Endeavor

Black History Month is a time for us to really focus on the impact of the Black community. It’s something that we treat not just as a recognition month, but as a checkpoint that exists during the year—a point to look at the actions we have taken, and will take, during the year. Outside of Black History Month, Clouderans regularly take action by working with, or donating to charities, holding days of service to give back, and further strengthening their commitments to supporting these communities.

At times, it can be easy to take for granted just how much knowledge and awareness has shifted around Black History in recent years. And just as we’ve seen the Black History celebration grow from a single week into an entire month, there is always more we can do to understand the history and uplift the stories and achievements of this community.

Learn more about how Cloudera is building a more inclusive and diverse workplace.

Diving Deeper on Data and AI in 2025

Wim Stoop — Sat, 25 Jan 2025 05:01:00 UTC

The AI and data landscape is evolving at an unprecedented pace. Organizations are not only grappling with the challenge of managing massive volumes of data but also seeking ways to harness it for AI-driven innovation. As we enter 2025, the intersection of data and AI will continue to transform, unlocking new opportunities and reshaping the future in exciting and unexpected ways.

To better understand the innovation and changes in the future, Cloudera hosted a panel discussion consisting of industry experts who shed light on the biggest trends and changes they see surrounding data and AI. Moderated by Cloudera’s Senior Director of Product Marketing, Wim Stoop, the panel included fellow Clouderans Manasi Vartak, Chief AI Architect, Christopher Royles, Field CTO - EMEA, and guest speaker and Principal at SanjMo and former Gartner Analyst, Sanjeev Mohan. The panelists covered a broad range of trends—from the use of AI agents to effective governance.

You can watch the entire conversation here. A myriad of topics were covered, so in this blog, we’ll explore some of the questions we didn’t have time to answer on the live webinar.

What can we expect with regard to AI agents in the coming year?

AI agents are a key part of AI’s evolution, designed to operate autonomously and mimic human decision-making, problem-solving, and learning. Over the past year, they’ve gained traction across industries, handling tasks from customer support to streamlining internal operations.

One area where AI agents show significant promise in the coming year is security. Security operations centers (SOCs) are grappling with increasing demands and overwhelming alert volumes. AI agents powered by GenAI have the potential to enhance SOC capabilities and reduce the cognitive load on analysts. These capabilities include the ability to autonomously monitor for threats in real time, automate routine tasks with minimal human intervention, and provide contextual decision-making support. More on the security implications of AI agents here.

Considering AI governance is still evolving, what are some best practices, and how should we expect it to develop?

New AI models are coming out nearly every day. Given the rapid evolution of AI models, AI governance needs to keep pace and adapt along with it. While data governance practices have been articulated heavily over recent years, AI governance has not fully taken off.

But there are still a series of steps and best practices that organizations can, and should follow to be successful. These include establishing clear guardrails around the development and deployment of AI systems, as well as implementing robust monitoring and evaluation frameworks. When implementing these practices the focus should come back to a few key areas like transparency, mitigating biases, maintaining data security, and enforcing accountability at every stage of the AI lifecycle. As AI governance continues to develop, we expect to see more standardized frameworks that integrate these best practices into model development and deployment processes.

What’s the more effective route for my business to manage our data and AI initiatives? Single or hybrid cloud?

The answer here is actually more nuanced than an either-or scenario. Taking a hybrid cloud approach alone is not enough and only comes with more complexity and challenges. What organizations should be working toward is ‘true’ hybrid. But what does that mean?

One of the biggest differentiators for true hybrid is the ability to operate as a single platform across both data center and cloud, and at the edge. When examining your own operations, there are several areas to focus on to achieve true hybrid, including a distributed model and portable and interoperable data services. You can see what else is needed and the necessary steps to take in this checklist.

How is success with AI defined through the lens of social and environmental well-being?

Success with AI, through the lens of social and environmental well-being, is defined by its ability to drive meaningful, measurable impact beyond business objectives. While AI can enhance productivity and efficiency, it can do a lot to mitigate our environmental and social impact.

One key measure of success is environmental impact, including reductions in energy consumption and carbon footprints. By optimizing workloads and resource allocation, AI can help organizations minimize their environmental footprint—an approach we actively support by enabling customers to run more efficient, sustainable operations.

We’ve also been able to support customers who are making major contributions to society, leveraging Cloudera’s platform in fields like pharmaceutical research, for example. When it comes to measuring success, the path forward has increasingly become about finding a balance between both of these elements.

+++++

Thank you to everyone who attended our webinar! The future of data and AI is full of possibilities, and businesses and organizations of nearly every type and size are gearing up to take advantage. As the industry navigates what’s next, Cloudera is already deeply ingrained in shaping that future, providing customers with capabilities that enable them to attain the trusted data needed to harness AI.

Interested in learning about what else our expert panel thinks is in store for 2025? Watch the entire webinar here, then find out how Cloudera can help you dive deeper into data and AI in the new year.

The Art of Getting Stuff Done with NVIDIA’s Kari Briski

Cloudera — Fri, 24 Jan 2025 05:01:00 UTC

Each quarter, we’ll take a break from the regularly scheduled AI programming on The AI Forecast podcast to direct our attention to women leaders in technology. We’ll hear from powerful visionaries at the top of their fields as they share their journeys, the lessons they've learned, and the insights gleaned from their success.

For the inaugural episode of Women Leaders in Technology on The AI Forecast, we welcomed Kari Briski - Vice President AI Software Product Management at NVIDIA. Kari shared the stories and strategies that inform her leadership style (like GSD or “getting stuff done”), what it means to trust your instinct, and the advice she gives to young women embarking on a career in technology and to women further along who feel stuck where they are.

Here are some highlights from that conversation.

For tough decisions, be like water and flow around it

Paul: Let's talk about roadblocks. When you do face a roadblock or a barrier, especially something that you care deeply about, how do you navigate that?

Kari: When you hit a roadblock, how can you be like water? How can you flow around it?

To flow around it, I have to see my options – what's going to get me the most success immediately to get data back so that I can use that data to then make another decision? When you hit a roadblock, A) don't get discouraged, B) step back and look at the picture, and then C) take small steps along the way to go around it. Because the roadblock could be there for a really good reason. Or it’s a bad reason. Either way, you need to learn from it and move on. That's the biggest thing, too: just learning from it and still moving forward.

It's going to be a bit bumpy along the way. You've got to be able to clearly lay out the path forward versus everyone tackling the project on their own terms with no direction. Emphasize that we're going to build the road forward together.

Leaning on resources to allow for more focused time with family

Paul: We're all entitled to lives outside of work – looking after ourselves and our family at the same time as ensuring that we're having great, fulfilling careers. It's easier to say, but it just seems devilishly hard to do.

Kari: It's really hard to do. Now has never been a better time to be able to have services that can support you in having time for your family. For example, grocery delivery – I'm not spending time at the supermarket. I'm spending time picking up my daughter and having a 20-minute conversation with her on the car home after practice. And then the food's already there when we both get there. And then you and the family can make dinner.

You always have to make focused time. That's true when you're at work and it's true when you're at home. Because if you're doing six things at once, you will burn out and then no one ever feels that they got any attention. From customers, teams at work, and your family, to yourself.

Advice from Kari to women at all stages in their careers: It gets harder, but you get better

Paul: What's the one piece of advice you find yourself going back to most often as you're mentoring people?

Kari: I have one piece of advice I've always said, and I will continue to say it. I say that it doesn't get any easier. It will never get any easier. It just gets harder, but you get better. With your capacity to learn and take on more, you become a better leader, you become a better person, you are able to gain scope, and then you are now able to take on more. It might seem easier, but it's actually harder. You're just able to do more. Allow yourself that ability to grow and handle more.

Paul: I know your mother was a mentor and huge influence in your life as someone who worked in STEM. Were there other mentors in your career and your life that helped bring you to where you are or that maybe are still in your life?

Kari: Yeah, all the time. I came out of college with five women in my class, so I wasn't always looking for a woman mentor, to be honest. I've had a lot of really fantastic male mentors because it's more about either our personality or what we are trying to achieve or goals in life or they hadn't been through that before.

Don't try and look for one person, because again, nobody's perfect. And don't try and use that one person as a model, but kind of use information from as many sources as you can get. Find the qualities in the people that you admire, and then apply it to yourself.

Paul: What’s your advice for someone beginning to embark on their career?

Kari: Number one, it takes time. You don't have to be the biggest leader in the first four years out of college. Give yourself time. And at the same time, stick with it. Try and gain as many different types of experiences to grow yourself. So, if you're not pushing yourself, if you feel comfortable, then you're not growing.

Get comfortable feeling uncomfortable. And I think that's another one that's especially true for women. It’s more likely that women don’t put themselves in positions to feel uncomfortable. For example, maybe opting against taking a job where they feel they meet nine out of 10 of the job description bullet points, not all 10/10. Get comfortable feeling uncomfortable. Push yourself to learn that one bullet point that you don't have or go take up a project around that so that you learn.

The other thing would be it's okay to fail. Because when you fail fast, those are the growth moments as you learn from your mistakes very quickly. But don't be afraid to get back into it because now you know what you did wrong and you're not going to do that wrong again. Give yourself time, push yourself, and be able to learn from your mistakes.

Paul: For those who are mid-career and feeling like they're stuck and they've got a manager or a culture that doesn't get it. What would you say to them?

Kari: Change. I felt that way too. I was scared to leave a company or switch a position, but I definitely looked at the road ahead of me or the people ahead of me. It was not very diverse, and I thought to myself: I have the ability to break that glass ceiling, but is this something I'm passionate about that I want to do it here?

Are you passionate enough about the work that you're doing to be uncomfortable, to grow, to push boundaries, to challenge people? And if you feel like you're constantly getting those roadblocks, then make a change. Someone else will value you more. You'll learn more through the process. So that's one aspect. And then the other one is really reflect on yourself. Why are you hitting that wall? How can I improve myself? How can I take on a job? Again, how can I learn?

To learn more about Cloudera’s Women Leaders in Technology, join the LinkedIn community and visit the home page.

Like and subscribe to The AI Forecast to stay up to date on the latest episodes. You can watch the video version of this episode on The AI Forecast YouTube page.

AI-Driven SOC Transformation with Cloudera: Enhancing Security Operations with Agentic AI

Suri Nuthalapati,Carolyn Duby,Laurence Da Luz — Thu, 23 Jan 2025 05:01:00 UTC

Image: AI Agents utilize privately hosted LLMs on the Cloudera AI Inference service

Image: Architecture of AI Agents integrated with Cloudera AI Inference, for their interaction with private LLMs and enterprise data in use for SOC Activities

Image: AI Agents utilize privately hosted LLMs on the Cloudera AI Inference service

Security Operations Centers (SOCs) are the backbone of organizational cybersecurity, responsible for detecting, investigating, and responding to threats in real-time. Yet, the increasing complexity and volume of cyber threats present significant challenges. SOC teams often grapple with alert fatigue, skill shortages, and time-consuming processes.

Generative AI (GenAI), coupled with Agentic AI, offers a revolutionary approach to addressing these pain points. By automating repetitive tasks, enabling proactive threat mitigation, and providing actionable insights, artificial intelligence (AI) is reshaping the future of SOCs. In this blog, we explore how Agentic AI, powered by Cloudera, enhances SOC effectiveness and ensures secure, efficient operations.

Challenges in Security Operations Centers

According to a Trend Micro survey, 70% of SOC analysts feel overwhelmed by alert volumes, while another report from Tines found that 64% plan to leave their roles due to stress and burnout. Additionally, 72% of organizations express concerns about safeguarding sensitive data, highlighting the critical need for privately hosted AI-driven solutions to address these challenges.

Overwhelmed analysts: SOC analysts contend with thousands of daily alerts from disparate sources. The relentless volume leads to alert fatigue, impacting their ability to prioritize and respond to genuine threats effectively.

Shortage of skilled analysts: The cybersecurity talent shortage is a persistent challenge. The demand for skilled SOC professionals far exceeds supply, making it difficult for organizations to scale their teams and maintain strong defenses.

Time-consuming documentation: Incident response requires detailed documentation, including reports, audits, and stakeholder summaries. These manual processes divert analysts from their primary investigative tasks.

Sensitivity of network data: Handling sensitive network data while integrating advanced AI technologies requires robust security measures to prevent data breaches and ensure compliance.

What are AI Agents?

AI agents are autonomous software systems designed to interact with their environments, gather data, and leverage that information to autonomously perform tasks aimed at achieving predefined objectives. They are a central concept in the field of AI and are designed to operate with a degree of autonomy, mimicking intelligent human behavior in decision-making, problem-solving, and learning. While humans define the goals, the AI agent independently determines the most effective actions required to accomplish them.

End-to-end Context with Enterprise Integration

Integrating enterprise-specific data, such as historical incidents, network topology, and response protocols, enables the AI model to generate highly relevant insights. This contextual understanding enhances the model’s accuracy and applicability to the SOC’s unique requirements.

For example, in a SOC use case, an AI agent tasked with threat detection and response might continuously monitor network traffic, analyze security logs, and correlate data from multiple sources to identify potential threats. Once it detects an anomaly, the agent can assess the severity, suggest remediation actions, or even execute automated responses like isolating affected systems. If the situation requires more nuanced decision-making or is beyond its scope, the AI agent escalates the incident to human analysts with detailed contextual insights, enabling faster and more informed responses.

Key Features and Benefits of this Agentic AI Solution

Organizations that employ Agentic AI solutions will save hundreds of analyst hours per month, with automated responses addressing up to 40% of repetitive threat scenarios. This translates into more focused, high-impact work by SOC teams and a stronger overall security posture.

Summarization of incident events: GenAI can process and condense large volumes of event data, providing analysts with concise summaries of incidents. Instead of sifting through logs and alerts, analysts can quickly understand the scope and nature of an event, allowing for faster decision-making.

Proactive threat mitigation: Agentic AI leverages predictive analytics to foresee potential attack vectors and suggests mitigation strategies before a threat fully manifests. This capability helps organizations stay ahead of adversaries.

Suggested remediation: AI-powered assistants can recommend remediation steps based on the analysis of past incidents and best practices. These suggestions can include isolating affected systems, patching vulnerabilities, or updating security configurations, empowering analysts with actionable insights.

Coding assistance for analysts: GenAI can act as a coding assistant, helping analysts develop new investigation notebooks and detection algorithms. This feature streamlines the creation of custom scripts and tools, enabling SOC teams to address unique threats more effectively.

The challenges SOC teams face demand innovative, scalable solutions. GenAI and Agentic AI, powered by the Cloudera platform, transform SOC operations by enhancing efficiency, reducing workloads, and improving threat response.

With Cloudera, organizations can deploy tailored AI solutions, ensuring data security and compliance. Future-proof your SOC and stay ahead of cybersecurity challenges with Cloudera’s unified approach to data management, advanced analytics, machine learning, and AI.

Enhancing Security Operations with Agentic AI

GenAI offers a promising solution to these challenges. By deploying privately hosted GenAI foundational models tailored to enterprise needs, and incorporating the capabilities of Agentic AI, organizations can enhance SOC effectiveness while maintaining data security and compliance.

In the realm of SOC, AI agents represent autonomous, adaptive systems capable of perceiving cybersecurity landscapes, contextualizing threats, and executing intelligent responses in real-time.

Proactive and Autonomous Security with AI Agents

Agentic AI builds on the capabilities of GenAI by introducing a layer of autonomy and proactivity. It enables SOC systems to:

Actively monitor and respond to threats in real-time.
Automate routine SOC tasks with minimal human intervention.
Provide contextual decision-making support, reducing the cognitive load on analysts.

Integrating your Agents with Privately Hosted AI Models (LLMs)

Deploying GenAI models in secure environments ensures data confidentiality. With Cloudera AI Inference service, enterprises can host AI models on-premises or in the cloud, maintaining compliance while harnessing AI’s power.

Your AI Agents can now interact with AI Models hosted on Cloudera, and all the proprietary data resides within your organization’s VPC. Also, these agents have the ability to interact with Enter Tools and Environments for further actions and Feedback.

Strengthening Cloudera’s Commitment to Corporate Equality

Karen Robinson — Tue, 14 Jan 2025 05:01:00 UTC

Inclusion and accessibility are at the heart of our mission and culture at Cloudera. This approach and our focus on inclusion are crucial to both our broader business objectives and to establishing the kind of workplace where our employees can be their best, authentic selves every day. As we strive to make data and analytics accessible, we pride ourselves on creating a working environment that makes every employee feel valued and supported, regardless of how they identify.

In that spirit, we are proud to have taken part in the Human Rights Campaign (HRC) Corporate Equality Index (CEI) this year. This report, released by the HRC annually, serves as a national benchmarking tool examining the corporate policies, practices, and benefits provided by companies that are pertinent to lesbian, gay, bisexual, transgender, and queer employees.

With the newest report freshly released, now is the perfect time to dive into what makes the CEI such an important metric and how Cloudera is working to provide LGBTQ+ employees with a positive and supportive work environment.

Why the Corporate Equality Index Matters

In 2024, Cloudera participated in the HRC Corporate Equality Index and was subsequently evaluated for its benefits and efforts to support the LGBTQ+ community. The report is critically important, highlighting successes and areas for improvement. The Human Rights Campaign has released these CEI reports annually for over two decades, becoming increasingly valuable to companies looking to innovate on their inclusion and diversity efforts.

The first year of the CEI included 319 participants, and now the 2025 CEI has grown to include 1,449 participating organizations. The results of the 2025 CEI showcase how U.S.-based companies are promoting LGBTQ+-friendly workplace policies in the U.S. and abroad. Scoring is broken down into four main sets of criteria: workforce protections, inclusive benefits, supporting and inclusive culture, and corporate social responsibility. Among others, initiatives like non-discrimination policies, family and domestic partner benefits, LGBTQ+ internal training and accountability, transgender inclusion best practices, and outreach or engagement with the broader LGBTQ+ community all factor into the final reporting.

The categories and scoring requirements within the CEI report are all areas where Cloudera has committed significant time and effort to build, and continuously improve, our culture. That culture has, in large part, been enriched and fostered by the efforts of our LGBTQ+ ERG, working to bring greater awareness to intersectional topics. For example, during Pride Month 2024, the ERG organized multiple events and meaningful opportunities for dialogue. Take a look at what our team in Cork, Ireland, did to celebrate Pride earlier this year. They equipped peers and leadership alike with tools to foster inclusivity, build empathy, and better understand the diverse experiences of the LGBTQ+ community.

From Cloudera’s perspective, the report’s goals and findings are directly aligned with our values. The CEI is much more than just a benchmark, it’s a reflection of who we are as a company. Taking part in the CEI helps us demonstrate to employees, customers, and partners just how committed we are to operating as a socially responsible organization, prioritizing values that empower all people.

Delivering on Our Commitment to an Inclusive Work Environment

Cloudera has a commitment to supporting the LGBTQ+ community and all employees that is both actionable and continuous. We are constantly looking at ways that we, as an organization, can enhance our policies, benefits, and programming to ensure they fully reflect and address the needs of our employees. Examples of this commitment can be seen in Cloudera’s ongoing Our Culture programming, which is designed to build awareness and allyship and encourage employees to actively share feedback to identify opportunities for improvement.

Cloudera employees, through groups like our ERGs, are strong advocates for representation and equality in their communities. Programs like Women Leaders in Technology have helped provide opportunities for personal growth and a sense of community while volunteering initiatives give Clouderans a chance to provide young students mentorship and support. Recently, Cloudera also successfully completed the recertification process with Fair Pay Workplace, demonstrating the company’s commitment to ongoing efforts to maintain equitable compensation practices. The arrival of the 2025 CEI report is a reminder that there is always work to be done when it comes to corporate equality and Our Culture.

Learn more about how Cloudera is working to foster an inclusive and supportive workplace where everyone can feel empowered to be themselves.

From Machine Learning to AI: Simplifying the Path to Enterprise Intelligence

Robert Hryniewicz — Fri, 10 Jan 2025 05:01:00 UTC

A Name That Matches the Moment

For years, Cloudera’s platform has helped the world’s most innovative organizations turn data into action. As the AI landscape evolves from experiments into strategic, enterprise-wide initiatives, it’s clear that our naming should reflect that shift. That’s why we’re moving from Cloudera Machine Learning to Cloudera AI.

This isn’t just a new label or even “AI washing.” It’s a signal that we’re fully embracing the future of enterprise intelligence. That’s a future where AI isn’t a nice-to-have—it’s the backbone of decision-making, product development, and customer experiences.

From Science Fiction Dreams to Boardroom Reality

The term “Artificial Intelligence” once belonged to the realm of sci-fi and academic research. Decades ago, it was a moonshot idea, and progress often stalled. But over the years, data teams and data scientists overcame these hurdles and AI became an engine of real-world innovation. Today, it’s everywhere–from conversational chatbots anticipating and reacting to questions to copilots accelerating development to advanced analytics driving strategic decisions.

This evolution has changed expectations. Businesses no longer wonder if AI can help them—they ask how soon and seamlessly they can put it to work. By embracing the term “AI,” we’re signaling that we’re here to meet that expectation head-on with the comprehensive solutions that organizations need right now not only to compete but to differentiate.

Why “AI” Matters More Than “ML”

Machine learning (ML) is a crucial piece of the puzzle, but it’s just one piece. AI today involves ML, advanced analytics, computer vision, natural language processing, autonomous agents, and more. It means combining data engineering, model ops, governance, and collaboration in a single, streamlined environment.

Renaming our platform “Cloudera AI” acknowledges that our customers aren’t just training models—they’re embedding intelligence across their business. It’s about comprehensive solutions, not isolated algorithms.

Speaking the Market’s Language

The marketplace has chosen “AI” as the universal shorthand for smart, automated decision-making. Executives, data teams, and even end-users understand that “AI” means more than building models; it means unlocking strategic value. By aligning our brand with that term, we’re ensuring that when you say “Cloudera AI,” everyone knows exactly what we offer: an integrated, future-focused platform that accelerates innovation for enterprise AI.

Simplifying to Amplify

This renaming is part of a broader effort to simplify how we present our offerings. Our platform, once known as Cloudera Data Platform or CDP, is now simply “Cloudera.” Within it, you’ll find capabilities that clearly map to what they deliver. By trimming acronyms and complex naming, we’re removing barriers so you can quickly find what you need and get to work building intelligence into every decision. Cloudera is also making major investments to accelerate our vision with the acquisition of Verta and many strategic partnerships, including NVIDIA.

More Than a Tool—A Full AI Suite

Cloudera AI brings together everything you need to operationalize AI at scale:

Cloudera AI Workbench: The collaborative development environment where data scientists, analysts, and engineers build solutions together—faster, smarter, and at enterprise scale.
Cloudera AI Registry: A place to govern and track all your AI assets—models, applications, and beyond—so you can deploy and update them confidently, on-premises, and in multiple clouds.
Cloudera AI Inference: The reliable engine that gets your intelligence into production with NVIDIA, delivering insights where and when they’re needed most.

Best of all, they are all designed to work together seamlessly, providing you with the capabilities for a smooth path from raw data to AI-driven results.

Beyond Buzzwords: Real Results

We know “AI” can sound like hype. But at Cloudera, it’s about delivering tangible outcomes. We’re building on our foundation of proven data and analytics expertise to deliver a platform that’s ready to help you realize real business value from your AI initiatives.

Whether you’re boosting customer satisfaction, reducing costs, or uncovering new revenue streams, Cloudera AI equips you to act with confidence and speed. We put the “intelligence” in “AI” by ensuring that your solutions are contextualized with your data–scalable and easy to evolve.

The Road Ahead

Cloudera AI isn’t just about what we have today—it’s about what comes next. We’re exploring AI Studios, such as our latest low-code RAG Studio, and AI Agents capable of proactive decision-making. As generative AI models mature, we’ll help you leverage them to enrich your data, enhance user experiences, and spark fresh innovation.

Our goal is to keep you at the forefront of what’s possible, so you’re never stuck playing catch-up in a rapidly evolving enterprise AI landscape.

Your Next Move

With Cloudera AI, we’re making it simpler, clearer, and more powerful to bring intelligence into everything you do. Ready to learn more? Check out the Cloudera AI page and discover how we can help you move from ideas to impact, accelerating your AI journey every step of the way.

Let’s build the future of enterprise intelligence—together.

Ready to experience Cloudera AI firsthand? Start your free 5-day trial and experience enterprise-scale AI capabilities for yourself.

Predictive Models Are Nothing Without Trust

Ryan Garnett — Wed, 08 Jan 2025 05:01:00 UTC

Airports are an interconnected system where one unforeseen event can tip the scale into chaos. For a smaller airport in Canada, data has grown to be its North Star in an industry full of surprises. In order for data to bring true value to operations–and ultimately customer experiences–those data insights must be grounded in trust. Ryan Garnett, Senior Manager Business Solutions of Halifax International Airport Authority, joined The AI Forecast to share how the airport revamped its approach to data, creating a predictions engine that drives operational efficiency and improved customer experience.

Here are some highlights from Paul and Ryan’s conversation.

Building a data culture

Paul: You joined Halifax International Airport Authority over a year ago. Tell me about what you were trying to build or replace or accomplish.

Ryan: First, I wanted to build a culture. Data needs to be an asset and not a commodity. And we need to think of it differently as something that we leverage and value. But the reality is people weren't valuing it. I'm reminded of a previous place where I worked in finance and reported to the CFO. It was early on in my time there and I was getting to know him. He tells me, “Ryan, I don't value data.” That obviously stunned me. I just joined this organization to do data, and you don't value it. He elaborated and explained the reason why is because he could ask five different people the same question and get back five different answers. That always stuck with me.

Building that culture is around trust. Why am I doing this? Why are we doing this? What's the reason for data? Everyone may answer and say, “informed decision making, generate profit, improve customer relations optimization.” That’s not it. You're in the business of building trust. That's your business. As soon as you build trust, you can do anything.

Paul: Where do you see your data journey heading? How did executives feel about the data refresh?

Ryan: Kudos to our executives because they bought into the value of data from the start. We didn’t need to sell them on why we needed it. They gave us the opportunity to build it. We have a strategic plan like most organizations, but the underpinning part for me is to change the culture. I want to get people to think of not what has happened but what could happen. If it did happen, is it significant? Just because it seems like a big number to you, it doesn't mean it's actually significant. Take that gut feeling and don't throw it away, but just pause for a second.

For example, we send routine reports to the senior leadership team. After one particular report, our CEO asked why a particular number was down. We said, “Do you remember that day? That was the hurricane.” You could see that relationship. People began to get that understanding that external forces are pushing numbers. Was that dip in numbers significant? Not all the time, but that’s why we support this broader thinking with data so people can plan for erroneous events and better understand the shifts.

Transforming operations and customer experience

Paul: Talk us through how you’re using that data—like passenger transit data—to plan for future events.

Ryan: Instead of looking in the past, we've built a predictive model and its origins come from people trusting in us—they ask us about different scenarios. Victor, who’s in airport planning, asked us to help his team understand when someone might show up. To dive into the problem, we had to uncover what that means for him. He specifically asked to know in the future what our biggest day is going to be or our biggest hour. From there, we have data we can look at, but we also have the schedule: When does a flight come? How many are we expecting? What size planes are they? How many seats do they have? And you can't just say, this type of plane has these many seats. It depends on the carrier. They might take them out to give those nice comfy seats.

We also pulled in real-time weather bulletins. And then we partnered with Halifax Discovery. That's part of the group that brings in events into Halifax. That partnership gave us data of when people are coming 10 years in the future. This ultimately can build a model to look at the history to determine, “Hey, every Wednesday it seems to be this big and there's these many planes.”

But what about when the snowstorm hits? What did that look like? What is the impact over a long weekend? We leveraged a model put out by Meta and it predicted it out and it predicted it pretty decently. There’s so much more we can use with this model. We can even use it to figure out how to staff security because we have a pretty good idea when it’s going to be busy.

Paul: What's the next big innovation for Halifax airport?

Ryan: I think the big one for us and for me is building on passenger experience. Travel anxiety is a real thing. People get extremely anxious about traveling. My hope, my desire, my dream is to take a data-driven approach and provide something to the general public that helps them lower that anxiety as low as possible when their journey starts by opening their door.

Don’t forget to tune in to Spotify or Apple Podcasts to listen to future episodes of The AI Forecast: Data and AI in the Cloud Era.

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

Suri Nuthalapati — Wed, 08 Jan 2025 05:01:00 UTC

Image: The Importance of Hybrid and Multi-Cloud Strategy

Leveraging AI and Machine Learning in a Hybrid Environment

A hybrid approach is ideal for deploying AI and ML. For instance, models can be trained in the cloud, leveraging scalable resources, and then deployed on-premises or at the edge for faster insights. This approach supports real-time analytics and end-to-end governance, ensuring consistency in AI models across environments. Leading enterprises have successfully adopted this strategy to enhance data access, security, and scalability.

Crafting a Hybrid Strategy for Enterprise Success

To implement a successful hybrid strategy, enterprises should:

Assess Current Infrastructure: The first step is to assess your current infrastructure - understanding what you have in place, what workloads you’re running, and how they align with your overall business goals. This will help identify gaps and opportunities, allowing you to optimize for performance, cost, and scalability.
Map Workload Placement: It’s important to map workload placement based on a 'best-fit' approach. Not all workloads belong in the same environment. For example, latency-sensitive applications might be better suited for on-prem or edge environments, while applications that require scalability could benefit from cloud environments. By strategically placing workloads where they perform best, you can improve efficiency and reduce costs.
Ensure Portability: One of the most critical elements in a hybrid cloud strategy is ensuring portability for both data and applications. In a multi-cloud environment, you must ensure that workloads can move seamlessly between clouds without vendor lock-in. Leveraging containerization, Kubernetes, and other cloud-agnostic software will help maintain flexibility and agility as business needs evolve.
Establish Governance: Lastly, don’t forget about governance, compliance, and security. As you expand across different cloud environments, it's essential to establish clear governance policies that ensure compliance with industry regulations like GDPR or HIPAA.

Key benefits of a hybrid and multi-cloud approach include:

Flexible Workload Deployment: The ability to place workloads in environments that best meet performance needs and regulatory requirements allows organizations to optimize operations while maintaining compliance.
- A prominent public health organization integrated data from multiple regional health entities within a hybrid multi-cloud environment (AWS, Azure, and on-premise). This approach enabled real-time disease tracking and advanced genomic research while ensuring compliance with stringent privacy regulations like HIPAA. Leveraging Cloudera’s hybrid architecture, the organization optimized operational efficiency for diverse workloads, providing secure and compliant operations across jurisdictions while improving response times for public health initiatives.
Cost Savings: Hybrid and multi-cloud setups allow organizations to optimize workloads by selecting cost-effective platforms, reducing overall infrastructure costs while meeting performance needs.
- A leading meal kit provider migrated its data architecture to Cloudera on AWS, utilizing Cloudera's Open Data Lakehouse capabilities. This transition streamlined data analytics workflows to accommodate significant growth in data volumes. By leveraging the Open Data Lakehouse's ability to unify structured and unstructured data with built-in governance and security, the organization tripled its analyzed data volume within a year, boosting operational efficiency. The scalable cloud infrastructure optimized costs, reduced customer churn, and enhanced marketing efficiency through improved customer segmentation and retention models. This modernization also provided a future-proof platform for advanced analytics and AI-driven insights, ensuring continued innovation.
Risk Mitigation: By using multiple vendors, organizations can mitigate risks such as vendor lock-in, outages, and sudden pricing changes, ensuring operational resilience.
- Several organizations utilize multiple cloud providers—such as AWS, Azure, and Google Cloud—to enhance risk mitigation. This multi-cloud strategy helps prevent vendor lock-in, ensures compliance with various regulations, and bolsters resilience against potential service disruptions.

Role of a True Hybrid Platform

A well-integrated hybrid platform is essential for seamless data movement, governance, and workload management across environments. However, a few foundational components are needed to make this possible:

Unified Runtime: Run applications and manage data seamlessly across environments without extensive rewrites.
Hybrid Control Plane: A single management interface to oversee both cloud and on-premises deployments.
Hybrid Experience: Effortlessly move workloads and data across clouds as business needs change.

In today’s dynamic digital landscape, multi-cloud strategies have become vital for organizations aiming to leverage the best of both cloud and on-premises environments. As enterprises navigate complex data-driven transformations, hybrid and multi-cloud models offer unmatched flexibility and resilience. Here’s a deep dive into why and how enterprises master multi-cloud deployments to enhance their data and AI initiatives.

The terms hybrid and multi-cloud are often used interchangeably. However, for clarity and consistency, we will use hybrid throughout the text. While multi-cloud generally refers to the use of multiple cloud providers, hybrid encompasses both cloud and on-premises integrations, as well as multi-cloud setups. This holistic approach better reflects the flexibility and strategic advantages discussed herein.

Why Hybrid and Multi-Cloud?

Adopting hybrid and multi-cloud models provides enterprises with flexibility, cost optimization, and a way to avoid vendor lock-in. In fact, recent research suggests that 93% of enterprises will adopt hybrid or multi-cloud models in the near future. This will allow companies to deploy workloads in environments where they are best placed, balancing on-prem and cloud advantages to maintain agility and meet evolving business demands.

Enhancing Hybrid Cloud Deployments with Observability

When you move to Hybrid and Multi-Cloud Deployments, you’ll see many components including containers, services, schedulers, and more - potentially deployed across many different infrastructures. Keeping such a complex platform under control and stable requires serious effort. When you need help from support to troubleshoot issues, most support tickets require multiple interactions to get the right context and information, which leads to frustration on the customer side.

Observability in a hybrid or multi-cloud setup ensures that your data and AI applications function optimally across environments. It helps prevent silos by offering unified views of performance and bottlenecks. This allows teams to proactively manage workloads, financial governance, and optimize resources. Cloudera offers a comprehensive solution that encompasses all these features.

Cloudera Observability helps organizations achieve efficiency and performance by enabling them to automatically analyze, manage, and improve deployments. Customers can maximize cost efficiency, enhance performance, and unlock intelligent insights with Cloudera Observability - a single pane of glass that provides visibility across hybrid and multi-cloud environments.

Image: Cloudera Observability Features

Image: Aspects of a True Hybrid and Multi-Cloud Platform

Partnering for Hybrid Success

A successful hybrid strategy often benefits from partnering with experts. Cloudera, with its wealth of experience in hybrid and multi-cloud deployments, data, analytics, and AI, can accelerate an organization’s time to value, ensuring scalability, security, and seamless integration across platforms. Through Cloudera Observability, governance, and workload optimization, enterprises can turn their multi-cloud journeys into powerful assets for growth and innovation.

Embrace multi-cloud for strategic data and AI initiatives, and position your enterprise to be resilient, innovative, and competitive in an ever-evolving digital world.

Image: Steps for a Successful Hybrid and Multi-Cloud Strategy

Image: Key Considerations for Workload Placement

A strong platform ensures that businesses can manage their multi-cloud environments effectively and with confidence, knowing that their data is secure, accessible, and compliant with regulations. Cloudera is the gold standard for hybrid and multi-cloud data platforms, offering enterprises a seamless, secure, and scalable foundation to manage their data and workloads across diverse environments.

Key Considerations for Data & AI Workload Placement

When deploying data and AI workloads across environments, certain factors can influence where they reside:
1. Data Security: It’s essential to safeguard sensitive information across environments using secure protocols and ensuring compliance. Balancing security with performance in a multi-cloud setup is paramount. Some best practices include:

Zero Trust Architecture: Zero Trust Architecture combines explicit user verification with multi-factor authentication and least privileged access through fine-grained RBAC/ABAC policies.
Data Encryption: Organizations encrypt data both in transit and at rest to prevent unauthorized access.
Logging and Monitoring: Robust monitoring helps enterprises identify potential threats and performance bottlenecks across clouds.

2. Cost Optimization: Select cost-effective platforms and manage resources efficiently to minimize infrastructure costs.

3. Scalability: Choose platforms that can dynamically scale to meet fluctuating workload demands.

Key Takeaways from AWS re:Invent 2024

Jeremiah Morrow — Thu, 19 Dec 2024 14:00:00 UTC

Sustainable AI Will Be a Core Competency for Enterprises

Sustainability was a key theme throughout the conference. As companies think about where and how they deploy AI workloads, they must consider both the financial and environmental impacts of those decisions. Cloudera is proud to partner with AWS to help mutual customers deploy sustainable AI solutions by leveraging AWS Graviton processors, reducing consumption and costs while improving performance for AI workloads.

You can read more about the partnership and its implications here.

Distributed Data is Here to Stay

re:Invent is obviously a cloud conference, but most of the customers we spoke with had much more than just their AWS environment. Customers are dealing with data stores across multiple clouds and on-premises environments that, for a variety of reasons, may never move to a cloud. However, they still need to provide access to a unified view of that data for many use cases–from customer analytics to operational efficiency.

A key consideration for customers who find themselves in this scenario is to simplify as much as possible: choose platforms that provide a consistent experience, leverage tools that span multiple environments, and invest in open standards, technologies, and processes to ensure maximum flexibility now and in the future.

Trusted Data is Critical for AI Success

Data practitioners know that while the current hype is around AI, the real work of generating positive outcomes from AI models starts with providing secure and governed access to trusted data. Inevitably, the majority of companies will find themselves managing distributed systems, often in multiple clouds and on-premises. Cloudera Shared Data Experience (SDX) remains the only solution that provides consistent security, governance, accessibility, and observability across data stores wherever they reside, while also providing the engines to process and analyze that data.

This is a critical component of the AI workflow, and if I were going to bet on some of the reasons AI might not ultimately live up to the hype, my first guess would be that trusted data is harder to achieve than most organizations realize without a focus on metadata and a single source of security and governance.

Iceberg is the Winner

Apache Iceberg was everywhere this week, with big announcements from AWS, Cloudera, and others related to supporting our customers’ transition to the open table format. It’s been clear for most of 2024 that Iceberg is now the consensus choice for open table format, and the market is coalescing around it.

Cloudera’s investment in and support for open metadata standards, our true hybrid architecture, and our native Spark offering for Iceberg combine to make us the ideal Iceberg data lakehouse. Spark is the best engine for data processing, including ingestion and transformation, and while we provide many execution engines for data workloads on top of Iceberg, we also support data sharing via the Iceberg REST catalog specification, as well as connections to third-party engines. The result is true engine freedom, and reduced data copies and data movement.

It’s an exciting time for everyone in the Iceberg ecosystem. In past conferences, I have spent a lot more time introducing and educating customers on the benefits of Iceberg and open table formats. There was still a lack of awareness around the project. This year, it seemed the table format needed no introduction. We’re looking forward to working with AWS and others to build the best possible Iceberg lakehouse.

Flo Rida’s Still Got It

At re:Invent, we partnered with Mission Cloud to co-host IGNITE24 for our customers featuring Flo Rida, and it did not disappoint. The event embodied the collaborative spirit that defines our work with AWS and our partners. During the partner keynote at AWS re:Invent, Dr. Ruba Borno, Vice President, Global Specialists and Partners, compared the AWS partner ecosystem to a symphony. Just as a symphony requires diverse instruments to create a harmonious masterpiece, digital transformation relies on the orchestration of expertise and innovation from partners across the ecosystem.

At IGNITE24, we celebrated the importance of working together to achieve something greater than the sum of its parts—a principle AWS echoed in its messages about the power of partnership, shared goals, and mutual success.

Events like these remind us that while the work of transforming businesses with data is challenging, it’s also an opportunity to connect, collaborate, and celebrate our shared journey.

AWS re:Invent is one of my favorite trade shows. It is one of the biggest technology conferences of the year and is an opportunity to have hundreds of conversations with customers and prospects, listen to their priorities and challenges, hopes, and give them a Cloudera tote bag or a pair of orange sunglasses.

What follows is a collection of just a few things I learned and observed during my week in Las Vegas.

It’s All About AI

Over the past couple of years, AI has skyrocketed to the top of the “Peak of Inflated Expectations” in Gartner’s Hype Cycle and, not surprisingly, many of the sessions, demonstrations, and conversations at the conference were focused on leveraging AI.

Cloudera partnered with NVIDIA on two sessions where we shared our AI Inference service, which uses NVIDIA NIM microservices to accelerate the development and deployment of AI models, and supports the scaling of those models. We also shared our Accelerators for Machine Learning Projects, or AMPs, which are templates for machine learning/AI models that customers can deploy with the click of a button and start to customize, reducing the time it takes to get models into production.

Finally, we hosted a hands-on workshop to walk attendees through a Retrieval-Augmented Generation (RAG) workflow within Cloudera AI to show how easy it is to deploy contextualized models based on organizational data..

Our goal is to make it easy, fast, and safe for customers to get started with AI and see value from their projects, and that was a message that resonated with virtually everyone who stopped by our sessions and our booth.

Women Leaders in Technology: A Conversation with Cloudera CMO, Mary Wells

Debbie Kruger — Wed, 18 Dec 2024 16:00:00 UTC

It’s no secret that women have long been underrepresented in the tech space.

This issue demands our attention, as it not only limits opportunities for women to work, grow, and thrive but also hinders companies in their pursuit of top talent. Although global organizations, policies and programs to address this issue have gained momentum in recent years, there’s work still left to do. In a recent study, women made up just 27.6% of tech jobs in 2022, and a separate study showed that women identifying as Asian or Pacific Islander make up just 7% of the IT workforce and Black and Hispanic women account for 3% and 2%, respectively. The same study found that senior positions have even less representation, women made up 33.8% of entry-level jobs in tech and just 23% of senior-level jobs.

Simply put, we can and must do better.

To amplify and support women's voices, Cloudera has introduced the Women Leaders in Technology Initiative. Recently, I had the opportunity to sit down with Mary Wells, Chief Marketing Officer at Cloudera, and executive sponsor of WLIT, for a conversation about why Cloudera launched this forum, her own experience working in male-dominated industries, and her advice for the next generation.

Mary, your career spans over 25 years across some of the biggest names in the tech industry. Can you share more about your background?

I’ve been fortunate to work with some of the most transformative brands in the technology industry. Before joining Cloudera in 2023, I served as CMO of ASG Technologies, an enterprise management software provider, where I helped shape its growth and eventual acquisition by Rocket Software in 2021. Prior, I held senior marketing leadership roles at trailblazing companies like HPE, Axeda, Kalido, BEA Systems, BroadVision, and Sybase—each of which has played a pivotal role in shaping the tech landscape.

What stands out about these organizations isn’t just their innovative contributions to the industry but their forward-thinking approaches to equity, diversity, and inclusion. That environment not only enabled me to thrive but also solidified my belief in the importance of fostering a culture where all voices are heard.

I earned my Bachelor of Science in Marketing and Organizational Behavior from Boston College, which laid the foundation for my career. Outside of work, I’m passionate about giving back, serving on boards focused on animal welfare and elder advocacy. These experiences continually remind me that true leadership is rooted in service—a principle I strive to uphold in every decision I make.

Can you tell us more about Women Leaders in Technology (WLIT)—what inspired its creation, and what are your aspirations for the impact it can have on Cloudera’s culture, and the tech industry as a whole?

WLIT is about creating meaningful connections and opportunities for women and allies in tech leadership through networking and collaboration, and it’s about inspiring the next generation of women and girls to see a place for themselves in the industry. Beyond that, it’s a platform for advocating policies and programs that drive real change, helping to build a stronger, more diverse workforce. By nurturing this conversation and leading these conversations, WLIT aims to serve as a model for other organizations, encouraging them to embrace similar initiatives to address these critical issues. What makes WLIT particularly unique is that it is supplemental to the important work of our internal-facing employee resource group, Women+, and WLIT is not confined to Cloudera or limited to an internal audience, it is an industry-wide initiative designed to bring together people across the tech sector, fostering a collective commitment to inclusivity and progress.

Having launched similar initiatives at HPE and ASG Technologies, I’ve seen firsthand the transformative power they can have. I’ve seen women lift each other up in board rooms, in leadership rooms, and in every day meetings. With the imperative on us all to lift up underrepresented voices, an encouraging word or helpful piece of feedback can help supercharge a career and give women the confidence to push themselves further.

One moment that stands out to me happened during a meeting with senior leaders at a previous company. When someone questioned the value of such efforts, a male colleague spoke up, saying, “Look around the room—this is why it matters.” His words highlighted the glaring lack of diversity in leadership and underscored how important allyship is to advance these causes. Moments like that remind me that change comes not just from the efforts of underrepresented groups but from allies who recognize the need for progress and actively support it.

At Cloudera, I’m proud to say we already have a strong foundation of diversity, equity, and inclusion. Every executive leader sponsors initiatives designed to elevate employee voices and ensure representation. WLIT builds on this work, amplifying our commitment to creating an environment where everyone can thrive.

For the tech industry as a whole, I hope to see more women not just entering the field but advancing into leadership roles. Additionally, I’ve seen how initiatives like WLIT can lead to real business opportunities—executives have reached out to me directly because of the visibility WLIT provides. Elevating female voices and fostering inclusivity isn’t just about representation; it’s also a business imperative that opens doors to innovation, partnerships, and growth.

Looking back at the first WLIT event Cloudera hosted, what made it memorable, and how did it set the tone for the initiative’s mission?

Cloudera WLIT held its first event at our EVOLVE24 conference in New York, where we came together for a panel of women leaders from across the industry. The panel included Manasi Vartak of Cloudera, Nichola Hammerton of Deutsche Bank, and Melissa Dougherty of AWS. The moderator of the discussion was Zoya Hasan of Forbes, who covers young leaders and the Forbes 30 Under 30 lists, including U30 U.S., Europe, and Local. She also co-authors a weekly newsletter and writes features on young founders. The panel was an in-depth and meaningful discussion about the real challenges women face entering the technology sector, how they can overcome them and the advice they have for their peers and women looking to enter the field.

Mentorship emerged as another pivotal theme among the panelists, with each crediting a mentor for shaping their career paths. These mentors varied widely in background and timing, appearing at different stages of the panelists' journeys. Yet, one common thread stood out: every mentor created opportunities for learning in a trusted and empowering environment. They also connected mentees with other people, broadening networks to help build a career path in a competitive workforce. For me, this network of teachers, peers, colleagues has enriched my career path tremendously and it is one of the key values I see a program like WLIT providing for others.

The event was inspiring, informative, and a celebration of the successful careers women can have in tech. I’m looking forward to more events, forums, and conversations coming up in 2025.

Throughout your career, mentorship has clearly played a pivotal role. Can you share a story about a mentor who had an impact on you?

I started my career working on the HR side of a company in their local regional office. I had a double major in HR and marketing, but I was just starting out and still figuring out where I wanted to go. The regional vice president of sales had an office nearby, and we’d often chat in the kitchen. At first, our roles didn’t really overlap, but he found out I went to Boston College and that I was one of the few women in the school of management at the time. That seemed to shift his perspective of me—he realized I had established myself even though I was a junior employee.

Soon after, he started asking me to help with small projects outside of HR. I remember him asking, “Can you help with this database?” I said, “I don’t know, but I’ll figure it out.” That became a pattern—he’d hand me something new, and I’d find a way to make it work. One day, he asked if I knew what a 10-K report was. I deadpanned, “I think it’s a road race.” He started explaining, but I cut him off and said, “I know what a 10-K report is.” That little bit of humor and confidence helped solidify our relationship, and he became my first and best mentor.

He believed in me in ways that gave me the confidence to take on challenges. I remember him asking me to present an insurance sales play to the entire sales team—this was when I had been in marketing for what felt like all of three seconds! But with his support, I felt validated and ready to stretch far beyond my comfort zone.

He saw me for my potential and abilities, and that made all the difference. I feel incredibly fortunate to have had that kind of mentorship so early in my career—it set the tone for how I wanted to lead and mentor others in the years that followed.

For women - and allies - aspiring to build careers in technology, what is the one piece of advice you believe can help them navigate challenges and seize opportunities in the industry?

“Do it afraid.” Whatever “it” is for you—whether it’s speaking up in a meeting, applying for a role you’re not sure you’re qualified for, or taking on a new challenge—don’t let doubt hold you back. Everyone faces moments of self-doubt, and imposter syndrome is something most of us at some point in our careers have felt. Please remember- those feelings don’t define you or your abilities.

To learn more about Cloudera’s WLIT initiative, join the LinkedIn group and get involved, and visit our website, here.

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Anthony Behan — Tue, 17 Dec 2024 17:00:00 UTC

Since 5G networks began rolling out commercially in 2019, telecom carriers have faced a wide range of new challenges: managing high-velocity workloads, reducing infrastructure costs, and adopting AI and automation. Because data management is a key variable for overcoming these challenges, carriers are turning to hybrid cloud solutions, which provide the flexibility and scalability needed to adapt to the evolving landscape 5G enables.

The introduction of these faster, more powerful networks has triggered an explosion of data, which needs to be processed in real time to meet customer demands. Traditional data architectures struggle to handle these workloads, and without a robust, scalable hybrid data platform, the risk of falling behind is real. The pressure to manage increasing data volumes while maintaining performance and reliability makes modernization an urgent priority. Carriers must balance performance needs with the growing complexity of their data environments, especially as 5G applications continue to evolve.

Cost is also a constant concern, especially as carriers work to scale their infrastructure to support 5G networks. The challenge lies in finding a solution that enables growth while keeping infrastructure and operational expenses under control. The vast amounts of data generated by 5G networks necessitate a robust, scalable data platform, and the wrong approach can lead to spiraling costs.

As more data is processed, carriers increasingly need to adopt hybrid cloud architectures to balance different workload demands. High-velocity workloads like network data are best managed on-premises, where operators have more control and can optimize costs. Meanwhile, public cloud environments provide the flexibility and scalability needed for dynamic workloads, such as customer applications or AI-driven services.

However, the complexity of managing workloads across different environments can be daunting. Carriers need tools that enable them to monitor performance, optimize workload distribution, and ensure data governance across both on-premises and cloud environments. This is where hybrid data platforms, like Cloudera, come into play providing carriers with the flexibility to seamlessly shift workloads between environments, optimizing costs while maintaining high performance.

As with many industries, the future of telecommunications lies in AI and automation. From customer service to network management, AI-driven automation will transform the way carriers run their businesses. However, implementing AI models requires significant computing power and real-time data processing, which cannot be achieved without modern, scalable data platforms.

Telecom Carrier Vi Found Success with Cloudera

Just over six years ago, Vodafone India and Idea Cellular merged to form Vi, now one of the largest carriers in the world with over 200 million subscribers. As India continues to be a hub for innovation, Vi is embracing the challenges and opportunities of 5G, with massive data volumes and increasing demands for automation.

Vi considered various data platform options, including public and private cloud as well as open source and proprietary solutions, but in the end, Vi decided to extend and grow its relationship with Cloudera. The first two reasons are cost and scale. When you have high-volume, high-velocity data, you need a reliable, robust environment to triage, treat, and distribute the data. For Vi, as for most telecom carriers, very large, high-velocity predictable workloads – such as network data – make sense to run on-prem, before serving data applications downstream either on-prem or in public cloud environments.

Vi's modernization journey has paid off, saving $20-$30 million in infrastructure costs and reducing support tickets by 80%. The hybrid cloud architecture also positions Vi for seamless future deployments and AI/ML workloads. By adopting hybrid cloud, streaming analytics, and data fabric, Vi is laying a strong foundation for future transformations and scaling its AI initiatives, and the potential for growth is immense.

The 5G era presents telecom carriers with both significant challenges and opportunities, each of which can be met effectively with a firm grounding in data. Telecom companies that adopt modern data architectures that are flexible, scalable, and cost-effective are better positioned to identify opportunities early, counter service degradations before they manifest themselves, and automate processes for maximum efficiency.

Hybrid data cloud platforms like Cloudera provide carriers with the ability to handle high-velocity workloads on-premises and leverage the scalability of the public cloud, enabling telecom companies to manage their data efficiently while laying the groundwork for future innovations.

Celebrating a Busy Week of Giving at Cloudera

Debbie Kruger — Tue, 17 Dec 2024 16:00:00 UTC

Now let’s turn to what Clouderans in the Americas did to celebrate Week of Giving.

“Volunteers from Cloudera’s Santa Clara team spent the Tuesday morning of Week of Giving helping provide groceries to over 540 local families, working with Second Harvest of Silicon Valley. Supporting SHSV is an ongoing effort for Santa Clara based Clouderans” - Amanda Allan, Manager, Cloudera Executive Briefing Program

"The Costa Rica office had the opportunity to spend a day at Fundación Génesis Costa Rica, an organization that provides vital support to children and families in need. It was a full day, packed with tasks like cleaning the facilities, decorating the social dining hall for Christmas, helping in the kitchen, and, most memorably, playing games and connecting with the kids. Stepping away from our usual routines as technologists to spend time with a community we don’t often interact with reminded us of how impactful volunteering can be. Experiences like this show us how important it is to give back and take time to make a difference.” - Natalia Aviles, Data Analyst

Next, let’s turn to our EMEA-based teams for a look at how they spent the week.

"I had the opportunity to participate in events including writing Christmas cards for clients of Friendly Call Cork, a service that helps tackle isolation and loneliness among older people, and helping wrap Christmas boxes for the Crann Centre, which supports people living with neuro-physical disabilities. Team members from our Cork office also contributed to a food drive for local charity Penny Dinners which provides hot meals for those in need in the city. I am very grateful for the opportunities Cloudera provides to support initiatives like these during our Week of Giving especially at this time of year. It’s a great reminder of just how lucky we all are as many people find this a very difficult time of year and need our support!” - James Ahern, HR Systems Analyst

“We were thrilled with the outcome of our efforts supporting Friendly Call Cork, having produced over 150 written Christmas Cards for their clients, and even received incredibly kind praise directly from the nonprofit partner, thanking us for our efforts. It would not have been possible without the support of our Cloudera Cares Ambassadors, our fellow teammates, and the support of everyone at Cloudera”. - Laura O’Connor, HR Business Partner

“For members of Cloudera’s Budapest office, Week of Giving meant spending a day at the REX Dog Home, an animal shelter that is home to animals of all kinds, from goats to cats and even horses. The Cloudera team spent the day helping to transform a massive heap of soil into a smooth, inviting surface for the dogs. Afterward, we spent time with the animals and helping fix and clean their enclosures as well. Big thanks to the Clouderans who brought their best animal-loving energy, and made this day such a success. Whether you came for the cuddles, the gardening… or the gyros, we hope you left with a smile. The REX animals were definitely happy to meet us!“ - Glória Benkő, Software Engineer II

In its third year, we have been so incredibly proud to see Clouderans from all over the globe embrace our Week of Giving and dive into so many heartwarming and impactful projects. Thank you to everyone who participated, and we look forward to seeing what our teams have in store for next year.

Check out the video below with highlights from this year’s Week of Giving and learn more about how Cloudera is working to support the communities that we live in.

In November, Cloudera celebrated its third annual Week of Giving, one of the company’s most cherished initiatives. Each year, this event is a time for Clouderans across the globe to join in and spend time giving back to the causes they care about. From partnering with a nonprofit organization to gathering donations or volunteering their time, Clouderans rolled up their sleeves to give back to their communities.

This year, we had volunteers spanning nearly every continent and making a difference in their communities, from India and Australia to Ireland, Costa Rica, the U.S. and so many more.

Don’t just take our word for it, though. Read and learn about what some of our team members chose to do to celebrate Week of Giving.

First, let’s hear from a few of our team members in APAC.

“Team members in Cloudera’s Bangalore office spent time during Week of Giving creating notebooks, completely from scratch, usingrecycled materials, and kits to help equip underprivileged students with the necessary school supplies. I realized that doing things with love can greatly impact people's lives.” – Philemon Johnson, Senior Software Engineer

“Cloudera’s Sydney office spent time throughout the week supporting Foodbank NSW & ACT in the fight against hunger across Australia. We volunteered labeling and packaging a variety of essential grocery items, which will be distributed through our community partners to help those in need. Given their high cost, providing these products to those in need is crucial in supporting local communities. It was a rewarding opportunity to come together, learn more about food rescue efforts, and make a tangible impact on regional communities throughout NSW & ACT. We’re proud to say we’ve all officially joined the ranks of #HungerFighters!” - Sarah Robinson, Talent Acquisition Lead APAC

Cloudera’s Take: What’s in Store for Data and AI in 2025

Cloudera — Mon, 16 Dec 2024 17:00:00 UTC

In the last year, we’ve seen the explosion of AI in the enterprise, leaving organizations to consider the infrastructure and processes for AI to successfully—and securely—deploy across an organization. As we head into 2025, it’s clear that next year will be just as exciting as past years.

Here, Cloudera experts share their insights on what to expect in data and AI for the enterprise in 2025.

Bridging the Gap Between Business and IT

Bridging the gap between business and IT teams is not a new mission for enterprises. However, the onus has historically fallen on business leaders to adopt more technical skills and proficiency. Cloudera CEO Charles Sansbury predicts a reverse trend in 2025 and sees data scientists and IT teams stepping into a more business-conscious role to bridge this gap:

“Business leaders becoming more ‘savvy’ via the proliferation of user-friendly AI tools like assistants and copilots have made it possible for business professionals to leverage analytics to inform better decision-making. This trend is ongoing, and I expect it will continue into 2025. However, I also expect a new, reverse trend to take shape: IT teams and data scientists will start to glean even greater business acumen to plug into the broader needs of the enterprise.

For too long, IT and business teams have been siloed, with business users making requests of the IT team without understanding the scope of the technology needed, and IT teams requesting producing insights without knowing what business problem they're being used to solve. In 2025, we will see that gap start to close with the most advanced enterprises arming themselves with an entire staff — from the marketing and finance departments to the IT and data scientists, all the way to the C-Suite — leveraging data, analytics, and AI to accelerate growth.”

The Shift to Private LLMs and the Ripple Effects

AI is only as powerful as the data behind it. As such, Remus Lim, Senior Vice President of APAC and Japan at Cloudera, believes enterprises will grow to favor private LLMs to spur their own AI innovation:

“With enterprise AI innovation taking center stage in the year ahead, businesses will eschew public large language models (LLMs) in favor of enterprise-grade or private LLMs that can deliver accurate insights informed by the organizational context.

“As more businesses deploy enterprise-grade LLMs, they will require the support of GPUs for faster performance over traditional CPUs, and robust data governance systems with improved security and privacy. In the same vein, businesses will also ramp up their use of retrieval-augmented generation (RAG) in a bid to transform generic LLMs into industry-specific or organization-specific data repositories that are more accurate and reliable for end users working in field support, HR, or supply chain.”

Hybrid Cloud Alone Will Be Insufficient for GenAI

2024 was the pilot year of GenAI, and 2025 will see businesses seeking to advance to full production and scale with GenAI deployments. As such, Lim believes hybrid cloud isn’t enough:

“With the growth in hybrid environments, companies’ data footprints span on-premises, mainframes, public cloud, at the edge. Businesses need the capability to bring GenAI models to wherever the data resides, and seamlessly move data and workloads across the business, to derive valuable insights and address organizational needs. With so much data being fed into AI model services, security and governance will also come to the fore.

As businesses turn to running AI models and applications privately, whether on premises or in public clouds, there will be a greater emphasis on hybrid data management platforms that integrate both on-premises and cloud data sources for greater flexibility and wider access to diverse datasets while maintaining control, security, and governance over model endpoints and operations.”

Agentic AI’s Big Step Forward

The training wheels are coming off with AI. Enterprises witnessed improved productivity and efficiency with AI-based solutions. But IT experts agree the technology has great potential for more, and Chris Royles, Field CTO of EMEA at Cloudera, foresees that coming to life through agentic AI in 2025:

“Currently, AI still falls short of replicating human-level decision-making, but next year that is set to change with Agentic AI.

Agentic AI is set to drive a wave of innovation, transforming real-time problem-solving and decision-making. Expect these AI agents to optimize tasks with ant-like efficiency, navigating challenges quickly and adapting in real time. This will see businesses building event-driven architectures that allow AI to react instantly to real-life events, revolutionizing industries like telecom and logistics.”

Research Will Fuel the Development of Legislation for AI Guardrails

Safeguarding for responsible, ethical AI use was a bubbling issue in 2024 as the technology advanced at a rapid pace. This will remain a priority, and Manasi Vartak, Chief AI Architect at Cloudera, foresees a larger focus on academic research for more informed GenAI policy in the coming year:

“While AI regulation is certainly necessary, it must be based on a deep understanding of how GenAI models and applications function. I also expect there to be an increase in funding for academic research into GenAI, including think tanks and labs to address what AI safety means and how it can be implemented, which will likely come from an increase in government partnerships with academic institutions.

Academic research plays a critical role in understanding AI and the necessary protocols, and government partnerships with academic institutions are essential to generate the knowledge and influence needed to establish effective safeguards and guide responsible regulation.”

Move Aside AI: Here Comes the Quantum Computing Revolution

The enterprise’s focus of late has been steadfastly on AI. Royles believes quantum computing will become the next “tech arms race” in 2025:

“Quantum computing is set to overshadow AI as the next major technological revolution. Rapid development is underway, with organizations investing heavily in next-generation data centers equipped to provide ultra-cold temperatures, specialized infrastructure, and massive power requirements needed to support quantum systems.

The potential value of quantum breakthroughs is immeasurable, from accelerating drug discovery and genetic reprogramming in healthcare to pushing energy closer to fusion, potentially rendering traditional power sources obsolete. As quantum emerges as a game-changer, this shift will trigger a race as companies rush to harness quantum’s capabilities, using it to enhance AI capabilities and gain a competitive edge.”

2025 is sure to be filled with exciting changes and developments. To dive deeper into these and other predictions, join us during our webinar on January 21.

Cloudera Commits to CISA’s “Secure by Design” Pledge, Strengthening Security for Our Customers

Cloudera — Thu, 12 Dec 2024 17:00:00 UTC

We’re proud to announce that Cloudera signed the Cybersecurity and Infrastructure Security Agency (CISA) “Secure by Design” pledge, joining a network of industry leaders dedicated to embedding security at every stage of the product lifecycle. To be good stewards of our customers’ data, it is critical for security to be a fundamental component of every product and service we offer–not just an afterthought. This commitment aligns with our ongoing mission to empower organizations to transform their data into valuable insights in the most secure and compliant way possible.

What is the “Secure by Design” Pledge?

The CISA “Secure by Design” pledge encourages technology providers to prioritize security throughout the development process rather than focusing solely on post-production fixes. It is a proactive approach that requires security to be integrated from the initial concept of a product through every phase of design, testing, deployment, and operation. By signing this pledge, Cloudera solidifies its role as a leader in cybersecurity, ensuring that every product feature and service capability meets strict security standards aligned with CISA’s best practices for resilience against cyber threats. Cloudera has pledged to build security protocols directly into our development pipeline, making sure security is robust and ready to defend against both known and unknown threats.

Why is “Secure by Design” so Important?

Protecting sensitive data requires vigilance and an evolving approach to security. Traditional security approaches focus on patching vulnerabilities after a product is deployed in production.

While these responses remain necessary, the “Secure by Design” framework focuses on prevention, embedding security into the DNA of products from day one. The goal is to take a proactive stance against security threats, preventing potential vulnerabilities before they surface, making Cloudera technology more secure and resilient for our Customers - designed with resisting attacks and securing sensitive data in mind.

How Do Cloudera Customers Benefit from Our “Secure by Design” Commitment?

With over 25 exabytes of data under management, Cloudera is committed to being good stewards of our customers’ data by delivering solutions that meet and often surpass modern security standards. This pledge reflects that commitment. Here’s what this means for our customers:

Enhanced Proactive Security Measures
Every product, feature, and update from Cloudera is built with a security-first mindset. From the beginning stages of development to the deployment of a product, security controls are integrated to protect data, ensuring that vulnerabilities are minimized and managed as part of the core functionality. Customers can rely on Cloudera’s commitment to building security measures into every product, ensuring that potential vulnerabilities are addressed early and thoroughly.
Continuous Security and Compliance Monitoring
As part of our ongoing security management strategy, Cloudera regularly conducts internal and external audits, risk assessments, and continuous security monitoring to ensure compliance with industry standards and regulations such as GDPR, PCI DSS, ISO27001, and more, easing the compliance burden for our customers and supporting risk mitigation. Through this rigorous process, we can address emerging security threats swiftly, keeping our customers’ environments secure.
Collaboration with Pledge Members
Cloudera will attend regular technical exchange meetings with other companies who signed the “Secure by Design” pledge. These meetings promote collaboration and best practices sharing with a community of technology providers who are committed to the security of their products and services, and we can leverage our collective expertise as we implement the pledge.
Support for a Shared Responsibility Model
Security is a shared responsibility between Cloudera and our customers, particularly in hybrid and multi-cloud environments. By building a solid security foundation with the “Secure by Design” approach, Cloudera empowers our customers to operate securely across any infrastructure or data store. We provide tools, insights, and resources that enable our customers to make informed security decisions and achieve the right configurations for their specific environments.
Access to Industry-Leading Security Expertise
Our teams are dedicated to working continuously to support customers with best-in-class security architectures and capabilities tailored to individual needs. With our dedicated Trust Center, customers can explore resources like our shared responsibility model, risk assessments, vulnerability management, and comprehensive documentation around security practices. This commitment extends to ongoing customer engagement, ensuring Cloudera remains a trusted partner through every stage of the security lifecycle.
Building Resilient Data Solutions for a Changing World
With our “Secure by Design” pledge, Cloudera reinforces its dedication to protecting our customers from potential disruptions. Our security features, built into every layer of our data and analytics solutions, provide the flexibility and resilience to meet the demands of a data-driven world. By reducing vulnerabilities upfront, we’re helping our customers focus on what matters most - transforming data into actionable insights - without compromising on security or compliance.

What This Means for the Future of Cloudera and Our Customers

This commitment to the “Secure by Design” pledge is more than just a formal obligation. It’s a testament to Cloudera’s core values of security, trust, and customer success. As cyber threats continue to evolve, so will Cloudera’s security practices, staying at the forefront of secure technology and setting new standards in the industry. Our customers can expect a data platform that safeguards their most important asset while enabling innovation and insights that lead to better business outcomes.

Learn More

To discover how Cloudera’s commitment to security can empower your organization, visit our Trust Center. Here, you’ll find valuable resources on Cloudera’s secure architecture, compliance standards, and risk management practices. Or, if you’re interested in trying Cloudera for yourself, check out our 5-day trial on AWS.

Scaling AI Solutions with Cloudera: A Deep Dive into AI Inference and Solution Patterns

Suri Nuthalapati,Laurence Da Luz — Mon, 09 Dec 2024 19:27:00 UTC

Accelerators for Faster Deployment

Cloudera provides pre-built accelerators (AMPs) and ReadyFlows to speed up AI application deployment:

Accelerators for ML Projects (AMPs): To quickly build a chatbot, teams can leverage the DocGenius AI AMP, which utilizes Cloudera’s AI Inference service with Retrieval-Augmented Generation (RAG). In addition to this, many other great AMPs are available, allowing teams to customize applications across industries with minimal setup.
ReadyFlows(NiFi): Cloudera’s ReadyFlows are pre-designed data pipelines for various use cases, reducing complexity in data ingestion and transformation. These tools allow businesses to focus on building impactful AI solutions without needing extensive custom data engineering.

Also, Cloudera’s Professional Services team brings expertise in tailored AI deployments, helping customers address their unique challenges, from pilot projects to full-scale production. By partnering with Cloudera’s experts, organizations gain access to proven methodologies and best practices that ensure AI implementations align with business objectives.

Conclusion

With Cloudera’s AI Inference service and scalable solution patterns, organizations can confidently implement AI applications that are production-ready, secure, and integrated with their operations. Whether you’re building chatbots, virtual assistants, or complex agentic workflows, Cloudera’s end-to-end platform ensures that your AI solutions are production-ready, secure, and seamlessly integrated with enterprise operations.

For those eager to accelerate their AI journey, we recently shared these insights at ClouderaNOW, highlighting AI Solution Patterns and demonstrating their impact on real-world applications. This session, available on-demand, offers a deeper look at how organizations can leverage Cloudera's platform to accelerate their AI journey and build scalable, impactful AI applications.

As organizations increasingly integrate AI into day-to-day operations, scaling AI solutions effectively becomes essential yet challenging. Many enterprises encounter bottlenecks related to data quality, model deployment, and infrastructure requirements that hinder scaling efforts. Cloudera tackles these challenges with the AI Inference service and tailored Solution Patterns developed by Cloudera’s Professional Services, empowering organizations to operationalize AI at scale across industries.

Effortless Model Deployment with Cloudera AI Inference

Cloudera AI Inference service offers a powerful, production-grade environment for deploying AI models at scale. Designed to handle the demands of real-time applications, this service supports a wide range of models, from traditional predictive models to advanced generative AI (GenAI), such as large language models (LLMs) and embedding models. Its architecture ensures low-latency, high-availability deployments, making it ideal for enterprise-grade applications.

Key Features:

Model Hub Integration: Import top-performing models from different sources into Cloudera’s Model Registry. This functionality allows data scientists to deploy models with minimal setup, significantly reducing time to production.
End-to-End Deployment: The Cloudera Model Registry integration simplifies model lifecycle management, allowing users to deploy models directly from the registry with minimal configuration.
Flexible APIs: With support for Open Inference Protocol and OpenAI API standards, users can deploy models for diverse AI tasks, including language generation and predictive analytics.
Autoscaling & Resource Optimization: The platform dynamically adjusts resources with autoscaling based on Requests per Second (RPS) or concurrency metrics, ensuring efficient handling of peak loads.
Canary Deployment: For smoother rollouts, Cloudera AI Inference supports canary deployments, where a new model version can be tested on a subset of traffic before full rollout, ensuring stability.
Monitoring and Logging: In-built logging and monitoring tools offer insights into model performance, making it easy to troubleshoot and optimize for production environments.
Edge and Hybrid Deployments: With Cloudera AI Inference, enterprises have the flexibility to deploy models in hybrid and edge environments, meeting regulatory requirements while reducing latency for critical applications in manufacturing, retail, and logistics.

Scaling AI with Proven Solution Patterns

While deploying a model is critical, true operationalization of AI goes beyond deployment. Solution Patterns from Cloudera’s Professional Services provide a blueprint for scaling AI by encompassing all aspects of the AI lifecycle, from data engineering and model deployment to real-time inference and monitoring. These solution patterns serve as best-practice frameworks, enabling organizations to scale AI initiatives effectively.

GenAI Solution Pattern

Cloudera’s platform provides a strong foundation for GenAI applications, supporting everything from secure hosting to end-to-end AI workflows. Here are three core advantages of deploying GenAI on Cloudera:

Data Privacy and Compliance: Cloudera enables private and secure hosting within your own environment, ensuring data privacy and compliance, which is crucial for sensitive industries like healthcare, finance, and government.
Open and Flexible Platform: With Cloudera’s open architecture, you can leverage the latest open-source models, avoiding lock-in to proprietary frameworks. This flexibility allows you to select the best models for your specific use cases.
End-to-End Data and AI Platform: Cloudera integrates the full AI pipeline—from data engineering and model deployment to real-time inference—making it easy to deploy scalable, production-ready applications.

Whether you’re building a virtual assistant or content generator, Cloudera ensures your GenAI apps are secure, scalable, and adaptable to evolving data and business needs.

Knowledge Base Integration: Cloudera DataFlow, powered by NiFi, enables seamless data ingestion from Amazon S3 to Pinecone, where data is transformed into vector embeddings. This setup creates a robust knowledge base, allowing for fast, searchable insights in Retrieval-Augmented Generation (RAG) applications. By automating this data flow, NiFi ensures that relevant information is available in real-time, giving dispatchers immediate, accurate responses to queries and enhancing operational decision-making.

GenAI Use Case Spotlight: Smart Logistics Assistant

Using a logistics AI assistant as an example, we can examine the Retrieval-Augmented Generation (RAG) approach, which enriches model responses with real-time data. In this case, the Logistics’ AI assistant accesses data on truck maintenance and shipment timelines, enhancing decision-making for dispatchers and optimizing fleet schedules:

RAG Architecture: User prompts are supplemented with additional context from knowledgebase and external lookups. This enriched query is then processed by the Meta Llama 3 model, deployed through Cloudera AI Inference, to provide contextual responses that aid logistics management.

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Kevin Talbert,Nashua Springberry — Mon, 09 Dec 2024 17:00:00 UTC

We’re thrilled to announce the release of a new Cloudera Accelerator for Machine Learning (ML) Projects (AMP): “Summarization with Gemini from Vertex AI”. An AMP is a pre-built, high-quality minimal viable product (MVP) for Artificial Intelligence (AI) use cases that can be deployed in a single-click from Cloudera AI (CAI). AMPs are all about helping you quickly build performant AI applications. More on AMPs can be found here.

We built this AMP for two reasons:

To add an AI application prototype to our AMP catalog that can handle both full document summarization and raw text block summarization.
To showcase how easy it is to build an AI application using Cloudera AI and Google's Vertex AI Model Garden.

Summarization has consistently been the ultimate low-hanging fruit of Generative AI (GenAI) use cases. For example, a Cloudera customer saw a large productivity improvement in their contract review process with an application that extracts and displays a short summary of essential clauses for the reviewer. Another customer in Banking reduced the time it took to produce a prospective client's source of wealth review memo from one day to just 15 minutes with a custom GenAI application that summarizes key details from tens to hundreds of financial documents.

This will be our first AMP using the Vertex AI Model Garden, and it’s about time. It’s incredibly beneficial to only need a single account for easy API access to over a hundred of the leading closed-source and open-source models, including a strong set of task-specific models. The models in the Garden are already optimized for running efficiently on Google's Cloud infrastructure, offering cost effective inference and enterprise-grade scaling, even on the highest-throughput apps.

This will also be our first AMP using Gemini Pro Models, which work well with multi-modal and text summarization applications and offer a large context window, which is up to one million tokens. Benchmark tests indicate that Gemini Pro demonstrates superior speed in token processing compared to its competitors like GPT-4. And compared to other high-performing models, Gemini Pro offers competitive pricing structures for both free and paid tiers, making it an attractive option for businesses seeking cost-effective AI solutions without compromising on quality.

How to deploy the AMP:

Get Gemini Pro Access: From the Vertex AI Marketplace find and enable the Vertex AI API, then create an API key, and then enable Gemini for the same project space you generated the API key for.
Launch the AMP: Click on the AMP tile “Document Summarization with Gemini from Vertex AI” in Cloudera AI Learning, input the configuration information (Vertex AI API key and ML runtime info), and then click launch.

The AMP scripts will then do the following:

Install all dependencies and requirements (including the all-MiniLM-L6-v2 embedding model, Hugging Face transformers library, and LlamaIndex vector store).
Load a sample doc into the LlamaIndex vector store
Launch the Streamlit UI

And there you have it: a summarization application deployed in mere minutes. Stay tuned for future AMPs we’ll build using Cloudera AI and Vertex AI.

You can then use the Streamlit UI to:

Select the Gemini Pro Model you’d like to use for summarization
Paste in text and summarize it
Load documents into the vector store (which generates the embeddings)
Select a loaded document and summarize it
Adjust response length (max output tokens) and randomness (temperature)

The Struggle Between Data Dark Ages and LLM Accuracy

Mary Wells — Fri, 06 Dec 2024 17:00:00 UTC

Artificial Intelligence promises to transform lives and business as we know it. But what does that future look like? The AI Forecast: Data and AI in the Cloud Era, sponsored by Cloudera, aims to take an objective look at the impact of AI on business, industry, and the world at large.

Hosted weekly by Paul Muller, The AI Forecast speaks to experts in the space to understand the ins and outs of AI in the enterprise, the kinds of data architectures and infrastructures that support it, the guardrails that should be put in place, and the success stories to emulate…or cautionary tales to learn from.

AI is only as successful as the data behind it. To explore what the next era of data looks like in this AI boom, R “Ray” Wang, principal analyst, founder, and chairman of Constellation Research, joined us to kick off this new podcast and discuss.

Here are some key takeaways from Ray in that conversation.

LLM precision is good, not great, right now

Paul: I wanted to chat about this notion of precision data with you. And specifically, I was reading one of your blog posts recently that talked about the dark ages of data. Walk us through where we are with precision data today and how this relates to the dark ages of data.

Ray: We're at a point where people get excited about 85% accuracy in their LLMs. 85% accuracy for customer experience means that number isn’t bad. What does that look like? You may get a telemarketing call and it gets routed to the wrong person. Or you might get an extra fry by accident at the checkout. These are all minor.

But 85% accuracy in the supply chain means you have no manufacturing operations. 85% accuracy in finance can put you in jail. Therefore, the next 10%, which are small language models, are going to come into play. And the value of the 10% is as much as the 85% and as much as the next 5% to get to 95%. To get to a full 100%, that last 5% is even more valuable. That's context, that's location. It could be metadata that you weren't capturing before. That's anything from perspiration to heart rate – it's all being captured.

The final hurdle to LLM precision, available data

Ray: But to get to a level of precision that your stakeholders are going to trust, there's not enough data. Most of the publicly available information on the internet has already been scrapped. There's nothing new. People aren’t putting stuff out there anymore because they're afraid. We went from not having enough data, to having all the data we know, to after 2022 not being sure what happened because people started hoarding data.

We are going to enter the dark ages of data and the internet because nothing of value is going to be available publicly.

Value chains emerge in the midst of Dark Ages

Ray: Given the dark ages of data and the internet, all the new information and insights are going to be worth something. You're going to value your company not just by the revenues, but also by the business graph and the data that's behind it.

Companies will partner, but not with each other in terms of competitors. A big retailer might partner with the manufacturer and a distributor to share information on demand or intervention on pricing elasticity or about available supply. That kind of information is going to become very valuable, and people are going to bid and build markets against that.

Data collectives are going to merge over time, and industry value chains will consolidate and share information. It's not direct competitors. Retail manufacturing distribution is a natural value chain. These natural value chains are going to start learning how to share data and use different mechanisms to do that.

+++

Don’t forget to tune in to Spotify or Apple Podcasts to listen to future episodes of The AI Forecast: Data and AI in the Cloud Era.

Cloudera AI Inference Service Enables Easy Integration and Deployment of GenAI Into Your Production Environments

Zoram Thanga — Wed, 04 Dec 2024 18:17:00 UTC

It complements Cloudera AI Workbench (previously known as Cloudera Machine Learning Workspace), a deployment environment that is more focused on the exploration, development, and testing phases of the MLOPs workflow.

Why did we build it?

The emergence of GenAI, sparked by the release of ChatGPT, has facilitated the broad availability of high-quality, open-source large language models (LLMs). Services like Hugging Face and the ONNX Model Zoo made it easy to access a wide range of pre-trained models. This availability highlights the need for a robust service that enables customers to seamlessly integrate and deploy pre-trained models from various sources into production environments. To meet the needs of our customers, the service must be highly:

Secure - strong authentication and authorization, private, and safe
Scalable - hundreds of models and applications with autoscaling capability
Reliable - minimalist, fast recovery from failures
Manageable - easy to operate, rolling updates
Standards compliant - adopt market-leading API standards and model frameworks
Resource efficient - fine-grained resource controls and scale to zero
Observable - monitor system and model performance
Performant - best-in-class latency, throughput, and concurrency
Isolated - avoid noisy neighbors to provide strong service SLAs

These and other considerations led us to create the Cloudera AI Inference service as a new, purpose-built service for hosting all production AI models and related applications. It is ideal for deploying always-on AI models and applications that serve business-critical use cases.

High-level architecture

Welcome to the first installment of a series of posts discussing the recently announced Cloudera AI Inference service.

Today, Artificial Intelligence (AI) and Machine Learning (ML) are more crucial than ever for organizations to turn data into a competitive advantage. To unlock the full potential of AI, however, businesses need to deploy models and AI applications at scale, in real-time, and with low latency and high throughput. This is where the Cloudera AI Inference service comes in. It is a powerful deployment environment that enables you to integrate and deploy generative AI (GenAI) and predictive models into your production environments, incorporating Cloudera’s enterprise-grade security, privacy, and data governance.

Over the next several weeks, we’ll explore the Cloudera AI Inference service in-depth, providing you with a comprehensive introduction to its capabilities, benefits, and use cases.

In this series, we’ll delve into topics such as:

A Cloudera AI Inference service architecture deep dive
Key features and benefits of the service, and how it complements Cloudera AI Workbench
Service configuration and sizing of model deployments based on projected workloads
How to implement a Retrieval-Augmented Generation (RAG) system using the service
Exploring different use cases for which the service is a great choice

If you’re interested in unlocking the full potential of AI and ML in your organization, stay tuned for our next posts, where we’ll dig deeper into the world of Cloudera AI Inference.

What is the Cloudera AI Inference service?

The Cloudera AI Inference service is a highly scalable, secure, and high-performance deployment environment for serving production AI models and related applications. The service is targeted at the production-serving end of the MLOPs/LLMOPs pipeline, as shown in the following diagram:

The diagram above shows a high-level architecture of Cloudera AI Inference service in context:

KServe and Knative handle model and application orchestration, respectively. Knative provides the framework for autoscaling, including scale to zero.
Model servers are responsible for running models using highly optimized frameworks, which we will cover in detail in a later post.
Istio provides the service mesh, and we take advantage of its extension capabilities to add strong authentication and authorization with Apache Knox and Apache Ranger.
Inference request and response payloads ship asynchronously to Apache Iceberg tables. Teams can analyze the data using any BI tool for model monitoring and governance purposes.
System metrics, such as inference latency and throughput, are available as Prometheus metrics. Data teams can use any metrics dashboarding tool to monitor these.
Users can train and/or fine-tune models in the AI Workbench, and deploy them to the Cloudera AI Inference service for production use cases.
Users can deploy trained models, including GenAI models or predictive deep learning models, directly to the Cloudera AI Inference service.
Models hosted on the Cloudera AI Inference service can easily integrate with AI applications, such as chatbots, virtual assistants, RAG pipelines, real-time and batch predictions, and more, all with standard protocols like the OpenAI API and the Open Inference Protocol.
Users can manage all of their models and applications on the Cloudera AI Inference service with common CI/CD systems using Cloudera service accounts, also known as machine users.
The service can efficiently orchestrate hundreds of models and applications and scale each deployment to hundreds of replicas dynamically, provided compute and networking resources are available.

Conclusion

In this first post, we introduced the Cloudera AI Inference service, explained why we built it, and took a high-level tour of its architecture. We also outlined many of its capabilities. We will dive deeper into the architecture in our next post, so please stay tuned.

Fueling the Future of GenAI with NiFi: Cloudera DataFlow 2.9 Delivers Enhanced Efficiency and Adaptability

Chris Joynt — Wed, 04 Dec 2024 17:00:00 UTC

For more than a decade, Cloudera has been an ardent supporter and committee member of Apache NiFi, long recognizing its power and versatility for data ingestion, transformation, and delivery. Our customers rely on NiFi as well as the associated sub-projects (Apache MiNiFi and Registry) to connect to structured, unstructured, and multi-modal data from a variety of data sources – from edge devices to SaaS tools to server logs and change data capture streams.

Now, the era of generative AI (GenAI) demands data pipelines that are not just powerful, but also agile and adaptable. Cloudera DataFlow 2.9 delivers on this need, providing enhancements that streamline development, boost efficiency, and empower organizations to build cutting-edge GenAI solutions.

This release underscores Cloudera's unwavering commitment to Apache NiFi and its vibrant open-source community. We're particularly excited about the advancements in Apache NiFi 2.0 and its potential to revolutionize data flow management. If you can’t wait to try Apache NiFi 2.0, access our free 5-day trial now. For a brief review of the new capabilities of Cloudera DataFlow 2.9 read on.

Accelerating GenAI with Powerful New Capabilities

Cloudera DataFlow 2.9 introduces new features specifically designed to fuel GenAI initiatives:

New AI Processors: Harness the power of cutting-edge AI models with new processors that simplify integration and streamline data preparation for GenAI applications.
Ready Flows for RAG Architectures: Jumpstart your Retrieval Augmented Generation (RAG) projects with pre-built data flows that accelerate the development of GenAI applications that leverage external knowledge sources.

These enhancements empower organizations to build sophisticated GenAI solutions with greater ease and efficiency, unlocking the transformative power of AI.

Boosting Developer Productivity

DataFlow 2.9 introduces features to enhance developer productivity and streamline data pipeline development:

Parameter Groups: Simplify flow management and promote reusability by grouping parameters and applying them across multiple flows. This reduces development time and enhances consistency.
Ready Flows: Accelerate development with pre-built templates for common data integration and processing tasks, freeing up developers to focus on higher-value activities.

By simplifying development and promoting reusability, DataFlow 2.9 empowers data engineers to build and deploy data pipelines faster, accelerating time-to-value for the business.

Simplifying Operations and Enhancing Observability

DataFlow 2.9 also includes enhancements that make operating and monitoring data pipelines easier than ever:

Notifications: Stay informed about the health and performance of your data flows with customizable notifications that alert you to critical events.
Enhanced NiFi Metrics: Gain deeper insights into your data pipelines with improved monitoring capabilities that provide detailed metrics on flow performance and can be integrated into your preferred observability tool.

These operational enhancements ensure smoother data pipeline management, reducing troubleshooting time and maximizing efficiency.

Cloudera's Vision: Universal Data Distribution

With DataFlow 2.9, Cloudera continues to deliver on its vision of universal data distribution, empowering organizations to seamlessly move and process data across any environment, from edge to AI. This release provides the essential building blocks for creating efficient, adaptable, and future-proof data pipelines that fuel innovation and drive business value in the age of GenAI.

Learn More:

To explore the new capabilities of Cloudera DataFlow 2.9 and discover how it can transform your data pipelines, watch this video.

Cloudera DataFlow now powers GenAI pipelines and supports changing flow versions of existing deployments in latest 2.8 release

Cloudera DataFlow 2.9 now supports building GenAI pipelines with NiFi 2, simplifies parameter sharing and improves monitoring capabilities

Cloudera announces ‘Interoperability Ecosystem’ with founding members AWS and Snowflake

Abhas Ricky,Michelle Hoover — Wed, 04 Dec 2024 14:00:00 UTC

Today enterprises can leverage the combination of Cloudera and Snowflake—two best-of-breed tools for ingestion, processing and consumption of data—for a single source of truth across all data, analytics, and AI workloads.

But now AWS customers will gain more flexibility, data utility, and complexity, supporting the modern data architecture. All this by making it easier for customers to connect their workloads with Snowflake, Cloudera, and unique AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Relational Database Service (Amazon RDS), Amazon Elastic Compute Cloud (Amazon EC2), Amazon EMR and Amazon Athena.

Our customers have spoken, they are seeking the following:

Scalable, Cost-Efficient Performance: Dynamically scale analytics and AI workloads while optimizing costs using Snowflake’s compute engine and Cloudera’s Iceberg-powered lakehouse
Seamless Ecosystem Integration: Enable unified workflows across AWS services and hybrid/multi-cloud setups without data silos.
Secure, Real-Time Insights: Combine robust governance with real-time analytics for efficient, secure data management and AI-driven insights.

Our joint collaboration will enable the following:

Seamless Data Sharing and Interoperability: The integration enables AWS customers to leverage Cloudera’s data lakehouse capabilities alongside Snowflake’s AI Data Cloud, facilitating unified data access and sharing across platforms
Enhanced AI/ML Performance: The partnership optimizes data workflows for AI/ML applications by enabling Cloudera’s on-premises or hybrid data sets running AWS to integrate with Snowflake’s analytics capabilities, reducing latency and improving insights
Maximized Cloud Investments: Customers can utilize Snowflake for specific analytics use cases while continuing to manage broader data operations with Cloudera, maximizing their investment in AWS infrastructure by combining the strengths of both platforms
Support for Multi-Cloud Strategies: The collaboration simplifies bridging data stored on AWS with Snowflake, supporting multi-cloud strategies and enhancing data mobility across environments without vendor lock-in
Scalable and Secure Data Management: Cloudera’s enterprise-grade security and governance capabilities extend to data shared with Snowflake, ensuring compliance and scalability for customers handling sensitive data

There are several new un-locked use cases. For example: An AWS customer using Cloudera for hybrid workloads can now extend analytics workflows to Snowflake, gaining deeper insights without moving data across infrastructures. Or now customers can combine Cloudera’s raw data processing and Snowflake’s analytical capabilities to build efficient AI/ML pipelines. The use cases are boundless and may extend beyond the limits of even our collective companies’ imaginations.

“This helps accelerate enterprises’ lakehouse and analytics deployments across the deployment zone of choice, form factor of choice and compute engine of choice, with the best TCO. Through this partnership with AWS and Snowflake, customers will now be able to build mission critical enterprise applications once and deploy anywhere – public cloud, private cloud, their VPC, and share data assets between practitioners and data teams” said Abhas Ricky, Chief Strategy Officer of Cloudera.

“We’re thrilled to be working alongside AWS and Snowflake to deliver a unique alignment amongst three powerful companies. The positive feedback from our customers affirms we’re delivering meaningful impact,” said Michelle Hoover, SVP Global Alliances & Channels of Cloudera.

Earlier this year Cloudera announced the expansion of its Enterprise AI ecosystem with partners Pinecone, Anthropic, Google, Amazon, and Snowflake. This initiative brings together a diverse group of industry-leading AI providers to deliver comprehensive, end-to-end AI solutions for customers that help to maximize the value of AI.

By extension the REST Catalog ecosystem broadens access to data across multiple vendors so now customers don’t have to pick and choose vendors but have the best of each through a single point of access.

Read this blog to learn more about how Amazon EMR seamlessly integrates with Cloudera’s lakehouse for secure data sharing and interoperability powered by Iceberg REST Catalog.

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Tue, 03 Dec 2024 17:00:00 UTC

Pre-requisites

The following components in Cloudera on cloud should be installed and configured:

Cloudera Data Platform Public Cloud version 7.2.18.300 or later
One of the following to create the required Iceberg Tables
- Data Warehouse Virtual Warehouse running Hive (with HUE access)
- Data Hub running Hive and HUE

The following AWS prerequisites:

An AWS Account & an IAM role with permissions to create Athena Notebooks

In this example, you will see how to use Amazon Athena to access data that is being created and updated in Iceberg tables using Cloudera.

Please reference user documentation for installation and configuration of Cloudera Public Cloud.

Follow the steps below to setup Cloudera:

1. Create Database and Tables:

Open HUE and execute the following to create a database and tables.

CREATE DATABASE IF NOT EXISTS airlines_data;

DROP TABLE IF EXISTS airlines_data.carriers;

CREATE TABLE airlines_data.carriers (

   carrier_code STRING,

   carrier_description STRING)

STORED BY ICEBERG

TBLPROPERTIES ('format-version'='2');

DROP TABLE IF EXISTS airlines_data.airports;

CREATE TABLE airlines_data.airports (

   airport_id INT,

   airport_name STRING,

   city STRING,

   country STRING,

   iata STRING)

STORED BY ICEBERG

TBLPROPERTIES ('format-version'='2');

2. Load data into Tables:

In HUE execute the following to load data into each Iceberg table.

INSERT INTO airlines_data.carriers (carrier_code, carrier_description)

VALUES

    ("UA", "United Air Lines Inc."),

    ("AA", "American Airlines Inc.")

;

INSERT INTO airlines_data.airports (airport_id, airport_name, city, country, iata)

VALUES

    (1, 'Hartsfield-Jackson Atlanta International Airport', 'Atlanta', 'USA', 'ATL'),

    (2, 'Los Angeles International Airport', 'Los Angeles', 'USA', 'LAX'),

    (3, 'Heathrow Airport', 'London', 'UK', 'LHR'),

    (4, 'Tokyo Haneda Airport', 'Tokyo', 'Japan', 'HND'),

    (5, 'Shanghai Pudong International Airport', 'Shanghai', 'China', 'PVG')

;

3. Query Carriers Iceberg table:

In HUE execute the following query. You will see the 2 carrier records in the table.

SELECT * FROM airlines_data.carriers;

4. Setup REST Catalog

5. Setup Ranger Policy to allow “rest-demo” access for sharing:

Create a policy that will allow the “rest-demo” role to have read access to the Carriers table, but will have no access to read the Airports table.

In Ranger go to Settings > Roles to validate that your Role is available and has been assigned group(s).

Follow the steps below to create an Amazon Athena notebook configured to use the Cloudera Iceberg REST Catalog:

6. Create an Amazon Athena notebook with the “Spark_primary” Workgroup

Many enterprises have heterogeneous data platforms and technology stacks across different business units or data domains. For decades, they have been struggling with scale, speed, and correctness required to derive timely, meaningful, and actionable insights from vast and diverse big data environments. Despite various architectural patterns and paradigms, they still end up with perpetual “data puddles” and silos in many non-interoperable data formats. Constant data duplication, complex Extract, Transform & Load (ETL) pipelines, and sprawling infrastructure leads to prohibitively expensive solutions, adversely impacting the Time to Value, Time to Market, overall Total Cost of Ownership (TCO), and Return on Investment (ROI) for the business.

Cloudera’s open data lakehouse, powered by Apache Iceberg, solves the real-world big data challenges mentioned above by providing a unified, curated, shareable, and interoperable data lake that is accessible by a wide array of Iceberg-compatible compute engines and tools.

The Apache Iceberg REST Catalog takes this accessibility to the next level simplifying Iceberg table data sharing and consumption between heterogeneous data producers and consumers via an open standard RESTful API specification.

REST Catalog Value Proposition

It provides open, metastore-agnostic APIs for Iceberg metadata operations, dramatically simplifying the Iceberg client and metastore/engine integration.
It abstracts the backend metastore implementation details from the Iceberg clients.
It provides real time metadata access by directly integrating with the Iceberg-compatible metastore.
Apache Iceberg, together with the REST Catalog, dramatically simplifies the enterprise data architecture, reducing the Time to Value, Time to Market, and overall TCO, and driving greater ROI.

The Cloudera open data lakehouse, powered by Apache Iceberg and the REST Catalog, now provides the ability to share data with non-Cloudera engines in a secure manner.

With Cloudera’s open data lakehouse, you can improve data practitioner productivity and launch new AI and data applications much faster with the following key features:

Multi-engine interoperability and compatibility with Apache Iceberg, including Cloudera DataFlow (NiFi), Cloudera Stream Analytics (Flink, SQL Stream Builder), Cloudera Data Engineering (Spark), Cloudera Data Warehouse (Impala, Hive), and Cloudera AI (formerly Cloudera Machine Learning).
Time Travel: Reproduce a query as of a given time or snapshot ID, which can be used for historical audits, validating ML models, and rollback of erroneous operations, as examples.
Table Rollback: Enable users to quickly correct problems by rolling back tables to a good state.
Rich set of SQL (query, DDL, DML) commands: Create or manipulate database objects, run queries, load and modify data, perform time travel operations, and convert Hive external tables to Iceberg tables using SQL commands.
In-place table (schema, partition) evolution: Evolve Iceberg table schema and partition layout on the fly without requiring data rewriting, migration, or application changes.
Cloudera Shared Data Experience (SDX) Integration: Provide unified security, governance, and metadata management, as well as data lineage and auditing on all your data.
Iceberg Replication: Out-of-the-box disaster recovery and table backup capability.
Easy portability of workloads between public cloud and private cloud without any code refactoring.

Solution Overview

Data sharing is the capability to share data managed in Cloudera, specifically Iceberg tables, with external users (clients) who are outside of the Cloudera environment. You can share Iceberg table data with your clients who can then access the data using third party engines like Amazon Athena, Trino, Databricks, or Snowflake that support Iceberg REST catalog.

The solution covered by this blog describes how Cloudera shares data with an Amazon Athena notebook. Cloudera uses a Hive Metastore (HMS) REST Catalog service implemented based on the Iceberg REST Catalog API specification. This service can be made available to your clients by using the OAuth authentication mechanism defined by the

KNOX token management system and using Apache Ranger policies for defining the data shares for the clients. Amazon Athena will use the Iceberg REST Catalog Open API to execute queries against the data stored in Cloudera Iceberg tables.

In this case I’m using a role named - “UnitedAirlinesRole” that I can use to share data.

Create new Policy with the following settings, be sure to save your policy

Policy Name: rest-demo-access-policy
Hive Database: airlines_data
Hive Table: carriers
Hive Column: *
In Allow Conditions
- Select your role under “Select Roles”
- Permissions: select

a. Provide a name for your notebook

b. Additional Apache Spark properties - this will enable use of the Cloudera Iceberg REST Catalog. Select the “Edit in JSON” button. Copy the following and replace , , , and with the appropriate values. See REST Catalog Setup blog to determine what values to use for replacement.

{

      "spark.sql.catalog.demo": "org.apache.iceberg.spark.SparkCatalog",

      "spark.sql.catalog.demo.default-namespace": "airlines",

      "spark.sql.catalog.demo.type": "rest",

      "spark.sql.catalog.demo.uri": "https:////cdp-share-access/hms-api/icecli",

      "spark.sql.catalog.demo.credential": ":",

      "spark.sql.defaultCatalog": "demo",

      "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

    }

c. Click on the “Create” button, to create a new notebook

7. Spark-sql Notebook - execute commands via the REST Catalog

Run the following commands 1 at a time to see what is available from the Cloudera REST Catalog. You will be able to:

See the list of available databases

spark.sql(show databases).show();

Switch to the airlines_data database

spark.sql(use airlines_data);

See the available tables (should not see the Airports table in the returned list)

spark.sql(show tables).show();

Query the Carriers table to see the 2 Carriers currently in this table

spark.sql(SELECT * FROM airlines_data.carriers).show()

Follow the steps below to make changes to the Cloudera Iceberg table & query the table using Amazon Athena:

8. Cloudera - Insert a new record into the Carriers table:

In HUE execute the following to add a row to the Carriers table.

INSERT INTO airlines_data.carriers
VALUES("DL", "Delta Air Lines Inc.");

9. Cloudera - Query Carriers Iceberg table:

In HUE and execute the following to add a row to the Carriers table.

SELECT * FROM airlines_data.carriers;

10. Amazon Athena Notebook - query subset of Airlines (carriers) table to see changes:

Execute the following query - you should see 3 rows returned. This shows that the REST Catalog will automatically handle any metadata pointer changes, guaranteeing that you will get the most recent data.

spark.sql(SELECT * FROM airlines_data.carriers).show()

11. Amazon Athena Notebook - try to query Airports table to test security policy is in place:

Execute the following query. This query should fail, as expected, and will not return any data from the Airports table. The reason for this is that the Ranger Policy is being enforced and denies access to this table.

spark.sql(SELECT * FROM airlines_data.airports).show()

Conclusion

In this post, we explored how to set up a data share between Cloudera and Amazon Athena. We used Amazon Athena to connect via the Iceberg REST Catalog to query data created and maintained in Cloudera.

Key features of the Cloudera open data lakehouse include:

Multi-engine compatibility with various Cloudera products and other Iceberg REST compatible tools.
Time Travel and Table Rollback for data recovery and historical analysis.
Comprehensive SQL support and in-place schema evolution.
Integration with Cloudera SDX for unified security and governance.
Iceberg replication for disaster recovery.

Amazon Athena is a serverless, interactive analytics service that provides a simplified and flexible way to analyze petabytes of data where it lives.. Amazon Athena also makes it easy to interactively run data analytics using Apache Spark without having to plan for, configure, or manage resources. When you run Apache Spark applications on Athena, you submit Spark code for processing and receive the results directly. Use the simplified notebook experience in Amazon Athena console to develop Apache Spark applications using Python or Use Athena notebook APIs. The Iceberg REST Catalog integration with Amazon Athena allows organizations to leverage the scalability and processing power of EMR Spark for large-scale data processing, analytics, and machine learning workloads on large datasets stored in Cloudera Iceberg tables.

For enterprises facing challenges with their diverse data platforms, who might be struggling with issues related to scale, speed, and data correctness, this solution can provide significant value. This solution can reduce data duplication issues, simplify complex ETL pipelines, and reduce costs, while improving business outcomes.

To learn more about Cloudera and how to get started, refer to Getting Started. Check out Cloudera’s open data lakehouse to get more information about the capabilities available or visit Cloudera.com for details on everything Cloudera has to offer. Refer to Getting Started with Amazon Athena

Add a Policy in Ranger > Hadoop SQL.

Cloudera and AWS Partner to Deliver Cost-Efficient and Sustainable Infrastructure for AI and Analytics

Jeremiah Morrow — Mon, 02 Dec 2024 14:00:00 UTC

As organizations adopt a cloud-first infrastructure strategy, they must weigh a number of factors to determine whether or not a workload belongs in the cloud. Cost has been a key consideration in public cloud adoption from the start. Today, energy efficiency is gaining importance, not only for cutting costs but also as a vital step toward sustainable business practices. By optimizing energy consumption, companies can significantly reduce the cost of their infrastructure. In this way, FinOps and GreenOps go hand-in-hand.

Cloudera is committed to providing the most optimal architecture for data processing, advanced analytics, and AI while advancing our customers’ cloud journeys. To that end, we’re collaborating with Amazon Web Services (AWS) to deliver a high-performance, energy-efficient, and cost-effective solution by supporting many data services on AWS Graviton. Together, Cloudera and AWS empower businesses to optimize performance for data processing, analytics, and AI while minimizing their resource consumption and carbon footprint.

FinOps and GreenOps are Critical for Cloud Computing in the Age of AI

FinOps in cloud computing is essential for businesses to understand, manage, and optimize cloud spending. As the scale of data and computing grows, especially with the increase of AI workloads, FinOps provides a strategic approach to keep cloud expenses predictable and aligned with business objectives.

Meanwhile, GreenOps focuses on reducing the environmental impact of cloud operations. With more companies committing to carbon neutrality and recognizing their roles in climate action, GreenOps practices are gaining traction, encouraging efficient infrastructure management that conserves energy and reduces carbon emissions. Together, FinOps and GreenOps form a powerful approach to cloud strategy supporting cost-efficient sustainable operations.

Cloudera’s Commitment to Sustainable Analytics and AI

With 25 exabytes of data under management across the globe, Cloudera is leading the way in sustainable analytics and AI. We have implemented several product and corporate initiatives to help our customers reduce costs and be more energy efficient, including:

Native Support for Apache Iceberg: Cloudera’s adoption of Apache Iceberg enables customers to build data lakehouses that reduce data replication, data movement, and data copies, saving on storage costs and reducing the compute required to move and duplicate data.
Lakehouse Optimizer: Cloudera introduced a service that automatically optimizes Iceberg tables for high-performance queries and reduced storage utilization. The net result is that queries are more efficient and run for shorter durations, while storage costs and energy consumption are reduced.
Unified Codebase: A unified codebase across platforms and infrastructure simplifies analytics workflows, eliminating redundant processes that consume resources and ensuring that analytics workloads run efficiently.
Net Zero Commitment: Cloudera has committed to reducing carbon emissions and reaching Net Zero by 2050. With each technology advancement, Cloudera moves closer to creating a sustainable analytics ecosystem.

These features align with FinOps and GreenOps principles, demonstrating Cloudera’s commitment to cost savings and environmental stewardship while delivering industry-leading analytics capabilities.

Cloudera and AWS Graviton Supports Sustainable Analytics and AI

As more companies deploy AI workloads, they require greater computing power, resulting in increased energy consumption and emissions. Companies adopting AI now face a new obstacle to innovation: they must support AI development while meeting corporate goals for sustainability.

Sustainable infrastructure is no longer optional–it’s essential. And AWS is a crucial ally for Cloudera in enabling companies to scale AI operations responsibly.

AWS Graviton processors are designed to run critical workloads at the lowest cost, with the best performance, and with the lowest energy consumption. Today, Cloudera Data Engineering, a data service that streamlines and scales data pipeline development, is available with support for AWS Graviton processors. Benchmarks show significant performance improvements for Spark jobs on AWS Graviton processors compared to alternatives.

Performance Benchmark of Cloudera Data Engineering on AWS Graviton

As a result of this collaboration, Cloudera Data Engineering on AWS Graviton is the industry leader in performance, price, and energy efficiency for data processing workloads utilizing Spark on Iceberg tables.

Cloudera Data Engineering is just the start. Several other data services are planned for general availability or technical preview on AWS Graviton in the coming months.

Customer Spotlight: A Multinational Utility Company’s Journey in Sustainable Data Optimization

At a recent event, a multinational utility company shared its Cloudera journey, showcasing how sustainable business practices evolved alongside its data capabilities. From its early data lakes, the company progressed through several optimizations, including adopting columnar storage, migrating to the public cloud, and utilizing autoscaling to more efficiently manage data. It also implemented Spark for data processing and adopted Iceberg as an open table format to reduce data copies and manage a single, consistent version of its data. Each step of their evolution resulted in improvements in efficiency and a reduction in resource demands.

Cloudera’s continuous innovation enabled this company to maintain relevance and prepare for the future of AI with a well-optimized, sustainable infrastructure that minimizes costs and environmental impact.

For Cloudera, this utility customer’s journey reflects our commitment to modernization and sustainability, providing a real-world example that customers across industries can replicate as they consider their own journeys toward responsible AI.

Try Cloudera on AWS Today

The Cloudera and AWS collaboration represents a significant step forward for companies seeking to balance the need to innovate with advanced analytics and AI while optimizing costs and significantly reducing their environmental impact. Delivering business impact with data while achieving financial and environmental objectives for sustainable operations will be critical for nearly every business, and Cloudera and AWS are ready to partner with your organization to meet those challenges.

If you’re ready to try Cloudera and AWS, you can start with a free 5-day trial that provides common patterns for analytics use cases to get you started, including Cloudera Data Engineering, generative AI, the open data lakehouse, and streaming data distribution. Give it a try today.

Elevating Productivity: Cloudera Data Engineering Brings External IDE Connectivity to Apache Spark

Pamela Pan,Shaun Ahmadian — Thu, 21 Nov 2024 11:36:00 UTC

Best-in-Class Apache Spark on Iceberg

This release also brings new capabilities designed to enhance cost-effectiveness. Support for Apache Iceberg 1.5, together with Apache Spark 3.5, delivers better performance and optimized cost management. In Change Data Capture (CDC) use cases, advanced row-level deletes with Merge-on-Read improve query efficiency, reducing resource consumption and operational costs.

Why Cloudera Data Engineering?

Cloudera customers benefit from enterprise-secured tools to build collaborative sandboxes, empowering data engineers, data scientists, and extended data practitioner teams that need insights to drive decisions. With 100x more data under management compared to other cloud-only vendors, Cloudera empowers enterprises to build open data lakehouses for scalable and secure data management with portable analytics across hybrid cloud environments.

Top innovators from financial, healthcare, and other data-intensive industries rely on Cloudera Data Engineering for several reasons:

Secure Data Pipelining Across Hybrid Environments: With Apache Spark as the engine, Cloudera Data Engineering provides secure ingestion, seamlessly handling data in different formats across hybrid clouds to meet the varied needs of modern data pipelines. Powered by integrated platform services, Cloudera Data Engineering ensures data governance with robust data handling and automated lifecycle lineage tracking.
Simplified Workflows and Iterative Collaborations: With Apache Airflow, Cloudera Data Engineering provides API integrations for external data tools like dbt. Interactive Sessions and the latest External IDE Connectivity support quick iterations and collaborations.
Data Interoperability With Lower TCO: Cloudera Data Engineering has native support for Apache Iceberg - the leading open table format purpose-built for managing exabyte-scale data lakes and delivering high-performance queries. Unlike cloud vendors with proprietary engines, Cloudera Data Engineering optimizes cost efficiency by leveraging open-source technologies and integrated platform services like Cloudera Observability.

Ready to Explore?

Discover how Cloudera Data Engineering can accelerate time-to-value in building future-proof modern data architectures:

Start your 5-day trial for Cloudera Data Engineering on AWS
Download the new datasheet for Cloudera Data Engineering
View Product Release Notes
Join the community for upcoming demos

As advanced analytics and AI continue to drive enterprise strategy, leaders are tasked with building flexible, resilient data pipelines that accelerate trusted insights. AI pioneer Andrew Ng recently underscored that robust data engineering is foundational to the success of data-centric AI—a strategy that prioritizes data quality over model complexity. McKinsey Quarterly’s latest research further forecasts a future of “data ubiquity” by 2030, where enterprise data is seamlessly embedded across systems, processes, and decision points. For enterprises, the challenge now is not just rapid deployment; it’s about building trusted, iterative processes that ensure high-quality and actionable data at scale.

Cloudera Data Engineering’s latest version release on public cloud addresses this rising challenge by introducing major enhancements in development productivity with enterprise-secured toolings, bringing remote access to Apache Spark from the practitioner’s preferred coding environments. This release marks a milestone toward Cloudera Data Engineering’s vision of providing the best practitioner-centric, production-grade pipelining and orchestration solutions.

A New Level of Productivity with Remote Access

The new Cloudera Data Engineering 1.23 on public cloud spotlights External IDE Connectivity, which enables data engineers to access Apache Spark clusters and data pipelines directly from their preferred development environments (e.g., Jupyter, PyCharm, and VS Code). Extended data practitioner teams can work in their preferred coding environments without proprietary lock-ins.

Along with Cloudera Data Engineering’s Interactive Sessions, data teams can reap the benefits of iterative development, fostering more collaborative iterative workflows to drive quality while maintaining robust security standards.

Enable Image Analysis with Cloudera’s New Accelerator for Machine Learning Projects Based on Anthropic Claude

Jeremiah Morrow — Fri, 15 Nov 2024 11:18:00 UTC

Enterprise organizations collect massive volumes of unstructured data, such as images, handwritten text, documents, and more. They also still capture much of this data through manual processes. The way to leverage this for business insight is to digitize that data. One of the biggest challenges with digitizing the output of these manual processes is transforming this unstructured data into something that can actually deliver actionable insights.

Artificial Intelligence is the new mining tool to extract business insight gold from the more complex and more abstract unstructured data assets. To help quickly and efficiently create these new AI applications to mine unstructured data, Cloudera is excited to introduce a new addition to our Accelerator for Machine Learning Projects (AMPs), easy-to-use AI quick starters, based on Anthropic Claude, a Large Language Model (LLM) that supports the extraction and manipulation of information from images. Claude 3 goes beyond traditional Optical Character Recognition (OCR) with advanced reasoning capabilities that enable users to specify exactly what information they need from an image– whether it’s converting handwritten notes into text or pulling data from dense, complicated forms.

Unlike Other OCR systems, which can often miss context or require multiple steps to clean the data, Claude 3 enables customers to perform complex document understanding tasks directly. The result is a powerful tool for businesses that need to quickly digitize, analyze, and extract machine usable data from unstructured visual inputs.

Searching and retrieving information from unstructured data is critical for companies who want to quickly and accurately digitize manual, time-consuming administrative tasks. This AMP makes it possible to quickly deliver a production-ready model that is fine-tuned with organizational data and context specific to each individual use case.

Some possible use cases for this AMP include:

Transcribing Typed Text: Quickly extract digital text from scanned documents, PDFs, or printouts, supporting efficient document digitization.

Transcribing Handwritten Text: Convert handwritten notes into machine-readable text. This is ideal for digitizing personal notes, historical records, and even legal documents.

Transcribing Forms: Extract data from structured forms while preserving the organization and layout, automating data entry processes.

Complex Document QA: Ask context-specific questions about documents, extracting relevant answers from even the most complicated forms and formats.

Data Transformation: Transform unstructured image content into JSON format, making it easy to integrate image-based data into structured databases and workflows.

User-Defined Prompts: For advanced users, this AMP also provides the flexibility to create custom prompts that cater to niche or highly specialized use cases involving image data.

Get Started Today

Getting started with this AMP is as simple as clicking a button. You can launch it from the AMP catalog within your Cloudera AI (Formerly Cloudera Machine Learning) workspace, or start a new project with the repository URL. For more information on requirements and for more detailed instructions on how to get started, visit our guide on GitHub.

Empower Your Cyber Defenders with Real-Time Analytics

Carolyn Duby — Fri, 15 Nov 2024 11:05:00 UTC

Today, cyber defenders face an unprecedented set of challenges as they work to secure and protect their organizations. In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report, there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021.

The constant barrage of increasingly sophisticated cyberattacks has left many professionals feeling overwhelmed and burned out. With the sheer volume and sophistication of these attacks increasing daily, defenders must implement AI and automation to combat intrusions proactively and effectively.

However, there is a fundamental challenge standing in the way of being successful: data. Read on to discover the issues that cyber defenders face leveraging data, analytics, and AI to do their jobs, how Cloudera’s open data lakehouse mitigates those issues, and how this architecture is crucial for successfully navigating the complexities of the modern cybersecurity landscape.

The Problem with Cyber Data

Data is both the greatest asset and the biggest challenge for cyber defenders. The problem isn’t just the volume of the data, but also how difficult it is to manage and make sense of it. Cyber defenders struggle with:

Too much data: Cybersecurity tools generate an overwhelming volume of log data, including Domain Name Service (DNS) records, firewall logs, and more. All of this data is essential for investigations and threat hunting, but existing systems often struggle to manage it efficiently. Ingesting the data is often too slow and/or expensive, leading to latent responses and missed opportunities.
Too many tools: An average enterprise organization deploys more than 40 different tools for cyber defense. Each tool serves a unique purpose, but analysts are often left juggling multiple interfaces, leading to fragmented investigations. The manual process of switching between tools slows down their work, often leaving them reliant on rudimentary methods of keeping track of their findings.
Unstructured data not ready for analysis: Even when defenders finally collect log data, it’s rarely in a format that’s ready for analysis. Cyber logs are often unstructured or semi-structured, making it difficult to derive insights from them. The result is that analysts waste valuable time and resources normalizing, parsing, and preparing data for investigation.

A Better Way Forward: Cloudera’s Open Data Lakehouse

Cloudera offers a solution to these challenges with its open data lakehouse, which combines the flexibility and scalability of data lake storage with data warehouse functionality to unify and simplify the management of cyber log data. By breaking down data silos and integrating log data from multiple sources, Cloudera empowers defenders with the real-time analytics to respond to threats swiftly.

Here’s how Cloudera makes it possible:

One unified system: Cloudera’s open data lakehouse consolidates all critical log data into one system. By leveraging Apache Iceberg, an open table format designed for high-performance analytics on massive volumes of data, cyber defenders can access all of their data and conduct investigations with greater speed and efficiency. Whether they need to query data from today or from years past, the system scales up or down to meet their needs.
Optimized for analytics: Iceberg tables are designed to deliver analytics faster and more effectively. With flexible schema and partitioning, Iceberg tables can scale to handle petabytes of data while compressing logs to save on storage costs. The metadata-driven approach ensures quick query planning so defenders don’t have to deal with slow processes when they need fast answers.
Secure and governed data: With Cloudera Shared Data Experience (SDX), security and governance are built into every step. Cyber logs often contain sensitive data about users, networks, and investigations, so it’s critical to protect this information while ensuring that authorized teams can access and share it safely.
Streaming pipelines for real-time insights: While the open data lakehouse provides a foundation for analytics, it is Cloudera’s data pipeline capabilities that transform raw, unstructured cyber logs into optimized Iceberg tables. Using Cloudera Data Flow and Cloudera Stream Processing, teams can filter, parse, normalize, and enrich log data in real time, ensuring that defenders are always working with clean, structured data that’s ready for advanced analytics.
Seamless integration: Cloudera’s open data lakehouse integrates with a wide range of tools, enabling investigators, threat hunters, and data scientists to work with their preferred tools. From drag-and-drop interfaces in Cloudera Data Visualization to advanced machine learning models for anomaly detection, the possibilities are endless. Plus, with Iceberg’s combination of interoperability and open standards, customers can choose the best tool for each job.

Real-Time Threat Detection with Iceberg

Cyber log data is massive and constantly evolving. In many traditional systems, query planning can take as long as executing the query itself. Iceberg makes query planning more efficient by storing all of the table metadata–including partitioning and file locations–in a way that’s easy for query engines to consume. It ensures that even large, constantly evolving tables remain manageable, enabling cyber defenders to perform real-time threat detection without being bogged down by inefficient query planning processes, and leading to faster, more efficient threat detection and investigation workflows.

Additionally, as threats evolve, so too must the systems and processes used to detect and respond to them. Iceberg enables teams to modify schemas, partitioning, and enrichment processes on the fly without having to rewrite tables. Versioning with Iceberg snapshots makes it easy to reproduce a previous state of the table so cyber defenders always have access to historical context without managing and maintaining multiple copies of the data.

The Future: AI-Powered Cyber Defense

Cloudera also prepares cyber defenders for the future of AI-driven cybersecurity. With built-in generative AI tools like the SQL AI Assistant, analysts can quickly write SQL queries to extract the needed answers. From automating routine tasks to building chatbots for incident summaries, Cloudera’s AI capabilities make cyber defense more efficient, while keeping data secure and under control.

Conclusion: Empower Your Defenders, Protect Your Business

By uniting cyber data in a scalable, secure, and analytics-ready environment, Cloudera’s open data lakehouse empowers defenders to stay one step ahead of cyber threats. With seamless integration with many tools and execution engines, flexible and cost-effective storage, and built-in AI capabilities, Cloudera empowers defenders to protect their organizations with real-time and predictive insights that help them keep pace with cyber threats.

Learn more about this solution, and all of the other innovations from Cloudera, by watching the on-demand recording of Cloudera NOW.

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Navita Sood — Thu, 14 Nov 2024 17:00:00 UTC

We are excited to announce the acquisition of Octopai, a leading data lineage and catalog platform that provides data discovery and governance for enterprises to enhance their data-driven decision making. Cloudera’s mission since its inception has been to empower organizations to transform all their data to deliver trusted, valuable, and predictive insights. With AI and generative AI powering the next wave of business applications, the real competitive edge lies in collecting vast amounts of data and deeply understanding and leveraging it for business value. Cloudera, together with Octopai, will make it easier for organizations to better understand, access, and leverage all their data in their entire data estate – including data outside of Cloudera – to power the most robust data, analytics and AI applications. This acquisition delivers access to trusted data so organizations can build reliable AI models and applications by combining data from anywhere in their environment.

Propel AI and analytic success with better data discovery and data cataloging

As organizations collect a vast and diverse array of data sources, they face significant challenges in achieving a comprehensive understanding of their data. This includes having full visibility into the origin of the data, the transformations it underwent, its relationships, and the context that was added or stripped away from that data as it moved throughout the enterprise. In today’s heterogeneous data ecosystems, integrating and analyzing data from multiple sources presents several obstacles: data often exists in various formats, with inconsistencies in definitions, structures, and quality standards. Additionally, multiple copies of the same data locked in proprietary systems contribute to version control issues, redundancies, staleness, and management headaches. This dampens confidence in the data and hampers access, in turn impacting the speed to launch new AI and analytic projects.

Founded in 2016, Octopai offers automated solutions for data lineage, data discovery, data catalog, mapping, and impact analysis across complex data environments. Combining Octopai capabilities with Cloudera’s AI powered hybrid data platform provides deeper data understanding, enhanced security, and robust data governance – essential for driving AI and analytics success. The combined platform will integrate data – from wherever it originates and wherever it is stored (cloud or on prem) – to deliver real-time insights required for faster decision making and predictive generative AI applications for personalized customer experiences. By adding the Octopai platform, Cloudera customers will benefit from:

Enhanced Data Discovery: Octopai’s automated data discovery enables instantaneous search and location of desired data across multiple systems. It leverages knowledge graphs to keep track of all the data sources and data flows, using AI to fill the gaps so you have the most comprehensive metadata management solution.

Data Trust and Quality: Octopai’s multi-layered data lineage solution provides the most complete, in-depth, and trustworthy automated lineage so data users can always trust the data and the insights generated from it. The end-to-end lineage also automates tasks such as predicting the impact of a process change, analyzing the impact of a broken process, discovering parallel processes performing the same tasks, and performing root cause analysis to uncover the source of reporting errors. This guarantees data quality and automates the laborious, manual processes required to maintain data reliability.

Robust Data Catalog: Organizations can create company-wide consistency with a self-creating, self-updating data catalog. This automated data catalog always provides up-to-date inventory of assets that never get stale. Octopai’s 50+ connectors make it easy to capture the metadata from different data sources and maintain the catalog automatically so users always know what data is available, where it can be found, what it represents, and who is responsible for it.

AI Co-pilot: The co-pilot empowers data teams with a real-time, unified workspace that automates, optimizes, and interprets scripts while providing immediate insights into data lineage. It allows users to mitigate risks, increase efficiency, and make data strategy more actionable than ever before.

The path forward with data governance and metadata management

With this acquisition, Cloudera bolsters its rich metadata management with Octopai’s market leading data discovery and data lineage capabilities, enabling customers to understand and trust their data across not just the Cloudera platform but the entire enterprise ecosystem. Together, Cloudera and Octopai will help reinvent how customers manage their metadata and track lineage across all their data sources. This will also accelerate deployment of new data products for AI, gen AI, and analytics applications. It will increase the discovery of the data products and ensure the usability and consistent delivery of these data products, providing essential elements of a data mesh architecture for self-service decentralized access to data.

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Jason Everett,Alex Song ,Charu Anchlia,Suryakant Bhardwaj,Abhishek Ranjan — Wed, 13 Nov 2024 10:27:00 UTC

To help remedy these issues, Cloudera introduces Fine Tuning Studio, a one-stop-shop studio application that covers the entire workflow and lifecycle of fine tuning, evaluating, and deploying fine-tuned LLMs in Cloudera’s AI Workbench. Now, developers, data scientists, solution engineers, and all AI practitioners working within Cloudera’s AI ecosystem can easily organize data, models, training jobs, and evaluations related to fine tuning LLMs.

Large Language Models (LLMs) will be at the core of many groundbreaking AI solutions for enterprise organizations. Here are just a few examples of the benefits of using LLMs in the enterprise for both internal and external use cases:

Optimize Costs. LLMs deployed as customer-facing chatbots can respond to frequently asked questions and simple queries. These enable customer service representatives to focus their time and attention on more high-value interactions, leading to a more cost-efficient service model.

Save Time. LLMs deployed as internal enterprise-specific agents can help employees find internal documentation, data, and other company information to help organizations easily extract and summarize important internal content.

Increase Productivity. LLMs deployed as code assistants accelerate developer efficiency within an organization, ensuring that code meets standards and coding best practices.

Several LLMs are publicly available through APIs from OpenAI, Anthropic, AWS, and others, which give developers instant access to industry-leading models that are capable of performing most generalized tasks. However, these LLM endpoints often can’t be used by enterprises for several reasons:

Private Data Sources: Enterprises often need an LLM that knows where and how to access internal company data, and users often can’t share this data with an open LLM.
Company-specific Formatting: LLMs are sometimes required to provide a very nuanced formatted response specific to an enterprise’s needs, or meet an organization’s coding standards.
Hosting Costs: Even if an organization wants to host one of these large generic models in their own data centers, they are often limited to the compute resources available for hosting these models.

The Need for Fine Tuning

Fine tuning solves these issues. Fine tuning involves another round of training for a specific model to help guide the output of LLMs to meet specific standards of an organization. Given some example data, LLMs can quickly learn new content that wasn’t available during the initial training of the base model. The benefits of using fine-tuned models in an organization are numerous:

Meet Coding Formats and Standards: Fine tuning an LLM ensures the model generates specific coding formats and standards, or provides specific actions that can be taken from customer input to an agent chatbot.
Reduce Training Time: AI practitioners can train “adapters” for base models, which only train a specific subset of parameters within the LLM. These adapters can be swapped freely between one another on the same model, so a single model can perform different roles based on the adapters.
Achieve Cost Benefits: Smaller models that are fine-tuned for a specific task or use case perform just as well as or better than a “generalized” larger LLM that is an order of magnitude more expensive to operate.

Although the benefits of fine tuning are substantial, the process of preparing, training, evaluating, and deploying fine-tuned LLMs is a lengthy LLMOps workflow that organizations handle differently. This leads to compatibility issues with no consistency in data and model organization.

Introducing Cloudera’s Fine Tuning Studio

Want to see what’s under the hood? For advanced users, contributors, or other users who want to view or modify Fine Tuning Studio, the project is hosted on Cloudera’s github.

Get Started Today!

Cloudera is excited to be working on the forefront of training, evaluating, and deploying LLMs to customers in production-ready environments. Fine Tuning Studio is under continuous development and the team is eager to continue providing customers with a streamlined approach to fine tune any model, on any data, for any enterprise application. Get started today on your fine tuning needs, and Cloudera AI’s team is ready to assist in fulfilling your enterprise’s vision for AI-ready applications to become a reality.

Check Adapter Performance. Once the training job completes, it’s helpful to “spot check” the performance of the adapter to make sure that it was trained successfully. Fine Tuning Studio offers a Local Adapter Comparison page to quickly compare the performance of a prompt between a base model and a trained adapter. Let’s try a simple customer input, pulled directly from the bitext dataset: “i have to get a refund i need assistance”, where the corresponding desired output action is get_refund. Looking at the output of the base model compared to the trained adapter, it’s clear that training had a positive impact on our adapter!

Evaluate the Adapter. Now that we’ve performed a spot check to make sure training completed successfully, let’s take a deeper look into the performance of the adapter. We can evaluate the performance against the “test” portion of the dataset from the Run MLFlow Evaluation page. This provides a more in-depth evaluation of any selected models and adapters. For this example, we will compare the performance of 1) just the bigscience/bloom-1b1 base model, 2) the same base model with our newly trained better-ticketing adapter activated, and finally 3) a larger mistral-7b-instruct model.

How can I Get Started with Fine Tuning Studio?

Cloudera’s Fine Tuning Studio is available to Cloudera AI customers as an Accelerator for Machine Learning Projects (AMP), right from Cloudera’s AMP catalog. Install and try Fine Tuning Studio following the instructions for deploying this AMP right from the workspace.

As we can see, our rougueL metric (similar to an exact match but more complex) of the 1B model adapter is significantly higher than the same metric for an untrained 7B model. As simple as that, we trained an adapter for a small, cost-effective model that outperforms a significantly larger model. Even though the larger 7B model may perform better on generalized tasks, the non-fine-tuned 7B model has not been trained on the available “actions” that the model can take given a specific customer input, and therefore would not perform as well as our fine-tuned 1B model in a production environment.

Accelerating Fine Tuned LLMs to Production

As we saw, Fine Tuning Studio enables anyone of any skill level to train a model for any enterprise-specific use case. Now, customers can incorporate cost-effective, high-performance, fine-tuned LLMs into their production-ready AI workflows more easily than ever, and expose models to customers while ensuring safety and compliance. After training a model, users can use the Export Model feature to export trained adapters as a Cloudera Machine Learning model endpoint, which is a production-ready model hosting service available to Cloudera AI (formerly known as Cloudera Machine Learning) customers. Fine Tuning Studio ships with a powerful example application showing how easy it is to incorporate a model that was trained within Fine Tuning Studio into a full-fledged production AI application.

Train a New Adapter. With a dataset, model, and prompt selected, let’s train a new adapter for our bloom-1b1 model, which can more accurately handle customer requests. On the Train a New Adapter page, we can fill out all relevant fields, including the name of our new adapter, dataset to train on, and training prompt to use. For this example, we had two L40S GPUs available for training, so we chose the Multi Node training type. We trained on 2 epochs of the dataset and trained on 90% of the dataset, leaving 10% available for evaluation and testing.

Monitor the Training Job. On the Monitor Training Jobs page we can track the status of our training job, and also follow the deep link to the Cloudera Machine Learning Job directly to view log outputs. Two L40S GPUs and 2 epochs of our bitext dataset completed training in only 10 minutes.

Fine Tuning Studio Key Capabilities

Once the Fine Tuning Studio is deployed to any enterprise’s Cloudera’s AI Workbench, users gain instant access to powerful tools within Fine Tuning Studio to help organize data, test prompts, train adapters for LLMs, and evaluate the performance of these fine-tuning jobs:

Track all your resources for fine tuning and evaluating LLMs. Fine Tuning Studio enables users to track the location of all datasets, models, and model adapters for training and evaluation. Datasets that are imported from both Hugging Face and from a Cloudera AI project directly (such as a custom CSV), as well as models imported from multiple sources such as Hugging Face and Cloudera’s Model Registry, are all synergistically organized and can be used throughout the tool – completely agnostic of their type or location.
Build and test training and inference prompts. Fine Tuning Studio ships with powerful prompt templating features, so users can build and test the performance of different prompts to feed into different models and model adapters during training. Users can compare the performance of different prompts on different models.
Train new adapters for an LLM. Fine Tuning Studio makes training new adapters for an LLM a breeze. Users can configure training jobs right within the UI, either leave training jobs with their sensible defaults or fully configure a training job down to custom parameters that can be sent to the training job itself. The training jobs use Cloudera’s Workbench compute resources, and users can track the performance of a training job within the UI. Furthermore, Fine Tuning Studio comes with deep MLFlow experiments integration, so every metric related to a fine tuning job can be viewed in Cloudera AI’s Experiments view.
Evaluate the performance of trained LLMs. Fine Tuning Studio ships with several ways to test the performance of a trained model and compare the performance of models between one another, all within the UI. Fine Tuning Studio provides ways to quickly test the performance of a trained adapter with simple spot-checking, and also provides full MLFlow-based evaluations comparing the performance of different models to one another using industry-standard metrics. The evaluation tools built into the Fine Tuning Studio allow AI professionals to ensure the safety and performance of a model before it ever reaches production.
Deploy trained LLMs to production environments. Fine Tuning Studio ships natively with deep integrations with Cloudera’s AI suite of tools to deploy, host, and monitor LLMs. Users can immediately export a fine-tuned model as a Cloudera Machine Learning Model endpoint, which can then be used in production-ready workflows. Users can also export fine tuned models into Cloudera’s new Model Registry, which can later be used to deploy to Cloudera AI’s new AI Inferencing service running within a Workspace.
No-code, low-code, and all-code solutions. Fine Tuning Studio ships with a convenient Python client that makes calls to the Fine Tuning Studio’s core server. This means that data scientists can build and develop their own training scripts while still using Fine Tuning Studio’s compute and organizational capabilities. Anyone with any skill level can leverage the power of Fine Tuning Studio with or without code.

An End-to-End Example: Ticketing Support Agent

To show how easy it is for GenAI builders to build and deploy a production-ready application, let’s take a look at an end-to-end example: fine tuning an event ticketing customer support agent. The goal is to fine tune a small, cost-effective model that , based on customer input, can extract an appropriate “action” (think API call) that the downstream system should take for the customer. Given the cost constraints of hosting and infrastructure, the goal is to fine tune a model that is small enough to host on a consumer GPU and can provide the same accuracy as a larger model.

Data Preparation. For this example, we will use the bitext/Bitext-events-ticketing-llm-chatbot-training-dataset dataset available on HuggingFace, which contains pairs of customer input and desired intent/action output for a variety of customer inputs. We can import this dataset on the Import Datasets page.

Model Selection. To keep our inference footprint small, we will use the bigscience/bloom-1b1 model as our base model, which is also available on HuggingFace. We can import this model directly from the Import Base Models page. The goal is to train an adapter for this base model that gives it better predictive capabilities for our specific dataset.

Creating a Training Prompt. Next, we’ll create a prompt for both training and inference. We can utilize this prompt to give the model more context on possible selections. Let’s name our prompt better-ticketing and use our bitext dataset as the base dataset for the prompt. The Create Prompts page enables us to create a prompt “template” based on the features available in the dataset. We can then test the prompt against the dataset to make sure everything is working properly. Once everything looks good, we hit Create Prompt, which activates our prompt usage throughout the tool. Here’s our prompt template, which uses the instruction and intent fields from our dataset:

Meet Michelle Hoover, Cloudera’s new SVP of Global Alliances and Channels

Frank O’Dowd — Tue, 05 Nov 2024 16:00:00 UTC

Cloudera’s partner ecosystem delivers best-of-breed technology solutions to joint customers from the biggest names in the industry and is a core pillar of the company’s growth strategy.

Cloudera is committed to fostering collaboration with partners, growing relationships, and innovating for the future. To elevate Cloudera’s partner ecosystem, the company recently announced the promotion of Michelle Hoover to Senior Vice President of Global Alliances & Channels.

Michelle brings over 20 years of experience to her role, having managed software alliances and partners at many key industry organizations, including two years at Cloudera. Her extensive background includes management and executive positions at Confluent, Red Hat, and Oracle. Most recently, she was VP of Cloud & AI Ecosystem Partners at Cloudera. Michelle’s deep partnership expertise and strong relationships within the data and AI ecosystem make her a great leader of the Cloudera alliances and partner channels strategies.

Let’s get to know Cloudera’s newest SVP, Michelle Hoover.

Michelle, congratulations on your new position. What excites you most about your new role?

I’m truly excited to step into this position and work toward improving and growing our partner ecosystem because of the position Cloudera and our partners are in today and the opportunities that lie ahead.

Cloudera is the only true hybrid data platform that enables enterprises to analyze, control, and modernize data, analytics, and AI. Organizations recognize the value of hybrid data platforms that run anywhere and create true value for our customers.

But we don’t do this alone. We work closely with our partners to ensure that the treasure trove of data our customers hold can be easily managed into actionable insights no matter where the data resides. This positions the partner ecosystem well for the future by creating an incredible opportunity to generate services and grow with us.

What is your vision for the Cloudera partner ecosystem as you take on this role?

One of my goals is to build a robust system integrator partner business. Regional and global system integrators are critical to making decisions as we work with large enterprise customers who rely on them.

We must double down on our focus on that community of partners, along with independent software vendors, who are more important than ever to our customers with the advent of AI and new applications. It’s my objective to make sure those two communities of partners are successful with Cloudera. We can only be as successful as our partner ecosystem, so it is fundamental to our growth strategy to help partners excel with customers.

Our diverse partner set delivers the best portfolio of solutions to make it easy for customers to run their businesses in the cloud - whether that’s with AWS, Microsoft, or Google – or in a hybrid environment with Dell, IBM, and others. We want to help our customers leverage our partner ecosystem to make it easier to solve their problems and address their specific needs.

The key to partnerships is making sure it’s a win-win situation. How can our partners be successful and build a robust business around our customers? It’s important to explore new channels and routes to do that.

What makes Cloudera such a unique place to work?

The people are what really sets Cloudera apart. Everyone at Cloudera is passionate and engaged in collaboration across lines of business to deliver the most innovative world-class solutions for our customers in any environment.

That passion is something I have seen extend across the culture at Cloudera as well. Take, for example, our Cloudera Cares program. Employees in this program volunteer to help our surrounding communities through acts of service and give back to those in need. These initiatives embody the collective spirit of Cloudera employees worldwide. It’s a pillar of our company and builds a strong and rewarding culture that I’m proud to be a part of.

What advice would you give to partners when it comes to rapidly growing technologies such as AI?

My advice is to align with partners you can trust and know that it’s more important than ever to be able to trust your data.

Cloudera is uniquely positioned in that way. It offers technologies to help organizations trust their data, and it will continue to lead in strategic partnerships. We offer products and solutions to help bring AI models to your data versus bringing the data to the AI model.

When considering a partner vendor, remember that your customers innovate and take advantage of newer technologies, but you also need to know that the solutions you build and the vendors you partner with will be around for a long time.

Trust your data and trust the vendors you partner with to manage that data.

What advice would you give your younger self or young professionals today?

Early in my career, I was told that everyone you meet can help you in some way. There’s value in everybody.

It’s something that I believe is particularly relevant in my work in the partner space today. Just as every individual you meet or work with can help you, in the partner ecosystem, every partner has the potential to help you support your customers. Cloudera partners are important - both current partners and new partners that may emerge.

How do you like to spend your free time?

I love the outdoors, nature, competitive sports, like tennis, and spending time with my husband and two young adult kids. I’ve also been trying to take more opportunities to explore new areas of interest such as gardening and travel. My travel bucket list includes Australia and New Zealand, so I would love to visit soon.

Learn more about the Cloudera partner ecosystem.

Unlocking Faster Insights: How Cloudera and Cohere can deliver Smarter Document Analysis

Kevin Talbert,Abhas Ricky,Nashua Springberry — Mon, 04 Nov 2024 09:57:00 UTC

Today we are excited to announce the release of a new Cloudera Accelerator for Machine Learning (ML) Projects (AMP) for PDF document analysis, “Document Analysis with Command R and FAISS”, leveraging Cohere’s Command R Large Language Model (LLM), the Cohere Toolkit for retrieval augmented generation (RAG) applications, and Facebook’s AI Similarity Search (FAISS).

Document analysis is crucial for efficiently extracting insights from large volumes of text. It has wide-ranging applications including legal research, market analysis, and scientific research. For example, cancer researchers can use document analysis to quickly understand the key findings of thousands of research papers on a certain type of cancer, helping them identify trends and knowledge gaps needed to set new research priorities.

Before the widespread use of LLMs, document analysis was primarily conducted through manual methods and rule-based systems. These methods were often time-consuming, labor-intensive, and limited in their ability to handle complex language nuances and unstructured data.

The development of advanced LLMs, such as Cohere’s Command R, and AI Platforms, such as Cloudera Artificial Intelligence (CAI), made it easier than ever for enterprises to deploy high-impact document analysis applications. We created our “Document Analysis with Command R and FAISS” AMP to make that process even easier.

Cohere’s Command R Family of Models are advanced LLMs that leverage state-of-the-art transformer architectures to handle complex text generation and understanding tasks with high accuracy and speed, making them suitable for enterprise-level applications and real-time processing needs. They were made to be easily integrated into various applications, offering scalability and flexibility for both small-scale and large-scale implementations. The Cohere Toolkit is a collection of pre-built components enabling developers to quickly build and deploy retrieval augmented generation (RAG) applications.

CAI is a robust platform for data scientists and Artificial Intelligence (AI) practitioners to build, train, deploy, and manage models and applications at scale. AMPs are one-click deployments of commonly used AI/ML-based prototypes that reduce time to value by providing high-quality reference examples leveraging Cloudera’s research and expertise to showcase cutting-edge AI applications.

This AMP is a single project launched from CAI that automatically deploys an application, loads vectors into a FAISS vector store, and enables interfacing with Cohere’s Command R LLM to perform document analysis. The image below illustrates the Retrieval-Augmented Generation (RAG) architecture used by the AMP, and how the components of Cohere, FAISS, the user’s knowledge base, and Streamlit work together to create a ready-to-use Generative AI use case.

This project brings together several exciting new themes to Cloudera’s AMP library, especially in terms of RAG. Facebook’s open source FAISS is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. By leveraging it in this AMP, Cloudera demonstrates its flexibility in vector search applications and adds this capability on top of its adoption of Milvus, Chroma, Pinecone, and others in its existing AMP catalog.

Additionally, the AMP leverages LangChain’s AI toolkit that takes advantage of custom connectors to Cohere and FAISS to enable advanced semantic search and summarization capabilities in a clean and easy to understand code base. It also utilizes Cohere's embed-english-v3.0 model which is tailor made for generating high-quality text embeddings from English language inputs and excels in capturing semantic nuances. By using Streamlit for the UI, users have a simple starting template, which can be the basis for a full-scale production deployment.

More on how the “Document Analysis with Command R and FAISS” AMP works and how to deploy it can be found in this Github Repository.

Be on the lookout for more news from Cohere and Cloudera as we work together to make it easier than ever to deploy high-performance AI applications.

Looking Back on Our First Women Leaders in Technology Event

Debbie Kruger — Fri, 01 Nov 2024 15:00:00 UTC

Over the last few months, Cloudera has been traversing the globe hosting our EVOLVE24 event series. It has been a time full of excitement, innovative ideas, and connection with our partners and customers. It also provided a moment for us to launch an important initiative for Cloudera: our Women Leaders in Technology (WLIT) initiative.

WLIT is a global initiative developed to create a forum wherein women and allies in tech leadership roles can connect with and demonstrate to women and girls that it is possible to enter, grow, and thrive in the tech industry. It aims to shine a light on the gender imbalance in the industry, provide insight into policies and programs that help foster a stronger more diverse workforce, and create networking opportunities for women. This program goes beyond the critical work of our Womens+ ERG which aims to cultivate among Cloudera employees an inclusive environment that supports and encourages women to advance their skills and leadership potential through connection, mentorship, collaboration, recruiting, retention, and discussion. WLIT is industry-wide and seeks to connect, inspire, and elevate Clouderans as well as cross-sector leaders.

During EVOLVE New York, the WLIT group came together for a luncheon panel designed to kick off a conversation among the women—and allies—in the room and in the tech space more broadly about the challenges faced by women in tech and how to overcome them. The panel discussion included: Manasi Vartak, of Cloudera, Nichola Hammerton of Deutsche Bank, and Melissa Dougherty of AWS. Moderating the conversation was Forbes Reporter, Zoya Hasan. Zoya edits the Forbes 30 Under 30 lists, including U30 U.S., Europe, and Local, co-authors a weekly newsletter, and writes features on young founders.

Let’s dive into the panel discussion and a few of the biggest takeaways from our participants.

Building Inclusive Data-Driven Organizations: Leadership Strategies for the Modern Workplace

As it stands, women currently account for approximately 25% of the technology workforce. And that number only gets smaller the further up you advance in your career, with women holding just 11% of executive roles in the technology space. But, it’s about much more than a number. As Zoya pointed out in her opening remarks, women in technology are not just a statistic and we should be doing everything we can to flip prevailing assumptions to demonstrate that it’s not women in technology, instead, it is just people in technology who happen to be women.

As we started the discussion, our panelists covered several pressing issues surrounding how women leaders find success in building inclusive and data-driven organizations. The speakers covered everything from different leadership approaches to overcoming systemic barriers and driving organizational transformation and inclusion. Here are a few key takeaways:

It’s never too early (or late) to enter into a STEAM field. This was one of the key points raised during the discussion. There is tremendous value in encouraging women to get involved early, whether that’s in technology, mathematics, or other STEAM-related subjects. Likewise, while getting started early is important, it’s also not the only way to get into these disciplines. When it comes to carving out a career in technology, it’s never too late to take the first step.

Find a mentor who can help you grow in your career. For women looking to succeed professionally in technology, having a mentor can be incredibly impactful. Mentors, both women and men, bring plenty of experience and insight from their own lived experiences that can help you better understand how to handle various situations, deepen your networks, and provide trusted guidance in a competitive field. Cultivating a company culture that prioritizes and facilitates mentorship and sponsorship is imperative.

Don’t put limits on what you’re capable of. We’re all susceptible to second-guessing and self-doubt. But it’s important to recognize that feeling and work on overcoming it. Whether it’s thinking a project or task is too challenging for your skills or that you’re underqualified for a job you want (or maybe even already have), imposter syndrome is a feeling all the panelists could relate to but agreed it is rarely, if ever, justified.

“As women leaders in this space, it’s so important to share our experiences and learnings with other women to help encourage them in their own careers. Our first WLIT event has been incredibly rewarding and having the opportunity to connect with so many people throughout the luncheon goes to show just how important this community is.” – Manasi Vartak, Chief AI Architect, Cloudera

Our first WLIT event was an incredible experience, and we were so thrilled to see how engaged attendees were throughout the luncheon and how active the Q&A portion was. With the launch of our WLIT group, we hope to grow this community and support women throughout their technology careers—at Cloudera, our partner organizations, our customers, and beyond.

Find out more about Cloudera’s Women Leaders in Technology initiative and join our LinkedIn group to get involved.

#ClouderaLife Employee Spotlight: Julia Ostrowski

Debbie Kruger — Wed, 30 Oct 2024 15:00:00 UTC

In this Employee Spotlight, we sat down with Julia Ostrowski to learn about her time at Cloudera, what she loves about her job, her experience on both sides of Cloudera’s mentorship program, and her impressive volunteer work.

Meet Julia Ostrowski

Julia is the Director of Enterprise Entitlement at Cloudera and has been with the company since 2019, joining via Hortonworks. Outside of her typical responsibilities, Julia is deeply involved in various philanthropic initiatives within Cloudera as well as in her own free time.

“Whenever there is an opportunity to do something good in the Santa Clara office for a couple of hours, I always sign up for it, no matter what it is,” Julia said. “There are so many rewarding initiatives that Cloudera puts together, from mentorships to taking care of animals or helping people in need. It’s a great part of Cloudera’s culture.”

Julia’s Cloudera Career Journey

When Julia first arrived at Cloudera, she worked in Support as the manager of the Support Product Management team, helping ensure Cloudera’s COEs and Support Managers were able to provide world-class technical support to our customers. In 2022 she was offered the opportunity to join the growing and dynamic IT department under Olivia Keenaghan as a part of Business Applications. Now, she’s responsible for managing a small team that reviews the entire entitlement business process, taking ownership of any parts of the lifecycle that were distributed and/or unowned.

This was quite the career shift, but she described switching to the IT department as “fantastic,” crediting an amazing manager, a great team, and rewarding responsibilities.

“Making the change was such a welcoming experience,” she said. “I still get to my laptop each day with a smile on my face. I like the challenge of never knowing exactly what each day will bring, and I get excited to solve complex problems to help Cloudera grow. I have always said that at Cloudera your career is in your hands. I've seen people go from one team to a completely different team with such success and grace, totally supported by their former and new managers. You just don’t see that at every company, unfortunately”

Julia’s team consisting of Ken McCarthy and Susy Mena-McCarthy tracks and manages all of Cloudera’s public cloud metering for billing purposes and consistently optimizes processes to support business needs. Their current big project is Project Lionheart: the replacement of a licensing server to allow for more flexibility and much more control over the insight gained into customer usage over their on-premises product.

While Julia has a real passion for solving problems and driving value during her day-to-day job, she is always eager to leverage Cloudera’s extensive giving and volunteering programs to volunteer her time to help others.

Mentoring and Volunteering at Cloudera

Cloudera offers a mentorship program to help employees improve and navigate their careers. They gain advice, set goals, and learn how to handle difficult situations at work, such as a project not going according to plan. Julia, who shared that she’s benefited from being a mentee herself both at Cloudera currently, and at previous organizations, volunteers to mentor up to four colleagues at a time and views this act as paying it forward.

“I really enjoy it,” she said. “I've gotten so much out of having a mentor that I feel obligated to be a mentor myself, but fortunately, I do love it. I appreciate being able to meet new people at Cloudera and do my best to help guide them in their careers.”

Julia says that a lot of the time, the key is to keep a good attitude, but the biggest piece of advice she offers is that uncertainty is not always a bad thing as long as you have a path forward.

“It's okay not to know exactly where you will be in ten years, five years, or even two years,” she said. “Expecting to have your whole career path mapped out down to the last rock or pebble is a very old-school way of thinking, but you should always have your next goal in mind. It doesn't have to be a new title or a particular salary, but it could be a new skill set that you want to add to your repertoire, for example. I always ask mentees, ‘What is your next goal?’ and work with them on finding a path there.”

Cloudera Cares Volunteer Program

Julia also lends her time to philanthropic initiatives like Cloudera’s Teen Accelerator Program, an initiative organized by Cloudera’s volunteer group, Cloudera Cares.

The Teen Accelerator Program is a partnership with the Boys & Girls Club of America in both Tennessee and the San Francisco Bay Area. This program offers students a six-week paid internship program at Cloudera and 1:1 employee mentorship, helping facilitate opportunities in corporate America for high school students in under-resourced communities.

At first, Julia explained, she was a little nervous to start working with the program because she had not had much interaction with teens. However, despite that initial nervousness, she dove right in and was hooked instantly as the program tapped into her passion for helping others.

“It was such a rewarding experience,” Julia said. “I asked myself, ‘What did I want to know about the world of business when I was 16 years old?’ The first time I was in an office in my early years, I didn’t know how to dress or what to do. And there are so many simple questions. I remember not knowing whether you have to raise your hand to go to the bathroom or what ‘CEO’ means. I put myself in those shoes, and I’m glad I did. It was honestly fantastic.”

During the teen mentorship, Julia partnered with Cy Jervis (Senior Manager, Support Knowledge Programs) to virtually guide the teen mentee through the ins and outs of Cloudera, introducing him to key departments, and allowing him to ask questions and engage with colleagues. By the end of the program, he left with a solid understanding of how modern tech companies operate.

“It was so interesting to hear all our colleagues go in-depth sharing what they do and what their department works on that I was even learning right along with him,” Julia said. “I wasn’t expecting to have so much fun, but he was so engaged, and it was just great.”

Julia also takes advantage of another Cloudera volunteer initiative by spending some of her time with Second Harvest, a food distribution group in her local area of San Jose, California. Through this volunteer program, a group of Cloudera employees (championed by Executive Briefing Program Manager Amanda Allen) assist the organization in distributing food to people in need.

“Cloudera has quite a few volunteer opportunities, and this one is one of my absolute favorites,” Julia said. “I take advantage of every opportunity to take part in it. Whether it’s sunny or raining out, it doesn’t matter. I get to go and spend time with other fantastic Clouderans who I wouldn’t normally see, and I truly enjoy it. This is a great organization that puts food in the hands of people who really need it, and I am so appreciative of Amanda’s strong advocacy in organizing these opportunities for Cloudera.”

Julia is truly passionate about her philanthropic efforts and helping those in need both through Cloudera and outside of work. She continues to foster dogs and cats for local animal rescues (and occasionally they find a permanent home with her!). She loves being able to make a difference in her role in Enterprise Entitlement and appreciates the flexibility Cloudera granted her to explore a new position in IT that she truly enjoys.

Read our last employee spotlight here.

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Venkat Rajaji — Wed, 23 Oct 2024 16:00:00 UTC

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI. One of the most important innovations in data management is open table formats, specifically Apache Iceberg, which fundamentally transforms the way data teams manage operational metadata in the data lake. By maintaining operational metadata within the table itself, Iceberg tables enable interoperability with many different systems and engines.

The Iceberg REST catalog specification is a key component for making Iceberg tables available and discoverable by many different tools and execution engines. It enables easy integration and interaction with Iceberg table metadata via an API and also decouples metadata management from the underlying storage. It is a critical feature for delivering unified access to data in distributed, multi-engine architectures.

That’s why Cloudera added support for the REST catalog: to make open metadata a priority for our customers and to ensure that data teams can truly leverage the best tool for each workload– whether it’s ingestion, reporting, data engineering, or building, training, and deploying AI models.

Snowflake and Cloudera: Better Together

In the spirit of open data and engine freedom, Cloudera is excited to partner with Snowflake to bring the most comprehensive open data lakehouse, and the freedom it provides, to all of our customers.

Snowflake is one of the most popular platforms for data sharing, business intelligence (BI), reporting, and dashboarding due to its ease of use, self-service capabilities, and the performance of its execution engine. Snowflake is a prominent contributor to the Iceberg project, understanding the value it brings to its customers in terms of interoperability, data management, and data governance.

By leveraging Cloudera to build and manage Iceberg tables, Snowflake customers can make a single, consistent, and accurate view of their data available for their BI users without moving or copying data to other systems. They can take advantage of Cloudera’s true hybrid architecture and even provide easy access to on-premises data sources by leveraging Apache Ozone.

They can also leverage a single view of their data for any other Cloudera or third-party engine for other analytic workloads, including streaming, advanced analytics, and AI/ML.

With Snowflake’s engine, Cloudera customers get easy self-service access to their data for BI and interactive dashboards anywhere their data lives, including multiple public clouds and on-premises.

The Cloudera + Snowflake Advantage

The partnership between Cloudera and Snowflake gives several advantages to joint customers:

Lower Total Cost of Ownership: Reducing data copies and data movement while guaranteeing engine and infrastructure freedom enables customers to reduce storage, compute, and operational costs of maintaining their analytics stack.
Choose the best tool for the job: By keeping data in open formats, customers can choose the environment and tools that provide the most ideal balance of cost and performance on a workload-by-workload basis. Customers have access to multiple public and private clouds and on-premises data stores, and they can use any engine that can read or write to Iceberg tables.
True hybrid: Customers have full access to data stores on-premises and in every cloud without undertaking an expensive and complex migration project. They are free to choose the infrastructure best suited for each workload. Cloudera Shared Data Experience (SDX) enables customers to enforce consistent security and governance policies across all of their environments –even if data moves across clouds.

Try Cloudera and Snowflake Today

Together, Cloudera and Snowflake deliver the most comprehensive hybrid open data lakehouse. It enables customers to confidently address virtually any analytic use case, from self-service BI that delivers actionable intelligence to business users to AI that transforms business processes and powers differentiated customer experiences.

Both platforms are free to try today. Try Cloudera’s open data lakehouse on AWS for 5 days for free here, or try Snowflake for free for 30 days here.

The Evolution of LLMOps: Adapting MLOps for GenAI

Meeta Dash — Tue, 22 Oct 2024 08:38:00 UTC

In recent years, machine learning operations (MLOps) have become the standard practice for developing, deploying, and managing machine learning models. MLOps standardizes processes and workflows for faster, scalable, and risk-free model deployment, centralizing model management, automating CI/CD for deployment, providing continuous monitoring, and ensuring governance and release best practices.

However, the rapid rise of large language models (LLMs) has introduced new challenges around computing cost, infrastructure needs, prompt engineering, and other optimization techniques, governance, and more. This requires an evolution of MLOps into what we now call “large language model operations” (LLMOps).

Let’s explore some key differentiating areas where LLMOps introduce novel processes and workflows compared to traditional MLOps.

Expanding the Builder Persona: Traditional ML applications largely involve data scientists building models, with ML engineers focusing on pipelines and operations. With LLMs, this paradigm has shifted. Data scientists are no longer the only ones involved—business teams, product managers, and engineers play a more active role, particularly because LLMs lower the barrier to entry for AI-driven applications. The rise of both open-source models (e.g.; Llama, Mistral) and proprietary services (e.g., OpenAI) have removed much of the heavy lifting around model building and training. This democratization is a double-edged sword. While LLMs can be easily integrated into products, new challenges like compute cost, infrastructure needs, governance, and quality must be addressed.
Low-Code/No-Code as a Core Feature: In MLOps, tools were primarily designed for data scientists, focusing on APIs and integrations with Python or R. With LLMOps, low-code/no-code tooling has become essential to cater to a broader set of users and make LLMs accessible across various teams. A key trend is how LLMOps platforms now emphasize user-friendly interfaces, enabling non-technical stakeholders to build, experiment, and deploy LLMs with minimal coding knowledge.
More Focus on Model Optimization: When using LLMs, teams often work with general-purpose models, fine-tuning them for specific business needs using proprietary data. Therefore, model optimization techniques are becoming central to LLMOps. These techniques, such as quantization, pruning, and prompt engineering, are critical to refining LLMs to suit targeted use cases. Optimization not only improves performance but is essential for managing the cost and scalability of LLM applications.
Prompt Engineering: A completely new concept introduced by LLMOps is prompt engineering—the practice of crafting precise instructions to guide the model’s behavior. This is both an art and science, serving as a key method for improving the quality, relevance, and efficiency of LLM responses. Tools for prompt management include prompt chaining, playgrounds for testing, and advanced concepts like meta-prompting techniques where users leverage one prompt to improve another prompt, which should be part of an LLMOps stack. Techniques like Chain of Thoughts and Assumed Expertise are becoming standard strategies in this new domain.
The Emergence of Retrieval-Augmented Generation (RAG): Unlike traditional ML models, many enterprise-level GenAI use cases involving LLMs rely on retrieving relevant data from external sources, rather than solely generating responses from pre-trained knowledge. This has led to the rise of Retrieval-Augmented Generation (RAG) architectures, which integrate retrieval models to pull information from enterprise knowledge bases, and then rank and summarize that information using LLMs. RAG significantly reduces hallucinations and offers a cost-effective way to leverage enterprise data, making it a new cornerstone of LLMOps. Building and managing RAG pipelines is a completely new challenge that wasn’t part of the MLOps landscape. In the LLMOps life cycle, building and managing a RAG pipeline has replaced traditional model training as a key focus. While fine-tuning LLMs is still critical (and similar to ML model training), it brings new challenges around infrastructure and cost. Additionally, the use of enterprise data in RAG pipelines creates new data management challenges. Capabilities like vector storage, semantic search, and embeddings have become essential parts of the LLMOps workflow—areas that were less prevalent in MLOps.
Evaluation and Monitoring is Less Predictable: Evaluating and monitoring LLMs is more complex than with traditional ML models. LLM applications are often context-specific, requiring significant input from subject matter experts (SMEs) during evaluation. Auto-evaluation frameworks, where one LLM is used to assess another, are beginning to emerge. However, challenges like the unpredictability of generative models and issues like hallucination remain difficult to address. To navigate these challenges, many companies first deploy internal LLM use cases, such as agent assistants, to build confidence before launching customer-facing applications.
Risk Management and Governance: Model risk management has always been a critical focus for MLOps, but LLMOps introduces new concerns. Transparency into what data LLMs are trained on is often murky, raising concerns about privacy, copyrights, and bias. Additionally, making LLMs auditable and explainable remains an unsolved problem. Enterprises are beginning to adopt AI risk frameworks, but best practices are still evolving. For now, focusing on thorough evaluation, continuous monitoring, creating a catalog of approved models, and establishing governance policies are essential first steps. AI governance will be a central pillar of LLMOps tooling going forward.

As enterprises adopt LLMs, the shift from MLOps to LLMOps is essential for addressing their unique challenges. LLMOps emphasizes prompt engineering, model optimization, and RAG. It also introduces new complexities in governance, risk management, and evaluation, making LLMOps crucial for successfully scaling and managing these advanced models in production.

For enterprises interested in learning more about leveraging LLMs, click here.

Cloudera Lakehouse Optimizer Makes it Easier Than Ever to Deliver High-Performance Iceberg Tables

Bill Zhang — Thu, 10 Oct 2024 08:20:00 UTC

The open data lakehouse is quickly becoming the standard architecture for unified multifunction analytics on large volumes of data. It combines the flexibility and scalability of data lake storage with the data analytics, data governance, and data management functionality of the data warehouse. Open table formats are a key component of this architecture, as they provide many of the capabilities of traditional data warehousing directly on data lake storage, and Apache Iceberg is quickly becoming the standard format for vendors and customers alike.

Iceberg has many features that drastically reduce the work required to deliver a high-performance view of the data, but many of these features create overhead and require manual job execution to optimize for performance and costs. To make the data lakehouse even easier to manage, Cloudera is introducing Cloudera Lakehouse Optimizer, which intelligently automates Iceberg table maintenance so many of these jobs automatically run in the background. Let’s take a look at some of the features in Cloudera Lakehouse Optimizer, the benefits they provide, and the road ahead for this service.

Cloudera Lakehouse Optimizer Features

Cloudera Lakehouse Optimizer runs automatic, policy-based Iceberg table optimization tasks based on user configurations and Iceberg table statistics. Automatic optimization jobs include:

Compaction: Companies often ingest many small files, such as with micro batching or streaming ingestion, and reading multiple small files can negatively impact query performance. Compaction is a process that rewrites small files into larger ones to improve performance. Cloudera Lakehouse Optimizer autonomously determines the best time to automatically compact data files so users always have the best performance from their tables. It also prioritizes the tables that must be optimized based on the usage patterns so we are only optimizing when there is real ROI.

Table Cleanup: As tables grow, they often accumulate unused data files, manifest files, and snapshots that aren’t needed anymore. Users may want to perform table maintenance functions, like expiring snapshots, removing old metadata files, and deleting orphan files, to optimize storage utilization and improve performance. Cloudera Lakehouse Optimizer will autonomously determine the best time to perform these maintenance tasks and ensure tables always utilize optimal storage.

In addition to optimization and policy-based controls, Cloudera Lakehouse Optimizer features observability for optimization jobs, so data teams can see and understand how their policies are impacting the health and performance of their tables and storage.

The Benefits

Cloudera Lakehouse Optimizer provides several benefits for companies managing Iceberg tables:

They experience lower Total Cost of Ownership (TCO) as a result of optimizing their storage footprint and reducing query runtimes.
They can deliver a high-performance of their data by reducing the number of files that need to be read in a query.
They reduce data management effort and overhead by automating some of the most tedious lakehouse maintenance tasks.

The Road Ahead

The features we are launching in Cloudera Lakehouse Optimizer solve two very important challenges for companies who want to move to an open data lakehouse architecture. This is just the first step in advancing Cloudera’s vision of making it easier than ever to deliver a high-performance view of your data. Down the road, we plan to add support for more optimization features, including reorganizing partitions to solve data distribution problems that can impact query performance, and query optimization.

The goal for all of these features is to ensure that Cloudera is the best platform for managing and delivering access to Iceberg tables, and that the path to adopting an open data lakehouse is easier than ever.

Our Open Data Lakehouse is Free to Try

You can try Cloudera’s open data lakehouse on AWS for free today. Go sign up for our 5-day trial here to see for yourself.

Deploy and Scale AI Applications With Cloudera AI Inference Service

Robert Hryniewicz,Peter Ableda,Priyank Patel,Zoram Thanga — Tue, 08 Oct 2024 16:00:00 UTC

We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scalable AI inference.

Background

The generative AI landscape is evolving at a rapid pace, marked by explosive growth and widespread adoption across industries. In 2022, the release of ChatGPT attracted over 100 million users within just two months, demonstrating the technology's accessibility and its impact across various user skill levels.

By 2023, the focus shifted towards experimentation. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. These innovations pushed the boundaries of what generative AI could achieve.

Now, in 2024, generative AI is moving into the production phase for many companies. Businesses are now allocating dedicated budgets and building infrastructure to support AI applications in real-world environments. However, this transition presents significant challenges. Enterprises are increasingly concerned with safeguarding intellectual property (IP), maintaining brand integrity, and protecting client confidentiality while adhering to regulatory requirements.

A major risk is data exposure — AI systems must be designed to align with company ethics and meet strict regulatory standards without compromising functionality. Ensuring that AI systems prevent breaches of client confidentiality, personally identifiable information (PII), and data security is crucial for mitigating these risks.

Enterprises also face the challenge of maintaining control over AI development and deployment across disparate environments. They require solutions that offer robust security, ownership, and governance throughout the entire AI lifecycle, from POC to full production. Additionally, there is a need for enterprise-grade software that streamlines this transition while meeting stringent security requirements.

To safely leverage the full potential of generative AI, companies must address these challenges head-on. Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.

At Cloudera, we focus on simplifying the development and deployment of generative AI models for production applications. Our approach provides accelerated, scalable, and efficient infrastructure along with enterprise-grade security and governance. This combination helps organizations confidently adopt generative AI while protecting their IP, brand reputation, and compliance with regulatory standards.

Cloudera AI Inference Service

The new Cloudera AI Inference service provides accelerated model serving, enabling enterprises to deploy and scale AI applications with enhanced speed and efficiency. By leveraging the NVIDIA NeMo platform and optimized versions of open-source models like Llama 3 and Mistral, businesses can harness the latest advancements in natural language processing, computer vision, and other AI domains.

Cloudera AI Inference: Scalable and Secure Model Serving

The Cloudera AI Inference service offers a powerful combination of performance, security, and scalability designed for modern AI applications. Powered by NVIDIA NIM, it delivers market-leading performance with substantial time and cost savings. Hardware and software optimizations enable up to 36 times faster inference with NVIDIA accelerated computing and nearly four times the throughput on CPUs, accelerating decision-making.

Integration with NVIDIA Triton Inference Server further enhances the service. It provides standardized, efficient deployment with support for open protocols, reducing deployment time and complexity.

In terms of security, the Cloudera AI Inference service delivers robust protection and control. Customers can deploy AI models within their virtual private cloud (VPC) while maintaining strict privacy and control over sensitive data in the cloud. All communications between the applications and model endpoints remain within the customer’s secured environment.

Comprehensive safeguards, including authentication and authorization, ensure that only users with configured access can interact with the model endpoint. The service also meets enterprise-grade security and compliance standards, recording all model interactions for governance and audit.

The Cloudera AI Inference service also offers exceptional scalability and flexibility. It supports hybrid environments, allowing seamless transitions between on-premises and cloud deployments for increased operational flexibility.

Seamless integration with CI/CD pipelines enhances MLOps workflows, while dynamic scaling and distributed serving optimize resource usage. These features reduce costs without compromising performance. High availability and disaster recovery capabilities help enable continuous operation and minimal downtime.

Feature Highlights:

Hybrid and Multi-Cloud Support: Enables deployment across on-premises*, public cloud, and hybrid environments, offering flexibility to meet diverse enterprise infrastructure needs.
Model Registry Integration: Seamlessly integrates with Cloudera AI Registry, a centralized repository for storing, versioning, and managing models, enabling consistency and easy access to different model versions.
Detailed Data and Model Lineage Tracking*: Ensures comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.
Enterprise-Grade Security: Implements robust security measures, including authentication, authorization*, and data encryption, helping ensure that data and models are protected both in transit and at rest.
Real-time Inference Capabilities: Provides real-time predictions with low latency and batch processing for large datasets, offering flexibility in serving AI models based on different needs.
High Availability and Dynamic Scaling: Features high availability configurations and dynamic scaling capabilities to efficiently handle varying loads while delivering continuous service.
Advanced Language Model: Support with pre-generated optimized engines for a diverse range of cutting-edge LLM architectures.
Flexible Integration: Easily integrate with existing workflows and applications. Developers are provided open inference protocol APIs for traditional ML models and with an OpenAI compatible API for LLMs.
Multiple AI Framework Support: Integrates seamlessly with popular machine learning frameworks such as TensorFlow, PyTorch, Scikit-learn, and Hugging Face Transformers, making it easy to deploy a wide variety of model types.
Advanced Deployment Patterns: Supports sophisticated deployment strategies like canary and blue-green deployments*, as well as A/B testing*, enabling safe and gradual rollouts of new model versions.
Open APIs: Provides standards-compliant, open APIs for deploying, managing, and monitoring online models and applications*, as well as for facilitating integration with CI/CD pipelines and other MLOps tools.
Performance Monitoring and Logging: Provides comprehensive monitoring and logging capabilities, tracking performance metrics such as latency, throughput, resource utilization, and model health, supporting troubleshooting and optimization.
Business Monitoring*: Supports continuous monitoring of key generative AI modeI metrics like sentiment, user feedback, and drift that are crucial for maintaining model quality and performance.

The Cloudera AI Inference service, powered by NVIDIA NIM microservices, delivers seamless, high-performance AI model inferencing across on-premises and cloud environments. Supporting open-source community models, NVIDIA AI Foundation models, and custom AI models, it offers the flexibility to meet diverse business needs. The service enables rapid deployment of generative AI applications at scale, with a strong focus on privacy and security, to help enterprises that want to unlock the full potential of their data with AI models in production environments.

* feature coming soon - please reach out to us if you have questions or would like to learn more.

The Global Impact of Cloudera in Our Daily Lives

Jeff Healey — Fri, 27 Sep 2024 16:00:00 UTC

Cloudera customers understand the potential impact of data, analytics, and AI on their respective businesses -- reducing costs, managing risk, improving customer satisfaction, and generating new business opportunities that help to increase market share.

But, what is the ultimate impact of all this effort and investment on each of us in our daily lives? At EVOLVE in Singapore, the Manila Electric Company, Meralco, won the Cloudera 2024 Data Impact Award in the Leadership and Transformation category for its customer-centric and data-driven transformation. In addition to literally powering countries, there were other impressive stories from award finalists and winners that focused on the human element. Specifically, the People and Society award category highlighted data’s positive influence on the world at large.

That got us thinking: How could we share our customer stories in a way that clearly articulates the impact of Cloudera on our daily lives? So we built an interactive tour showcasing that impact throughout a typical day. This tour features examples of data, analytics, and AI delivering memorable and tangible experiences for every-day people.

For example, SAIC-Volkswagen delivers insights and services to drivers through an application powered by Cloudera. In fact, 8 out of the top 10 global automakers use Cloudera to deliver a connected vehicle experience.

BT Group is among 7 of the top 10 global telecommunications companies that use Cloudera to improve the customer experience and operational efficiency. They process 5 times the data in a third of the time, enabling them to deliver better network performance to their customers.

And pharmaceutical companies like IQVIA use Cloudera to provide visibility across the R&D pipeline, accelerating the development of life-saving drugs.

These are just a couple of the stories we highlight in our interactive tour. You can see the rest for yourself here.

And if you want to learn why the world’s most recognized enterprises trust Cloudera for business-critical data, analytics, and AI initiatives, you can read their stories here.

Streamlining Generative AI Deployment with New Accelerators

Jacob Bengtson — Thu, 26 Sep 2024 07:58:00 UTC

In comparison to the original LLM Chatbot Augmented with Enterprise Data AMP, this version includes new features such as user document ingestion, automatic question generation, and result streaming. It also leverages Llama Index to implement the RAG pipeline.

To learn more, click here.

Overcoming the challenges of developing production ready Generative AI with four new ready-to-deploy Accelerators for ML Projects (AMPs)

The journey from a great idea for a Generative AI use case to deploying it in a production environment often resembles navigating a maze. Every turn presents new challenges—whether it’s technical hurdles, security concerns, or shifting priorities—that can stall progress or even force you to start over.

Cloudera recognizes the struggles that many enterprises face when setting out on this path, and that’s why we started building Accelerators for ML Projects (AMPs). AMPs are fully built out ML prototypes that can be deployed with a single click directly from Cloudera Machine Learning . AMPs enable data scientists to go from an idea to a fully working ML use case in a fraction of the time. By providing pre-built workflows, best practices, and integration with enterprise-grade tools, AMPs eliminate much of the complexity involved in building and deploying machine learning models.

In line with our ongoing commitment to supporting ML practitioners, Cloudera is thrilled to announce the release of five new Accelerators! These cutting-edge tools focus on trending topics in generative AI, empowering enterprises to unlock innovation and accelerate the development of impactful solutions.

Fine Tuning Studio

Fine tuning has become an important methodology for creating specialized large language models (LLM). Since LLMs are trained on essentially the entire internet, they are generalists capable of doing many different things very well. However, in order for them to truly excel at specific tasks, like code generation or language translation for rare dialects, they need to be tuned for the task with a more focused and specialized dataset. This process allows the model to refine its understanding and adapt its outputs to better suit the nuances of the specific task, making it more accurate and efficient in that domain.

The Fine Tuning Studio is a Cloudera-developed AMP that provides users with an all-encompassing application and “ecosystem” for managing, fine tuning, and evaluating LLMs. This application is a launcher that helps users organize and dispatch other Cloudera Machine Learning workloads (primarily via the Jobs feature) that are configured specifically for LLM training and evaluation type tasks.

PromptBrew by Verta

80% of Generative AI success depends on prompting and yet most AI developers can't write good prompts. This gap in prompt engineering skills often leads to suboptimal results, as the effectiveness of generative AI models largely hinges on how well they are guided through instructions. Crafting precise, clear, and contextually appropriate prompts is crucial for maximizing the model’s capabilities. Without well-designed prompts, even the most advanced models can produce irrelevant, ambiguous, or low-quality outputs.

PromptBrew provides AI-powered assistance to help developers craft high-performing, reliable prompts with ease. Whether you’re starting with a specific project goal or a draft prompt, PromptBrew guides you through a streamlined process, offering suggestions and optimizations to refine your prompts. By generating multiple candidate prompts and recommending enhancements, it ensures that your inputs are tailored for the best possible outcomes. These optimized prompts can then be seamlessly integrated into your project workflow, improving performance and accuracy in generative AI applications.

Chat with your Documents

This AMP showcases how to build a chatbot using an open-source, pre-trained, instruction-following Large Language Model (LLM). The chatbot’s responses are improved by providing it with context from an internal knowledge base, created from documents uploaded by users. This context is retrieved through semantic search, powered by an open-source vector database.

RAG with Knowledge Graph

Retrieval Augmented Generation (RAG) has become one of the default methodologies for adding additional context to responses from a LLM. This application architecture makes use of prompt engineering and vector stores to provide an LLM with new information at the time of inference. However, the performance of RAG applications is far from perfect, prompting innovations like integrating knowledge graphs, which structure data into interconnected entities and relationships. This addition improves retrieval accuracy, contextual relevance, reasoning capabilities, and domain-specific understanding, elevating the overall effectiveness of RAG systems.

RAG with Knowledge Graph demonstrates how integrating knowledge graphs can enhance RAG performance, using a solution designed for academic research paper retrieval. The solution ingests significant AI/ML papers from arXiv into Neo4j’s knowledge graph and vector store. For the LLM, we used Meta-Llama-3.1-8B-Instruct which can be leveraged both remotely or locally. To highlight the improvements that knowledge graphs deliver to RAG, the UI compares the results with and without a knowledge graph.

Celebrating Hispanic Heritage Month with Cloudera

Joe Rodriguez ,Joel Martinez — Mon, 23 Sep 2024 15:00:00 UTC

We’re more than a week into Hispanic Heritage Month, which started on September 15 and continues through October 15. This month is an annual celebration in the United States that honors the contributions, culture, and achievements of Hispanic and Latinx Americans. Over the next few weeks, we’ll be gathering with fellow Clouderans to reflect on and celebrate, the achievements of the Hispanic and Latinx communities here in the U.S. and across the globe.

Fundamentally, this month is about sharing what makes each of us unique and learning about the cultural differences across these communities to better understand the experiences that unite us all.

It’s primed to be a busy month, so let’s dive into what we have planned for an exciting month of learning, growing, and giving back.

A Month Full of Action and Learning

Cloudera’s calendar is full of engaging activities to help not just bring awareness to the Hispanic and Latinx experience and history but also to give back and make an impact on the next generation.

At our most recent event, Cloudera volunteers helped Hispanic and Latinx students at under-resourced schools enhance their LinkedIn profiles. We spent time offering constructive recommendations to build on what they already had and align them with professional standards. Our goal was to enable these students to find more secure employment opportunities, build a strong professional brand, and learn from what other Latinx and Hispanic Americans are doing, both in their careers and via networking settings like LinkedIn.

Looking ahead, the rest of the month is jam-packed with fun and informative activities too. On October 8, we’re hosting a Hispanic Heritage Month Workshop and a Hispanic and Latin American History and Culture Trivia Event. Both offer a great opportunity for attendees to celebrate the diverse histories, languages, and heritage of Hispanic and Latinx cultures across the U.S. and Latin America. During trivia, we’ll test our knowledge on everything from famous Hispanic and Latinx figures to historical events to pop culture knowledge.

Whether it's through volunteering or trivia, there are numerous opportunities for Clouderans to get involved and learn about their colleagues’ experiences. Each activity this month will help attendees gain an appreciation for the long, rich, and complex history that got us here, and better understand the unique perspectives that shape the Latinx and Hispanic experience around the world.

Reflecting on A Month of Learning and Growth

This month is so important in helping bring a real, meaningful level of engagement and connection to Clouderans. As we participate in each of these events, it’s important to step back and appreciate the many contributions Hispanic and Latinx Americans have made to the world while also acknowledging that the concept of Hispanic or Latinx is not a monolith. So many amazing cultures and communities are encompassed by those labels and it’s important to highlight and celebrate the unique aspects of each one.

Hispanic Heritage Month has been around in the U.S. since 1988. Since then, it’s evolved from a weeklong event into its own month-long celebration. This evolution is a testament to the profound impact of Hispanic and Latinx culture and history in the U.S. As we near the final weeks of celebration, it is rewarding to see just how invested everyone has been in building a warm, welcoming environment for team members to connect and showcase the things that make them unique.

Learn more about how Cloudera is building a more inclusive and diverse workplace.

Cloudera Evaluates Integrated Data and AI Exchange Business Line to Optimize Data-Driven Generative AI Use Cases

Steve Moroski — Wed, 18 Sep 2024 16:00:00 UTC

According to recent survey data from Cloudera, 88% of companies are already utilizing AI for the tasks of enhancing efficiency in IT processes, improving customer support with chatbots, and leveraging analytics for better decision-making. More and more enterprises are leveraging pre-trained models for various applications, from natural language processing to computer vision. For that reason, Cloudera is evaluating a new line of business: Cloudera Integrated Data and AI Exchange (InDaiX).

InDaiX provides data consumers with unparalleled flexibility and scalability, streamlining how businesses, researchers, and developers access and integrate diverse data sources and AI foundational models, expediting the process of Generative AI (GenAI) adoption.

As part of this evaluation process with InDaiX, Cloudera is conducting workshops with end users to better understand the practical use cases that enterprises are hoping to use AI for. InDaiX is being evaluated as an extension of Cloudera to include:

Datasets Exchange:
- Industry Datasets: Comprehensive datasets across various domains, including healthcare, finance, and retail.
- Alternative Datasets: Unique datasets, such as location intelligence and social media data, providing novel insights for various applications.
- Synthetic Datasets: High-quality synthetic data generated using state-of-the-art techniques, ensuring privacy and compliance.
AI Foundational Models Exchange: Access to pre-trained AI models, including natural language processing models, computer vision models, and reinforcement learning models, catering to various industry needs.
Unique Data Integration and Experimentation Capabilities: Enable users to bridge the gap between choosing from and experimenting with several data sources and testing multiple AI foundational models, enabling quicker iterations and more effective testing.
Scalability and Flexibility: Cloudera’s scalable architecture supports the growing volume and variety of data from InDaiX, allowing the platform to expand and adapt to changing data needs without compromising performance.

Offering a vast variety of data choices and AI foundational models, InDaiX enables businesses to create new data assets at scale, enabling them to run workloads in private or public clouds, choosing the most suitable environment for their needs. For data providers, InDaiX enhances distribution by reaching Cloudera's established customer base and provides valuable feedback on data usage and integration with AI models. This dual value proposition makes InDaiX an essential platform for driving innovation and operational efficiency in the data and AI markets.

By offering high-quality, privacy-compliant, and scalable data solutions and AI models, Cloudera is cementing itself as the go-to platform for businesses, researchers, and developers seeking to build bespoke, innovative data-driven insights and applications.

For enterprises looking to work and participate in the evaluation process, learn more about how to get involved here.

Cloudera Launches Private Link Network for Secure, Internet-Free Cloud Connectivity

Blake Tow,Senthil Venkatachalam — Thu, 12 Sep 2024 16:00:00 UTC

Imagine a world where your sensitive data moves effortlessly between clouds – secure, private, and far from the prying eyes of the public internet. Today, we’re making that world a reality with the launch of Cloudera Private Link Network.

Organizations are continuously seeking ways to enhance their data security. One of the challenges is ensuring that data remains protected as it traverses different cloud environments. Cloud provider solutions like AWS PrivateLink and Azure Private Link are a step in the right direction, but they often fall short of providing a comprehensive solution across multiple clouds.

Cloudera Private Link Network is designed to provide seamless, private connectivity between your cloud workloads and the Cloudera Control Plane. Cloudera Private Link Network ensures that data never leaves your secure, private network, even across multiple cloud environments, offering peace of mind in an increasingly privacy-conscious climate.

Why Cloudera Private Link Network?

As industries like financial services, healthcare, and pharmaceuticals continue to navigate strict data privacy policies, the need for secure, private connectivity has never been more critical. Cloudera Private Link Network addresses these concerns head-on by providing a unified, cross-cloud private connectivity service that goes well beyond what’s currently available.

Integrated Solution: Whether you’re connecting workloads in AWS or Azure, Cloudera Private Link Network provides a seamless, integrated solution that eliminates the need for multiple vendor-specific solutions.
Better Security Posture: Cloudera Private Link Network ensures that your data remains within a secure, private network, significantly reducing the attack surface. This is particularly crucial for highly regulated industries where even the slightest data exposure can have serious consequences.
Reduced Network/CloudOps Load: By utilizing the Cloudera Private Link Network, you free your network and CloudOps teams from having to design, test, deploy, manage, and monitor multi-cloud connectivity. These teams can spend their time on more strategic work.
Lower TCO: Unlike point solutions that require you to pay for individual, atomic links on a per link basis, with Private Link Network, you pay to access the network on a per-VPC or per-account basis. The pricing is based on consumption so you only pay for what you use. Ease of management also results in lower operational costs.
Network Performance: With Cloudera Private Link Network, security doesn’t come at the cost of network performance. Your operations remain smooth and efficient, even as your data remains fully protected.

Cloudera Private Link Network is a game changer for industries where data privacy is paramount. Here’s a sample of how it's making a difference:

Financial Services: For financial institutions managing large volumes of sensitive customer and financial data, Cloudera Private Link Network enables secure connectivity while ensuring compliance with regulations and policies by keeping data off of the public internet.
Pharmaceuticals: Pharmaceutical companies need to integrate sensitive data, including Personally Identifiable Information (PII), across the R&D pipeline to accelerate the development of life-saving medications. Cloudera Private Link Network provides a secure environment where teams can integrate and analyze sensitive information without fear of exposing that data outside the organization.
Healthcare: Healthcare providers are under pressure to leverage data to deliver patient centricity and a continuum of care . Cloudera Private Link Network enables providers to build a proactive, data-driven healthcare experience while maintaining strict privacy standards, reducing the risk of data loss and ensuring data privacy.

At Cloudera, we understand that ease of use is just as important as security. That’s why Cloudera Private Link Network is designed to be user friendly and flexible, with two deployment options to suit your organization’s needs. Customers can turn on these options using the Cloudera Command Line Interface (CLI), enabling this functionality on demand.

VPC Option: This option is ideal for organizations that prefer to use Cloudera’s CLI for a complete setup, offering end-to-end control and management.
Authorization Option: This option is perfect for customers who want to integrate Cloudera Private Link Network with their existing cloud automation tools, such as Terraform or AWS CloudFormation, without the need for cross-account permissions.

Pricing

Securing your cloud infrastructure shouldn’t come with hidden costs. That’s why Cloudera Private Link Network uses a consumption-based pricing model, ensuring that you only pay for what you use. This approach provides transparency and predictability, enabling organizations to scale their use of Private Link Network according to their needs, whether it’s for single cloud virtual networks or entire Cloudera accounts that span multiple cloud accounts.

Click here for more information on Cloudera Private Link Network.

The critical role of a hybrid cloud architecture in ensuring regulatory compliance in financial services

Joe Rodriguez — Tue, 10 Sep 2024 16:00:00 UTC

Register for EVOLVE24 in Dubai (September 12, 2024) to hear from industry leaders on why hybrid solutions are essential for navigating an increasingly complex regulatory environment.

A prominent global bank was thrust into the spotlight for all the wrong reasons. The institution was hit with a staggering fine - multiple billions - for failing to comply with new data protection regulations that ultimately led to a customer data breach. The breach, which exposed sensitive information, not only resulted in financial penalties but also caused significant reputational damage. Customers lost trust, investors questioned the bank’s governance, and competitors seized the opportunity to highlight the incident swaying customers away from the bank with messaging about data privacy and incentives.

Another scenario: A major lender rolls out a new AI-driven credit scoring system to streamline loan approvals. The system was expected to reduce processing times and improve customer satisfaction. However, six months into its implementation, regulators discovered that the AI model had been trained on biased historical data and was inadvertently discriminating against certain demographic groups, leading to unfair lending practices.

Regulators determined the bank was not compliant with anti-discrimination laws and data protection regulations, as the AI system lacked transparency and failed to meet the required standards for fairness. The bank was fined $100 million and ordered to audit and overhaul its AI practices. The incident not only resulted in financial penalties but also sparked public outrage, damaging the bank's reputation and leading to a significant loss of customer trust.

While these scenarios are hypotheticals, the risk is real.

For good reason, the financial services industry is facing an increasingly complex regulatory landscape, particularly when it comes to data privacy and the use of artificial intelligence. However, as regulations become more stringent and data governance demands grow, financial institutions are under immense pressure to manage their data with greater precision, making effective data management within a hybrid cloud environment essential.

How a Hybrid Cloud Architecture Empowers Regulatory Compliance

A hybrid cloud architecture has emerged as a crucial strategy for financial institutions to navigate these regulations while maintaining innovation and operational efficiency. By combining the best of on-premises and cloud environments, hybrid architectures offer a flexible, secure, and scalable data management solution that empowers financial institutions to maintain compliance, enhance security, and adapt to regulatory changes—all while optimizing costs and ensuring business continuity.

Let’s review some of the more critical regulations and the impact of a hybrid cloud architecture.

Privacy Regulations

Privacy regulations like GDPR (EU), CCPA (California, US), LGPD (Brazil), APPI (Japan), and PIPL (China) have profoundly influenced how financial institutions manage personal data. Implementing a hybrid cloud architecture offers several key advantages in complying with these stringent requirements:

Data Sovereignty and Localization
Many privacy laws require certain types of data to be stored within specific geographic boundaries. Hybrid cloud allows financial institutions to maintain sensitive data on-premises or in private clouds within the required jurisdictions while leveraging public cloud resources for non-sensitive workloads.

Granular Data Control
Hybrid cloud enables financial institutions to implement fine-grained access controls and data classification systems. This allows for better management of personal data, making it easier to comply with data subject rights (e.g., right to access, right to be forgotten) mandated by regulations like GDPR and CCPA.

Enhanced Security Measures
Hybrid cloud allows for the implementation of robust security measures, including encryption, tokenization, and data masking. These techniques are crucial for protecting personal data and meeting the security requirements of privacy regulations.

Compliance Monitoring and Reporting
Hybrid cloud often includes tools that facilitate continuous compliance monitoring and automated reporting. This capability is essential for financial institutions to maintain transparency and accountability in line with regulatory requirements.

Disaster Recovery and Business Continuity
A hybrid cloud’s ability to distribute workloads across different environments provides a strong foundation for disaster recovery and business continuity. This ensures that personal data remains protected and accessible even in the event of a system failure or cyberattack.

AI-Specific Regulations

As AI becomes increasingly integral to financial services, regulations like the EU AI Act (EU), AIDA (Canada), the Digital India Act (India) and most recently the California S.B. 1047 AI bill, are emerging to ensure its ethical and responsible use. Navigating these regulations requires a robust infrastructure, and hybrid cloud architectures are proving to be essential in meeting these new challenges in the following ways:

Transparency and Explainability
AI regulations often require organizations to provide transparency in their AI decision-making processes. Hybrid cloud environments can facilitate the storage and processing of AI models and their associated data, allowing for easier auditing and explanation of AI outcomes.

Model Governance
Hybrid cloud enables financial institutions to implement comprehensive model governance frameworks. This includes version control, model testing, and validation processes, which are crucial for complying with AI regulations that demand rigorous oversight of AI systems.

Data Quality and Bias Mitigation
Many AI regulations focus on ensuring fairness and preventing bias in AI systems (Ethical AI). Hybrid cloud architectures allow for better data management and quality control, helping financial institutions maintain high-quality, diverse datasets for training AI models and mitigating potential biases.

Financial Services-Specific Regulations

Financial institutions face additional industry-specific regulations that impact their IT infrastructure choices. Hybrid cloud architectures are well-suited to address these requirements:

Basel III and IV: These regulations focus on capital adequacy, stress testing, and market liquidity risk. Hybrid cloud architectures provide the computational power needed for complex risk calculations and stress tests while allowing sensitive data to remain on-premises or in private clouds.

MiFID II: This regulation requires extensive record-keeping and reporting. Hybrid cloud architectures offer the scalability to handle large volumes of transaction data while maintaining the security needed for sensitive financial information.

DORA (Digital Operational Resilience Act): DORA focuses on the digital operational resilience of financial institutions. Hybrid cloud architectures enhance operational resilience by providing redundancy, disaster recovery capabilities, and the ability to quickly scale resources in response to operational challenges.

ESG Regulations: As ESG (Environmental, Social, and Governance) reporting becomes mandatory, financial institutions need robust data management and analytics capabilities. Hybrid cloud architectures provide the flexibility to collect, store, and analyze vast amounts of ESG-related data while ensuring compliance with data privacy regulations.

How can Cloudera’s Hybrid Data Platform help address regulatory compliance?

Cloudera's hybrid data platform is a comprehensive solution for financial institutions navigating today's complex regulatory environment while striving for innovation and operational efficiency while reducing risk. By integrating on-premises, private, and public cloud resources into a unified architecture, Cloudera helps organizations address data sovereignty requirements mandated by international privacy regulations such as GDPR, CCPA, and PIPL. The platform's advanced security and governance features, powered by Cloudera’s Shared Data Experience (SDX), ensure compliance with AI-specific regulations like the EU AI Act and AIDA, delivering transparency, explainability, and robust model governance

For regulations, including Basel III/IV, MiFID II, and DORA, Cloudera’s scalable analytics capabilities support intricate risk calculations, comprehensive record-keeping, and enhanced operational resilience. The platform's flexibility enables institutions to adapt swiftly to changing regulatory demands while harnessing advanced analytics and AI for critical functions such as fraud detection, risk modeling, and ESG reporting. By providing a cohesive environment for managing and analyzing data across hybrid and multi-cloud deployments, Cloudera empowers financial institutions to stay compliant, improve customer experiences, and drive innovation in a dynamic digital landscape.

To learn more about Cloudera’s work with financial institutions, click here.

Moving Your AI Pilot Projects to Production

Vinicius Cardoso — Tue, 10 Sep 2024 16:00:00 UTC

Without a doubt, Artificial Intelligence (AI) is revolutionizing businesses, with Australia’s AI spending expected to hit $6.4 billion by 2026. However, according to The State of Enterprise AI and Modern Data Architecture report, while 88% of enterprises adopt AI, many still lack the data infrastructure and team skilling to fully reap its benefits. In fact, over 25% of respondents stated they don’t have the data infrastructure required to effectively power AI. We also found that over 39% of respondents said that almost none of their employees are currently using AI.

Interestingly, Gartner has predicted that at least 30% of GenAI projects will be abandoned after proof of concept by the end of 2025. With that in mind, the question then becomes: How will you embrace technologies and projects when you can’t see the time to value that AI will bring to the organization?

Translating AI’s Potential into Measurable Business Impact

It can’t be denied that a mature enterprise data strategy generates better business outcomes in the form of revenue growth and cost savings. Organizations also see improvements in customer experience, operational efficiency, and supply chain optimization.

However, to fully realize the benefits of AI and its perceived value, organizations must measure their AI objectives against key business metrics used internally. This alignment is crucial for the progression of these projects. It also becomes the basis for communicating to internal stakeholders to secure sustained funding and financial investment. Adopting common business metrics also enhances the likelihood of successful implementation and value realization from these investments.

OCBC Bank’s adoption of AI has effectively impacted revenue generation and better risk management. In addition, it has improved developers' efficiency by 20%.

Ensuring AI’s Trust with Intent

AI projects cannot begin without trust. Trusting AI equates to trusting the data it uses, meaning it must be accurate, consistent, and unbiased. Ethical AI depends on trustworthy data, guaranteeing equitable outcomes that reflect the company’s principles.

This means access to data completeness is critical. Yet, it’s a challenge for 55% of organizations that suggest accessing all of their company’s data is more daunting than a root canal.

Ensuring AI trust involves understanding your data and scrutinizing data sources, quality, access, and storage within your organization. Consider the intent, potential biases, and implications of AI decisions. Empathize with customers’ perspectives on data usage to guide ethical practices. If you wouldn’t approve of how the data would be used, it’s a sign to reassess your approach.

Kick-starting Your AI Journey

So, how do you transition an AI project from concept to full production and reap its benefits? Here are some tips for organizations starting on their ethical AI journey:

Formulate a data strategy. This starts and ends with business value. Look at the organization’s mission, vision, and key objectives, and develop a holistic approach that involves people, processes, and technology to leverage your data assets and develop capabilities and use cases to support business objectives.
Know Your Data, Know Your Intent. Ask yourself: is the data integrated into your systems reliable, and can you trust your organization’s intentions for using that data? A deliberate and thoughtful design of AI systems is crucial to ensure the outcomes are fair and unbiased, reflecting the organization’s ethos and principles. Organizations must have a clear vision of what they aim to achieve with AI to avoid missing out on its benefits or, worse, damaging their reputation and customer trust.
Utilize a modern data platform that unifies the data lifecycle. Your data platform should facilitate the implementation of modern data architectures - data mesh, fabric, or open data lakehouse - with security and governance as the foundation. This platform should enable your organization to handle the complex data challenges that arise daily across different functions, enabling seamless deployment of workloads between on-premise and cloud (or multi-cloud) without workload refactoring. Most importantly, it should maintain data traceability and uphold stringent security policies and access controls from one environment to another.

AI Assistants - Democratize AI For Users

What’s in trend today may not be tomorrow, and it’s possible that public LLMs will soon become a thing of the past before the next disruptive technology comes along. Perhaps you find accessing your data challenging or you lack the technical skills in-house to build and deploy GenAI capabilities.

Fortunately, modern data platforms with AI Assistants can facilitate AI adoption across the organization, giving Data Analysts access to ‘conversational AI’ capabilities and all everyday users faster access to their data-driven insights.

Learn more about how Cloudera can help accelerate your enterprise AI adoption.

DEI-focused Cloudera Sponsorship Program Named Finalist for Ragan Awards 2024 CSR & Diversity Awards

Debbie Kruger — Tue, 03 Sep 2024 15:00:00 UTC

Achieving equality and fairness requires ongoing effort, and for a business to be truly successful, it’s critical to raise awareness and create leadership and growth opportunities for underrepresented communities.

We are proud to share that Cloudera was recently named a Finalist for the Ragan 2024 Corporate Social Responsibility (CSR) & Diversity Awards under the Mentoring Program category for our Sponsorship Program’s commitment to Diversity, Equity, and Inclusion (DE&I). At Cloudera, we believe that true success exists when we foster an inclusive environment where everyone can thrive.

Cloudera's DE&I strategy centers on a multifaceted approach at every level of the organization. Central to this is establishing clear goals and objectives, regular reviews of diversity analytics, and enhanced transparency and accountability. The company actively engages leaders and team members across all business units to ensure widespread participation and ownership of DE&I initiatives.  

The Ragan Awards celebrate the most successful campaigns, initiatives, people, and teams in the communication, public relations, marketing, and employee wellbeing industries. Its Mentoring Program award goes to organizations that champion a mentorship program that is instrumental in advancing the careers of employees from underrepresented communities. All finalists are recognized at a special event in New York City on September 27, before category winners are announced.

Cloudera’s Sponsorship Program addresses the disparity between mentorship and sponsorship experienced by underrepresented groups (URGs). Historically, these communities have not received the same opportunities as others and lack active advocates and supporters of their career advancement. The primary goal of Cloudera's Sponsorship Program is to amplify the high-potential talent within these groups, increase their visibility within the Company, and actively advocate for them, which results, oftentimes, in talented individuals advancing into leadership positions.

"This program was life-changing for me,” said a Sponsorship Program participant. “There were many opportunities to reflect on my areas of improvement and also on the responsibility that I have to lead my own career. Besides that, to have the opportunity to learn with the Cloudera executive was remarkable."

How the Sponsorship Program Works

Cloudera’s Sponsorship Program operates on a cohort-based model, spanning six months. It strategically pairs senior leaders, or “sponsors,” with “proteges”, Clouderans with diverse backgrounds across all company functions. This highly intentional pairing ensures that proteges receive personalized support and advocacy from experienced leaders, committed to advancing diversity and inclusion.

Through Cloudera’s partnership with Sounding Board, a renowned leadership coaching firm, proteges also receive additional professional coaching and support throughout the duration of the program. This enhances participants' personal and professional development, empowering them to navigate challenges and seize opportunities for growth. 

The Work Behind the Scenes

Cloudera's Sponsorship Program involves meticulous planning and coordination to ensure the success of each cohort. Program administrators work closely with senior leadership to identify suitable sponsors and proteges, considering each participant's unique strengths and aspirations.

Once pairs are established, sponsors are encouraged to advocate for their proteges, leveraging their influence and networks to create visibility and exposure to other senior leaders and professional development opportunities. Simultaneously, proteges engage in one-on-one coaching sessions with Sound Board’s certified professional development experts, focusing on career advancement, skill development, and leadership competencies. Throughout, sponsors, proteges, and program administrators collaborate closely to monitor progress, address challenges, and celebrate achievements. 

Keys to Success

Cloudera's Sponsorship Program is evaluated on its ability to elevate high-potential talent, increase their visibility within the company, and foster diverse leadership pipelines. It conducts engagement and diversity surveys to draw a comparative analysis of program participants’ sentiment relative to the rest of the employee population. These statistically backed reviews measure the number of proteges transitioning into leadership roles, participant survey feedback regarding coaching and sponsorship support, and the overall impact diversity initiatives have on organizational culture.

According to participant feedback, 84% of Cloudera’s Sponsorship Program graduates believe that people from all backgrounds have equal opportunities to succeed at Cloudera, while 83% feel like a valued member of the organization. Regular assessments and feedback mechanisms ensure that the program remains responsive to participants evolving needs while also aligning with Cloudera's commitment to promoting DE&I initiatives.

Cloudera is honored to be recognized as a Finalist in the Mentoring Program category for the Ragan 2024 CSR & Diversity Awards. The Company remains committed to advancing the Sponsorship Program and to providing mentorship opportunities for employees from diverse backgrounds.

To learn more about Cloudera’s DE&I initiatives, click here.

Add Flexera’s State of the Cloud Report to Your Summer Reading List

Jeremiah Morrow — Tue, 27 Aug 2024 16:00:00 UTC

Multi-cloud strategies continue to dominate. 89% of respondents report using multiple clouds, up from 87% in 2023. The dominance of multi-cloud is the result of the increased parity in functionality and ecosystems between the hyperscalers, as well as the desire to avoid lock-in with any individual cloud provider.

Data teams operating in multi-cloud environments must make some critical architectural decisions. The first is to leverage open formats, including Apache Parquet at the file level and Apache Iceberg at the table level, to ensure that data is both transferable between clouds and interoperable with a wide range of tools for different use cases. The second and perhaps the most critical component of multi-cloud architectures is unified security and governance across the entire data estate, so sensitive data is always protected and data consumers have access to a consistent and accurate view of the data wherever it lives.

A Lot of Data Will Remain On-Premises

Many organizations still prefer to keep sensitive data on-premises, including consumer data, corporate financial data intellectual property, research data, and more, while the majority of non-sensitive data is destined for the public cloud. This result is intuitive for anyone who has spent time talking to customers in highly regulated industries like banking and healthcare, but it can serve as a healthy sanity check for customers who are feeling pressure to migrate. The reality is that, despite a lot of cloud hype, less than 20% of all companies who participated in the survey plan to move their sensitive data to the cloud.

Cloudera Customers Have an Advantage

The good news for Cloudera customers, as they consider their cloud strategy, is that it doesn’t really matter whether they plan to migrate sensitive data to a public cloud or leave it on-premises. As the only true hybrid platform for data, analytics, and AI, Cloudera enables customers to freely choose any infrastructure for their data analytics workloads, and that data remains in open formats and available for a wide range of workloads, from data engineering to Business Intelligence to AI and ML. It’s portable, meaning that if infrastructure requirements change, it’s easy to move. It’s interoperable, so data teams and data consumers can choose the best tool or execution engine on a workload-by-workload basis. And it’s protected by a unified governance and security solution, so customers can rest assured that whether their data is in the cloud or on-premises, it’s safe and accessible only by the right users.

Try Cloudera Today

Cloudera is available for AWS customers to try today. Deploy one of three use case patterns and get your hands on the platform that can dramatically simplify and accelerate your cloud journey.

It’s nearing the end of the summer in North America, and one report has been a staple on my reading list for more than a decade: the Flexera State of the Cloud Report. The annual survey of hundreds of global IT decision makers assesses cloud strategies, migration trends, and important considerations for companies moving to the cloud or managing cloud environments. As a long-running report, it’s also a valuable resource for understanding the evolution of cloud strategies and priorities over time, with trend data that shows how strategies and priorities have also evolved over time. I’ve referenced the latest iteration of the report dozens of times since its inception.

The 2024 edition of the Flexera State of the Cloud report was released in March and, as usual, it serves as a fantastic resource for data, analytics, and AI leaders as they consider the infrastructure and platform options for their architecture. Here are a few key takeaways from the report:

Managing Cloud Spend Remains a top Challenge - Even Overtaking Security

One of the biggest surprises of the 2023 report was that managing cloud spend overtook security as the top challenge for organizations for the first time in 11 years. Cloud spend remained on top for the second year in a row, with public cloud spend exceeding budgets by an average of 15%. The economic uncertainty that many companies have faced in the past two years has exacerbated cost overruns, and most data teams should expect greater scrutiny over their public cloud consumption.

There are two ways to combat the high costs of public clouds. The first is to architect for hybrid deployments. The cloud is ideal for workloads with intermittent or burst capacity requirements, like training AI models. But companies can save money by running other workloads with predictable resource requirements on-premises. The other way is to be strategic about where and when to leverage SaaS platforms, which take control over workload tuning and resource allocation out of the user’s hands in favor of ease of use. That trade-off is not always necessary or ideal.

Increased Adoption of Multi-Cloud Strategies

Cloudera’s Bangalore Center of Excellence - Local Innovation Driving Global Impact

Abhas Ricky — Thu, 22 Aug 2024 16:00:00 UTC

In the heart of India’s tech hub, Bangalore, you’ll find our Center of Excellence (CoE), an innovation hub focused on technological advancement. Established in 2014, this center has become a cornerstone of Cloudera's global strategy, playing a pivotal role in driving the company's three growth pillars: accelerating enterprise AI, delivering a truly hybrid platform, and enabling modern data architectures. It is an engine of cutting-edge solutions that keep us close to our open source roots, and drive our mission of making data and analytics accessible and easy for everyone.

The Indian Talent Prowess

Over nearly a decade, the Bangalore CoE has grown into a robust hub, housing over 600 employees, primarily in engineering roles, which form the backbone of product innovation and customer support together with the team in Chennai. The other teams responsible for delivering these products include performance engineering, data engineering, storage, control plane, test infrastructure, security, performance, partner certification, release engineering, and site reliability. This diverse range of expertise ensures that our solutions are comprehensive and of the highest quality to support the data journeys of top enterprises globally.

We operate three major engineering centers worldwide in the US, India, and Hungary. This strategic distribution allows Cloudera to drive continuous innovation and provide timely support to its global customer base. Within India, the engineering expertise in Chennai and Bangalore play a strategic role to global growth & progress due to its rich talent pool and historical contributions to open-source projects like Iceberg. The center's prominence in the open-source community has made it a magnet for top talent.

Our team in Bangalore is instrumental in providing comprehensive cloud support for end-to-end delivery of projects, keeping major releases up to date with the latest features, as well as innovations including features in Cloudera Machine Learning (CML) that empower customers to develop, test, train, and deploy models within their data environments. The team plays a crucial role in delivering features and enhancements through designing and building solutions that accelerate the journey from exploration to production, and scale their Machine Learning workloads.These work toward supporting Generative AI applications for enterprises in a hybrid cloud environment, ensuring that all their data is AI-ready.

Forging ahead with innovations in machine learning and cloud computing, the team has refined flows for Generative AI workloads and developed an alert system for Cloud, which enables customers to achieve significant savings in cloud costs. With the team propelling these advancements, Cloudera retains its foothold as a leader providing cutting-edge machine learning and data science solutions worldwide.

Local Innovation with Global Impact

The impact of our Bangalore CoE extends beyond the borders of India. The team works on global projects that drive significant growth across industries, such as finance, telcos and manufacturing. Solutions engineered here are designed with a global perspective in mind, ensuring they meet the diverse needs of our global and Indian customers.

Cloudera is assisting leading banks, stock exchanges, and top organizations across industries in India by providing the infrastructure needed to analyze the data. This support helps them in cost reduction, increased revenue, and enhanced operational efficiency. These sectors frequently handle large volumes of sensitive data demanding robust security and there is a growing need to manage large and complex data sets with better strategies and infrastructure.

We believe that data can make what is impossible today, possible tomorrow. Our Bangalore and Chennai CoE play a key role in empowering organizations to unlock the full potential of their data. By developing robust and secure data platforms, our team ensures businesses can derive meaningful insights that drive a competitive advantage. From enhancing data governance and security to optimizing data workflows and enabling real-time analytics, our team’s contributions are instrumental in shaping the future of data-driven enterprises.

The center’s ability to handle all facets of software development from ideation to delivery ensures that innovation is continuous. 600+ engineers with a passion for open source technologies, have helped accelerate the enterprise AI and machine learning story for hundreds of customers — across a series of generative AI use cases such as agentic applications on our lakehouse, co-pilots for increased productivity, and text summarization at scale. The unique collocation has also helped customers move to the cloud faster and deploy secure hybrid solutions that empower them to transform data of all types on any public or private cloud into valuable and trusted insights.

Looking ahead, Cloudera plans to continue its investments in the Bangalore CoE, with plans to hire an additional 50 engineers this year alone. The focus will be on developing talent to fuel enterprise AI and hybrid platform capabilities. Despite challenges in finding niche skill sets, Cloudera’s commitment to upskilling and providing growth opportunities remains steadfast.

As Cloudera continues to evolve, the Bangalore CoE will undoubtedly remain at the forefront, pushing the boundaries of what’s possible in the world of big data and AI.

Cloudera Open Data Lakehouse Named a Finalist in the CRN Tech Innovator Awards

Jeff Healey — Wed, 21 Aug 2024 16:00:00 UTC

The CRN Tech Innovator Awards spotlight innovative products and services across 36 categories, with winners chosen by CRN staff from over 320 product applications. This year, we’re excited to share that Cloudera’s Open Data Lakehouse 7.1.9 release was named a finalist under the category of Business Intelligence and Data Analytics.

These awards, held annually, are intended to help solution providers identify IT products and services that are truly innovative and deliver customer value. The Awards showcase IT vendor offerings that provide significant technology advances – and partner growth opportunities – across technology categories including AI and AI infrastructure, cloud management tools, IT infrastructure and monitoring, networking, data storage, and cybersecurity.

Embracing the Open Data Lakehouse

The selection of Cloudera’s Open Data Lakehouse signals just how important this platform has become with the rise of artificial intelligence (AI) and generative AI (GenAI) alike.

AI is at the forefront of nearly every business’ list of priorities. Its potential is huge when it comes to carving out a competitive edge or just boosting operational efficiency. But even as adoption continues to accelerate, many organizations find themselves struggling with how to fully tap into the power of AI.

The root of the problem comes down to trusted data. Pockets and siloes of disparate data can accumulate across an enterprise or legacy data warehouses may not be equipped to properly manage a sea of structured and unstructured data at scale. Successful AI implementations require businesses to adequately access and collect, often disparate and siloed, data across hybrid environments.

The latest release of the Open Data Lakehouse on private cloud brings a number of features, expanded support, and capabilities that can make managing that data easier. The latest platform release includes Apache Iceberg support to unlock opportunities for enterprises to apply mission-critical data to AI and address the most error-prone processes, generate new use cases, improve overall performance, and reduce costs.

The platform is truly unique, also offering critical capabilities like zero downtime upgrades and security enhancements to limit disruptions and improve business continuity. Additionally, this release of Open Data Lakehouse includes a mix of Apache Ozone capabilities, like quotas, snapshots, and disaster recovery enhancements. Open Data Lakehouse also offers expanded support for Python 3.10 and RHEL 9.1, all of which add another layer of compatibility and flexibility.

Driving AI Innovation on the Open Data Lakehouse

We’re incredibly honored and excited to be named a finalist among hundreds of innovative products and solutions that were considered for CRN’s Tech Innovator Awards. Many businesses understand the need for effective AI and analytics within their own operations but struggle with getting their data architectures to support it.

Cloudera’s platform offers portable, cloud-native, analytics deployable across infrastructures, all while maintaining consistent data governance and security. Leveraging a modern data architecture like the Open Data Lakehouse for private cloud helps cut through that complexity to deliver meaningful insights and truly effective AI, at scale.

Learn more about how Cloudera’s Open Data Lakehouse for private cloud can help fuel your AI journey.

AI Challenges and How Cloudera Can Help

Jeff Healey — Tue, 20 Aug 2024 16:00:00 UTC

By now, every organization, regardless of industry, has at least explored the use of AI, if not already embraced it. In today’s market, the AI imperative is firmly here, and failing to act quickly could mean getting left behind. But even as adoption soars, struggles remain, and scalability continues to be a major issue. Organizations are quick to adopt AI, but getting it established across the organization brings a unique set of challenges that come into play.

Whether it’s rapidly rising costs, an inefficient and outdated data infrastructure, or serious gaps in data governance, there are myriad reasons why organizations are struggling to move past adoption and achieve AI at scale in their enterprises.

But with the right technology partner, businesses can accelerate their adoption and maximize the value of both their own data and the AI outputs it can generate. Let’s take a closer look at what they face and how Cloudera is uniquely positioned to help them find success.

A New Set of Challenges

Getting up and running with AI is not always straightforward. Anytime a technology is integrated into a business, there’s potential for a new set of challenges to take hold. At the core of many of those is the issue of trust, specifically trusted data. Trusted data is what makes the outputs of AI not just accurate, but impactful in decision making. Ensuring data is trustworthy comes with its own complications.

Cloudera’s State of Enterprise AI and Modern Data Architecture survey identified several challenges when it comes to data. Among those challenges, survey respondents identified merely having too much data (35%) and good governance (36%) as serious obstacles. Trust represents a unique challenge for business leaders—the insights gathered from data are only useful if those leaders know they can trust it. Essentially, if the business data that’s fed into AI models is bad, the resulting insights that come from it will be flawed as well. As data volume grows, data silos proliferate, making it harder for leadership to manage their collective data estates. That lack of knowledge also directly feeds into the problem of governance, as those gaps leave room for data to be misused or mishandled.

Businesses require a modern data architecture that is ready to support the needs of a contemporary business. With this architecture, businesses can build in greater flexibility and scalability in their existing infrastructure to support the rise of AI. These architectures enable enterprises to future-proof their AI models, drive innovation, and stand out in competitive markets.

Cloudera is Your Trusted AI Partner

Getting past these challenges and successfully tapping into the power of AI requires businesses to work with a technology partner proven to deliver crucial data and analytics solutions for hybrid cloud, industry-leading AI expertise, and a strong foundation to support and future-proof AI investments.

With Cloudera, organizations can build a data infrastructure that is flexible and scalable enough to ensure that as the use of AI grows, the data that fuels it will keep pace. Cloudera ensures data governance is robust to protect data as it’s used in AI models and keep that use in line with internal standards and external regulations.

Likewise, Cloudera’s open data lakehouse presents a standout option for organizations to leverage as a foundational part of their data infrastructure. This platform brings together the flexibility of data lakes with the power of a data warehouse all in one place. The open data lakehouse is critical for organizations looking to harness AI, helping run analytics on data—structured and unstructured—at scale. It’s tailor-made for the data challenges that often hinder AI adoption. Particularly as organizations are inundated with more and more data, an open data lakehouse serves as a strong foundation to help them keep pace. With Cloudera and the only true hybrid platform for data, analytics, and AI, organizations can eliminate data silos and empower data teams to collaborate on the same data with the tools of their choice on any public cloud and private cloud.

Ultimately, every AI journey will encounter a bump in the road. But there’s no need to face those challenges alone. Cloudera brings a wide set of technology, solutions, and know-how to demystify the process and ensure that AI is implemented securely, effectively, and simply enough to rapidly scale and generate maximum business value.

Read the full survey report and learn how Cloudera can help accelerate your AI journey.

Navigating the Future with Cloudera’s Updated Interface

Blake Tow — Thu, 15 Aug 2024 16:00:00 UTC

Analytics Summary:
- It is essential for data teams to quickly access and understand the health and performance of their data services. The Analytics Summary section addresses this need by integrating key Observability Dashboard metrics directly into the homepage. Users can now view a summary of analytics for individual clusters and virtual warehouses without navigating away from the main page. This section enables users to select and display operational insights for specific services, such as Data Hub, Data Engineering, and Data Warehouse, providing immediate insights into their operations. Bringing these metrics to the homepage helps users monitor performance and make data-driven decisions more effectively.

Quick Start:
- The Quick Start section is designed to help users perform essential data tasks with ease and efficiency. This section offers a step-by-step guide for common activities, such as connecting to or importing data, querying and transforming data, and visualizing data. Each guide includes links that take users directly to the relevant sections in Cloudera’s documentation, providing a supportive experience for both new and experienced users.

Data practitioners are consistently asked to deliver more with less, and although most executives recognize the value of innovating with data, the reality is that most data teams spend the majority of their time responding to support tickets for data access, performance and troubleshooting, and other mundane activities. At the heart of this backlog of requests is this: data is hard to work with, and it’s made even harder when users need to work to get or find what they need.

As a long-time partner to some of the largest enterprises in the world, we recognize the critical role Cloudera plays in making data teams and data consumers successful in their day-to-day work. That's why we are rolling out a significant update to the Cloudera platform homepage, including a new set of features we designed to provide a more intuitive and efficient experience for data practitioners.

The decision to revamp the Cloudera UI was driven by our commitment to enhancing user experience and addressing the evolving needs of our customers. We have always listened closely to our users and tailored our solutions to meet their specific requirements. Over the years, our platform has grown in capability, offering a diverse range of services and tools. However, while the traditional tile-based homepage was functional, it did not fully support the intuitive navigation and quick access to information that our users require. This UI improvement ensures that our platform remains at the forefront of user-friendly design, making it easier for users to take advantage of the full potential of our services.

By implementing these changes, our goal is to create a more cohesive, intuitive, and efficient user interface that simplifies navigation, enhances the discoverability of features within the platform, and improves overall user productivity. With much quicker access to frequently used tools and services, integrated analytics for quick insights, comprehensive guides for exploring new solutions, and a powerful search function, users can now navigate the platform with greater ease and efficiency.

Key Enhancements:

Streamlined Navigation:
- Navigating through multiple pages to access frequently-used services and workspaces has historically been a time-consuming task for our users. The new Favorites feature addresses this problem by enabling users to bookmark their most frequently used Data Hubs, services, and workspaces, making them available wherever you open Cloudera. It ensures that users can quickly reach the tools and services they rely on the most, making their daily operations smoother and more efficient.
- Additionally, several changes have been made to streamline navigation across the platform. These enhancements reduce the number of clicks required, ensuring users can quickly reach the tools and services they are looking for.

This homepage update is just the beginning. The new UI is designed to expand beyond the homepage, and we will gradually integrate it into individual services across the platform. Our vision is to make it easier than ever for our customers to deliver actionable insights to the business by providing the most intuitive and user-friendly experience for working with data.

We encourage you to explore the new homepage and share your thoughts. Your feedback is crucial in helping us refine and improve the Cloudera experience for all users.

We’re rolling out this new UI gradually, and we’d love for you to try it out. You can enable the new homepage by clicking the “Enable New UI” toggle button in the top-right navigation bar. We know many of our customers are comfortable using the current UI, and that’s fine! You can opt in or opt out with the click of a button.

Documentation Search:
- The new Documentation Search feature adds a convenient search bar at the top of the homepage, enabling users to quickly find the information they need within Cloudera’s extensive documentation. Whether you need guidance on a specific feature, troubleshooting tips, or detailed technical documentation, the Documentation Search makes it easy to access the comprehensive resources available.

Solution Explorer:
- The Solution Explorer section is designed to help users discover and explore new innovations within the Cloudera platform tailored to specific roles and needs. It offers a comprehensive guide to various solutions, including Enterprise AI, Open Data Lakehouse, Scalable Data Mesh, Unified Data Fabric, and Hybrid Data Platform, with more to come! This section provides detailed descriptions and relevant documentation for users whose roles benefit from these capabilities, such as data scientists, analytics professionals, and database administrators. By centralizing information about Cloudera’s latest innovations, the Solution Explorer makes it easier for users to stay informed and leverage the full potential of the platform’s capabilities.

Cloudera Partners with Allitix to Fuel Enterprise Connected Planning Solutions

Natascha Lee — Thu, 08 Aug 2024 23:46:00 UTC

Cloudera is excited to announce a partnership with Allitix, a leading IT consultancy specializing in connected planning and predictive modeling. This collaboration is set to enhance Allitix’s offerings by leveraging Cloudera’s secure, open data lakehouse, empowering enterprises to scale advanced predictive models and data-driven solutions across their environments.

Through this strategic partnership, Allitix applications will enable business users to more easily work with data in the lakehouse, collaborate across functions with this data, and use it to build advanced predictive models, giving its end customers a competitive edge.

“Allitix constantly seeks to do more for our customers, and our extensive search showed that Cloudera is best in class to service our clients’ end-to-end data needs,” said Jon Pause, Practice Director for Data and Advanced Tools at Allitix. “We love Cloudera’s hybrid model, coding portability, and open-source AI approach. And through this partnership, we can offer clients cost-effective AI models and well-governed datasets as this industry charges into the future.”

Allitix will leverage Cloudera's open data lakehouse to support its connected planning solutions for enterprise clients and partners across various markets, including regulated industries such as finance, healthcare, pharmaceuticals, and consumer packaged goods. This will enable these clients and partners to make more informed strategic decisions regarding marketing, operations, customer success, overall business strategy, and more.

“This partnership is a significant win for enterprise customers,” said Andy Moller, SVP of Global Alliances and Ecosystem at Cloudera. “With Cloudera and Allitix, they can develop complex predictive data models to make crucial business decisions. These large, regulated organizations depend heavily on data management and security. This strategic partnership strengthens our connection with business users through Allitix solutions and extends our technology into new markets.”

Data-backed Decisions Through Predictive Models

Predictive models use historical data and analytics to forecast future outcomes through mathematical processes. They help organizations allocate resources appropriately, anticipate potential challenges, and identify market trends. This capability is crucial for enterprises to make more informed financial and resource allocation decisions. Cloudera’s open data lakehouse will enable Allitix customers to build more comprehensive predictive models, leading to faster, data-driven decision making.

“Cloudera’s open data lakehouse is the core functionality that enables self-service analytics, governance, and cost-cutting architecture,” Pause said. “Through this partnership, our customers will benefit from more democratized data reducing risk to all downstream projects while significantly cutting their variable IT spend.”

Cloudera’s open data lakehouse unlocks the power of enterprise data across private and public cloud environments. It streamlines data ingestion, analytics, and the development of AI and ML models, allowing enterprises to feed their data into a large language model (LLM) to build advanced AI applications in a secure governed environment. The Cloudera platform allows businesses to build AI applications hosted by any open-source LLM they choose, allowing for scaling across an enterprise for a variety of users and data sets.

“Cloudera partners with the world’s most innovative companies – across industries – to bring our leading trusted data management platform to organizations leveraging the technologies of tomorrow,” Moller said. “This strategic partnership with Allitix will empower enterprises to harness our world-class data management platform, driving innovation, operational excellence, and competitive advantages.”

Learn more about how you can partner with Cloudera.

How Cloudera and Allitix Fit Together

Allitix will now leverage Cloudera’s open data lakehouse to help its enterprise clients eliminate data silos and integrate plans across functions through connected planning. This facilitates improved collaboration across departments via data virtualization, which allows users to view and analyze data without needing to move or replicate it. Cloudera’s data lakehouse provides enterprise users with access to structured, semi-structured, and unstructured data, enabling them to analyze, refine, and store various data types, including text, images, audio, video, system logs, and more. Allitix enterprise clients will also benefit from the enhanced data security, data governance, and data management capabilities offered with Cloudera’s open data lakehouse.

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Venkat Rajaji — Tue, 06 Aug 2024 16:00:00 UTC

Over the past several years, data leaders asked many questions about where they should keep their data and what architecture they should implement to serve an incredible breadth of analytic use cases. Vendors with proprietary formats and query engines made their pitches, and over the years the market listened, and data leaders made their decisions.

The most interesting thing about their choices is that, despite the millions of marketing dollars vendors spent trying to convince customers that they built the next greatest data platform, there has been no clear winner.

Many companies adopted the public cloud, but very few organizations will ever move everything to the cloud, or to a single cloud. The future for most data teams will be multi-cloud and hybrid. And although there is clear momentum behind the data lakehouse as the ideal architecture for multi-function analytics, the demand for open table formats including Apache Iceberg is a clear signal that data leaders value interoperability and engine freedom. It no longer matters where the data is. What matters is how we understand it and make it available to share, and use.

The direction is clear. Proprietary formats and vendor lock-in are a thing of the past. Open data is the future. And for that future to be a reality, data teams must shift their attention to metadata, the new turf war for data.

The need for unified metadata

While open and distributed architectures offer many benefits, they come with their own set of challenges. As companies seek to deliver a unified view of their entire data estate for analytics and AI, data teams are under pressure to:

Make data easily consumable, discoverable, and useful to a wide range of technical and non-technical data consumers
Improve the accuracy, consistency, and quality of data
Ensure the efficient querying of data, including high availability, high performance, and interoperability with multiple execution engines
Apply consistent security and governance policies across their architecture
Achieve high performance while managing costs

The answer to unifying the data has traditionally been to move or copy data from one source or system to another. The problem with that approach is that data copies and data movement actually undermine all five of the points above, increasing costs while making it more difficult to manage and trust the data as well as the insights derived from it.

This leads us to a new frontier of data management, which is especially critical for teams managing distributed architectures. Unifying the data isn’t enough. Data teams actually need to unify the metadata.

There are two types of metadata, and they both serve critical functions within the data lifecycle:

Operational metadata supports the data team’s goals of securing, governing, processing, and exposing the data to the right data consumers while also keeping queries against that data performant. Data teams manage this metadata with a metastore.

Business metadata is metadata that supports data consumers who want to discover and leverage that data for a broad range of analytics. It provides context so users can easily find, access, and analyze the data they’re looking for. Business metadata is managed with a data catalog.

Many solutions manage at least one of these types of metadata well. A few solutions manage both. However, there are very few platforms that can unify and manage business and operational metadata from on-premises and cloud environments as well as metadata from multiple disparate tools and systems. Additionally, almost none of the available tools do all of that and also provide the automation required to scale these solutions for enterprise environments.

Cloudera is built on open metadata

Cloudera’s open data lakehouse is built on Apache Iceberg, which makes it easy to manage operational metadata. Iceberg maintains the metadata within the table itself, eliminating the need for metadata lookups during query planning and simplifying formerly complex data management tasks like partition and schema evolution. With Cloudera’s open data lakehouse, data teams store and manage a single physical copy of their data, eliminating additional data movement and data copies and ensuring a consistent and accurate view of their data for every data consumer and analytic use case.

Cloudera also supports the REST catalog specification for Iceberg, ensuring that table metadata is always open and easily accessible by third-party execution engines and tools. While a lot of vendors are focused on locking in metadata, Cloudera remains cloud- and tool-agnostic to ensure customers continue to have the freedom to choose.

Cloudera is also working on accessing and tracking metadata outside of the Cloudera ecosystem, so data teams will have visibility across their entire data estate, including data stored in a variety of other platforms and solutions.

Automating business metadata is the key to achieving scale

While operational metadata is often generated by a system and maintained within Iceberg tables, business metadata is often generated by domain experts or data teams. In an enterprise environment, which often features hundreds or even thousands of data sources, files, and tables, scaling the human effort required to ensure these datasets are easily discoverable is impossible.

Cloudera’s vision is to augment the data catalog experience and remove the manual effort of generating business metadata. Customers will be able to leverage Generative AI to ensure that every dataset is properly tagged and classified, and is easily discoverable. With an automated business metadata solution, data consumers and data teams can easily find the data they’re looking for, even with huge catalogs, and no dataset will fall through the cracks.

Unified security and governance

Data teams strive to balance the need for broad access to data for every data consumer with centralized security and governance. That task becomes much more complicated in distributed environments, and in situations where the data moves from its source to another destination.

Cloudera Shared Data Experience (SDX) is an integrated set of security and governance technologies for tracking metadata across distributed environments. It ensures that access control and security policies that are set once still apply wherever and however that data is accessed, so data teams know that only the right data consumers have access to the right datasets, and the most sensitive data is protected. Unlike decentralized and siloed data systems, having a centralized and trusted security management layer makes it easier to democratize data with the confidence that nobody will have unauthorized access to data. From a governance perspective, data teams have control over and visibility into the health of their data pipelines, the quality of their data products, and the performance of their execution engines.

The metadata turf wars have just begun

As data teams adopt hybrid, distributed data architectures, managing metadata is critical to providing a unified self-service view of the data, to delivering analytic insights that data consumers trust, and to ensuring security and governance across the entire data estate.

Chief Data Analytics Officers can take some important lessons from the data wars onto this new battlefield:

Choose open metadata: Don’t lock your metadata into a single solution or platform. Iceberg is a great tool for ensuring openness and interoperability with a large commercial and open source software ecosystem.
Unify metadata management: Invest in a metadata management solution that unifies operational and business metadata across all environments and systems, even third-party tools and platforms.
Automation and Scalability: Leverage automation to handle the scale and complexity of creating and managing metadata in large, distributed environments.
Centralized Security and Governance: Ensure that security and governance policies are consistently applied and enforced across the entire data landscape to protect sensitive data and ensure the health and performance of your data estate.

These are the guiding principles of Cloudera’s metadata management solutions, and why Cloudera is uniquely positioned to support an open metadata strategy across distributed enterprise environments.

Learn more about Cloudera’s metadata management solutions here.

#ClouderaLife Employee Spotlight: Stephanie Han

Debbie Kruger — Mon, 05 Aug 2024 15:00:00 UTC

In 2022, Stephanie joined Cloudera’s Ambassador Network, the company’s global network of philanthropic champions, dedicated to planning and executing giving and volunteering initiatives. After serving in this role for two years, Stephanie expressed interest and subsequently expanded her professional responsibilities to include the management of this critical culture initiative. Under Stephanie’s management, the Ambassador Network is wrapping up its inaugural Summer of Service campaign, which supplements Cloudera’s existing annual Week of Giving campaign. Both initiatives consist of philanthropic events and global volunteer opportunities.

“It was so much fun working with my team to plan our first Summer of Service and seeing Clouderans so excited to plan events giving back to their local communities and beyond,” Stephanie said.

Her interest in expanding her expertise and impact did not stop there. Now, Stephanie’s scope also includes Cloudera’s ESG and talent management programs as well.

“I feel very privileged to be a part of this team,” Stephanie said. “Volunteering as an ambassador for several years was really rewarding. It’s really unique to now have the opportunity to expand my scope and work on this initiative – as well as others of interest, like ESG and Talent Management – as part of my formal responsibilities too. Having the support of our HR Leadership to continue to build my experience and expertise is invaluable.”

One of the other areas of interest Stephanie has taken on is talent management. She serves as a member of the talent management team, which is made up of employees from various HR groups who are incredibly passionate about helping people grow their careers. The team works on succession planning, development plans, and maps out ways to help employees grow within Cloudera.

“Talent management is an area I’m grateful for during my time at Cloudera,” Stephanie said. “It’s an area I took great interest in before this role, and it’s one of the reasons I pursued a career change. I wanted to work with people and help them improve and succeed professionally, and now I get to do that through various unique and impactful initiatives at Cloudera.”

Empowering Professional Development Around the Globe

After the 2015 earthquake devastated much of Nepal, Stephanie, who lives in the Bay Area, visited the country to see the landscape, talk to locals, and try to address widespread needs. Now, Stephanie devotes a great amount of her free time to helping people in Nepal through group volunteer work. Recently, she helped start the Institute of Higher Learning (IHL), which aims to equip the Nepali people with skills to help them pursue a better life.

“We realized there’s very little that wecan do living halfway across the world, but one way we could really make an impact is by setting up a learning institute,” Stephanie said. “There’s a wealth of resources here in the Bay Area and many people are trained in various transferable fields including IT and Engineering. Nepal is quickly developing, so there will be jobs needed in the IT space, for example. We have people in my group who come from various professional backgrounds, and it is our mission to teach and mentor students to help open the door to greater economic opportunities.”

The institute is starting its first English class and hopes to launch technology classes and coding camps in the future.

Following Passions of Helping People Grow

Career changes are not always easy, but after a 12-year career in accounting, Stephanie managed to find a role she was truly passionate about – empowering others to grow and thrive at work while bringing people together through DEI and HR initiatives. In parallel, she continues to focus her free time on her philanthropic work to help students in Nepal and appreciates the ability to weave that passion for impact into her work here at Cloudera.

Read our last employee spotlight here.

In this Employee Spotlight, we sat down with Stephanie Han to learn about her tenure at Cloudera, her journey from accounting to leading diversity, equality & inclusion (DEI) programs, and her impressive volunteer work.

Meet Stephanie Han

Stephanie is a Senior Program Manager in the HR team at Cloudera. She’s been with the company since 2019 and plays a key role in a variety of employee-centric initiatives including Cloudera’s employee volunteering program, talent management program, and both its DEI and environmental, social, and governance (ESG) efforts. She exudes enthusiasm about all the work she’s involved in, is thankful for the variety of initiatives she gets to work on and the diverse perspectives she works alongside.

“Building a diverse team is so important because it allows us to learn from one another,” Stephanie said. “That is very true with my team. We all come from different backgrounds – both professional and lived experiences. It’s been so amazing to be able to grow with my team and learn from our different ways of thinking, ways of doing, and ways of being.”

Making the Switch from Accounting to HR – A Unique Opportunity at Cloudera

When Stephanie first joined Cloudera, she found herself in a role doing what she had always done – finance and accounting. During her time as senior operations revenue manager, Stephanie worked on revenue close processes, financial reporting, and more. However, after she started managing employees, she found her passion for professional development and the opportunity to help teammates grow and advance their careers. She wanted to do more than just help “close the books,” and this inspired her to seek a career change to HR.

She knew switching from a 12-year accounting career to HR would be difficult, but she remained persistent, conducting as many informal interviews with HR professionals as she could. Fortunately, those calls led her back to Cloudera.

“The right opportunity at the right time came up,” Stephanie said. “All the skill sets I gained in my previous role in accounting and revenue operations helped me to seamlessly hit the ground running in HR and program management. I’m a big believer that building skill sets is more important than actual job titles, and I was thankful that the leaders at Cloudera recognized that too.”

Stephanie took a consultancy position on Cloudera’s DEI team in October 2020 before earning a full-time role in April 2021. The team later became part of the HR department and Stephanie welcomed the opportunity to further integrate into areas of interest.

Growing Cloudera’s DEI Programs

After joining Cloudera’s DEI team, Stephanie helped to build the company’s Teen Accelerator Program, which has a partnership with the Boys and Girls Club of America in both Tennessee and the San Francisco Bay Area. This program facilitates opportunities in corporate America for high school students in under-resourced communities. The Teen Accelerator program offers students a six-week paid internship program at Cloudera and 1:1 employee mentorship. Managing this group of employee mentors inspired Stephanie to begin volunteering alongside colleagues as well.

An Overview of Cloudera’s AI Survey: The State of Enterprise AI and Modern Data Architecture

Jeff Healey — Thu, 01 Aug 2024 16:00:00 UTC

Enterprise IT leaders across industries are tasked with preparing their organizations for the technologies of the future – which is no simple task. With the use of AI exploding, Cloudera, in partnership with Researchscape, surveyed 600 IT leaders who work at companies with over 1,000 employees in the U.S., EMEA and APAC regions. The survey, ‘The State of Enterprise AI and Modern Data Architecture’ uncovered the challenges and barriers that exist with AI adoption, current enterprise AI deployment plans, and the state of data infrastructures and data management.

The State of Enterprise AI

It will likely come as little surprise that businesses across the world are swiftly incorporating AI into their operations, with 88% of surveyed companies already utilizing this transformative technology. AI is starting to revolutionize industries by changing how a business operates and the teams within. The departments leading this adoption are IT (92%), Customer Service (52%), and Marketing (45%). Across these business areas, AI is enhancing efficiency in IT processes, improving customer support with chatbots, and leveraging analytics for better decision-making.

Among various AI implementations, Generative AI (GenAI) stands out as the most popular, with 67% of respondents utilizing generative models in some capacity. Companies are deploying GenAI using several architectures: exposing data to open-source models without training on it (60%), training open-source models on their data (57%), using open-source models trained on-premises or in private clouds (50%), and developing proprietary Large Language Models (LLMs) or Small Language Models (26%).

In addition to GenAI, respondents noted they are deploying predictive (50%), deep learning (45%), classification (36%) and supervised learning (35%) applications.

Challenges in Implementing AI

Implementing AI does not come without challenges for many organizations, primarily due to outdated or inadequate data infrastructures. While every business has adopted some form of data architecture, the types they use vary widely. The majority of organizations store their data in private clouds (81%), but other architectures are also prevalent, including public clouds (58%), on-premises mainframes (42%), on-premises distributed systems (31%), other physical environments (29%), and data lakehouses (19%).

Navigating the complexity of modern data landscapes brings its own set of challenges. Key issues include data security and reliability (66%), escalating data management costs (48%), compliance and governance challenges (38%), overly complex processes (37%), siloed and difficult-to-access data (36%), mistrust in connecting private data and inaccuracies in AI models (32%), and the need for standardized data formats (29%).

Adding to these complexities is the rapidly evolving nature of data technologies and the growing volume of data businesses must manage. Ensuring that AI implementations are effective and secure requires continuous adaptation and investment in robust, scalable data infrastructures. This is essential for businesses aiming to leverage AI for competitive advantage and operational efficiency.

Leveraging Modern Data Architectures

In today’s landscape, the only way to ensure data reliability is through the adoption of modern data architectures. These advanced architectures provide critical flexibility and visibility, acting as a blueprint for accelerating the extraction of insights and value from data. They simplify data access across organizations, breaking down silos and making data easier to understand and act upon.

When asked about the most valuable advantages of hybrid data architectures, respondents highlighted data security (71%) as the primary benefit. Other significant advantages include improved data analytics (59%), enhanced data management (58%), scalability (53%), cost efficiency (52%), flexibility (51%), and compliance (37%).

Modern data architectures support the integration of diverse data sources and formats, providing a cohesive and efficient framework for data operations. This integration is essential for businesses aiming to leverage data-driven strategies, ensuring that their data infrastructure can meet the demands of evolving technologies and increasing data volumes. By adopting these architectures, organizations can position themselves to unlock new opportunities and drive innovation through reliable and accessible data.

The enhanced security, transparency, accessibility, and insights provided by modern data architectures directly contribute to a business's agility, adaptability, and informed decision-making. These factors are crucial for future-proofing data infrastructure, ensuring it remains robust over time, and achieving tangible ROI from AI implementations.

To gain more insights from Cloudera’s latest survey report, click here.

What Makes Data-in-Motion Architectures a Must-Have for the Modern Enterprise

Chris Joynt — Mon, 29 Jul 2024 16:00:00 UTC

Cloudera’s data-in-motion architecture is a comprehensive set of scalable, modular, re-composable capabilities that help organizations deliver smart automation and real-time data products with maximum efficiency while remaining agile to meet changing business needs. In this blog, we will examine the “why” behind streaming data and review some high-level guidelines for how organizations should build their data-in-motion architecture of the future.

Businesses everywhere seek to be more data-driven not just when it comes to big strategic decisions, but also when it comes to the many low-level operational decisions that must be made every day, every hour, every minute, and, in many cases, every second. The transformative power of incremental improvement at the operational level has been proven many times over. Executing better on the processes that add value to your value chain is bound to reap benefits. Take a hypothetical manufacturer for example. On the shop floor, myriad low-level decisions add up to manufacturing excellence, including:

Inventory management
Equipment health and performance monitoring
Production monitoring
Quality control
Supply chain management

It’s no wonder that businesses are working harder than ever to embed data deeper into operations. In 2022, McKinsey imagined the Data-Driven Enterprise of 2025 where winner-takes-all market dynamics incentivizes organizations to pull out all the stops and adopt the virtuous cycle of iterative improvement. It was very telling that, of the seven characteristics highlighted in that piece, the first two are:

Data should be embedded in every decision, interaction, and process
Data should be processed and delivered in real time

Notice that McKinsey isn’t talking about how fast data is created. They are talking about data being processed and delivered in real time. It is not the speed at which data is created that determines an organization’s response time to a critical event, it’s how quickly they can execute an end-to-end workflow and deliver processed data that determines their response. A sensor on a machine recording a vibration, on its own, has very little value. What matters is how fast that data can be captured, processed to put that vibration reading within the context of the machine’s health, used to identify an anomaly, and delivered to a person or system that can take action.

Businesses are challenged, however, with transforming legacy architectures to deliver real-time data that is ready for business use. For many organizations, the analytics stack was built to consolidate transactional data in batches, often over multiple steps, to report on Key Performance Indicators (KPIs). They were never built for real-time data, yet they are still the primary means of moving and processing data for most data teams. To achieve this, real-time data must first come to rest and wait to make its way through the stack. By the time it is ready for analysis, it is a historical view of what happened, and the opportunity to act on events in real time has passed, reducing the value of the insights.

The growing number of disparate sources that business analysts and data scientists need access to further complicates efforts. Unfortunately, a lot of enterprise data is underutilized. Underutilized data often leads to lost opportunities as data loses its value, or decays, over time. For example, 50% of organizations admit that their data loses value within hours, and only 26% said their streaming data is analyzed in real time. If an organization is struggling to utilize data before it decays, it fails to fully leverage the high-speed data in which it has invested.

Before we go any further, let’s clarify what data in motion is. Data in motion, simply put, is data that is not at rest, such as data in permanent storage. It includes data that is streaming - a continuous series of discrete events that happen at a point in time, such as sensor readings. It also includes data that is currently moving through an organization’s systems. For example, a record of login attempts being sent from an authentication server to a Security Information and Event Management tool is also data in motion. By contrast, data at rest isn’t doing much besides waiting to be queried. Data in motion is active data that is flowing.

Data-in-motion architecture is about building the scalable data infrastructure required to remove friction that might impede active data from flowing freely across the enterprise. It’s about building strategic capabilities to make real-time data a first-class citizen. Data in motion is much more than just streaming.

Delivering real-time insights at scale with the efficiency and agility needed to compete in today’s business environment requires more than just building streaming pipelines to move high-velocity data into an old analytics stack. The three key elements of a data-in-motion architecture are:

Scalable data movement is the ability to pre-process data efficiently from any system or device into a real-time stream incrementally as soon as that data is produced. Classic Extract, Transform, & Load (ETL) tools have this functionality, but they typically rely on batching or micro-batching as opposed to moving the data incrementally. Thus, they are not built for true real-time.
Enterprise stream management is the ability to manage an intermediary that can broker real-time data between any number of “publishing” sources and “subscribing” destinations. This capability is the backbone of building real-time use cases, and it eliminates the need to build sprawling point-to-point connections across the enterprise. Management involves utilizing tools to easily connect publishing and subscribing applications, ensure data quality, route data, and monitor health and performance as streams scale.
Democratized stream processing is the ability of non-coder domain experts to apply transformations, rules, or business logic to streaming data to identify complex events in real time and trigger automated workflows and/or deliver decision-ready data to users. This capability converts large volumes of raw data into contextualized data that is ready for use in a business process. Domain experts need to have access to inject their knowledge into data before it is distributed across the organization. A traditional analytics stack typically has this functionality spread out over multiple inefficient steps.

To transform business operations with data embedded in every process and decision, a data-in-motion architecture must be able to capture data from any source system, process that data within the context of the processes and decisions that need to be made, and distribute it to any number of destinations in real time. As organizations scale, the benefits of data in motion grow exponentially. The hallmark of an effective data-in-motion architecture is maximal data utilization with minimal latency across the organization. Examples of this include:

An order flowing across an e-commerce organization to provide real-time updates to marketing, fulfillment, supply chain, finance, and customer service, enabling efficient operations and delighting customers.
A user session on a telco network flowing across the organization and being utilized by various processes, including fraud detection, network optimization, billing, marketing, and customer service.

With data in motion enabling true real-time, analysts can get fresh, up-to-the-second, processed data ready for analysis, improving the quality of insights and accelerating their time to value.

A data-in-motion architecture delivers these capabilities in a way that makes them independently modifiable. That way, organizations can adopt technology that meets their current needs and continue to build their streaming maturity as they go. It should be easy to do things like onboard a new sensor stream when a manufacturing production line has been retrofitted with sensors by using data movement capabilities to bring data into an existing stream without modifying the entire architecture. We should be able to add new rules to how we manage streaming data without rebuilding connectivity to the source system. Similarly, it should be easy to add new logic into real-time monitoring for cybersecurity threats when we identify a new tactic. As demand for real-time data continues to grow and new data sources and applications come online, it should be effortless to scale up the necessary components independently without compromising the efficient use of resources. The speed with which an enterprise can make changes to the way they capture, process, and distribute data is essential for organizational agility.

Capturing, processing, and distributing real-time data at scale is critical to unlocking new opportunities to drive operational efficiency. The ability to do so at scale is the key to reaping greater economic value. The ability to remain agile is critical to sustaining innovation speed. Additionally, the value of architectural simplicity can not be understated. In a recent paper, Harvard Business School professor and technology researcher Marco Iansiti collaborated with Economist Ruiging Cao to model “Data architecture coherence” and the cascading benefit of sustained innovation speed across an enterprise. A coherent data architecture in Professor Iansiti’s definition is simple to understand and modify, and one that is well aligned with business processes and broader digital transformation goals. Professor Iansiti theorizes that the real driving force behind the innovation speed of many digital natives is not culture as much as it is a coherent data architecture that lends itself well to a rapid iteration approach to business process optimization. Reduction in redundant tools and process steps can be quantified in terms of licensing, resource utilization, personnel impacts, and administrative overhead. However, these benefits are dwarfed by the sustained innovation speed required to execute constant incremental improvements at the operational level that coherent data architectures deliver.

Cloudera’s holistic approach to real-time data is designed to help organizations build a data-in-motion architecture that simplifies legacy processes for data movement as it scales.

Ready to take action? Get started by reviewing Gigaom's Radar for Streaming Data Platforms to see how vendors stack up in this space.

Zero Downtime Upgrades - Redefining Your Platform Upgrade Experience

Mark Schoeni — Wed, 24 Jul 2024 16:00:00 UTC

Enter Zero Downtime Upgrades

ZDU is an answer to the increased demands on IT infrastructure brought on by internal stakeholders and external customers becoming global. The days when IT infrastructure could be brought down at night or on weekends to apply updates are disappearing.

Cloudera recently unveiled the latest version of Cloudera Private Cloud Base with the Zero Downtime Upgrade (ZDU) feature to enhance your user experience. The goal of ZDU is to make upgrades simpler for you and your stakeholders by increasing the availability of Cloudera’s services.

How Do You Keep IT Infrastructure (and Buses) Running and Avoid Downtime?

Before I dive into the depths of ZDU, let me provide an analogy inspired by a customer. Citizens of large cities heavily depend on their local metro systems to plan their day-to-day lives. People need to get to work, go to the doctor, and get groceries, and it’s up to their local transportation department to ensure they make it to their destinations reliably. Managing IT infrastructure starts to look like a city’s transportation infrastructure when you realize that end users also depend on the reliability of IT systems to complete work and get home on time. IT organizations have the thankless job of ensuring infrastructure is up to date and patched against the latest vulnerabilities while downtime is kept to a minimum. Similarly, transportation agencies reduce downtime through innovations like automatic inflating tire systems; so it's about time we innovate too. That begs the question – if clusters are like buses, how do I inflate the tires while the bus is en route? Meaning, how do I keep my infrastructure running and avoid downtime?

Similar to how a bus needs tune-ups, IT infrastructure needs maintenance to perform major upgrades, apply performance enhancements to scale workloads, or patch vulnerabilities to keep your environments safe. Cloudera helps you with this maintenance by delivering improvements and vulnerability patches in Service Packs and Cumulative Hotfixes (CHFs) <Cumulative Hotfix Blog>. Although applying Service Packs and CHFs is a straightforward process, you do need to restart services. Therefore, Cloudera Private Cloud Base needs to adopt a fundamental change to the upgrade and patch process to reduce and eventually eliminate workload downtime.

ZDU isn’t Cloudera’s first experience with providing the ability to upgrade services with no downtime. Rolling upgrades and restarts have been available in services like HDFS and YARN. This feature, which is still available in Cloudera Private Cloud Base, allows users to restart some of Cloudera’s fundamental services with reduced capacity and no downtime

Diving into Zero Downtime Upgrades

With that context let’s dive into how ZDU in Cloudera Private Cloud Base keeps your end users “on the bus” while performing critical maintenance. ZDU allows platform administrators to perform major upgrades, apply service packs, and cumulative hotfixes with minimum to no downtime. The first innovation of this experience was improving Cloudera Manager’s upgrade process. The service upgrade sequence is optimized to account for service dependencies and to limit the time a service experiences reduced capacity. These optimizations improve upgrade time whether performing a regular full downtime upgrade or the ZDU experience. Next, services are improved to either add the ability to upgrade without downtime or reduce the amount of downtime one may experience.

Let’s talk about what you should expect during a zero downtime upgrade. When initiating an upgrade with Cloudera Manager, you will first be presented with a checklist page to ensure your cluster is ready for an upgrade. After completing the checklist, you can perform a regular or zero downtime upgrade. Once the ZDU begins, the Cloudera Manager will begin to upgrade the services in two stages. First, services that will experience some downtime will be upgraded. This ensures that any service downtime is predictable and is only experienced at the beginning of your upgrade window. Next, Cloudera Manager will perform upgrades on the rest of the services that will experience reduced capacity, but zero downtime. When Cloudera Manager completes the sequence of commands, validation of the cluster by administrators can occur much like a regular upgrade before being finalized. If any issues occur during the process, Cloudera Private Cloud Base now supports downgrades to allow a cluster to adopt the previous version without losing any metadata.

The Cloudera team is passionate about helping you confidently tackle their toughest data and AI challenges. This first step into Zero Downtime Upgrades is a big achievement in providing a revolutionary experience for cluster administration teams. Ultimately our goal is to provide you with the tools to keep the buses rolling and passengers moving so we can all make it home on time.

To learn more, visit our product page.

Resilience in Action: How Cloudera's Platform, and Data in Motion Solutions, Stayed Strong Amid the CrowdStrike Outage

Jeremiah Morrow — Mon, 22 Jul 2024 16:00:00 UTC

Late last week, the tech world witnessed a significant disruption caused by a faulty update from CrowdStrike, a cybersecurity software company that focuses on protecting endpoints, cloud workloads, identity, and data. This update led to global IT outages, severely affecting various sectors such as banking, airlines, and healthcare. Many organizations found their systems rendered inoperative, highlighting the critical importance of system resilience and reliability.

However, amidst this disruption, one Cloudera customer reported that although many of their systems were impacted, Cloudera’s data-in-motion stack specifically demonstrated remarkable resilience, experiencing no downtime. Here, we’ll briefly discuss the incident, and how Cloudera protected its customers’ most critical analytic workloads from potential downtime.

The Incident: A Brief Overview

The CrowdStrike incident, which stemmed from a problematic update to their Falcon platform, caused widespread compatibility issues with Microsoft systems. This resulted in numerous systems experiencing the infamous Windows "blue screen of death" among other operational failures. While this incident did not involve a cyberattack, the technical glitch led to significant disruptions to global operations.

Cloudera’s Resilience - Data in Motion and the Entire Cloudera Data Platform

The Cloudera customer reported that despite many of their systems going down, Cloudera services running on Linux instances in Amazon Web Services (AWS) remained up and functional. These services included their data-in-motion stack, but it’s important to note that Cloudera’s entire platform and all hybrid cloud data services are equally resilient largely due to Cloudera’s focus on high availability, disaster tolerance, and long history serving mission-critical workloads to our large enterprise customers.

Cloudera offers the only open true hybrid platform for data, analytics and AI, and with that comes unique opportunities for supporting high availability and disaster tolerance. With portable data services that can run on any cloud, and on-premises, you can configure a variety of available sites that mix between different clouds and include on-premises resources, reducing the dependency on a single platform, vendor, or service to operate. For more information on how Cloudera is designed for resilience, read the Cloudera blog on Disaster Recovery, and follow the Cloudera Reference Architecture for Disaster Recovery for guidance and best practices to further your own resilience and availability goals with Cloudera.

Data in motion is a set of technologies, including Apache NiFi, Apache Flink, and Apache Kafka, that enable customers to capture, process, and distribute any data anywhere, enabling real-time analytics, AI, and machine learning. These technologies are key components for many mission-critical workloads and applications – from network monitoring and service assurance in telecommunications to fraud detection and prevention in financial services. Real-time workloads, when they are mission critical, carry the additional weight of timeliness, and, as such, a potential outage could have a significantly greater business impact compared to less time-critical workloads.

Fortunately for this and many other Cloudera customers, data in motion has been designed with Cloudera’s most exacting standards for high availability and disaster tolerance, including support for hybrid cloud, ensuring even if some components were to have a dependency on a CrowdStrike affected system or service, it would not have presented itself as a single point of failure for the platform. The continuity of service that they experienced underscores the reliability and resilience of Cloudera, even in the face of significant external disruptions, as well as Cloudera’s potential for reducing the business impact of cloud provider outages.

Architect for Resilience, Especially for Real-Time Applications

The CrowdStrike incident is not the first major service disruption that businesses have experienced, and it very likely will not be the last. The cloud provides many benefits from a cost, flexibility, and scalability perspective, especially for analytic workloads. However, it also comes with some operational risk. Many workloads and applications that rely on the real-time capturing, processing, and analysis of data have zero tolerance for downtime.

Cloudera’s platform, and the data-in-motion stack, are built with resilience in mind. Cloudera’s unique approach to hybrid cloud and investment in proven architectures for high availability and disaster tolerance can mitigate the challenges many companies have experienced in the past few days, protecting their mission-critical workloads and ensuring business continuity.

Learn more about Cloudera and data in motion here.

Introducing Cloudera Observability Premium

Wim Stoop — Wed, 10 Jul 2024 16:00:00 UTC

Cloudera Observability Premium immediately addresses challenges related to the health and optimization of data centers.

There’s nothing worse than wasting money on unnecessary costs. In on-premises data estates, these costs appear as wasted person-hours waiting for inefficient analytics to complete, or troubleshooting jobs that have failed to execute as expected, or at all. They manifest as idle hardware waiting for urgent workloads to come in, ensuring sufficient spare capacity to run them amidst noisy neighbors and resource-hungry, lower-priority workloads. In the public cloud, these cost management issues are compounded by consumption rates, where compute is often overused due to a lack of visibility into optimization opportunities.

With observability, you gain more than just the information about what’s happening in your infrastructure, workloads, and related services. You can tap into insights such as where to optimize for the biggest gains, what you can do to fix workloads that don’t run, and how you can save money in the cloud.

Observability for your most secure data

For your most sensitive, protected data, we understand even the metadata and telemetry about your workloads must be kept under close watch, and it must stay within your secured environment. You may be behind heavy firewalls, or even in a completely air-gapped environment, and sending telemetry to any third-party service is just not an option. For scenarios like this, we have now created Cloudera Observability Premium on-premises service. Simply install Cloudera Observability Premium services in your data center, set it up to receive Cloudera telemetry, and enjoy all the premium benefits without any data or metadata ever leaving your secured environment. If you have multiple Cloudera environments, as long as they can all connect to the same on-premises observability server you’ve installed, you can use this service across your organization and benefit from federated telemetry and centralized visibility.

Observability for your public cloud data estate

We all know how fast things move in the public cloud, and nothing moves faster, it seems, than the bill! One way we can help you regain control and reduce overspending in the cloud is through real-time monitoring and real-time automatic actions. Can you imagine being able to stop runaway jobs before you have to pay for them? Imagine no more. With Cloudera Observability’s latest innovation in real-time monitoring, customers running Cloudera DataHub on a public cloud can take full advantage of this feature, along with many other high-value capabilities, and start saving on cloud costs today.

New Data Observability capabilities

You’ve seen how Cloudera Observability Premium can tell you what you’re doing with your data - how many resources you’re using to process it, query it, and more. But what about the data itself? Wouldn’t it be great if you could also have some observability into what tables are hot and cold? Cloudera Observability Premium now includes features to measure your data’s temperature, and identify which tables are used the most, and what their health condition and other measures are like - all so you can improve data quality, performance, and health.

The data temperature feature lets us see whether hot or cold data sets are deployed optimally, including the underlying file sizes and partitioning styles. This allows you to quickly determine if your most important data is managed efficiently. You can easily check if these data sets are correctly secured, properly stored to minimize bottlenecks during analysis, and effectively partitioned to remain performant as they grow. With this added telemetry, you can take control of your data and ensure optimal use of one of your company’s most precious assets, driving even more business value from it.

Cloudera Observability does it again

Cloudera Observability Premium users have seen the advantages of immediately addressing issues and concerns related to the health and optimization of their data centers. Now, on-premises users with the most secure data centers can enjoy these same benefits—all without any metadata or telemetry leaving their protected environments. With Cloudera Observability Premium for DataHub on public cloud, we extend these benefits to public cloud workloads, where our customers run their mission-critical and complex applications. With these new observability features, you’ll maximize your investment and eliminate unnecessary spending.

To learn more, click here. Get Observability for your data center today as a SaaS application or reach out to your local Cloudera sales representative and let us know where you'd like to start.

Revolutionize Your Business Dashboards with Large Language Models

Varun Jaitly,Juno Schaser — Wed, 26 Jun 2024 07:30:00 UTC

And don't worry about losing track of the data behind the insights. Cloudera Data Visualization allows users to easily delve deeper into the underlying data, providing transparency and fostering trust in the results. This means users can build powerful visual dashboards and reports, and also have an additional layer of contextual intelligence through the AI visual for a comprehensive business intelligence workflow.

New AI assistant within Cloudera Data Visualization gets users talking to their data

In today’s data-driven world, businesses rely heavily on their dashboards to make informed decisions. However, traditional dashboards often lack the intuitive interface needed to truly harness the power of data. But what if you could simply talk to your data and get instant insights?

In the latest version of Cloudera Data Visualization, we’re introducing a new AI visual that helps users leverage the power of Large Language Models (LLMs) to “talk” to their data. Cloudera Data Visualization now leverages the latest advancements in natural language processing to transform your business dashboards into intelligent platforms.

Gone are the days of tedious filtering schemes and dropdown menus. With Cloudera Data Visualization, users can now have interactive conversations with their data, thanks to its seamless integration with LLMs of their choosing.. This means users can ask questions in plain language and receive accurate, contextually relevant responses. Say goodbye to static dashboards and hello to a whole new level of engagement.

One of the most remarkable features of the AI visual is its ability to understand context. For example, if a user asks their office supply data sets about “binders,” Cloudera Data Visualization automatically recognizes that the query might be referring to both durable and economy types that are sold by your organization. This level of intelligence streamlines the analysis process and saves valuable time. In the example below, the assistant is answering a question about sales performance of a particular product in a particular region.

The future of data insight from visualization is here – and it's smarter than ever before.

So, whether you're a data scientist, business analyst, or executive, Cloudera Data Visualization revolutionizes the way you interact with data. It empowers users to make faster, more informed decisions by putting the power of natural language processing at their fingertips.

[caption id="attachment_148179" align="alignright" width="512"] Dashboards, Visuals and Apps in Cloudera Data Visualization[/caption]

Ready to experience the future of business intelligence? Cloudera customers can now access the Technical Preview of this new AI visual within any Data Visualization dashboard or application, and see firsthand how LLMs can transform dashboards to make it easier to surface insights.

For more information on these features and our AI capabilities, visit our Enterprise AI page. When you’re ready, you can request a demo at the bottom of the page to see how these capabilities can work in the context of your business.

Introducing Cloudera's AI Assistants

Robert Hryniewicz — Mon, 24 Jun 2024 16:00:00 UTC

Under the hood, the assistant uses advanced techniques like prompt engineering and retrieval augmented generation (RAG) to truly understand your database. It works with many large language models (LLMs), whether they are public or private, and it effortlessly scales to handle thousands of tables and users simultaneously. So whether you're under pressure to answer critical business questions or just tired of wrestling with SQL syntax, the AI assistant has your back, enabling you to focus on what really matters – getting insights from your data.

AI Chatbot in Cloudera Data Visualization: Your Data's New Best Friend

BI dashboards are undeniably useful, but they often only tell part of the story. To gain meaningful and actionable insights, data consumers need to engage in a conversation with their data, and ask questions beyond simply the “what” that a dashboard typically shows. That's where the AI Chatbot in Cloudera Data Visualization comes into play.

Cloudera's ML copilots, powered by pre-trained LLMs, are like having machine learning experts on call 24/7. They can write and debug Python code, suggest improvements, and even generate entire applications from scratch. With seamless integration to over 130 Hugging Face models and datasets, you have a wealth of resources at your disposal.

Whether you're a data scientist looking to streamline your workflow or a business user eager to get an AI application up and running quickly, the ML copilots support the end-to-end development process and get models into production fast.

Elevate Your Data with AI Assistants

By embedding AI assistants for SQL, BI, and ML directly into the platform, Cloudera is simplifying and enhancing the data experience for every single user. SQL developers will be more efficient and productive than ever. Business analysts will be empowered to have meaningful, actionable conversations with data, uncovering the “why” behind the “what.” Additionally, data scientists will be empowered to bring new AI applications to production faster and with greater confidence.

Discover Cloudera's AI-Driven SQL, BI, and ML Assistants

In the last couple of years, AI has launched itself to the forefront of technology initiatives across industries. In fact, Gartner predicts the AI software market will grow from $124 billion in 2022 to $297 billion in 2027. As a data platform company, Cloudera has two very clear priorities. First, we need to help customers get AI models based on trusted data into production faster than ever. And second, we need to build AI capabilities into Cloudera to give more people access to data-driven insights for their everyday roles.

At our recent Cloudera Now virtual event, we announced three new capabilities that support both of our AI priorities: An AI-driven SQL assistant, a Business Intelligence (BI) chatbot that converses with your data, and an ML copilot that accelerates machine learning development. Let’s take a deeper dive into how these capabilities accelerate your AI initiatives and support data democratization.

SQL AI Assistant: Your New Best Friend

Writing complex SQL queries can be a real challenge. From finding the right tables and columns to dealing with joins, unions, and subselects, then optimizing for readability and performance, and doing all of that while taking into account the unique SQL dialect of the engine, it's enough to make even the most seasoned SQL developer’s head spin. And at the end of the day, not everyone who needs data to be successful in their day-to-day work is an SQL expert.

Imagine, instead, having a domain expert and a SQL guru always by your side. That's exactly what Cloudera's SQL AI assistant is. Users simply describe what they need in plain language, and the assistant will find the relevant data, write the query, optimize it, and even explain it back in easy-to-understand terms.

The chatbot resides directly within your dashboard, ready to answer any question you pose. And when we say “any question,” we mean it. Why are sales down in the Northeast? Will this trend continue? What actions should we take? The chatbot leverages the context of the data behind the dashboard to deliver deeper, more actionable insights to the user.

A written answer is a great way to start understanding your data, but let’s not forget the power of the visuals in our dashboards and reports. The chatbot eliminates the burden of clicking through dropdowns and filters to find answers. Simply ask what you want to know, in plain language, and the chatbot will intelligently match it to the relevant data and visuals. It's like having a dedicated subject matter expert right there with you, ready to dive deep into the insights that matter most to your business.

Cloudera Copilot for Cloudera Machine Learning: Your Model's New Best Friend

Building machine learning models is no easy feat. From data wrangling to coding, model tuning to deployment, it's a complex and time-consuming process. In fact, many models never make it into production at all. But what if you had a copilot to help navigate all of the challenges related to deploying models in production?

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Peter Ableda,Robert Hryniewicz — Mon, 24 Jun 2024 16:00:00 UTC

When it comes to debugging and troubleshooting, the Copilot reduces time to solution by clarifying error messages, identifying bugs within code, and proposing practical solutions such as: analyzing static code, recognizing common issues, and providing proactive recommendations to address them. This proactive approach not only improves code quality but also empowers developers to resolve issues more efficiently, fostering a smoother development process and minimizing disruptions to project timelines.

The Future of Data Science, Today

As we continue to innovate within Cloudera Machine Learning, the introduction of Cloudera Copilot marks a significant leap forward in data science practitioner productivity capabilities. Our commitment to empowering data practitioners extends beyond mere automation—it's about equipping them with the tools to rapidly drive innovation and achieve meaningful outcomes.

Get Started Today

Are you ready to unlock the full potential of your data workflows? Explore the capabilities of Cloudera Copilot for Cloudera Machine Learning and experience firsthand how AI-powered assistance can transform your productivity.

In the fast-evolving landscape of data science and machine learning, efficiency is not just desirable—it's essential. Imagine a world where every data practitioner, from seasoned data scientists to budding developers, has an intelligent assistant at their fingertips. This assistant doesn't just automate mundane tasks but understands the intricacies of your workflows, anticipates your needs, and dramatically enhances your productivity at every turn. Welcome to the era of Cloudera Copilot for Cloudera Machine Learning.

The Evolution of AI-Powered Assistance

At Cloudera, we understand the challenges faced by data practitioners. The complexities of modern data workflows often translate into countless hours spent coding, debugging, and optimizing models. Recognizing this pain point, we set out to redefine the data science experience with AI-driven innovation.

Cloudera Copilot for Cloudera Machine Learning integrates cutting-edge large language models directly into the machine learning service. This integration empowers developers and data scientists alike with advanced capabilities for code completion, generation, and troubleshooting. Whether you're tackling data transformation challenges or refining intricate machine learning models, our Copilot is designed to be your reliable partner in innovation.

Accelerating Productivity with AI

The Cloudera Copilot for Cloudera Machine Learning redefines data practitioners' workflows across critical areas: code generation and autocompletion, debugging and troubleshooting, and code understanding and exploration.

In code generation and autocompletion, the Copilot dramatically speeds up the development lifecycle. It begins by generating initial notebooks to kickstart projects and continues to assist in writing functions, test cases, or documentation. This practical support speeds up project initiation and maintains consistent coding practices. By automating these essential tasks, the Copilot frees up developers’ time to concentrate on innovation and problem-solving rather than spending valuable time on repetitive tasks. This integrated assistant fosters a more efficient development process and boosts overall productivity in data science and machine learning endeavors.

In the realm of code understanding and exploration, the Cloudera Copilot for Cloudera Machine Learning speeds up developer onboarding and assists in navigating complex projects. By capturing project-specific knowledge, the Copilot can assist new team members in grasping project structures, helping with high-level questions, and explaining implementation details. This capability accelerates learning curves and fosters better collaboration within teams of varying expertise, ultimately leading to more efficient development cycles and enhanced project outcomes.

The Importance of Recognizing Juneteenth

Ashton Stockstill — Fri, 21 Jun 2024 15:00:00 UTC

“A vital part of our culture at Cloudera is ensuring all voices are heard, and that we take the time to connect with others and understand the unique backgrounds that shape our lives. Our Juneteenth History & Culture Jeo-Party was an incredible success, filled with engaging dialogue and great insight into both Black history and the day-to-day experiences of our fellow colleagues.” - Dipto Chakravarty, Chief Product Officer

Interested in resources to learn about Juneteenth? We encourage you to watch this video from a previous Understanding Juneteenth event and review our Understanding Juneteenth Resource Guide.

For more on Cloudera’s commitment to Diversity, Equality and Inclusion, click here.

Juneteenth holds profound significance in the history of freedom and equality for Black Americans. Also known as Freedom Day or Emancipation Day, Juneteenth commemorates the anniversary of June 19, 1865, when news of the Emancipation Proclamation reached Galveston, Texas, finally declaring freedom for enslaved Americans held in the Confederacy–more than two years after the proclamation was issued in on January 1, 1863.

Black Americans have celebrated Juneteenth across the US–predominantly in the south and southwestern regions –for over 150 years. However, awareness and recognition of its importance has only come into the mainstream since it was recognized as a federal holiday in 2021.

Juneteenth is a momentous occasion that marked the culmination of a long and arduous struggle for freedom and justice. It calls for celebration, but also provides an important opportunity to reflect on the experiences of Black Americans throughout history and the challenges they still face today. As we celebrated Juneteenth in the U.S., we honored the resilience, courage, and perseverance of those who fought for freedom and equality then, and of those who continue to fight for true equality and equity today.

Cloudera is committed to fostering a culture of inclusivity, diversity, and belonging that empowers all employees to bring their whole, authentic selves to work every day. One of the ways we did this was by providing opportunities for our employees to educate themselves on the experiences of their colleagues and community members.

This year, we encouraged our employees to come together to celebrate their shared humanity and reaffirm their commitment to building a more just and equitable world for all. We urged them to take the time to engage in celebrations and activities that expanded their awareness of the holiday, including by joining our Juneteenth History & Culture Jeo-Party, which took place on June 20th. The event featured a fast-paced and competitive game that celebrated and shared education on Black history and culture.

Empowering Enterprise Generative AI with Flexibility: Navigating the Model Landscape

Robert Hryniewicz — Tue, 18 Jun 2024 16:00:00 UTC

The world of Generative AI (GenAI) is rapidly evolving, with a wide array of models available for businesses to leverage. These models can be broadly categorized into two types: closed-source (proprietary) and open-source models.

Closed-source models, such as OpenAI's GPT-4o, Anthropic’s Claude 3, or Google's Gemini 1.5 Pro, are developed and maintained by private and public companies. These models are known for their state-of-the-art performance and extensive training on vast amounts of data. However, they often come with limitations in terms of customization, control, and cost.

On the other hand, open-source models, such as Llama 3 or Mistral, are freely available for businesses to use, modify, and deploy. These models offer greater flexibility, transparency, and cost-effectiveness compared to their closed-source counterparts.

Advantages and Challenges of Closed-source Models

Closed-source models have gained popularity due to their impressive capabilities and ease of use. Platforms like OpenAI's API or Google Cloud AI provide businesses with access to powerful GenAI models without the need for extensive in-house expertise. These models excel at a wide range of tasks, from content generation to language translation.

However, the use of closed-source models also presents challenges. Businesses have limited control over the model's architecture, training data, and output. This lack of transparency can raise concerns about data privacy, security, and bias. Additionally, the cost of using closed-source models can quickly escalate as usage increases, making it difficult for businesses to scale their GenAI applications.

The Rise of Open-source Models: Customization, Control, and Cost-effectiveness

Open-source models have emerged as a compelling alternative to closed-source models, and usage has been on the rise. According to GitHub, there was a 148% year-over-year increase in individual contributors and a 248% rise in the total number of open-source GenAI projects on GitHub from 2022 to 2023. With open-source models, businesses can customize and fine-tune models to their specific needs. By training open-source models on enterprise-specific data, businesses can create highly tailored GenAI applications that outperform generic closed-source models.

Moreover, open-source models provide businesses with complete control over the model's deployment and usage. According to data gathered by Andreessen Horowitz (a16z), 60% of AI leaders cited control as the primary reason to leverage open source. This control enables businesses to ensure data privacy, security, and compliance with industry regulations. Open-source models also offer significant cost savings compared to closed-source models, as businesses can run and scale these models on their own infrastructure without incurring excessive usage fees.

Selecting the right GenAI model depends on various factors, including the specific use case, available data, performance requirements, and budget. In some cases, closed-source models may be the best fit due to their ease of use and state-of-the-art performance. However, for businesses that require greater customization, control, and cost-effectiveness, open-source models are often the preferred choice.

Cloudera's Approach to Model Flexibility and Deployment

At Cloudera, we understand the importance of flexibility in GenAI model selection and deployment. Our platform supports a wide range of open-source and closed-source models, allowing businesses to choose the best model for their specific needs.

Fig 1. Cloudera Enterprise GenAI Stack
Openness and interoperability are key to leverage the full GenAI ecosystem.

With Cloudera, businesses can easily train, fine-tune, and deploy open-source models on their own infrastructure. The platform provides a secure and governed environment for model development, enabling data scientists and engineers to collaborate effectively. Our platform also integrates with popular open-source libraries and frameworks, such as TensorFlow and PyTorch, ensuring compatibility with the latest advancements in GenAI.

For businesses that prefer to use closed-source models, Cloudera's platform offers seamless integration with leading public cloud AI services, such as Amazon Bedrock. This integration allows businesses to leverage the power of closed-source models while still maintaining control over their data and infrastructure.

Find out how Cloudera can help fuel your enterprise AI journey.

Cloudera Unveils Plans for Annual Pride Celebration in Cork

Debbie Kruger — Mon, 17 Jun 2024 15:00:00 UTC

Pride Month is underway and we at Cloudera are looking forward to joining the global celebration of diversity, equity and the ongoing effort for LGBTQ+ (Lesbian, Gay, Bisexual, Transgender, Queer/Questioning) rights and recognition.

Pride Month serves as a reminder that the fight for equality and equity for members of the LGBTQ+ community is not over. It is a call to action to continue to support and advocate for the rights of our LGBTQ+ colleagues and friends within our organization and our communities.

We are proud that Cloudera’s culture is built on a strong foundation of fostering inclusivity and creating an environment where everyone feels valued, respected, and empowered to bring their authentic selves to work. In past years, Cloudera employees have hosted events, both in-person and virtually, to celebrate. Last year, our team in Cork, Ireland joined those efforts and hosted its first on-site Pride event. The event brought together over 80 employees to celebrate the 30th anniversary of the decriminalization of same-gender loving relationships in Ireland. The event proved to be a resounding success, but don’t just take our word for it. Here’s just a sampling of what our Cork team had to say.

“Our Cork Pride events have gotten bigger and bigger with each passing year and we want to surpass that again this year. We are very fortunate in Cloudera to have the backing of our Leadership sponsors across all areas of the Organisation and without them, these events would not be possible.” Charlotte Keating, Senior Operations Analyst

“Being a member of the LGBTQ+ community, I was delighted to have the opportunity to give Clouderans in Ireland the chance to experience a colorful pride celebration, even people outside of the community should experience at least once in their life!” - Sean Murphy Phelan, Associate Business System Analyst

“The growing participation among the Clouderans both within and outside the community to take part in organising and attending the Pride events amplifies the culture of acceptance, welcome, and celebration that has been created in Cloudera over the previous years and will continue to grow through their support, participation, and help.” - Ódhlan Duff, Cloudera Cork Intern, FSS.

Building on that success, the Cork Team is excited to announce that this year's celebration will be held on Wednesday, June 26. Once again, they invite Cloudera’s global workforce to join the festivities which they promise will be even bigger and bolder this year. The event will include party games, a special performance by Cork's own drag performer Krystal Queer, and inspiring conversations around the triumphs and struggles of the LGBTQ+ community. Attendees will also hear from Cloudera leadership and members of the EMEA+ ERG Committee about our ongoing efforts in promoting Diversity, Equality, and Inclusion within our workforce. Additionally, the Cork Pride Committee will join the team to share insights into their mission and the significance of Pride in the local community.

The celebration will continue in August, when Cloudera participates in Cork’s Pride Parade for the second year. This all-out colorful event draws people from all walks of life to honor the LGBTQ+ community. This year's theme, 'Unity in Community,' is a powerful call to action focusing on inclusivity and acceptance, as well as the importance of strong support systems. We encourage all employees and our broader community to attend and celebrate the LGBTQ+ community, commemorate their LGBTQ+ peers, family, and friends, and to expand their knowledge of the history of Pride and the powerful contributions of the volunteers at Cork Pride in our community.

For more on Cloudera’s commitment to Diversity, Equality and Inclusion, click here.

Where Does Data Governance Fit Into Hybrid Cloud?

Wim Stoop — Sat, 15 Jun 2024 06:54:00 UTC

At a time when artificial intelligence (AI) and tools like generative AI (GenAI) and large language models (LLMs) have exploded in popularity, getting the most out of organizational data is critical to driving business value and carving out a competitive market advantage. To reach that goal, more businesses are turning toward hybrid cloud infrastructure - with data on-premises, in the cloud, or both - as a means to tap into valuable data.

But for all the excitement and movement happening within hybrid cloud infrastructure and its potential with AI, there are still risks and challenges that need to be appropriately managed—specifically when it comes to the issue of data governance. Inherently, a hybrid cloud infrastructure allows data to move between environments, which can make that data vulnerable to not only security risks but also lapses in compliance with internal standards or external regulations like the General Data Protection Regulation (GDPR) or California Consumer Privacy Act (CCPR) and even the Health Insurance Portability and Accountability Act (HIPAA).

The need for effective data governance itself is not a new phenomenon. It’s something that’s always been an important task alongside everyone’s day-to-day workflows. It’s also something that, unlike other projects, is always happening. With hybrid cloud infrastructure firmly cemented as the preferred approach to data infrastructure, governance needs to be at the top of every to-do list.

Barriers to Good Data Governance Remain

A Cloudera survey found that 72% of enterprise leaders agreed that data governance was an enabler of business value. But whether it’s a lack of buy-in from leadership, a disparate set of tools and users, or increasingly siloed data, a broad range of complications can prevent data governance measures from being comprehensive and impactful. As hybrid cloud architecture leads the way, data silos—a challenge that most IT leaders are all too familiar with—have increasingly been a pain point when managing governance. Any time operations are run through a hybrid architecture with multiple environments, it becomes difficult to maintain a holistic view of what’s happening across the organization as a whole, with different workloads running in isolated pockets, if things like data access and visibility are not managed effectively.

This, understandably, presents a major challenge when it comes to ensuring data governance practices are effectively implemented—even in the most data-driven organizations. The Cloudera survey found that over one quarter (26%) of enterprise practitioners reported they had anywhere from 51 to 100 data silos spread across their organizations. To break those silos and achieve a meaningful, comprehensive level of data governance, both large enterprises and data-driven organizations must prioritize and fully integrate solutions that help boost data visibility in a hybrid setting while also ensuring consistent compliance with both business-level data guidelines and external regulation.

Establishing Comprehensive and Impactful Data Governance

So, how should organizations go about data governance in a hybrid cloud? Some IT and business leaders attempt a more fragmented approach to governance, but while somewhat effective, that can create ‘islands’ of perfection—a place where governance appears to be happening effectively, leading IT leaders to assume it’s not as much of a concern. While these islands may seem sound up close, at an enterprise level, they become fragmented, isolated, and prone to collapse should governance change in other areas. Ultimately, the key to good governance must leverage solutions and architectures that provide an enterprise-level, unified, viewpoint of governance and data visibility.

In a hybrid cloud setting, data gravity sees smaller bodies of data pulled in toward larger ones where the center of gravity exists. Creating, managing, and maintaining each of those connections adds up over time and hinders data governance efforts. That’s where adopting the right hybrid data platform can help transform those operations and achieve a true hybrid cloud experience. With the right solution, businesses gain the ability to leverage architectures like a data fabric—a type of data architecture designed to give a unified view of data across an organization, regardless of where data is stored or how it is structured.

It also infuses automation into data management, handling not just the unlocking of data but also the sorting and investigating of information, determining what goes where and who needs access, bolstering internal guidelines and access rules that feed back to data governance. In the context of comprehensive data governance, implementing a data fabric has become a vitally important first step.

Learn more about how Cloudera can help guide you on your journey toward comprehensive data governance.

Addressing the Elephant in the Room - Welcome to Today’s Cloudera

Jeff Healey — Thu, 13 Jun 2024 16:00:00 UTC

The Future of Enterprise AI, Delivered Today

If the Big Data era was this century’s gold rush, then AI is the next moon shot. Hyperbole doesn’t really apply when you speak to the potential impact of AI for every business and person on earth (and beyond). But, what is essential to putting AI into practice to improve productivity? Again, it’s all about the data, but more specifically, trusted data so that you can trust in Enterprise AI.

Only Cloudera has the ability to help organizations overcome the three barriers to trust in Enterprise AI:

Readiness - Can you trust the safety of your proprietary data in public AI models? Cloudera’s true hybrid approach ensures you can leverage any deployment, from virtual private cloud to on-premises data centers, to maximize the use of AI.
Reliability - Can you trust that your data quality will yield useful AI results? With Cloudera’s modern data architectures, you can ensure your data is of high quality, well-governed, and managed as a single data estate.
Responsibility - Can you trust your AI models will give meaningful insight? Cloudera’s support for both open and closed models for enterprise AI available to all form factors ensures you have the choice, flexibility, and ability to cross-compare and ensure useful outcomes that you can trust.

With last week’s acquisition of Verta’s operational AI platform, we are deepening our technology and talent to accelerate AI innovation and, more specifically, simplify the process of bolstering customers’ private datasets to build retrieval-augmented generation (RAG) and fine-tuning applications. As a result, developers – regardless of their expertise in machine learning – will be able to develop and optimize business-ready large language models (LLMs). These bold acquisitions, a continual release of innovations, and key partnerships from the ecosystem, including NVIDIA, will enable all companies to prosper in the Enterprise AI era.

The Enterprise Runs on Cloudera

Innovative technology aside, the best evidence to show how a vendor has evolved to meet the business-critical use cases of its customers are through success stories.

Cloudera plays a central role not only at work but in all of our daily lives – from the money you save and spend, to the energy and connectivity in your home, to the car you are driving (and your insurance rates), to the phone and network that you are using, to the life-saving drugs and healthcare that keep you and your loved ones healthy.

A recent customer story - OCBC Bank has accelerated its data strategy with Cloudera - illustrates the power of Cloudera for machine learning use cases, particularly in the area of GenAI with business impact:

OCBC’s Next Best Conversation, a centralized platform that uses machine learning to analyze real-time contextual data from customer conversations related to sales, service and more. The bank increased their revenue by more than $100M annually by using the data to identify the most relevant information for each customer and curate personal experiences across communication channels.
OCBC also developed a credit card fraud detection solution that reduced the volume of transactions reviewed by anti-money laundering compliance analysts and increased the accuracy rate of identifying suspicious transactions. They developed smarter processes on the platform by introducing chatbots to take over 10% of customer interactions on their website.

But, What Happened to Hadoop?

Many of our customers store and manage their data – much of it unstructured – in HDFS, particularly in on-prem environments. And, with the growing popularity of object storage, we support a variety of S3 object stores from our partners for customers who want cloud-native architectures delivered on public and private clouds.

That open approach is key to enabling our customers to analyze data wherever it resides. Instead of moving the data each time to the compute that you want to use, you just keep all the data in its current place and bring the compute to the data. That is the key to our open data lakehouse architecture.

Also, we have seen significant adoption in Apache Ozone, a scalable, redundant, and distributed object store optimized for big data workloads running on-premises. In fact, Cloudera customers have just exceeded 1 exabyte of data stored in Ozone. In addition, customers can use the Ozone file system with key Apache technologies, including Apache Hive, Apache Spark, and Apache Iceberg, as well as any S3-compatible workload.

Those are just a few examples of how Cloudera constantly evolves with customer-led innovation to prepare everyone for a truly open future of data, analytics, and AI. That’s today’s Cloudera.

To learn more about groundbreaking innovations and customer stories, join us at Cloudera EVOLVE, the industry’s premier data and AI conference. We hope to see you there.

Hadoop. The first time that I really became familiar with this term was at Hadoop World in New York City some ten or so years ago. There were thousands of attendees at the event - lining up for book signings and meetings with recruiters to fill the endless job openings for developers experienced with MapReduce and managing Big Data.

This was the gold rush of the 21st century, except the gold was data. Two companies were at the center of it all, handing out the proverbial pickaxes: Cloudera and Hortonworks.

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies.

You could argue that the Big Data and analytics movement would not have happened without Cloudera. And all of those massive volumes of data are now today’s data lakes - more than 25 exabytes managed by Cloudera.

But, let’s make one thing clear - we are no longer that Hadoop company.

Welcome to Today’s Cloudera

Since the Big Data era, Cloudera has made massive investments with the north star of delivering customer-focused innovation wherever our customers run their business-critical data and analytics. This includes running analytics at the edge, supporting multi-cloud environments, treating Apache Iceberg as a first-class citizen, and introducing many more innovations like data observability.

That investment and support have resulted in the first true hybrid platform for data, analytics, and AI, backed by a seasoned and proven leadership team, with a go-to-market strategy focused on ensuring our customers' success in the future of Enterprise AI.

Making an AI Investment: How Finance Institutions are Harnessing the Power of AI and Generative AI

Joe Rodriguez — Wed, 12 Jun 2024 16:00:00 UTC

Of all of the emerging tech of the last two decades, artificial intelligence (AI) is tipping the hype scale, causing organizations from all industries to rethink their digital transformation initiatives asking where it fits in. In Financial Services, the projected numbers are staggering. According to a recent McKinsey & Co. article, “The McKinsey Global Institute (MGI) estimates that across the global banking sector, [Generative AI] could add between $200 billion and $340 billion in value annually, or 2.8 to 4.7 percent of total industry revenues.”

While these numbers reflect the potential impact of broad implementation, I’m often asked by our Financial Services customers for suggestions as to which use cases to prioritize as they plan Generative AI (GenAI) projects, and AI more broadly.

In truth, the question is usually framed more like, “How are my competitors using AI and GenAI?” and “What business use cases are they focused on?”

What Should Institutions Invest In?

The truth is, the industry is rapidly adopting AI and GenAI technologies to drive innovation across various domains. Traditional machine learning (ML) models enhance risk management, credit scoring, anti-money laundering efforts and process automation. Meanwhile, GenAI unlocks new opportunities like personalized customer experiences through virtual assistants, automated content creation, advanced risk and compliance analysis, and data-driven trading strategies.

Some of the biggest and well-known financial institutions are already realizing value from AI and GenAI:

JPMorgan Chase uses AI for personalized virtual assistants and ML models for risk management.
Capital One leverages GenAI to create synthetic data for model training while protecting privacy.
BlackRock utilizes GenAI to automatically generate research reports and investment summaries.
Deloitte employs AI for risk, compliance, and analysis while also using ML models for fraud detection.
HSBC harnesses ML for anti-money laundering efforts based on transaction patterns.
Bridgewater Associates leverages GenAI to process data for trading signals and portfolio optimization.

The key is identifying high-value, high-volume tasks that can benefit from automation, personalization and rapid analysis enabled by ML, AI, and GenAI models. Prioritizing use cases that directly improve customer experiences, operational efficiency and risk management can also drive significant value for the industry.

AI and ML for Risk Management

ML models can analyze large volumes of data to identify patterns and anomalies indicating potential risks such as fraud, money laundering or credit default, enabling proactive mitigation. In credit scoring and loan underwriting, AI algorithms evaluate loan applications, credit histories and financial data to assess creditworthiness and generate more accurate approval recommendations than traditional methods. ML models enhance anti-money laundering (AML) compliance by detecting suspicious transaction patterns and customer behaviors. Additionally, AI and robotic process automation (RPA) improve operational efficiency by automating repetitive tasks like data entry, document processing, and report generation.

Quick Wins with GenAI Opportunities

Financial institutions can achieve quick wins by leveraging GenAI to enhance or improve a range of use cases including customer service, operations, and decision-making processes.

Customer experiences

One significant application is in creating personalized customer experiences. AI-powered virtual assistants and chatbots can understand natural language queries, enabling them to provide tailored financial advice, product recommendations, and support. This personalized approach will improve customer satisfaction and engagement.

Content creation

Another area where AI will make a substantial impact is in automated content creation. GenAI models can automatically generate a wide range of materials, including marketing content, research reports, investment summaries and more. By analyzing data, news, and market trends, these models produce high-quality content quickly and efficiently, freeing up human resources for more strategic tasks.

Risk and compliance analysis

Risk and compliance analysis is another critical application of AI in finance. AI can rapidly analyze complex legal documents, regulations, financial statements and transaction data to identify potential risks or regulatory and compliance issues. This capability allows financial institutions to generate detailed assessment reports swiftly, ensuring they remain compliant with evolving regulations and mitigate risks effectively.

Trading and portfolio optimization

GenAI can play a pivotal role in trading and portfolio optimization by processing vast amounts of data to generate actionable insights and trading signals. These insights enable the implementation of automated investment strategies, additional variables in decision-making and optimized portfolio management allowing financial institutions to deliver superior investment performance to their clients.

The Opportunities are Compelling, but Significant Challenges Must be Addressed

Data privacy and security in the financial sector demand rigorous protection measures for sensitive information. This includes robust encryption, stringent access controls and advanced anonymization techniques to ensure financial data remains secure. Moreover, ensuring AI decision-making processes are transparent and explainable is crucial for meeting regulatory compliance standards. This transparency helps in understanding and verifying AI-driven decisions, thereby fostering trust.

Addressing biases and errors in training data is essential to prevent the propagation of incorrect insights. Bias mitigation ensures that AI systems provide fair and accurate outcomes, which is critical for maintaining the integrity of financial services. Additionally, safeguarding AI systems against data manipulation attacks and exploitation for fraudulent activities is vital to address cybersecurity vulnerabilities. This involves implementing strong defensive measures and continuously monitoring for potential threats.

Adhering to industry regulations and guidelines is necessary to ensure fairness and accountability in AI decision-making processes. Compliance with these standards helps in maintaining governance and regulatory oversight, which are essential for building a trustworthy AI ecosystem.

Monitoring for new sources or transmission channels of systemic risks introduced by AI adoption is crucial for managing systemic financial risks. These might include unforeseen vulnerabilities in AI models, reliance on flawed or biased data, or new types of cyber threats targeting AI systems. Understanding how these risks can spread within the financial system is critical to safe and effective AI. For instance, an error in an AI model used by one financial institution could propagate through interconnected systems and markets, affecting other institutions and leading to broader financial instability. Not addressing these risks can impact the entire financial system, not just individual entities, and have the potential to cause widespread disruption and significant economic consequences.

Additionally, proactive governance frameworks, security protocols and regulatory guidance will be crucial as financial institutions continue exploring the potential of AI.

How Cloudera helps Financial Institutions on their AI and Gen AI journey

Cloudera helps financial institutions harness the power of AI and GenAI while navigating the associated risks. Cloudera provides a secure, scalable and governed environment for managing and analyzing vast volumes of structured and unstructured data, essential for training accurate and unbiased AI models. Integrated ML and AI tools allow financial institutions to develop, deploy and monitor AI models efficiently, streamlining the implementation of the aforementioned use cases.

Cloudera's advanced data management capabilities ensure the highest levels of data privacy and security while data lineage and governance features help institutions maintain transparency and compliance with regulatory requirements.

With Cloudera, financial institutions can unlock the full potential of AI and GenAI while mitigating risks, ensuring responsible adoption, and driving innovation in the industry.

Fueling Enterprise Generative AI with Data: The Cornerstone of Differentiation

Robert Hryniewicz — Tue, 11 Jun 2024 16:00:00 UTC

A global pharmaceutical company has utilized GenAI to accelerate drug discovery and development. By integrating structured and unstructured data from clinical trials, research papers, and patient records, the company has trained GenAI models to identify potential drug candidates and predict their efficacy and safety. This data-driven approach has significantly reduced the time and cost associated with bringing new drugs to market.

These real-world examples demonstrate the transformative power of combining enterprise data with GenAI. By leveraging their unique data assets, businesses across industries can unlock new opportunities, drive innovation, and gain a competitive edge.

Learn more about how Cloudera can help accelerate your enterprise AI journey.

More than two-thirds of companies are currently using Generative AI (GenAI) models, such as large language models (LLMs), which can understand and generate human-like text, images, video, music, and even code. However, the true power of these models lies in their ability to adapt to an enterprise’s unique context. By leveraging an organization’s proprietary data, GenAI models can produce highly relevant and customized outputs that align with the business’s specific needs and objectives.

Structured and Unstructured Data: A Treasure Trove of Insights

Enterprise data encompasses a wide array of types, falling mainly into two categories: structured and unstructured. Structured data is highly organized and formatted in a way that makes it easily searchable in databases and data warehouses. This data often includes fields that are predefined, such as dates, credit card numbers, or customer names, which can be readily processed and queried by traditional database tools and algorithms.

On the other hand, unstructured data lacks a predefined format or structure, making it more complex to manage and utilize. This type of data includes a variety of content such as documents, emails, images and videos. Thankfully, GenAI models can harness the insights hidden within both structured and unstructured data. As a result, these models enable organizations to unlock new opportunities and gain a 360 degree view of their entire business.

For example, a financial institution can use GenAI to analyze customer interactions across various channels, including emails, chat logs, and call transcripts, to identify patterns and sentiments. By feeding this unstructured data into an LLM, the institution can generate personalized financial advice, improve customer service, and detect potentially fraudulent activities.

The Role of an Open Data Lakehouse in Seamless Data Access

To fully capitalize on the potential of GenAI, enterprises need seamless access to their data. This is proving to be a challenge for businesses – only four percent of business and technology leaders described their data as fully accessible. This is where an open data lakehouse comes into play. It is the building block of a strong data foundation necessary to adopt GenAI. An open data lakehouse breaks down data silos and enables the integration of data from various sources, making it readily available for GenAI models.

Cloudera’s open data lakehouse provides a secure and governed environment for storing, processing, and analyzing massive amounts of structured and unstructured data. With built-in security and governance features, businesses can ensure that their data is protected and compliant with industry regulations while still being accessible for GenAI applications.

By feeding enterprise data into GenAI models, businesses can create highly contextual and relevant outputs. For instance, a manufacturing company can use GenAI to analyze sensor data, maintenance logs, production records and reference operational documentation to predict potential equipment failures and optimize maintenance schedules. By incorporating enterprise-specific data, the GenAI model can provide accurate and actionable insights tailored to the company’s unique operating environment – helping drive ROI for the business.

Real-world Examples of Data-driven Generative AI Success

OCBC Bank, a leading financial institution in Singapore, has leveraged GenAI to enhance its customer service and internal operations. By feeding customer interaction data and financial transaction records into LLMs, OCBC Bank has developed AI-powered chatbots that provide personalized financial advice and support. The bank’s teams built Next Best Conversation, a centralized platform that uses machine learning to analyze real-time contextual data from customer conversations related to sales, service, and other variables to deliver unique insights and opportunities to improve operations. The bank has also used GenAI to automate document processing, reducing manual effort and improving efficiency.