中文

KeenData Releases KeenData Lakehouse2.0 Milestone Version, Building Data&AI Integrated Data Infrastructure

2025-08-14

KeenData Lakehouse2.0 is a Data&AI integrated platform oriented towards AI-Native, with the entire platform integrating "AI-Native" design philosophy, pioneering the AI-in-Lakehouse intelligent-driven architecture, connecting the full chain of data engineering → model training/inference → Agent factory → intelligent applications, advancing "Data&AI" new infrastructure with "trusted + intelligent + systematic" platform capabilities, supporting large organizations to move from data-driven to intelligent-driven.

KeenData Lakehouse2.0

KeenData Lakehouse2.0 adopts an AI-Native intelligent-driven architecture to achieve Data&AI engineering integration capabilities. The platform is designed for systematic Data&AI implementation in large organizations, providing infrastructure products that cover the full closed loop from data integration, offline and real-time development, multimodal computing, data governance, dataset management, AI model building, training-inference integration, to Agent development. The platform breaks through the traditional architecture where data and AI are separated, pioneering AI-in-Lakehouse technology to unify the lakehouse engine, OLAP data governance, and AI technology, forming a streamlined and efficient All-in-One technical solution. The self-developed multimodal computing engine completes data cleaning to results analysis in a single pipeline, multiplying GPU inference throughput, and combines KMI inference acceleration, model quantization, and Unity Catalog to achieve cross-modal intelligent governance.

(Figure: KeenData Lakehouse2.0 Product Matrix)

AI-Native Oriented
Data&AI Integrated Platform Features

Data&AI Integration

The platform achieves deep integration of data and AI, seamlessly connecting data lifecycle processing with AI development workflows, forming a closed-loop capability of data processing-AI development-application implementation.
Its core features are reflected in three aspects:

  • Multimodal Data Processing: Supports text/image/audio-video fusion governance;
  • Agent Intelligent Architecture: Achieves perception-cognition-action-evolution closed loop;
  • Data&AI Integration: Data&AI native fusion provides All-in-One architectural capabilities, eliminating the separation between Data and AI architectures.

(Figure: Data&AI Integration)

AI-Native

Unlike traditional platforms' loosely coupled external AI mode, KeenData's Data&AI integrated platform takes AI-Native as its core design philosophy, deeply embedding intelligent capabilities into the system's genes, building an intelligent data foundation base with autonomous evolution capabilities—its technical architecture and core capabilities are all centered around the bidirectional drive of AI efficiently processing data and data intelligently supporting AI, covering three core capabilities: MaaS self-inference, Agent self-iteration, and intelligent data lifecycle management.

Addressing the pain points of traditional storage-compute integrated architecture such as low resource utilization and high scaling costs, the platform adopts a storage-compute separation architecture, with data uniformly stored in high-performance unified storage, and computing resources can be elastically scaled on demand, not only reducing storage costs by more than 30%, but also allowing AI training, inference, and other computing tasks to flexibly call resources, completely solving the resource contention problem where large tasks squeeze out small tasks, laying a solid foundation for the implementation of intelligent closed-loop capabilities.

(Figure: AI-Native Full-Chain Capability Closed Loop)

AI-Native Oriented
Data&AI Integrated Platform Key Capabilities

AI Integrated Implementation Capability

The platform covers the full lifecycle of AI models, from model building, deployment to evaluation, governance, release, and application, providing comprehensive services and support. Through a unified computing power scheduling engine that dynamically optimizes resource allocation, it provides strong support for enterprise-level large model development, deployment, and intelligent operations, ensuring stability and elasticity in production environments. The platform innovatively integrates hundreds of pre-trained models, supports zero-code model fine-tuning and transfer learning, combined with advanced algorithm matrices, helping enterprises quickly build and implement proprietary models adapted to their business scenarios. Meanwhile, the platform provides visual Agent application building capabilities, allowing developers to orchestrate multi-node workflows through low-code methods, efficiently developing production-grade generative AI applications, promoting AI democratization. At the data foundation level, the platform implements full lifecycle management of unstructured data based on lakehouse architecture, integrating dual modes of manual annotation and AI intelligent annotation, building high-quality training datasets, laying a solid foundation for model training and accumulating core assets.

Keen AI as a key support, deeply customizes the training-inference framework, with built-in multiple model training and optimization strategies, supporting model lightweighting, multi-mode parallel computing, and inference acceleration algorithms such as sparse activation, Operator Fusion optimization, Paged Attention, etc., achieving efficient collaboration between training and inference, breaking through the performance bottlenecks of traditional fragmented development.

Agent Ready-to-Use

Rakesh Gohel proposed in his famous iceberg model that there is a harsh reality in the actual implementation of AI agents in enterprises: building a truly usable enterprise-level agent requires 90% software engineering and 10% AI.

KeenData's Data&AI integrated platform provides a one-stop Agent development factory with native fusion of 90% engineering capabilities and 10% AI capabilities, enabling developers to easily build diverse AI applications such as intelligent assistants, text writing, and automated workflows. The platform has built-in rich AI Agents that can be directly reused, provides visual orchestration tools and online debugging preview functions, supports seamless access to various mainstream large models, allowing developers to quickly build customized Agents and RAG (Retrieval Augmented Generation) applications, significantly reducing development barriers. Through dynamic task decomposition algorithms, the platform accurately splits complex requirements, combined with multimodal intent understanding technology to deeply analyze user demands, leveraging cross-platform execution engines to connect data, tools, and services, achieving a complete closed loop from requirement understanding to task execution. Meanwhile, the platform provides full lifecycle management functions such as online debugging preview, application release updates, and API access, and supports flexible access and control of various models to meet different scenario model requirements, empowering developers to efficiently implement intelligent application construction and deployment, effectively helping enterprises activate and build the complete 90% at the bottom of the iceberg.

多模态计算引擎

科杰科技Data&AI一体化平台为多模态AI工作负载设计多模态计算引擎,支持在同一个数据处理流水线中进行数据清洗、特征提取、模型推理和结果分析,深度兼容主流数据与AI框架支持任务内混合调度运行。多模态计算引擎重构了数据预处理范式,构建了一个原生理解和处理各种复杂多模态数据的引擎系统,面向AI/ML工作流设计,提供更好的AI/ML的Data Frame原语;具有低延迟高吞吐的特性,支持零拷贝数据共享,通过不可变数据设计简化容错机制,降低 70% 的网络开销,特别适合计算密集型任务;在此之上提供增强动态执行引擎,实现对任务和行动器的统一高度抽象,一套接口既能表达基于任务的并行计算(task-parallel),又能表达基于行动器的并行计算(actor-based)。

(Figure: Multimodal Computing Engine)

AI for Data Governance

The platform practices the "development-governance integration" concept, building an AI-driven intelligent governance system. Through intelligent metadata scanning, it achieves dynamic encryption and desensitization of sensitive data, with self-developed unified metadata technology covering the entire product matrix, providing enterprise-level data governance support, covering intelligent governance, permission management, centralized auditing, automatic tracking of data lineage, and cross-platform, tenant, and regional data sharing, ensuring the security and compliance of data assets. Based on business theme classification, it builds standard data models, monitors data anomalies in real-time and generates quality assessment reports, achieving intelligent upgrade of data governance from passive control to active prevention.

Full-Stack Intelligent Capabilities

The platform is designed to be simple and easy to use, significantly improving enterprise data development and application efficiency through built-in high-precision NL2SQL models and other intelligent capabilities. Its semantic development engine built on NLP technology supports business personnel to directly use natural language for data query and development, providing SQL interpretation and optimization functions for data warehouse engineers, greatly improving development efficiency. The platform's powerful multimodal data retrieval capability, combined with OCR, feature extraction technology, and deep natural language understanding, supports rapid cross-modal content retrieval and precise positioning of text/images. The intelligent data query system automatically associates global data assets through deep understanding of business semantics, allowing users to conveniently query various structured and unstructured assets within the platform using natural language. Meanwhile, the platform supports building efficient enterprise internal knowledge bases, with intelligent segmentation and embedding processing capabilities for multi-format documents, converting documents into searchable knowledge units, and providing deep intelligent Q&A functions through integration of advanced models (such as DeepSeek), greatly improving enterprise efficiency in data development, retrieval, management, and application, as well as user interaction experience.

Independent and Controllable Technical Support Capabilities

The platform relies on more than 170 core patents in big data and AI to build a solid, secure, and controllable technical foundation. Self-developed AI-in-Lakehouse intelligent-driven architecture, multimodal fusion engine, Data Fabric, Active Metadata Management, Data Mesh, and Data Virtualization achieve integrated governance and development as well as distributed data processing under centralized control; self-developed unified catalog (Unify Catalog) provides cross-modal semantic alignment capabilities for AI large models, ensuring consistency and security of data understanding; innovative KMI inference acceleration technology achieves 2x performance improvement, optimizing heterogeneous chip resource scheduling efficiency; advanced model quantization technology utilizes low-precision tensor cores (INT8/INT4 Tensor Core) to achieve almost lossless compression, reducing 70% storage overhead; open architecture supports multiple computing engines, providing unified data and model monitoring. Through deep domestic adaptation (Huawei Ascend, Hygon, Kylin, UnionTech OS), the platform achieves comprehensive domestic compatibility from underlying hardware to upper-layer applications, providing independent, controllable, secure, and efficient core technical guarantees for government, central enterprises, and high-security requirement industries.

Platform Diverse Application Scenarios

KeenData's Data&AI integrated platform, with the dual advantages of full-chain AI-Native architecture and low-code toolchain, combines technical universality with scenario adaptation flexibility. It can quickly respond to common needs in general scenarios such as data retrieval, intelligent assisted development, intelligent services, and knowledge management, while supporting deep customization based on vertical domain characteristics, truly achieving one platform covering multiple scenarios.

  • Multimodal Data Retrieval: Through intelligent data annotation + natural language understanding, supports rapid and precise retrieval of multimodal data, including text-to-text, text-to-image, image-to-image, image-to-text, etc.

    (Figure: Enhanced Vector Retrieval)

  • Intelligent Data Asset Q&A: Through natural language, quickly retrieves various Data&AI platform structured/unstructured data asset information. Users do not need to deeply understand different data asset storage structures or develop query statements. The system can automatically analyze query question semantics, convert analyzed semantics into query statements, and finally return structured results after query.

    (Figure: Data Asset Intelligent Q&A)

  • Intelligent Assisted Development: Supports providing programming assistance capabilities for different development languages to development engineers in natural language, including code generation, SQL generation, etc., and provides logical interpretation and performance optimization suggestions for existing code, assisting in rapid understanding and optimizing execution efficiency.

    (Figure: Intelligent Development)

    (Figure: SQL Interpretation and Optimization)

  • Agent Development Factory: The AI Agent development platform can quickly build scenario-oriented Agent applications such as intelligent customer service and process assistants. Through visual orchestration tools combining dialogue nodes and task execution nodes, enterprises can launch intelligent Q&A systems without coding to handle high-frequency needs such as customer inquiries and ticket routing; combined with the zero-code fine-tuning capability of the large model training platform, it can also optimize models based on enterprise's own dialogue data, improving answer accuracy and professionalism, and reducing manual customer service costs.

    (Figure: Visual Process Orchestration)

  • Intelligent Writing: Can easily build various applications, such as automatic tender writing, automatic contract review, PPT generation, and other AI applications, helping enterprises improve key document processing capabilities.

    (Figure: Application Square)

  • Knowledge Base Construction: Enterprise internal knowledge base supports intelligent segmentation and embedding of multi-format documents (PDF, Excel, TXT, etc.), converting fragmented knowledge into searchable structured units, using high-performance vector databases to build enterprise internal private knowledge bases; supports employees to quickly obtain required knowledge through natural language questions, such as professional technical solutions, contract management, corporate bylaws, historical project experience, etc. It also supports rapid recall and standard search of knowledge bases based on various databases, and deep analysis of user questions and recalled knowledge based on DeepSeek, providing professional and accurate answers.

    (Figure: Enterprise Knowledge Base)

Typical Case Applications

A City Data Bureau Data Infrastructure Project, based on KeenData's Data&AI integrated platform, enables non-algorithm teams to prepare data through the corpus processing layer, complete model training, fine-tuning, and deployment with zero code through the intelligent support layer, then call APIs or build agents to quickly transform large models into commercial products; simultaneously connecting the full chain from multimodal data to industry agents, covering the full lifecycle of "data → model → application", supporting rapid construction of data products for small-incision scenarios; through standardized SDK and plugin interfaces, opening up third-party corpus processing tool access, achieving "plug-and-play", accelerating the implementation efficiency of large models in urban business scenarios, and promoting AI technology to truly serve urban governance and industrial upgrading.

A City Digital Government 2.0 Project, based on KeenData's Data&AI integrated platform, builds a trusted data space, constructing new smart city big data infrastructure and trusted data space, achieving comprehensive digitalization, intelligence, and refined management and services in government affairs, people's livelihood, industry, and other fields, while ensuring that data resources are deeply mined and quickly applied, thereby driving the rapid development of cities and even city clusters, building the first government-side intensive data infrastructure common support platform, exploring effective supply of government public data to social enterprises.

In a Central State-Owned Enterprise Data Intelligence Foundation Project, relying on KeenDatay's Data&AI integrated platform, a unified data center and governance system was built, completing efficient storage and computation of newly added big data, and further combining with business scenarios, providing hundreds of service support for planning, engineering decision-making, and engineering integrated platforms. With AI driving comprehensive business and research data management sharing, accelerating the digital intelligence transformation of data into resources and assets, improving operational efficiency, and achieving integrated chain operations, it is an important milestone for the group's digital intelligence operations to enter a new stage of efficient collaboration.

KeenData Technology has always been deeply cultivating the R&D innovation of Data&AI technology, relying on KeenData Lakehouse2.0, focusing on the construction and upgrade of large organizations' data infrastructure. Through AI-Native native architecture and Data&AI integration capabilities, it provides full-chain support from data engineering to intelligent applications for enterprise digital transformation, accelerating the transformation of data value into business momentum with independent and controllable technical foundation and scenario-based implementation capabilities.

  • 产品介绍
  • 业务咨询
  • 联系我们
  • 回到顶部

Contact Us (09:00-18:00)

+86-10-64703560

Technical Support

Learn more
Start your data intelligence journey now

×
Submit

Thank you for your inquiry. We will contact you within 1 business day

×