Enterprise digital transformation and technology trends of big data + AI
The digital transformation of enterprises has mainly gone through two stages, from the informatization stage to the digital intelligence stage.
In the first stage, in the 1980s and 1990s, the information revolution, which had been developed in the West for decades, began to show its influence in China. The development of core technologies such as computer technology and network communication has made informatization a new era that replaces the past two eras of agriculture and industry.
Slowly, companies began to deposit the processes that were originally on paper and in the minds of experts into the information system. “Digital transformation” has become a topic that companies have been talking about in recent decades. During this process, a number of excellent software companies began to emerge, especially ERP software, to assist various companies in digital transformation.
However, with the development of informatization, enterprises have accumulated and accumulated a lot of data, and they need to give full play to the business value of data. Big data and AI technologies have been applied to many business scenarios, thus creating a market demand for data-driven business innovation.
In the second stage, more and more enterprises are not satisfied with the effect of simple digital transformation, but seek to upgrade digital intelligence. Scenario-based intelligence driven by digital and intelligent integration platform has become the only way for enterprises to achieve transformation.
At this stage, the capabilities of the technology platform begin to emerge. It can not only integrate the technical capabilities of big data and AI, but also quickly generate rich scenario-based intelligence and apply it to specific business scenarios, so it has become a way for many large enterprises to seek change.
However, the upgrade of enterprise digital intelligence faces new core challenges, which are mainly reflected in five points: (1) The more and more enterprise data accumulates, the speed of mining data value has not increased at the same speed, and the value density of unit data is becoming less and less; 2. The contradiction between the needs of enterprises to upgrade their business intelligence and their real-time decision-making needs and their own IT level cannot be supported; 3. There is a shortage of talents related to big data and AI; 4. The input-output ratio of business intelligence is not high; 5. It is difficult to quickly iterate and scale Smart business upgrades.
At the same time, the development trend of big data technology has also undergone new changes: ① the growth of data and computing power has exceeded the development of business and hardware; ② big data and AI applications are more integrated; ③ big data platform technology Support multi-modal computing; ④ More and more software and hardware begin to develop together; ⑤ Big data analysis becomes real-time and intelligent; ⑥ Data sharing based on privacy and security becomes important; ⑦ Technology based on the integration of lakes and warehouses And business has become a new evolution trend.
That is to say, the market demand formed by the challenges faced by customers has given birth to the development of technology, and the development of technology has continuously fed back to the market and customers to optimize market demand. Alibaba Cloud has been actively exploring the big data AI platform and scenario-based intelligent creation, and has upgraded five core capabilities in terms of combining technology development trends with customer needs.
Five core capabilities of Alibaba Cloud’s big data + AI integrated platform have been upgraded
Capability 1: Agile data governance, both bottom-up and top-down
After decades of informatization, most enterprises have accumulated a lot of business data. If enterprises want to build intelligent application barriers, they must find ways to use this data. But when companies build intelligent applications, they find it difficult to use this data:
For example, data from different business departments forms data islands on their own platforms. Not only do data not communicate with each other, but even data naming rules, expressions, and restrictions are different; there is a shortage of professionals who are familiar with all data logic; there is no distinction between data storage……
People are starting to realize that data governance for the enterprise is a required course.
Internet companies usually adopt a very effective “lean production” approach to data governance: data developers build models from the bottom up, first integrate data from data sources into data platforms, and then process, store, and Provide data services to upper-layer applications, and manage data at any time after encountering problems.
This bottom-up approach can very quickly process large-scale data from a point-to-point perspective, quickly respond to business needs, and at the same time manage and control data storage and processing costs through the intermediate data governance link. DataWorks, the big data development and governance platform, provides flexible service support for enterprises in this regard.
Gradually, enterprises have put forward new requirements for unified data standards, unified data management and governance, and the top-down model has been widely demanded and applied: starting from the business, carry out system planning of data warehouses, sort out existing data, and define standards , conduct data warehouse modeling, and define data indicators for various business applications in advance.
It is worth mentioning that this demand has spread from the financial industry to all walks of life, and even Internet companies that are accustomed to bottom-up and small steps. DataWorks adopts a two-pronged approach to meet the comprehensive platform requirements of enterprises in terms of data governance. It combines the traditional top-down modeling system capabilities with the ability to adapt to flexible “lean production” bottom-up construction of a data warehouse system. Ability to reverse model.
At the same time, in order to make it easy for enterprises to see whether their data is healthy or not, DataWorks has also launched a five-dimensional evaluation model of data governance health, covering five levels of R&D specifications, data quality, data security, computing resources, and storage resources. Evaluate the data health of an enterprise, and more effectively provide a strong digital basis for the enterprise’s data governance.
Ability 2: Lake and warehouse integration is newly upgraded to 2.0, truly achieving a single data, unified management and control, and diverse analysis
Recently, the form of data lake has been used by many enterprises. This technical form determines that it is easy for enterprises to manage data and use rich open source engines to perform various forms of computing on it. At the same time, driven by BI applications such as traditional reports, the data warehouses that enterprises have built have formed “data islands”, and it is difficult to perform collaborative analysis between data, and most enterprises do not have the ability to centrally process all data.
Driven by application requirements, enterprises have very strong data interoperability requirements for data storage and analysis in different warehouses and lakes. This is also the background of the “integration of lakes and warehouses”. Last year, Alibaba Cloud’s “Lake and Warehouse Integration” opened up the cloud data warehouse product “MaxCompute” and the data lake product “EMR”. After a year of customer training and precipitation, “Lake and Warehouse Integration” has a new capability of 2.0.
In terms of purchase experience, users can connect to the cloud-based Severless data warehouse (MaxCompute) and cloud-native data lake (EMR+OSS) in minutes online to achieve secure interworking of unified metadata and storage. It not only better supports standard HDFS data access, but also continuously optimizes the high-speed access performance to OSS object storage, and expands support for open source data lake formats such as Hudi and Delta Lake. The MaxCompute computing service improves the performance of accessing data in the EMR data lake by 10+ times by upgrading the intelligent caching capability.
In other words, the integration of lake and warehouse 2.0 can help enterprises eliminate data silos, manage and manage data in different forms in a unified manner through DataWorks, and accelerate analysis of specific applications. At the same time, it can also help enterprises make full use of existing systems while building new data warehouses or data lakes, and avoid decision-making risks of large data concentration for enterprises when application requirements are becoming more and more urgent.
Capability 3: Integration of data warehouses on the cloud and offline to improve analysis performance
Real-time and intelligence have become the development direction of cloud warehouse services. More and more enterprises can’t bear the long work of guiding business data decision-making after data is processed offline by T+1. Instead, they hope to generate real-time insights together with existing offline data while continuously generating real-time data. , which instantly generates the policies your business needs.
For example, a game player, in the process of game experience, pushes him a gift package that is very helpful to the current game experience according to the player’s immediate needs. While satisfying the player’s experience, it will also increase the payment conversion rate; for example, in the real-time data generation of securities transactions At times, transactions are realized through off-line integrated data analysis, meeting the management requirements of regulatory agencies, and better helping institutions control risks.
The cloud-based digital warehouse and off-line integrated solution provides on-demand services for users’ various timeliness needs in the field of analysis. Offline big data analysis MaxCompute is deeply integrated with the real-time data warehouse Hologres, and the real-time analysis of offline data can achieve a 10-fold performance improvement.
Among the internal components of the real-time data warehouse, the construction of an event-driven real-time data warehouse can be realized through the ability of real-time computing of the Flink version. Externally, the data in the data lake can be efficiently analyzed, and the data can be stored at a high speed. Through the support of the standard and open SQL protocol, the native support for 19 mainstream BI tools can be realized, helping customers to quickly build from data integration to data. Data warehouse application for analysis interface.
The data scale is getting bigger and the cluster scale is getting bigger and bigger, which will definitely bring great challenges to the operation and maintenance capabilities of the big data platform. Under the manageable and controllable massive data, query optimization technology and file storage optimization technology will The advantages of large-scale clusters are brought into full play, and the automatic tiering of cold and hot ice storage can reduce the cost increase caused by storage growth for users. The intelligent data warehouse solves the problem of difficult operation and maintenance of most enterprises, and truly realizes the intelligent driving of the enterprise big data platform.
Capability 4: Big data AI scenario-based intelligence to speed up business
The integration of big data and AI can not only reduce costs and increase efficiency for enterprise IT operations, but also directly bring business value.
Here are a few examples to see how Alibaba Cloud uses the power of big data and AI to increase users, improve business operation efficiency, reduce business operation costs, or improve risk control for enterprises.
The first is the end-to-end super-score of audio and video media. Based on the MNN on-end reasoning framework that Alibaba has open-sourced for many years, combined with years of optimization experience accumulated in algorithmic reasoning, an on-end super-score algorithm application has been constructed to help users improve the experience while saving computing, storage, and network to the greatest extent possible. resource.
The user retention brought by the viewing experience and fluency of audio and video media, and the computing and storage resource costs of CDN and GPU are two very important indicators that any Internet content provider needs to consider during business operations.
The verification data of actual customers shows that we can help customers save 44% to 75% of the cost of CDN content distribution, and at the same time bring a 1% increase in viewing time.
With the increase in the number of users, these two indicators bring continuous improvement of business operation efficiency to enterprises. At the same time, many large-scale inference calculations in the cloud are distributed to the terminal side, which greatly saves the operating cost of enterprises.
The second is the software and hardware co-optimization of big data and AI. PAI – Eflops solves the pain points of customers in deep learning training or inference acceleration, and helps customers to convert AI computing power investment into more efficient productivity through software and hardware co-optimization.
Through the cluster network optimization technology protected by multiple patents, the memory management technology in large-scale distributed model training, the self-developed vGPU technology, the large-scale distributed model training and optimization framework accumulated over the years, as well as end-to-end model management, monitoring And operation and maintenance technology, PAI-Eflops helps many customers to maximize the advantages of software and hardware synergy in specific scenarios. In the scenarios of financial quantitative models, Internet intelligent search and other AI-intensive applications, we can use complex neural networks. The parameter transfer performance is improved by 3 to 7 times, the application efficiency of GPU computing power is improved, and the overall performance is improved by nearly 100 times.
The experience of model training and inference is exactly the same as the experience of PAI on the cloud, so that future users can enjoy a convenient cloud-edge integrated AI experience such as federated modeling on the cloud and offline, saving their own IT and larger-scale AI training clusters investment.
The core problem of most Internet companies is to maintain user growth. From advertising and marketing, to the promotion of LTV conversion of new users, to the recall of lost users, these are the key indicators that are most concerned in business management.
Through the combination of big data platform and AI modeling ability, we can help enterprises to increase the RoI of advertisement placement by more than 20% and the efficiency of recalling SMS users by more than 5% through the advantages of intelligent algorithms. Refinement operation efficiency increased by nearly 30%. These have directly brought obvious business benefits to Internet companies.
Capability 5: Data security and privacy computing, escorting data collaboration and sharing
Data collaboration has become the trend of technological development, and data security and privacy computing have become essential capabilities of big data platforms.
Alibaba Cloud has preset a variety of secure computing methods in the big data computing and analysis engine, as well as the commonly used federated learning algorithms accumulated by the group’s business, to achieve manageable and controllable data access on the end-to-end data link. , traceable. At the same time, the ultra-large-scale distributed cloud-native architecture is closely combined with the built-in MPC, TEE, FL and other technologies, and the data development and governance platform capabilities of DataWorks are used to jointly orchestrate and coordinate privacy computing tasks with all other data tasks. management to achieve complete enterprise-level data applications.
4S standards and 3 major characteristics of big data + AI platform
The scenario-based intelligence generated by the fusion of big data and AI technology can help customers bring business value, but how to replicate such value in an enterprise at a low cost requires the power of cloud native and the platform layer.
Through platform capabilities and scenario-based intelligent services, enterprises can easily transform business workflows, and ultimately achieve business value enhancement in the interface of user growth, operating cost reduction, operating efficiency improvement, and risk control and security. The platform capability is built through the three characteristics of “deep”, “through” and “penetration”.
“Deep” means that in many scenarios, it is necessary to pursue the ultimate scenario performance through deep optimization of software and hardware collaboration. In-depth governance of disorganized, unsourced data. In the case of tight AI computing resources, through deep optimization of the algorithm framework, the cost reduction and efficiency improvement of large model training and inference can be achieved.
“Communication” means that big data and AI analysis are connected together. Big data is more used for AI applications, while AI is more dependent on big data systems. User-owned lakes and warehouses can be well connected from the storage, metadata and computing levels to truly achieve a single piece of data and diversified computing. Through federated learning and multi-party secure computing, secure interoperability under the condition of data ownership is realized.
“Transparent” means a lot of scene-based intelligence that can be used out of the box. According to the specific user scenarios of customers, scene-based intelligence with industry attributes can be precipitated from the levels of data, industry models, industry analysis templates, and typical algorithm frameworks. Realize the closed loop from data to business.
At the same time, we define big data and AI platforms through the “4S” standard. “4S” are: Scale, which means that the platform must be able to carry big data, large applications and large models; Speed, the platform must have the ultimate operation, development and operation and maintenance efficiency; Simplicity, the external program interface or service interface of the platform must be It is standard, simple, easy to understand, and can be called like a function; Scenario, the last and most important point, platform capabilities are born from scenarios.
Alibaba Cloud Intelligent Computing Platform helps customers build such a big data and AI platform, or provides convenient cloud services for thousands of cloud customers through such a platform.
So far, around the full-link life cycle of data, after years of Alibaba Group’s applications and hundreds of thousands of cloud customers, Alibaba Lingjie has formed a series of competitive products, including big data + AI platform products (cloud native big data computing Services MaxCompute, open source big data platform EMR, data lake construction DLF, big data development and governance platform DataWorks, real-time data warehouse Hologres, real-time computing Flink version, machine learning platform PAI, intelligent search OpenSearch, etc.), and rich ecological products (DDI, Elasticsearch, Cloudera, Confluent, Starburst, etc.).
In addition, Ali Lingjie provides users with more out-of-the-box standardized intelligent service interfaces and scenario-based intelligent solutions based on scenario-based requirements, helping enterprises to enhance business value. At present, the platform has penetrated into all walks of life, bringing the intelligent transformation of the platform to customers in different industries such as the Internet, finance, manufacturing, telecommunications, and education.