The data warehouse system has long been an important part of the enterprise IT architecture, especially for traditional industries that highly depend on digital technologies. In fact, industries such as banking have leveraged that for some time now.
The data warehouse plays an increasingly critical role, both in traditional supervision, reporting and in business intelligence, which has been a heated field in recent years.
With the rapid development of mobile technologies over the last few years, the proportion of online, mobile, and scenario-based financial services is increasing, resulting in explosive growth of data volumes and diversity of data types, especially with mobile internet.
For example, the processing capability of a traditional data warehouse platform ranges from hundreds of Gigabytes (GB) to hundreds of Terabytes (TB). However, a large modern bank generates an average of several or even dozens of TB data every day, and the amount of new data generated each year reaches the PB level.
In addition, as banks are deeply involved in customers’ scenario-based life, a large amount of unstructured data is generated every day, such as transaction logs, images, and audio and video. This data can pose severe challenges to the traditional data warehouse platform that is used to process a single structured data type and a limited amount of data. Therefore, it has become a major concern for IT managers to reconstruct the existing data warehouse platform to process massive and diversified data and support data-driven service innovation.
Traditional data warehouse platforms face the following challenges:
- High costs: The initial investment or subsequent platform expansion of the traditional data warehouse platform accounts for a large proportion of the IT department’s expenditure.
- Lack of real-time analysis capabilities: As the data volume and user scale increase, the traditional data warehouse cannot ensure the SLA of real-time analysis (such as real-time anti-fraud).
- Lack of diversified computing capabilities: Traditional data warehouses are mostly relational databases, which have weak capabilities of processing semi-structured and unstructured data.
- Lack of online capacity expansion capability: Traditional data warehouse platforms usually require that the existing service systems be suspended during capacity expansion. As the data scale increases, each capacity expansion takes too long, creating great challenges to service continuity.
- Platform decoupling: Traditional data warehouse platforms and the appliance architecture do not meet banks’ strategic requirements for IT architecture decoupling.
The future data platform will present the following five trends:
- Open and distributed architecture: An open platform combined with the MPP architecture has become the first choice for larger financial institutions. The distributed open architecture can help financial institutions decouple software from hardware, provide massive data processing capabilities, and support linear expansion of the platform.
- More real-time service decision-making capabilities: With the requirements of banks for real-time service processing, especially as bank customers ask more of real-time and personalised service experience, real-time analysis and processing have become the basic requirements of banks for building data warehouse platforms.
- Ability to process more types of full data: Data platforms must be capable to store, process, and analyse data effectively, including structured, semi-structured, and unstructured data. Based on diversified technologies, diversified data brings more value to data mining and analysis of financial institutions.
- Always-on services: Financial institutions hope that services will not be interrupted due to system expansion or upgrade. Continuous (24/7) online services have become a rigid requirement of banks for mission-critical service systems.
- Integration with the Artificial Intelligence (AI) platform: Financial institutions are exploring the application of AI in more and more fields. The application of AI depends on data. Therefore, the integration with the AI platform needs to be considered at the initial stage of planning and building the data platform.
The converged data lake becomes the main direction for financial institutions’ data platform construction.
By integrating the distributed data warehouse platform and big data processing platform, the converged data lake has the capabilities of processing structured and unstructured data at the same time, processing real-time data, and processing offline data in batches. In addition, the converged data lake leverages the distributed linear expansion capability to meet the requirement of processing massive amounts of data. With the rapid development of mobile and online financial services and improved customer experience, the converged data lake has become an important platform for banks to build customer-centric and scenario-based finance and implement fast service innovation.
Huawei Converged Data Lake Solution
Huawei provides the converged big data platform (FusionInsight Hadoop), distributed data warehouse platform (GaussDB A), AI development platform (FusionInsight AI), and converged data storage. In addition, the Huawei-developed data virtualised platform and data enablement system (DAYU and Data ROMA) are integrated to provide end-to-end solutions for customers in the finance industry, including front-end data access, data storage, data processing, data analysis, and data governance. With Huawei's full-stack hardware, Huawei can help industry customers optimise performance from chips to platforms, helping customers build data analysis and processing platforms with ultimate performance and accelerating service innovation.
Typical Cases of Huawei Converged Data Lakes Solution
With the rapid development of mobile internet technologies, especially the penetration of mobile payment technologies into all aspects of life, traditional financial institutions in China are facing fierce competition with FinTechs. For example, a major bank in China specified its data-driven strategy in 2015. To adapt to this strategy, this bank initiated the selection of open architecture-based distributed data platforms to cope with the great challenges brought by the surge of service data and rapid service innovation.
The bank faced the following challenges before it adopted a distributed data platform:
- Increasingly high investment cost of data platforms: The bank was under tremendous pressure of cost both in the early-stage platform construction and the follow-up capacity expansion. For example, in the 10 years from 2005 to 2015, this bank paid a data warehouse vendor CNY1 billion, with an average annual maintenance fee of nearly CNY10 million.
- The traditional closed architecture: The appliance architecture was in conflict with the bank’s technology decoupling strategy. The finance industry was highly dependent on innovation of digital technologies and therefore could not accept vendor lock-in.
- Online upgrade of services: Due to the increasingly high customer demands for service experience and the strict requirements on timeliness of service reporting, the traditional database warehouse platform could no longer satisfy the need for online capacity expansion.
- Lack of real-time data processing capability: The traditional data warehouse platform is mainly based on offline analysis and processing, and lacks real-time data processing capabilities in internet scenarios. In particular, the traditional data warehouse platform cannot cope with real-time processing of anti-fraud data flow.
- Lack of processing of semi-structured and unstructured data: The traditional data warehouse platform mainly processes relational structured data. However, it lacks sufficient processing and analysis capabilities for diversified data generated in mobile internet scenarios, such as log data, voice, and images. Under the premise of the data-driven strategy for banks, the ability to analyse, process, and explore diversified data has become one of the primary considerations for financial institutions in selecting data platforms.
The bank chose Huawei’s converged data lake platform after examining platforms of multiple vendors, and there are two main reasons. First, Huawei has had years of experience in data platform technologies, and Huawei big data platform has a large number of applications in Huawei’s global businesses. Second, Huawei has converged the data warehouse platform with the big data platform in an innovative manner, which conforms with the development trend of "lake-warehouse integration" of the data platform.
In addition, during the construction of the data platform, Huawei provided a complete data migration solution to ensure smooth data migration from the existing platform to the new one, achieving zero data loss and zero service interruption.
After survey and analysis, migration solution design, solution verification, and solution implementation, Huawei helped the customer complete the replacement of all traditional data warehouse platforms by June 2019, complete the migration and deployment of nearly 1000 nodes and over 2 PB data in the production environment, and perfectly implement secure and smooth service migration.
With the accelerated digital transformation of global banks and the implementation of data-driven strategies in mainstream banks, the converged data lake will gradually become an important platform for the banking industry to achieve service innovation and implement data as assets.