30 Years Worth of Data Processing Experience
Below are a few highlights of my long-standing history of data processing focusing on hands-on solutions that I provided to the client.
Investments / Insurance
I designed a Databricks solution architecture for the internal transactional data within the E2E business scope for the product called segregated funds (worth $95 billion to the company). There were 20+ systems involved in sharing the data, including numerous external vendors which relied on point-to-point data sharing and distribution.
The greatest singular achievement: The Databricks Architecture was the first comprehensive Enterprise data platform for a full product set of transactional data within the company that broke the standard eneterprise data sharing pattern of the client of using batch files point-to-point to share data between a producer and a client.
Wall Street
I managed (with a team of 7) a global reference data repository for the second largest investment bank on Wall Street with 5 million equity products, 20+ million fixed-income products and 50 million corporate action records in production databases (one per product type) with 50 global replicas. It was all master data that had to be correct 24 hours a day, 7 days a week. We didn’t use any COTS ETL-type system such as DataStage, Informatica or Ab Initio as these were not “agile” enough to deal with the volume and variety we needed. Instead, we developed dozens of custom programs that could support hundreds of data streams carrying hundreds of gigabytes of data all across the enterprise working separately and independently but with a single overall goal, to have “the right data in the right place at the right time”. Some programs consisted of a few lines and ran for a few seconds, some were composed of up to 10 sub-modules and ran for 3 full days (options expiry weekend).
The greatest singular achievement: I fixed (using a script) 10 million historical equity records in a production database (which was a regulatory failure) during North American production hours with no issues, failures or downtime in a system where a 1-second latency in data availability could cost millions of dollars in losses.
Commercial Banking
I rearchitected an ETL enterprise system of registered businesses to double its throughout during its 10-hour runtime using several different methods including multi-threading, SQL tuning, table cleansing, loading script refactoring etc.
The greatest singular achievement: I fine-tuned a 2000-line multi-statement SQL script to cut the runtime from 2 hours to 40-50 minutes.
eCommerce
I developed an ETL process that allowed my client to migrate their client’s MSSQL (national pizza chain) customer/order database with 10 years worth of data to my client’s private cloud database in 3 days as opposed to 8 months using their migration program migrating the source data 1 business record at the time. The script was multithreaded to download the data but single-threaded to upload it.
The greatest singular achievement: One of the SQL statements to insert orders into the new Oracle table was 2000 lines long because it converted all the data in the fields into XML segments.
