Dave Welch (@OraVBCA), CTO & Chief Evangelist
House of Brick has a growing set of collateral, webcasts, and blog posts on our experience of replatforming to AWS. On the occasion of re:Invent 2019, this blog post will focus on the experience of AWS customer Amazon.com. First we’ll review the replatform of the Amazon.com data warehouse off of Oracle. Then we’ll review Amazon.com’s successful unload of its ~7,500 non-bundled Oracle databases.
I suggest you organize three lunch and learns to play these sessions. It so happens that re:Invent has made what I consider to be a great gift: no fee registration or even login is needed to view these sessions.
Amazon.com’s Data Warehouse Replatform from Oracle to AWS Redshift
That’s what I think the session should have been titled to grab appropriate attention. Here’s the session’s actual title: ‘How Amazon leverages AWS to deliver analytics at enterprise scale.’ (Yawn.)
Amazon’s Oracle RDBMS data warehouse had several hundred petabytes of data: ~900K jobs; ~38K tables; and ~80K active users.
As Amazon.com is an AWS customer, they used the same APIs and documentation to pull this off as other AWS customers.
Amazon.com’s data warehouse operation had invested heavily in specialized hardware, routers, and switches. Data and compute were coupled. It became unreliable. They spent hundreds of hours moving data around, including through sharding, just to keep the system running. They couldn’t get hardware on demand, and licensing was expensive. They needed to find a better solution and quickly. Coming from a traditional RDBMS, they needed to evaluate big data use cases like those their customers were beginning to use.
They chose to move to AWS solutions DynamoDB (no SQL), Aurora, and Kinesis.
They ran the legacy and new data warehouses in parallel during the transition. Loads were modified, so that both the legacy and new warehouses ran in parallel for an extended period of time.
While the project was daunting, it was also highly successful. They pulled it off in two years. No project-wide Gantt chart was used. I was particularly impressed by the organizational dynamics discussed in the session. They had 90% successful query conversion on the first pass, which left 10%, that was clearly the presenter’s favorite part of the talk. We’re talking about all the people who didn’t want to migrate, didn’t have time to migrate, or were looking for features not in the new system.
The coordinating team didn’t get escalation emails. Rather, they got brag emails. “Hey, VP! We’re done early! Come to our launch party.” One team said they couldn’t hit deadline. When asked how much additional time they needed, they said a week. (Let alone the fact that a quarter of slack had been built into the system.) That’s not how tech is supposed to work. So, the central team asked, “How did you do it? You had said XYZ would break you.” The answer: “Yeah, well, we solved it. Don’t worry about it. And we shared the solution with other teams. It’s working great!”
The presenter maintained that central IT used to be the bottleneck. It was always a lot of blood, sweat, tears, and risk. But instead, Amazon teams troubleshot their own way without coming to the central team because AWS technology makes that possible. That was a complete revelation to the central guidance team.
The bottom line message: be ready. “The cloud moves far more quickly than you can.”
This was a data warehouse ecosystem with 1,700 different teams publishing, 3,000 teams consuming, and 20,000 data sets in active use. Policies and controls were put in place six months before cut-over to prohibit the introduction of new workloads into the legacy system.
Amazon.com Unloads its ~7,500 Oracle Databases
The story of how Amazon transitioned from Oracle Database to AWS tools was split across two sessions:
- [DAT359] How Amazon.com migrated its applications from Oracle to AWS databases
- [AMZ301] Amazon.com: enterprise database migration at scale
Paramount in the case studies was the experience of Amazon.com replatforming their ~7,500 Oracle databases not required by third party applications to other AWS database technologies. In August 2018, Amazon.com announced that this project was underway with a goal of completing the move by early 2020. Oracle’s Larry Ellison wished them luck, noting they had attempted the Oracle database unload previously, but had failed, and what hard work it was. AWS Professional Services operative Doug Booth, Principal Business Development Manager for AWS, described Amazon.com as Oracle’s largest customer many times over, which is certainly credible (but, I haven’t asked Doug how Oracle customer is defined). Amazon was experiencing substantial challenges with scalability and uptime.
- Cost savings: 90%.
- Performance/throughput improvements: 40%.
- Substantial uptime improvements.
- Scale Up/Scale Down (as opposed to scale up and stay scaled): introduced for the first time. (Think Black Friday, and now even more importantly, Prime Day.)
Now all of the former Oracle Database administrators are happily employed in and out of Amazon.com as transitional mentor architects and in similar professionally invigorating roles.
In conversation after the third of these sessions, more information came to light. Upon co-presenter Thomas Park’s July 2016 arrival at Amazon.com, people sincerely told him he’d have to hire a thousand people to get it done. Rather, he went in search of hungry Amazon professionals to provide leadership in the effort. Oracle Database administrators expressed both viability concerns and concerns for their professional futures. Individual business units and workflows were given the architectural option to choose which of the six AWS database technologies (as well as other AWS technologies) would best suit their purposes. Keep in mind that this effort succeeded without any central technical triage or escalation team.
I asked co-presenter Thomas Park, Sr. Manager of Software Development for Amazon, if they were on an Unlimited License Amendment (ULA) with Oracle. “I can’t comment on Oracle licensing,” he said with a smile. I expected his answer, but I still had to ask. “Then let’s discuss something you can talk about,” I said. “What about refactoring to deal with Oracle’s supplied PL/SQL packages? For many enterprises, that’s the 800 lb. gorilla in the room of unloading Oracle Databases.” Thomas said that each of the workflows evaluated how to approach the business and functional need with available AWS database types and tools. Through that process, he said the supplied packages issue became moot. This could give the impression of being both a non-intuitive and an unintentional circumvention of porting PL/SQL directly into PostgreSQL. But it could also be thought of as being something other than the Minimum Viable Product approach that is so common and so successful in such refactoring.
With that, Amazon.com had moved away from Oracle’s multi-purpose RDBMS to multiple, largely single-purpose database technologies—the very AWS technology direction that Ellison bashed repeatedly in his Oracle Open World 2019 keynote. I would think if the massive Amazon.com was indeed driving square technology pegs into round holes, their attempt to spin the comparative merits of the solution would become obvious soon, if it wasn’t already. Such transparency would be increased by individual business units and workflows’ enjoyment of architectural liberty.
The last of the three Amazon.com get-off-Oracle sessions had finished at 5:35 pm. Our group of ten or so who lingered longer, including the presenters, were still discussing it 40 minutes later. Presenters Doug and Thomas were livelier and more animated with attendees off stage. They were clearly genuinely interested in conference goers as individuals, as well as their organization’s specific challenges and opportunities. I appreciated this gift.
Does Amazon.com have an IT secret sauce to pull this off given their scale? That’s not the way the story reads, given the departmental architectural independence. Rather, mix that independence with their scale, and one could imagine an exponential rise in project risk. But, I’m inclined to think that we are looking at one of the world’s most intriguing master classes in IT organizational behavior and transformation. There appears to be a wealth of approachable, detailed information on this remarkable accomplishment.
Don’t forget, we’re not talking about a carefully, relationship-maintained reference customer – this is Amazon.com.