Showing posts with label databases. Show all posts
Showing posts with label databases. Show all posts

Rise and fall of DBAs: The tyranny of the ORM

Friday, September 14, 2007

There was a time when DBAs dictate how developers should use Their databases. It was early and mid-nineties and Their Word was The Truth. Those poor guys building client-server applications had to bow down before Him/Her and implement the Business Logic inside The Database Manager. Database hardware was expensive, but His/Her Highness could size the system easily because the number of clients connected was predictable. And that's what the IT Manager wanted; predictable figures.

Late nineties came and Web Application started to rule. Suddenly, the number of clients were unpredictable, the behaviour of the applications was radically different and for the first time in years, the DBMS was the bottleneck and The Master of the Data did not realize on time. So He/She asked to the IT Manager for better and bigger hardware.

IT Manager: Will it fix the performance problems?
DBA: We don't know, it depends on the number of clients on peak, the number of users in the critical path, the TV ads campaigns with the URL of the company...
IT Manager: So the bottleneck is the database?
DBA: Yes, the Web servers are almost idle...
IT Manager: Idle? My god! Can they do anything to relief the database?
DBA: Well, we can try move some business logic out to the application layer. If those lazy and hairy web developers could write good SQL...

And then a new trend started: move out of the database as much business logic as possible. The web developers started to code the SQL inside the application to relief the database.

The DBAs were upset, because these web developers coded really poor SQL and they don't even know about triggers, functions or views that could reduce the size or complexity of some of the statements. And they had to get involved in development almost like a Quality Assurance Team, rejecting poor SQL or aggressive tasks against the RDBMS. That was really disgusting!

By nature Good Developers are lazy. Writing SQL was like a pain in the ass, and wrapping the results in object models took a lot of time. The laziest of them all started to write little applications to create automatically the SQL and the code to return the results as object models. These little tools became in Object-Relational Mapping libraries. The Technical Leads explained to the IT Managers how they could save time using them, and the IT Manager was happy because he could show to the CFO and CEO that they were doing something to make that lazy web developers productive.

Then the nightmare of DBA started.

The DBA realized that the number of queries to the databases were increasing, and the complexity of the query was lower. A lot of simple queries... What's going on here? Sounds like a new intern writing crappy code...

DBA: Hi, is there a new intern in the team writing crappy SQL?
Development Lead: No, we don't have new people. What is going on?
DBA: Then somebody of your team is writing really poor code. I can see tons of queries as simple as a line of SQL!
Development Lead: Ah yes! That's the new ORM library we are using!
DBA: Well, that library is rubbish. It generates tons of shitty SQL. Ask the reseller to give your money back ;-)
Development Lead: No, it's opensource and free. And I see a lot of advantages using it. I think we should discuss it with the IT Manager.

IT Manager: Why this tool writes crappy SQL?
Development Lead: It does not write poor SQL. It creates simple queries, that's all. We can configure it to create more complex SQL, but sometimes the number of objects explodes and the application run out of memory.
DBAs: Why don't you use the already created Views?
Dev Lead: We cannot map Views to objects.
DBAs: So you are not going to use views anymore?
Dev Lead: If we can avoid them, yes.
DBAs: No views!? How am I going to optimize your complex queries?
Dev Lead: Well, we are not going to write complex SQL anymore. The ORM will do.
DBAs: And what about triggers and stored procedures?
Dev Lead: Out. Triggers and ORM does not match very well because it's hard to keep under control the changes performed in the DBMS. And Stored Procedures sucks, we have Java.
IT Manager: So, if there is no views, triggers and stored procedures, the DBAs can set your focus on keeping the system healthy and optimized, but not coding processes. Right?
DBAs: Errr... that's not exactly right...
IT Manager: And we can also transfer all these development tasks to the development team. Right?
DBAs: Correct, but...
IT Manager: And if the DBMS has no heavy processes inside, we can save some bucks in hardware, isn't?
DBAs: Probably yes, but if I don't track what is going on in the database system the system can die!
IT Manager: Of course! And that's what you are suppossed to do from now on. Help the development team to optimize how the ORM tools and libraries performs before going live.

Then the DBA role changed and became a slave of the ORM tools.

She could not recommend any more how to code the SQL code because it was an automatic process. And she saw how there was no gain in the performance because of the use of these ORM tools, even worse, she saw how the tuning of the database now was radically different due to the different characteristics of the SQL code. He thought that may be these new problems in the performance could make IT Manager to roll back to the old way, but it never happened. Actually, the IT Manager asked if it was necessary to have that very expensive Enterprise licenses if triggers, views and stored procedures were declining. Moreover, the application developers were implementing some smart caching strategies that really reduce the overhead in the database, saving millions to the company in hardware and licenses.

IT Manager: Should we try an opensource database?
DBA: Damn Gaving King...

Data normalization is not for sissies, it's just common sense

Wednesday, September 05, 2007

And I must be the biggest sissy of all. There is a lot of buzz about designing your databases thinking in data normalized or denormalized. Now that everybody is building the next Google or YouTube the database design has to support trillions of transactions per second and millions of terabytes, of course running on a multidimensional computing grid of zillions of nodes of databases written in Erlang... <ironic/>
Data normalization is common sense because helps you to avoid the Data Inconsistency Hell when data is repeated across different tables. Only a small fraction of us will have the chance to work in an environment with more than a few thousands concurrent users. And in such an environment the running cost of a standard database architecture is lower than the cost of restricting how your Software Architects must design and your programmers must code the applications. That's what is all about, economics. It's cheaper to have a fully normalized design with no chance of repeating data across different tables and scale the hardware when you need it. If you are not lucky you will probably will have to denormalize part of your datamodels due to performance reasons. This is very common and does not mean to design your core database model denormalized. You can denormalize the database model and create it in another schema, running some batch process to populate it when the impact on the users is lower. This is a typical solution for reporting tools.
But, if you are the lucky (or unlucky, who knows...) enough to work in environments like YouTube or Google with zillions of users then it make sense to use denormalized database models. The reason is again economics: It's cheaper to restrict how your Software Architects must design and your programmers must code the applications than scaling a non-standard database architecture (being a non-standard database architecture a massive cluster of database servers keeping data consistent among them). And maybe it's not possible to scale databases without strong constraints in the application development.
I think this is a simple axiom coming from my experience, the more exigent demand of database resources, the more restrictions on the application development. As a server side developer, caching techniques and in-memory smart data hierarchies (Web Sessions or Stateful EJBs) are the first step to improve the performance of an application. Very soon the Architects start to design keeping in mind that someday they will have to include caching or cluster data replication, for example. I know this is not a very Agile approach, but depends on the Architects' experience to provide a good design thinking in performance or not.
Finally, design fully normalized and move to a denormalized model only when you need it (this is a Agile approach).