Can Postgres run on Hadoop

Linux and open source

CW: EnterpriseDB, commercial provider of the Open source-Database PostgreSQL, woos especially dissatisfied Oracle customers with its own distribution (Postgres Plus). As a rule, companies change their database even less frequently than their applications, because mostly critical data has to be moved. What are the reasons when companies take on this burden?

Linster: The costs. The Oracle database is a great product, it can do a lot, I was an Oracle user myself for years. But the database is very, very expensive. This is particularly annoying when users do not fully utilize the range of functions.

A PostgreSQL database is 75 percent cheaper than an Oracle system - this difference often triggers a discussion about whether Oracle is mandatory. There are areas of application for which Oracle is wonderfully suited, in other areas PostgreSQL just makes more sense. And the switch is not complicated: In a user survey, 50 percent of those questioned said the effort was low or very low. Such a project usually takes a few weeks.

CW: Which fields of application are typical for PostgreSQL?

Linster: 60 percent and more of the applications that run with Oracle today can be migrated to PostgreSQL. Our customers consider our database to be reliable enough to run business-critical applications on it. The database has reached a technical level so that it is not a problem to implement applications for which a very low downtime is critical with PostgreSQL, even if availability of 99.99 percent is required. PostgreSQL can implement databases with five or six terabytes and process tens of thousands of transactions per second.

Of course, there are also performance requirements for which we would no longer recommend our product.

Marc Linster is Vice President, Professional Services at EnterpriseDB. Before that, he worked for the video conference telephony provider Polycom for several years in the areas of services supply chain, business intelligence, customer data management and cloud solutions. Prior to that, Linster advised international customers from the USA, Canada, Europe and Switzerland for many years as a business and system integration consultant in the supply chain area. He started his career with the Avicon Group as Chief Technology Officer (CTO) and Vice President of Operations. Linster received his PhD as Dr. rer. nat. at the University of Kaiserslautern in the field of computer science. He was born in Luxembourg and has lived in the USA for many years.

CW: What are the limits?

Linster: That cannot be expressed in general numbers. But if users want to compress data, then we have to say today that we don't offer this feature yet. If customers need exactly the scalability that they are used to from Oracle's "Real Application Cluster" (RAC), then we can only deliver that to a limited extent. If there is a need to implement a failover in less than ten seconds without a loss of transactions in the event of a hardware failure, then that is still a bit too fast for PostgreSQL. But how many applications really need less than ten seconds? Not even ERP systems need that. Most applications can handle 30 seconds of failover time.

PostgreSQL: No solution for ARP applications - so far

CW: How do users find out where the limits are in their particular case?

Linster: Through analysis. You can also uncover the potential savings. Four or five years ago PostgreSQL was still a product that was used for non-critical applications at the department level, with manageable demands on availability, where the database could be down for half a day. Today, large companies are migrating their mission-critical applications to the open source database. Our solution enables switchover to a remote data center, for example, and offers failover and recovery for critical applications.

CW: What is the database actually used for, for transactional and analytical applications?

Linster: Mainly transactional, both for web services and for business applications such as billing systems. PostgreSQL is not so well suited for the analytical area, but the database lacks some of the capabilities that integrate Oracle databases, for example.

CW: Is PostgreSQL used in ERP installations?

Linster: Very little at the moment, but interest is growing. We have already been approached by ERP providers because they trust PostgreSQL to do the job of an ERP database. Now you have to work with them to make the cooperation commercially robust.

CW: Are you aiming for SAP certification?

Linster: You have to honestly say that this is problematic. The major ERP providers SAP, Oracle and Microsoft all have their own database products. They are not interested in certifying PostgreSQL. But a lot is happening with the ERP providers behind it, who often serve special market segments.

An open source database as a big data integrator

CW: If ERP is not the preferred area of ​​application at the moment, where is PostgreSQL used today?

Linster: In addition to the aforementioned web services, especially in environments with customer-specific applications, for example in the insurance sector and in the public sector. The market for databases that support customized applications is very large.

CW: The dominant database topic at the moment is in-memory. How do you position yourself in the environment?

Linster: For our commercial product "Postgres Plus" there is a solution called "Infinite Cache". It allows very large amounts of data to be loaded into the cache memory and managed. We ensure that the data is written back to the hard drive so that the so-called ACID paradigm is fulfilled (Atomicity, Consistency, Isolation, Durability). Even though we already work a lot with solid state drives, it has to be said that PostgreSQL is currently not a pure in-memory database.

CW: One of the topics in-memory was to be able to process and analyze huge, even unstructured amounts of data.

Linster: For this purpose, the database was expanded to include the Foreign Data Wrapper (FDW). The technique allows other data sources to be treated as if they were PostgreSQL tables. With this "pluggable architecture" read and write operations can be carried out. The unstructured data can therefore be stored where they belong, for example in Mongo and Hadoop memories; they are integrated into PostgreSQL via the Foreign Data Wrapper. Our database thus becomes the integration center for structured and unstructured data. Instead of creating a monster, we integrate data sources.

CW: But with that they definitely compete with in-memory solutions like SAP Hana.

Linster: I do not know. We have not yet asked ourselves this question.

Amazon cloud instead of database appliance

CW: Comparisons of performance with Hana are circulating on the web - the question is apparently also asked elsewhere.

Linster: PostgreSQL is not an in-memory database. However, PostgreSQL Plus in particular includes some features that solve problems typically addressed by in-memory databases.

CW: Ultimately, the goal for users is to analyze the data. Are there any applications that do this on the basis of their data integration platform?

Linster: There are a number of open source business intelligence (BI) solutions such as Pentaho.

CW: Another current development in the database market is appliances, i.e. preconfigured hardware and software packages. When will the first Postgres appliance arrive?

Linster: There is a cloud solution that is available practically configured like an appliance. Postgres Plus Cloud Database runs in an Amazon environment and can be used within minutes.

CW: Since the NSA scandal, it is hardly conceivable that users would transfer their critical data to the public cloud of a US provider.

Linster: That's what I thought until recently. We are currently registering enormous demand. This is also confirmed by a current Gartner study. In the area of ​​database infrastructure in the cloud, Gartner predicts a 35 percent annual growth rate. We can confirm that, to our own surprise.

CW: Does that also apply to German customers?

Linster: Unfortunately, the Gartner report does not provide Germany-specific data.

All business activities of EnterpriseDB in Bedford, Massachusetts revolve around the development, services and support for the object-relational open source database PostgreSQL. The core of the offer is a more powerful or more feature-rich version called "Postgres Plus" compared to the open source version. As a rule, it has the same version designation as the version available free of charge from the community (currently version 9.3, see EnterpriseDB revised PostgreSQL).

EnterpriseDB's most important source of income, however, is likely to be enterprise migration services. The provider concentrates exclusively on transferring Oracle databases to the open source and cheaper PostgreSQL alternative. For this purpose, EnterpriseDB has developed various migration tools that are intended to standardize and accelerate the project.

EnterpriseDB was founded in 2004 with the aim of breaking the database oligopoly of Oracle, IBM and Microsoft. This goal has meanwhile been cashed in, at the moment it's only against Oracle. On the other hand, there is a cooperation with IBM. Most recently, the company earned around $ 100 million with around 250 employees (source: Wikipedia). The German customers include Gallinat-Service GmbH, a wholly-owned subsidiary of Gallinat-Bank AG.