Personal tools
You are here: Home Articles Jun 2005 Interview with John Merrells
Navigation
 
Document Actions

Interview with John Merrells

by paul last modified 2006-01-20 19:06

A chat between Paul Everitt and John Merrells. John was a lead engineer on the Netscape Directory Server and later, the founder of the Berkeley DB XML project at Sleepycat Software. He is now an owner and director of Parthenon Computing, a software engineering services company in Oxford, England.

Q: Let's start with a little bit of background.

JM: I've been lots of places and done lots of things. I'll start with my involvement with LDAP.

I joined Octel around 1994 to work on a voice messaging system. It was pretty blue sky stuff, as the voice messages were stored as emails. At that time, directory systems mostly existed as part of email systems. As such, they were hard to access and extend.

As part of this, I spent time at Microsoft with the exchange team doing technical liaison work for Octel. One of my Octel colleagues joined the directory team that had recently formed at Netscape and I followed along, as Netscape really understood how directory could become central to the application infrastructure.

I spent several happy years there, but when the AOL/Sun/Netscape deal landed for the server products, I started thinking about the next layer of infrastructure, as directory was being sedimented and commoditized.

Customers were trying to make use of our directory server in lots of unintended ways. That’s OK, as that’s what innovation is all about, but only within the constraints that the LDAP protocol and information model allow. People had a tough time when they wanted different transactional semantics and richer information modeling.

Q: Is this how you got involved with Sleepycat and Berkeley DB?

JM: Yes, at Netscape, my team needed a database. The original LDAP server implementation from the University of Michigan was based on Berkeley DB. It worked well, but wasn't performant nor reliable enough at that time. We needed, for example, transactions. Otherwise, we could map the LDAP information model down onto a data model of our own design with minimal overhead and thus great performance.

We talked with Keith and Margo (founders of Sleepycat), and did a deal that launched Sleepycat, around 1996.

Q: So, how did your move from LDAP to XML come about?

We were watching XML as it gathered steam. From the perspective of the LDAP universe it didn’t appear to make much sense: ‘Yet another text based file format when we’re quite happy with LDIF’. (RFC 2849)

At that time the LDAP working groups at the IETF were working on many RFCs, and getting bogged down in schema discussions. Even an attempt to define the schema for something as seemingly simple as a ‘Person’ proved to be an almost endless rat hole of complexity and politicking. (RFC 2798)

By contrast the XML community seemed to be succeeding in standardizing common schemas. At first I found this perplexing, as the technical problems were no different. But, XML had solved a social problem. XML was important because people thought it was important. A wave of self-fulfilling hype had convinced people to form and participate in industry consortia to define schemas for their business objects.

In concert we observed the codification of text-based protocols on the Internet. LDAP is a binary protocol, based on the ASN.1 syntax. It’s simple by protocol standards, but a binary protocol imposes certain requirements on the developer of any client software. They need an SDK with an API that maps down onto the protocol, or they have to write their own. Implementing and debugging binary protocols involves packet sniffing and quite a lot of experimentation. As network capacity kept increasing rapidly and software components became more distributed the need for simpler protocols that were accessible to non-rocket science programmers increased. The obvious syntax for text-based protocols was XML.

Our interpretation of this was that if XML was going to be on the wire then gateways would be needed to map from the XML protocols onto the old world binary protocols. But, gateways are inefficient, and the internet always routes around any inefficiency, so why not go straight to a native solution. XML everywhere: XML on the wire, XML in memory, XML as the application information model and XML on the disk.

Q: How did Sleepycat get involved with XML?

JM: A couple of us got together and, realizing that Netscape/Sun/AOL wasn't the right place for these ideas, put together some plans. We pitched the idea to Sleepycat. They were interested because they wanted to move up the product stack. The core Berkeley DB product provides a low-level API, which the developer must customize with an information model, data model, indexer, and query processor. That means there are many policy decisions to be made, and quite a lot of sophisticated code to write.

Thus was started the Berkeley DB XML project: a layer above Berkeley DB that provides a higher-level API. It uses XML for the information model and has a custom data model suited to storing and querying XML. The query language is XQuery and the query processor is sophisticated enough to include a query plan optimizer and cost driven execution planning.

So I designed and implemented DB XML at Sleepycat and built an open-source community around it.

Q: How long were you with Sleepycat?

JM: Around two years. I decided it was time to move back to the UK and moved my involvement in DB XML outside of Sleepycat. I founded a company in Oxford called Parthenon Computing.

Q: How does Parthenon relate to Sleepycat?

Parthenon provides software engineering services to a number of companies, including Sleepycat. For example, we were instrumental in designing and implementing the XQuery processor for version 2.0 of DB XML, and are continuing to provide Sleepycat with resources to improve it further.

Our staff have expertise in building XML processors and in working with XML protocols. My co-founder, Gareth Reakes, is a VP of the Apache organization and is the chair of the Xerces project. His team at Parthenon have implemented pretty much every recommendation that’s come out of the W3C, and are also working with implementations of the OASIS specifications for SAML and XACML.

Q: How does Sleepycat feel about Berkeley DB XML?

JM: They are still investing in it, as the project has been successful. We built an open source community around the project and the mailing list is more active than most other XML database projects. Sleepycat has sales traction and Berkeley DB XML 2.0 offers enough functionality that people are willing to offer up money for it.

Review: http://www.infoworld.com/article/05/05/23/21TCxmldb_2.html

Q: How does Berkeley DB XML fit in with other engines such as Mark Logic?

JM: I see the primary difference as data-orientation versus document-orientation. [See Ron Bourret’s definitions here: http://www.rpbourret.com/xml/XMLDatabaseProds.htm] The DB XML data model and query processor were designed for data-oriented XML. It’s a sophisticated processor that works hard to map the users’ intent onto a declarative query plan that can be optimized. The Mark Logic query processor is more focused on its superb full text indexing and query support. It’s also a great rapid portal building tool, as the entire user interface can be described in XQuery.

Q: For CMS projects, what is attractive about the technical architecture of DB XML?

JM: DB XML is much better for content oriented applications than a relational database or a directory of files on a filesystem.

Typically the documents in a CMS application have some metadata fields that most of the queries are directed against. DB XML supports most of the native data types you'd expect to find in that metadata. You can index the metadata for the appropriate type, prepare the queries, and then you get super-quick access to the documents.

With DB XML 2.0 you now get the node-level storage model that provides support for huge documents. It can be even smarter about its query processing and indexing of in document updates.

In general people should expect document collections of over a million items to work just fine. In a previous project we took the underlying system up to 100 million items.

Q: Final question... in general, how will this change how apps are planned?

JM: First, the Internet just isn't finished just yet. XML infrastructure is making distributed applications a reality at last... the democratizing power of the Internet is at it again, this time with applications instead of static information. People are now able to do really interesting things with semi-structured information, especially in the social networking and micro content publishing fields. [flikr, delicious, etc]

‘Digital Identity for the Internet’ is my current passion, but that’s a whole other essay.

Source

http://www.zeapartners.org/articles/200506/johnmerrells