One of the reasons for the success of the internet is that it is a text-based protocol. Of course there were others – the use of TCP/IP and HTML being also on the list – but I find it extraordinarily helpful to be able to look at the data to see whether it looks correct. On my recent BPEL/web services project I made quite extensive use of the the port forwarding feature of the Charles proxy. There are other proxy tools out there, but this one worked well for me. I recommend that you use the trial version first because a license is rather expensive, but I am certain it saved me at least a few days of work so the license is worth the price. This is how it works: Rather than pointing my web service client at the server, I point it to a port that Charles is listening to, and Charles forwards the message to the correct port on the server. When the server responds, it responds to Charles, which proceeds to forward the response to the client. Charles tells you everything you need to know about the messages being sent back and forth. It’s a classic man-in-the-middle attack that you do to yourself! For example, here’s the SOAP request that was sent to my server:
POST http://127.0.0.1:7501/webservices/2009/Integrity/ HTTP/1.1 Connection: keep-alive Content-Type: text/xml; charset=UTF-8 Host: 127.0.0.1 SOAPAction: "" Transfer-Encoding: chunked 298 <env:Envelope xmlns:env='http://schemas.xmlsoap.org/soap/envelope/'><env:Header></env:Header><env:Body><getItemsByCustomQuery xmlns='http://webservice.mks.com/2009/Integrity'> <arg0 transactionId='unique string' xmlns=''> <int:Username xmlns:int='http://webservice.mks.com/2009/Integrity/schema'>administrator</int:Username><int:Password xmlns:int='http://webservice.mks.com/2009/Integrity/schema'>xxxxxxxx</int:Password><int:QueryDefinition xmlns:int='http://webservice.mks.com/2009/Integrity/schema'>((field["ID"] = 41) or (field["ID"] = 21))</int:QueryDefinition></arg0> </getItemsByCustomQuery></env:Body></env:Envelope> 0
Whenever you encounter a mysterious problem and you’ve eliminated all the usual suspects, then there is no better place to start than to look at the raw data. A quick inspection of the above message would tell me if something is fishy.
Recently at my work a question has come up regarding an architectural trade-off: Should we add some verbose information to the data model to support a particular feature, or can we support the feature by adding just a little information and calculating the rest. I can’t really get into the details, but the basic problem is straightforward enough: More data, or more computation. There are two ways that calculations can augment a data model:
- Translate raw data into a form more suitable for the consumer of the data (whether the consumer is a human, another database, or an automated agent); and
- Resolve a reference to data by filling in the referenced information.
The first scenario is helpful when pulling information out of a transactional database, especially if the computation can be pushed onto the client. Scalability is the primary consideration. As the computational requirements increase you will eventually need to either delegate the computations to a different computer orĀ start storing the computations so that they are performed only once. The second scenario arises when data from more than one repository are being presented in a single view or report, or in a database where a simple multi-table join will not work. Joins are normally quite efficient. I’ve seen queries that join 10 tables together that do not seem to suffer from performance issues, and I believe you can go up to 15 tables. As you go to higher numbers of tables, however, and as the nature of the data changes within those tables over time, you may encounter situations where the server chooses the wrong optimization plan – opting to do a full table scan, for instance, when an index lookup is more appropriate. When you know something about the database that the server doesn’t seem to realize, then you can often construct better queries or multiple queries that are more reliably efficient than a multi-table join.
The decision of whether to add data to the database – in effect, denormalizing it to some extent – is partly informed by the need to reduce the overall complexity. As with the internet and the Charles proxy, the ability to see the raw data is always useful. If the raw data are right there – in your face, as it were – then you can solve problems quicker and understand the big picture better. But what does raw really mean? Data can seem to be “right there” even if it’s just a trick in the presentation. Think about it – just the notion that you can read a file from the file system is a delusion. What is a file? From the user’s perspective it is an object you can navigate to in a hierarchical folder and which supports the “open” gesture. If you had a clever GUI that made rows in a database appear to be separate “files”, and the GUI supported the “open file” gesture, which would present the contents of the “file” in a “text editor”, then you have duplicated the user experience of reading a file from the file system using a non-OS architecture. Therefore, when your goal is to reduce complexity all you need is a user interface that can present the important bits of your technology in some suitable format – text, image, Fourier transform, etc. – in your design, development and debugging environment. If there is no suitable user interface, then you must either build one, or consider making your data model simpler.
The world of technology advances by making each new technology seem like a simple construction layered on top of supporting infrastructure. It doesn’t matter how abstract or complex the supporting layers are, as long as they are solid and reliable. When I look at the text flying between my client and server I can for the most part assume that any problems I encounter are in the text or the text processing layer, not in the transport layer. This allows me to focus on the task at hand.

[...] Two paths to reducing complexity in your data model [...]