The RESTFulness of Single Page Applications

The frequent pairing of Single Page Applications and RESTful APIs goes well beyond the fact that both have become popular buzzwords. While the architectural styles of REST and Single Page Applications (SPAs) do not depend on each other in principle, it would be difficult to imagine either succeeding without the other in regard to the development of web applications

The reliance of SPAs on REST is perhaps the most immediately apparent. To retrieve and manipulate data, SPAs need to make a great many calls to a server using means other than the traditional pattern of page refreshes. SOAP web services have long been and remain an option. And, indeed, the JavaScript XMLHttpRequest object can return data in a wide variety of formats. However, crafting large, complex SOAP requests and parsing their XML-based results in JavaScript is both complex and error-prone. Building a useful JavaScript library for SOAP generally means making it specific to a single SOAP API. As a web developer, I tend to avoid making SOAP requests from JavaScript, instead making such calls form server-side code and sending the results to the browser through some other mechanism.

In contrast, making calls to RESTful APIs and parsing the JSON-formatted results is a pleasure. jQuery makes it all trivial. And while each SOAP API must be learned anew, RESTful APIs tend to be intuitive and predictable. As a practical matter, RESTful APIs have made the notion of an SPA feasible to a great many more development teams.

Interestingly, it is also true that SPAs make it feasible to build rich web-based applications within a RESTful architecture. While we tend to think of REST as a style of API, that is as an alternative to SOAP, REST is more than that. It is an architectural style defined by a set of six constraints, constraints intended to promote scalability and maintainability. If the architecture of a specific application adheres to these constraints, we may call it RESTful.

Let’s review these constraints:

1)    Client-Server. A REST architecture demands a separation of concerns between UI and data storage. This improves scalability by simplifying the server role and enables components to evolve independently.

2)    Stateless. Each request from client to server must contain all of the information necessary to process the request. The server may not store any session data on behalf of the client, rather the client must store all session data.

3)    Cache. Server responses must be labelled as cacheable or not cacheable.

4)    Uniform Interface. Components must communicate through uniform interfaces to decouple implementations from the services they provide.

5)    Layered System. Each layer in the system may not see beyond the layer to which it is immediately connected, providing for the possibility of many intermediate proxy servers without a dramatic rise in complexity.

6)    Code-on-Demand. Clients may download and execute scripts from a server.

In the article Principled Design of the Modern Web Architecture, the founder of REST, Roy Fielding, and coauthor, Richard Taylor, emphasize the importance of statelessness as one of the key constraints, explaining that it provide the following benefits:

  • Visibility improves because monitoring solutions do not need to look beyond a single request to understand its context.
  • Reliability improves as it is easier to recover from temporary server failures.
  • Scalability improves as servers may more quickly free resources and avoid the burden of managing session state.
  • Statelessness allows intermediary devices to understand requests in isolation, providing caching devices all of the information necessary to determine the reusability of a cached response.

This stateless constraint poses a challenge for application developers. Web applications typically require some form of session state—who is the user, where is he in the process, what are his preferences, what choices has she made. Because data stored within browsers is lost with each page refresh, server-side frameworks (ASP, ASP.Net, JSP, PHP, Ruby on Rails, Express.js for Node) manage session state on the server, generally by embedding a key with each request (via a query string parameter or cookie). The framework uses the key to retrieve session data (saved either in server memory, a database, or a distributed cache) and presents it for consumption by the developer’s code through the framework’s API. The stateless constraint of REST rules out this common pattern, which is used by a great many applications on the web today.

If REST rules out server-side session management and the browser loses session data with each page refresh, the options for RESTful web development narrow considerably. Yes, HTML5 provides for local storage capabilities, but these are not yet universally available in all browsers. The more reliable option is to avoid page refreshes, which means building SPAs. As long as a single page remains loaded in the browser, a developer may store session data in JavaScript objects. The question then becomes how do we manage session data within SPAs, which is an issue address by the emerging client-side frameworks, but that’s a topic for another day.

In short, REST and SPAs make each other feasible.

Posted in REST | Tagged | Leave a comment

The Many Ways of Escaping: Dust.js Filters

Dust.js is a powerful JavaScript templating framework, currently maintained by LinkedIn, that can function both in the browser and server-side. At the moment, I’m working on an introductory presentation on Dust.js for the Silicon Valley Code Camp. And as I was working my way through the documentation, I found the coverage of filters rather sparse, so I spent some time playing with them and decided to write up what I found.

Dust.js templates work by binding a JavaScript object to a template in order to render output. The most basic element within a template is a key, which is a tag such as {name}. When bound to a JavaScript object that has a property called name, Dust.js rendering will replace the tag within the template with the value of the name property.

But what if that value derives from untrusted user input, or contains HTML mark-up, or needs to be included within a URL? In those cases, Dust.js provides a technique known as filters to escape the value.

Filter syntax is simple, just add the pipe symbol and a letter representing the filter to the end of the key tag. For example, {value|h}.

Lets now walk through each filter assuming in each case that we are passing to the Dust.js render function the object specified below:

var obj = {value: “<p><script>alert(‘xss attack’)</script></p>”};

To keep things interesting, I’ve included in the string property called value both html markup and JavaScript.

Default

If you write a key tag without specifying a filter, Dust.js will apply the default filter. The default filter replaces the angle brackets of html tags with html entities for proper display in the browser. This has the effect of thwarting cross site scripting attacks, as the browswer will render rather than execute the JavaScript. The default filter also replaces quotation marks denoting strings with JavaScript with html entities, again reducing the risk of xss attacks.

Template: {value}

Text Output:

default output

Browser Display:

default browser display

In most cases, the default will be the best choice, but it is important to understand the other filtering options for the special cases in which they make sense.

HTML Escaping |h

The html escape filter renders content with html markup such that it appears in the browser as escaped html. It basically double escapes the html. As a result, the ampersands in the default output are replaced with their html entity equivalent (&amp;);

Template: {value|h}

Text Output:

h output

Browser Display:

h browser display

JavaScript Escaping |j

The JavaScript escaping filter renders content so that it can safely be used within a JavaScript string. The escaping prevents the output from breaking out of quoted strings and from breaking the closing </script> tag. This looks like the default filtering, except that the quotation marks are escaped with back slashes.

Template: {value|j}

Text Output:

j output

Browser Display:

j browser display

URL Escaping  |uc, |u

URL escaping renders content usable within a URL by replacing all characters not allowed by the URL specification with % and their ASCII hexadecimal equivalent code.

The |u filter passes the content through the JavaScript method encodeUR(), and the |uc filter passes content through encodeURIComponent(). The method encodeURI() expects a URL that begins with http:// and will not escape this section of the URL. In contrast, encodeURIComponent() will escape all of the content.

Template: {value|uc}

Text Output:

uc output

Suppress Escaping |s

The simplest filter to understand is suppression (|s), which turns off escaping. This allows even JavaScript code to execute as part of the page, a risky approach that should be used only with fully trusted content. Adding the output of the following tag to a page will result in an alert window.

Template: {value|s}

Text Output: <p><script>alert(‘xss attack’)</script></p>

Brower Display:

s browser display

Posted in Dust.js, JavaScript | 3 Comments

My Node.js Misperceptions

After recently studying and playing with Node.js, I realized that I had somehow picked up a couple of misperceptions that were wildly off base:

  • Node.js is solely a platform for web development similar to one of the other popular server-side web development platforms, perhaps PHP or Ruby on Rails.
  • Node.js is innovative only in that it brings JavaScript to the server.

To clarify misperception one, while Node.js provides powerful support for web development, Node.js can be used to build any manner of application. Node.js applications may access file systems, networks, databases, and child processes. Just as with Python and Ruby scripts, you can run Node.js scripts from a command line and can pipe streams back and forth to other applications in the Unix style.

My second misperception turned out to be doubly wrong.

First, Node.js does not break new ground in bringing JavaScript to the server. Netscape started on the Rhino project back in 1997, a JavaScript engine implemented in Java and capable of running outside of the browser. Even Microsoft’s ASP web framework supported server-side JavaScript, a fact that I had long ago forgotten. And to top it off, Node.js does not actually execute JavaScript, for that it relies on Google’s V8 JavaScript engine.

Second, Node.js is indeed very innovative, but its innovation lies not in its server-side implementation of JavaScript, but in how it handles concurrency—the situation wherein new requests arrive before earlier requests have completed.

Of course, this is an old problem, and all modern operating systems offer a built-in solution—multi-threading. A multi-threaded server makes a system call to generate a new thread for each request that it receives. That thread handles the request to completion while other threads handle other requests. The OS manages all of these threads, sharing out slices of CPU cycles to each as it is ready.

Obviously, multi-threading still works as it has for decades, but it does have its problems when it comes to massively scalable web applications. Managing threads is not free. Each has its own associated process, registry values, program counter, call stack. As the number of threads increases, the OS spends more and more CPU cycles managing rather than running the threads. And with web applications, threads spend the vast majority of their time waiting for IO, whether that be calls to files, databases, or network services.

Node.js takes a radically different approach, avoiding the need for OS threads by simply refusing to wait. Rather than making blocking IO calls, wherein the thread stalls waiting for the call to return, almost all IO calls in Node.js are asynchronous, wherein the thread continues without waiting for the call to return. In order to handle the returned data, code in Node.js passes callback functions to each asynchronous IO call. An event loop implemented within Node.js keeps track of these IO requests and calls the callback when the IO becomes available. Managing the event loop costs less than managing multiple threads, as it only requires tracking events and callbacks rather than entire call stacks.

Each Node.js process is single threaded, but it can be scaled to multiple processes and multiple machines just as traditional multi-threaded servers.

One might also argue that Node.js as a platform for server-side web development is innovative in its lack of abstraction; the Node.js programmer handles http requests by forming http responses (what the http protocol is all about), rather than creating pages (PHP, JSP, Asp.Net) or writing models, views, and controllers (Ruby on Rails). Personally, I have a preference for lighter frameworks or at least frameworks that do not force me into certain patterns so I certainly find Node.js appealing despite my initial misperceptions.

Posted in Node.js | 1 Comment

JavaScript Templating as a Cross-Stack Presentation/View Layer

At last night’s Front End Developers United meetup in Mountain View, Jeff Harrell of PayPal and Veena Basavaraj of LinkedIn spoke about the use of Dust.js within their organizations. LinkedIn and PayPal chose Dust.js from among the many JavaScript templating frameworks for reasons that Veena describes in her post Client-side Templating Throwdown. But the real interesting point that both speakers made was not so much about Dust.js, but how standardizing on a JavaScript templating framework across different stacks accelerated development cycles.

The presentation layer in web development generally relies on some sort of templating, which mixes html markup with coding instructions. Traditionally, this is done through a server-side technology such as PHP, JSP, or ASP.Net. More modern server-side MVC frameworks such as Ruby on Rails use templating in the view layer, either Erb or HAML.

With the rise of single-page applications and JavaScript MV* frameworks, JavaScript templating frameworks are growing in popularity. When using AJAX without an MV* framework, one would make an AJAX call to retrieve data from the server in JSON format, insert it into a template, and display the resulting HTML. When using an MV* framework (and the observer pattern), updating a model automatically causes associated views to render using the updated model and a template. One of the most popular MV* frameworks, Backbone.js, uses Underscore.js templating by default but could work just as well with others. (Veena’s post referenced above lists many of the available JavaScript templating frameworks.)

In large organizations that have been around for some time, there are bound to be multiple platforms in use, some representing a future ideal state, others representing legacy code. PayPal runs on C++ and Java and is moving increasingly toward Node.js. And, of course, most web applications use JavaScript in the browser to provide user interactivity. What this means is that an organization could end up using several templating frameworks. Similar content gets rendered using different templates, a situation that risks inconsistency in the user experience.

One possible answer to this dilemma would be to standardize on a single stack, but this is not possible for a number of reasons. Neither PayPal nor LinkedIn can run all of their code in the browser nor could they not run any code in the browser. For security reasons, not all code can run using JavaScript in the browser;  for user experience reasons, some code must run in the browser. Moreover, migrating a complex code base from one platform to another is a major undertaking.

Instead, PayPal and LinkedIn chose to use Dust.js as a common templating framework across all stacks. Directly using Dust.js in the browser or within Node.js poses no problem as it’s a JavaScript framework. From Java code, Dust.js can be accessed through Rhino, a Java-based implementation of JavaScript. And C++ code can access Dust.js through Google’s V8 JavaScript engine. Both Rhino and V8 can be run on the server. What this means is that the entire code base can make use of the same templates, assuring a more consistent user experience and more rapid development cycles.

Since only JavaScript runs in the browser, only a JavaScript-based templating framework could have achieved this unity.

Thanks to Veena and Jeff for the great presentations and to Seth McLaughlin of LinkedIn for organizing the event.

Posted in Dust.js, JavaScript | 4 Comments

Will NoSQL Equal No DBA in Enterprise IT?

During the recent CouchConf in San Francisco, Frank Weigel, a Couchbase product manager, touted the benefits of schemaless databases: without a schema, developers may add fields without going through a DBA. The audience of developers seemed pleased, but I wondered what a DBA might think.

Based on my experience, the roles of DBAs in enterprise IT vary. Some engage in database development. Others focus exclusively on operations. But DBAs generally own production databases; all production schema changes go through them. This control would go away under NoSQL. In the schemaless world of NoSQL, developers have more power and more responsibility. Through their code, developers control the schema, access rights, and data integrity.

What is left for DBAs? There is always operations: monitoring performance and adjusting resources as needed. But these are general system administration tasks, not necessarily requiring experts trained in relational databases. So will NoSQL put DBAs out of work? Will the DBAs, with their cubicle walls covered in Oracle certifications, stand up against this invader?

Let’s not get ahead of ourselves. I’ve not seen any evidence to suggest that NoSQL is taking hold in enterprise IT. With the exception of large, customer-facing systems, enterprises do not need the scalability promised by NoSQL. And many key enterprise systems require transactions that span records, a feature lacking in NoSQL systems. While document-oriented and graph databases might fit well for certain use cases, the value proposition for NoSQL in the enterprise has yet to become compelling.

And despite the disadvantages of fixed schema for developers, the fixed schema of relational systems has a larger value within enterprises. Ideally, these schema define in one location the rules of data integretity, which makes it easier to audit all changes.

While I could foresee conflicts between developers and DBAs over NoSQL in enterprises, that does not need to be the case. Most enterprise systems will continue to run on relational databases. The DBA role of managing these systems will not go away anytime soon. If enterprises do see any value in NoSQL for certain use cases, they should define policies around which sorts of applications might use this new technology. NoSQL, as many have commented, may well stand for Not Only SQL, in which case it can coexist with DBAs within the confines of corporate policies and procedures.

Posted in NoSQL | 1 Comment

Thoughts on CouchConf: Data Modeling for Flow

Last Friday, I had the pleasure of attending CouchConf in San Francisco to learn about the new and exciting features of Couchbase 2.0. With the new release, Couchbase becomes a full document oriented database . Meanwhile, Couchbase keeps its focus on performance, reporting some very impressive performance statistics comparing Couchbase to other NoSQL databases. Of all the exciting presentations at the conference, I most enjoyed the session on data modeling by Chris Anderson, Chief Architect for Mobile at Couchbase.

Document-oriented databases such as Couchbase are schemaless, making data modeling more flexible. While in the relational world there is generally one right answer for how to divide data across tables to achieve fifth normal form, in the document-oriented world there are fewer clear cut rules. For example, whether blog comments get stored with the post or in separate documents depends more on expected usage than any inherent constraints of the system or rigid design rules.

Without the schema constraints of the relational model, developers become free to think more about data flow. Chris Anderson uses phrases such as “emergent schema” and “runtime-driven schema” to describe this new emphasis. He concludes “thinking about throughput, latency, update and read patterns is the new data modeling.”

In my experience, emphasizing data flow, or data in motion as opposed to data in rest, makes developers better at understanding user needs. In my early career, when playing the role of business analyst, I went too far in allowing the relational model to shape my thinking about business requirements. With a relational schema always in the back of my mind, I focused my questions on whether data relations were one-to-one, one-to-many, or many-to-many. While I did write use cases, normalization rules shaped my thoughts.

Then several years ago, I worked with a colleague who focused more on process. He’d start conversations with “…now walk me through your process.” And that led us to a more fruitful discovery. At some point, I’d ask questions to better understand the data model, but we were in a better place when I started to ask those questions.

Of course, developers have always thought about use cases and update and read patterns. But if Chris Anderson is right and document databases make it easier to focus on data in motion rather than data at rest, then I think we’ll do a better job of thinking about user needs.

Posted in Couchbase, NoSQL | Leave a comment

Not NoSQL, NewSQL

This Monday at the Silicon Valley NewSQL meetup in Mountain View, Michael Stonebreaker took turns bashing both the established relational databases (Oracle, DB2, SQL Server, PostgreSQL)  and the NoSQL newcomers (MongoDB, Cassandra, Riak), proposing a third alternative, VoltDB, a NewSQL database.

Stonebreaker—a leading researcher in the field of database design, former UC Berkeley professor and now MIT professor, winner of the IEEE John von Neumann Medal, and a designer of PostgreSQL—argued that the established databases have become legacy bloatware incapable of scaling to modern requirements without complete redesign. According to Stonebreaker’s research, these systems, all of which follow a similar design, use only a small percentage of CPU cycles (about 4%) on useful work. The bulk of CPU cycles go to overhead, divided fairly evenly into four categories of about 24% each:

  • Managing the buffer pool of disk pages cached in memory
  • Multi-row locking for transactions
  • Latching memory objects such as b-trees to prevent corruption in a multi-threaded environment
  • Write-ahead logging to disk

The NoSQL databases, according to Stonebreaker, solve these problems, but they do so by jettisoning SQL and ACID. Giving up SQL, Stonebreaker argued, makes no sense. The SQL standard has proven itself as a time-saving high level language that successfully depends on compilation to generate low level commands. Going backwards to row-level commands and unique APIs for each database, Stonebreaker claimed, is comparable to giving up C for assembler.

Stonebreaker also argued against giving up ACID, a requirement (or potential requirement) for almost all applications. If a database does not provide ACID, application developers will need to write this complex code themselves.

Stonebreaker proposed instead his product, VoltDB, a relational database that supports ACID and most of the SQL standard. VoltDB avoids the overhead of buffer management by keeping all data in memory. It avoids the overhead of row locking and memory object latching by using a single thread per partition. Only one thread touches memory objects, and transactions run sequentially on the one thread. And instead of write-ahead logging of data, VoltDB takes periodic snapshots of the database and logs only commands, which is faster but still capable of rebuilding the database from disk in case of failure. (See the VoltDB Technical Overview for more details.)

Like most of the NoSQL databases, VoltDB supports scalability across commodity hardware by sharding data based on keys. According to Stonebreaker, the choice of key is critical to performance, as joins and transactions that cross partitions degrade performance, a problem that cannot be solved even by eliminating the overhead of traditional RDMS. VoltDB makes scaling possible, but application developers must still give careful thought to how to partition data so that most operations only touch a single partition.

One could argue that this latter admission proves the NoSQL case against relational databases, namely that a database supporting ACID cannot scale. VoltDB scales only as long as transactions do not cross partitions. In a sense, VoltDB can be thought of as many small, fast databases that support ACID or one large database that supports ACID but does not scale. In other words, VoltDB does not solve the CAP dilemma.

Certainly, VoltDB will make sense for certain use cases, where there is a need for lightning speed and transactional integrity, where data can be sharded into largely autonomous partitions, and where VoltDB’s only partial implementation of the SQL standard fulfills requirements. But VoltDB will not replace traditional RDMS anytime soon, as it lacks much of the functionality that enterprises expect, bloatware though that might be.

Nor will VoltDB eliminate the demand for NoSQL, because many organizations will find a NoSQL database out there that fits well with its specific requirements. If all you need is a key-value store, why not choose a database that specializes in this function. If your data takes the shape of a graph, why not choose a database tailor made to the purpose.

Moreover, Stonebreaker overstates the case against NoSQL APIs. Yes, SQL is a proven high level language for data access, but it does not always fit well with object models used by application developers, which is why organizations have turned to solutions for object-relational mapping.  In many ways, objects map more intuitively to document-oriented databases than to SQL. Besides, a central tenet of NoSQL is the embrace of variety, the rejection of the one size fits all mentality. With that in mind, the diversity of NoSQL APIs is a benefit; the diversity allows developers to choose the one or more APIs that best fit a specific requirement.

Whether or not we accept all of Stonebreaker’s claims, the VoltDB retake on the traditional RDMS makes clear that these are exciting times in the world of databases.

Posted in NewSQL, NoSQL | 6 Comments