Research

Broadly speaking, I study distributed systems—how to make them faster, more robust, and more secure. Much of my work focuses on large-scale web services, and how to design principled system interfaces for those services. Here are some of the specific topics which currently interest me:

Storage architectures for large-scale web services: What is the best way to organize user data for services that must scale to millions of users? For example, how can we maximize IO throughput, and minimize IO latency, for block-based storage abstractions? How can datacenters take advantage of new storage technologies like SSDs and shingled magnetic drives? How does application design change when cloud storage is user-centric instead of application-centric, i.e., when a user's data is located in a single, user-controlled storage silo, instead of scattered across multiple, application-controlled silos?

Client-side web security: How can developers isolate distrusted JavaScript code while still allowing rich interactions between third-party libraries and the enclosing web page? The Pivot system is one attempt at a solution. However, I believe that there are more fundamental solutions which involve the creation of a new scripting language for the web (a JavaScript++, if you like analogies to C++, which would be troubling because C++ is a nightmarish Pandora’s box of emotional trauma, but I clearly digress). Creating this new scripting language will require contributions from systems research as well as programming language research.

Secure delegation of sensitive user data: On the server-side, users have little influence on how their data is shared within different parts of an application, or across different applications that may belong to different companies. Access control mechanisms like OAuth provide users with a modicum of control, but those mechanisms are plagued with security vulnerabilities, and they do not provide strong, cryptographic limits on how third parties can manipulate user data. Thus, in practice, users cede control of their data to service providers. I'm interested in using techniques like attribute-based encryption, garbled circuits, and homomorphic encryption to provide users with cryptographically strong control over which third parties gain access to particular pieces of user data.

Web performance and analysis: To load a web page, a client-side browser must fetch a large number of objects (e.g., HTML files, images, and JavaScript files). Understanding how network conditions impact fetch performance is crucial for understanding the overall page load process. Once a page is loaded, that page generates a large number of JavaScript events; in turn, those events may trigger server-side events. By studying these asynchronous, wide-area event chains, we can identify which parts of the application pipeline are slow, and try to optimize them. Using data flow analysis of the dependencies between client-side HTML, CSS, and JavaScript files, we can present browsers with a fetch schedule for those files which minimizes page load time while still respecting the data flows.