R. Netravali and J. Mickens, “Reverb: Speculative Debugging for Web Applications,” in SOCC 2019, Forthcoming.Abstract
Bugs are common in web pages. Unfortunately, traditional debugging primitives like breakpoints are crude tools for understanding the asynchronous, wide-area data flows that bind client-side JavaScript code and server-side application logic. In this paper, we describe Reverb, a powerful new debugger that makes data flows explicit and queryable. Reverb provides three novel features. First, Reverb tracks precise value provenance, allowing a developer to quickly identify the reads and writes to JavaScript state that affected a particular variable’s value. Second, Reverb enables speculative bug fix analysis. A developer can replay a program to a certain point, change code or data in the program, and then resume the replay; Reverb uses the remaining log of nondeterministic events to influence the post-edit replay, allowing the developer to investigate whether the hypothesized bug fix would have helped the original execution run. Third, Reverb supports wide-area debugging for applications whose server-side components use event-driven architectures. By tracking the data flows between clients and servers, Reverb enables speculative replaying of the distributed application.
R. Netravali, A. Sivaraman, J. Mickens, and H. Balakrishnan, “WatchTower: Fast, Secure Mobile Page Loads Using Remote Dependency Resolution,” in MobiSys, 2019. PaperAbstract
Remote dependency resolution (RDR) is a proxy-driven scheme for reducing mobile page load times; a proxy loads a requested page using a local browser, fetching the page’s resources over fast proxy-origin links instead of a client’s slow last-mile links. In this paper, we describe two fundamental challenges to efficient RDR proxying: the increasing popularity of encrypted HTTPS content, and the fact that, due to time-dependent network conditions and page properties, RDR proxying can actually increase load times. We solve these problems by introducing a new, secure proxying scheme for HTTPS traffic, and by implementing WatchTower, a selective proxying system that uses dynamic models of network conditions and page structures to only enable RDR when it is predicted to help. WatchTower loads pages 21.2%–41.3% faster than state-of-the-art proxies and server push systems, while preserving end-to-end HTTPS security.
F. Wang, R. Ko, and J. Mickens, “Riverbed: Enforcing User-defined Privacy Constraints in Distributed Web Services,” NSDI. 2019. PaperAbstract
Riverbed is a new framework for building privacy-respecting web services. Using a simple policy language, users define restrictions on how a remote service can process and store sensitive data. A transparent  Riverbed proxy sits between a user's front-end client (e.g., a web browser) and the back-end server code. The back-end code remotely attests to the proxy, demonstrating that the code respects user policies; in particular, the server code attests that it executes within a  Riverbed-compatible managed runtime that uses IFC to enforce user policies. If attestation succeeds, the proxy releases the user's data, tagging it with the user-defined policies. On the server-side, the  Riverbed runtime places all data with compatible policies into the same universe (i.e., the same isolated instance of the full web service). The universe mechanism allows  Riverbed to work with unmodified, legacy software; unlike prior IFC systems,  Riverbed does not require developers to reason about security lattices, or manually annotate code with labels.  Riverbed imposes only modest performance overheads, with worst-case slowdowns of 10% for several real applications.
J. Larisch, J. Mickens, and E. Kohler, “Alto: Lightweight VMs using Virtualization-aware Managed Runtimes,” in International Conference on Managed Languages & Runtimes (ManLang), Linz, Austria, 2018. PaperAbstract
Virtualization enables datacenter operators to safely run computations that belong to untrusted tenants. An ideal virtual machine has three properties: a small memory footprint; strong isolation from other VMs and the host OS; and the ability to maintain in-memory state across client requests. Unfortunately, modern virtualization technologies cannot provide all three properties at once. In this paper, we explain why, and propose a new virtualization approach, called Alto, that virtualizes at the layer of a managed runtime interface. Through careful design of (1) the application-facing managed interface and (2) the internal runtime architecture, Alto provides VMs that are small, secure, and stateful. Conveniently, Alto also simplifies VM operations like suspension, migration, and resumption. We provide several details about the proposed design, and discuss the remaining challenges that must be solved to fully realize the Alto vision.
R. Ko and J. Mickens, “DeadBolt: Securing IoT Deployments,” in Applied Networking Research Workshop, Montreal, Quebec, Canada, 2018. PaperAbstract

In an IoT deployment, network access policies are a critical determinant of security. Devices that run unpatched or insecure code should not be allowed to communicate with the outside world; furthermore, unknown external hosts should not be able to contact sensitive devices that reside within an IoT deployment.

In this paper, we introduce DeadBolt, a new security framework for managing IoT network access. DeadBolt hides all of the devices in an IoT deployment behind an access point that implements deny-by-default policies for both incoming and outgoing traffic. The DeadBolt AP also forces high-end IoT devices to use remote attestation to gain network access; attestation allows the devices to prove that they run up-to-date, trusted software. For lightweight IoT devices which lack the ability to attest, the DeadBolt AP uses virtual drivers (essentially, security-focused virtual network functions) to protect lightweight device traffic. For example, a virtual driver might provide network intrusion detection, or encrypt device traffic that is natively cleartext. Using these techniques, and several others, DeadBolt can prevent realistic attacks while imposing only modest performance costs.

R. Netravali and J. Mickens, “Remote-Control Caching: Proxy-based URL Rewriting to Decrease Mobile Browsing Bandwidth,” in HotMobile, Tempe, Arizona, 2018. PaperAbstract

Mobile browsers suffer from unnecessary cache misses. The same binary object is often named by multiple URLs which correspond to different cache keys. Furthermore, servers frequently mark objects as uncacheable, even though the objects' content is stable over time.

In this paper, we quantify the excess network traffic that mobile devices generate due to inefficient caching logic. We demonstrate that mobile page loads suffer from more redundant transfers than reported by prior studies which focused on desktop page loads. We then propose a new scheme, called Remote-Control Caching (RC2), in which web proxies (owned by mobile carriers or device manufacturers) track the aliasing relationships between the objects that a client has fetched, and the URLs that were used to fetch those objects. Leveraging knowledge of those aliases, a proxy dynamically rewrites the URLs inside of pages, allowing the client's local browser cache to satisfy a larger fraction of requests. Using a concrete implementation of RC2, we show that, for two loads of a page separated by 8 hours, RC2 reduces bandwidth consumption by a median of 52%. As a result, mobile browsers can save a median of 469 KB per warm-cache page load.

R. Netravali, V. Nathan, J. Mickens, and H. Balakrishnan, “Vesper: Measuring Time-to-Interactivity for Web Pages,” in NSDI, Renton, WA, 2018. PaperAbstract
Everyone agrees that web pages should load more quickly. However, a good definition for ``page load time'' is elusive. We argue that, in a modern web page, load times should be defined with respect to interactivity: a page is ``loaded'' when above-the-fold content is visible and the associated JavaScript event handling state is functional. We define a new load time metric, called Ready Index, which explicitly captures our proposed notion of load time. Defining the metric is straightforward, but actually measuring it is not, since web developers do not explicitly annotate the JavaScript state and the DOM elements which support interactivity. To solve this problem, we introduce Vesper, a tool which rewrites a page's JavaScript and HTML to automatically discover the page's interactive state. Armed with Vesper, we compare Ready Index to prior load time metrics like Speed Index; we find that, across a variety of network conditions, prior metrics underestimate or overestimate the true load time for a page by 24%--64%. We also introduce a tool that optimizes a page for Ready Index, decreasing the median time to page interactivity by 29%--32%.
R. Netravali and J. Mickens, “Prophecy: Accelerating Mobile Page Loads Using Final-state Write Logs,” in NSDI, Renton, WA, 2018. PaperAbstract
Web browsing on mobile devices is expensive in terms of battery drainage and bandwidth consumption. Mobile pages also frequently suffer from long load times due to high-latency cellular connections. In this paper, we introduce Prophecy, a new acceleration technology for mobile pages. Prophecy simultaneously reduces energy costs, bandwidth consumption, and page load times. In Prophecy, web servers precompute the JavaScript heap and the DOM tree for a page; when a mobile browser requests the page, the server returns a write log that contains a single write per JavaScript variable or DOM node. The mobile browser replays the writes to quickly reconstruct the final page state, eliding unnecessary intermediate computations. Prophecy's server-side component generates write logs by tracking low-level data flows between the JavaScript heap and the DOM. Using knowledge of these flows, Prophecy enables optimizations that are impossible for prior web accelerators; for example, Prophecy can generate write logs that interleave DOM construction and JavaScript heap construction, allowing interactive page elements to become functional immediately after they become visible to the mobile user. Experiments with real pages and real phones show that Prophecy reduces median page load time by 53%, energy expenditure by 36%, and bandwidth costs by 21%.
F. Wang and J. Mickens, “Veil: Private Browsing Semantics Without Browser-side Assistance,” in NDSS, San Diego, CA, 2018. PaperAbstract

All popular web browsers offer a "private browsing mode." After a private session terminates, the browser is supposed to remove client-side evidence that the session occurred. Unfortunately, browsers still leak information through the file system, the browser cache, the DNS cache, and on-disk reflections of RAM such as the swap file.

Veil is a new deployment framework that allows web developers to prevent these information leaks, or at least reduce their likelihood. Veil leverages the fact that, even though developers do not control the client-side browser implementation, developers do control 1) the content that is sent to those browsers, and 2) the servers which deliver that content. Veil web sites collectively store their content on Veil's blinding servers instead of on individual, site-specific servers. To publish a new page, developers pass their HTML, CSS, and JavaScript files to Veil's compiler; the compiler transforms the URLs in the content so that, when the page loads on a user's browser, URLs are derived from a secret user key. The blinding service and the Veil page exchange encrypted data that is also protected by the user's key. The result is that Veil pages can safely store encrypted content in the browser cache; furthermore, the URLs exposed to system interfaces like the DNS cache are unintelligible to attackers who do not possess the user's key. To protect against post-session inspection of swap file artifacts, Veil uses heap walking (which minimizes the likelihood that secret data is paged out), content mutation (which garbles in-memory artifacts if they do get swapped out), and DOM hiding (which prevents the browser from learning site-specific HTML, CSS, and JavaScript content in the first place). Veil pages load on unmodified commodity browsers, allowing developers to provide stronger semantics for private browsing without forcing users to install or reconfigure their machines. Veil provides these guarantees even if the user does not visit a page using a browser's native privacy mode; indeed, Veil's protections are stronger than what the browser alone can provide.

F. Wang, Y. Joung, and J. Mickens, “Cobweb: Practical Remote Attestation Using Contextual Graphs,” in SysTEX, Shanghai, China, 2017. PaperAbstract

In theory, remote attestation is a powerful primitive for building distributed systems atop untrusting peers. Unfortunately, the canonical attestation framework defined by the Trusted Computing Group is insufficient to express rich contextual relationships between client-side software components. Thus, attestors and verifiers must rely on ad-hoc mechanisms to handle real-world attestation challenges like attestors that load executables in nondeterministic orders, or verifiers that require attestors to track dynamic information flows between attestor-side components.

In this paper, we survey these practical attestation challenges. We then describe a newattestation framework, named Cobweb, which handles these challenges. The key insight is that real-world attestation is a graph problem. An attestation message is a graph in which each vertex is a software component, and has one or more labels, e.g., the hash value of the component, or the raw file data, or a signature over that data. Each edge in an attestation graph is a contextual relationship, like the passage of time, or a parent/child fork() relationship, or a sender/receiver IPC relationship. Cobweb’s verifier-side policies are graph predicates which analyze contextual relationships. Experiments with real, complex software stacks demonstrate that Cobweb’s abstractions are generic and can support a variety of real-world policies.

R. Netravali, A. Goyal, J. Mickens, and H. Balakrishnan, “Polaris: Faster Page Loads Using Fine-grained Dependency Tracking,” in NSDI, Santa Clara, CA, 2016. PaperAbstract

To load a web page, a browser must fetch and evaluate objects like HTML files and JavaScript source code. Evaluating an object can result in additional objects being fetched and evaluated. Thus, loading a web page requires a browser to resolve a dependency graph; this partial ordering constrains the sequence in which a browser can process individual objects. Unfortunately, many edges in a page’s dependency graph are unobservable by today’s browsers. To avoid violating these hidden dependencies, browsers make conservative assumptions about which objects to process next, leaving the network and CPU underutilized.

We provide two contributions. First, using a new measurement platform called Scout that tracks fine-grained data flows across the JavaScript heap and the DOM, we show that prior, coarse-grained dependency analyzers miss crucial edges: across a test corpus of 200 pages, prior approaches miss 30% of edges at the median, and 118% at the 95th percentile. Second, we quantify the benefits of exposing these new edges to web browsers. We introduce Polaris, a dynamic client-side scheduler that is written in JavaScript and runs on unmodified browsers; using a fully automatic compiler, servers can translate normal pages into ones that load themselves with Polaris. Polaris uses fine-grained dependency graphs to dynamically determine which objects to load, and when. Since Polaris’ graphs have no missing edges, Polaris can aggressively fetch objects in a way that minimizes network round trips. Experiments in a variety of network conditions show that Polaris decreases page load times by 34% at the median, and 59% at the 95th percentile.

F. Wang, J. Mickens, N. Zeldovich, and V. Vaikuntanathan, “Sieve: Cryptographically Enforced Access Control for User Data in Untrusted Clouds,” in NSDI, Santa Clara, CA, 2016. PaperAbstract

Modern web services rob users of low-level control over cloud storage—a user’s single logical data set is scattered across multiple storage silos whose access controls are set by web services, not users. The consequence is that users lack the ultimate authority to determine how their data is shared with other web services.

In this paper, we introduce Sieve, a new platform which selectively (and securely) exposes user data to web services. Sieve has a user-centric storage model: each user uploads encrypted data to a single cloud store, and by default, only the user knows the decryption keys. Given this storage model, Sieve defines an infrastructure to support rich, legacy web applications. Using attribute-based encryption, Sieve allows users to define intuitively understandable access policies that are cryptographically enforceable. Using key homomorphism, Sieve can reencrypt user data on storage providers in situ, revoking decryption keys from web services without revealing new keys to the storage provider. Using secret sharing and two factor authentication, Sieve protects cryptographic secrets against the loss of user devices like smartphones and laptops. The result is that users can enjoy rich, legacy web applications, while benefiting from cryptographically strong controls over which data a web service can access.

D. Li, J. Mickens, S. Nath, and L. Ravindranath, “Domino: Understanding Wide-Area, Asynchronous Event Causality in Web Applications,” in SOCC (short paper), Kohala Coast, Hawai'i, 2015. PaperAbstract

In a modern web application, a single high-level action like a mouse click triggers a flurry of asynchronous events on the client browser and remote web servers. We introduce Domino, a new tool which automatically captures and analyzes end-to-end, asynchronous causal relationship of events that span clients and servers. Using Domino, we found uncharacteristically long event chains in Bing Maps, discovered data races in the WinJS implementation of promises, and developed a new server-side scheduling algorithm for reducing the tail latency of server responses.

T. Chajed, et al., “Amber: Decoupling User Data from Web Applications,” in HotOS, Kartause Ittingen, Switzerland, 2015. PaperAbstract

User-generated content is becoming increasingly common on the Web, but current web applications isolate their users’ data, enabling only restricted sharing and cross-service integration. We believe users should be able to share their data seamlessly between their applications and with other users. To that end, we propose Amber, an architecture that decouples users’ data from applications, while providing applications with powerful global queries to find user data. We demonstrate how multi-user applications, such as e-mail, can use these global queries to efficiently collect and monitor relevant data created by other users. Amber puts users in control of which applications they use with their data and with whom it is shared, and enables a new class of applications by removing the artificial partitioning of users’ data by application.

R. Netravali, et al., “Mahimahi: Accurate Record-and-Replay for HTTP,” in USENIX ATC, Santa Clara, CA, 2015. PaperAbstract

This paper presents Mahimahi, a framework to record traffic from HTTP-based applications, and later replay it under emulated network conditions. Mahimahi improves upon prior record-and-replay frameworks in three ways. First, it is more accurate because it carefully emulates the multi-server nature of Web applications, present in 98% of the Alexa US Top 500 Web pages. Second, it isolates its own network traffic, allowing multiple Mahimahi instances emulating different networks to run concurrently without mutual interference. And third, it is designed as a set of composable shells, providing ease-of-use and extensibility.

We evaluate Mahimahi by: (1) analyzing the performance of HTTP/1.1, SPDY, and QUIC on a corpus of 500 sites, (2) using Mahimahi to understand the reasons why these protocols are suboptimal, (3) developing Cumulus, a cloud-based browser designed to overcome these problems, using Mahimahi both to implement Cumulus by extending one of its shells, and to evaluate it, (4) using Mahimahi to evaluate HTTP multiplexing protocols on multiple performance metrics (page load time and speed index), and (5) describing how others have used Mahimahi.

J. van den Hooff, D. Lazar, and J. Mickens, “Mjölnir: The Magical Web Application Hammer,” in APSys, Tokyo, Japan, 2015. PaperAbstract

Conventional wisdom suggests that rich, large-scale web applications are difficult to build and maintain. An implicit assumption behind this intuition is that a large web application requires massive numbers of servers, and complicated, one-off back-end architectures. We provide empirical evidence to disprove this intuition. We then propose new programming abstractions and a new deployment model that reduce the overhead of building and running web services.

J. Mickens, et al., “Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications,” in NSDI, Seattle, WA, 2014. PaperAbstract

Blizzard is a high-performance block store that exposes cloud storage to cloud-oblivious POSIX and Win32 applications. Blizzard connects clients and servers using a network with full-bisection bandwidth, allowing clients to access any remote disk as fast as if it were local. Using a novel striping scheme, Blizzard exposes high disk parallelism to both sequential and random workloads; also, by decoupling the durability and ordering requirements expressed by flush requests, Blizzard can commit writes out-of-order, providing high performance and crash consistency to applications that issue many small, random IOs. Blizzard’s virtual disk drive, which clients mount like a normal physical one, provides maximum throughputs of 1200 MB/s, and can improve the performance of unmodified, cloud-oblivious applications by 2x–10x. Compared to EBS, a commercially available, state-of-the-art virtual drive for cloud applications, Blizzard can improve SQL server IOp rates by seven-fold while still providing crash consistency.

J. Mickens, “Pivot: Fast, Synchronous Mashup Isolation Using Generator Chains,” in IEEE Symposium on Security and Privacy, San Jose, CA, 2014. PaperAbstract

Pivot is a new JavaScript isolation framework for web applications. Pivot uses iframes as its low-level isolation containers, but it uses code rewriting to implement synchronous cross-domain interfaces atop the asynchronous cross-frame postMessage() primitive. Pivot layers a distributed scheduling abstraction across the frames, essentially treating each frame as a thread which can invoke RPCs that are serviced by external threads. By rewriting JavaScript call sites, Pivot can detect RPC invocations; Pivot exchanges RPC requests and responses via postMessage(), and it pauses and restarts frames using a novel rewriting technique that translates each frame’s JavaScript code into a restartable generator function. By leveraging both iframes and rewriting, Pivot does not need to rewrite all code, providing an order-of-magnitude performance improvement over rewriting-only solutions. Compared to iframe-only approaches, Pivot provides synchronous RPC semantics, which developers typically prefer over asynchronous RPCs. Pivot also allows developers to use the full, unrestricted JavaScript language, including powerful statements like eval().

J. R. Lorch, B. Parno, J. Mickens, M. Raykova, and J. Schiffman, “Shroud: Ensuring Private Access to Large-Scale Data in the Data Center,” in FAST, San Jose, CA, 2013. PaperAbstract

Recent events have shown online service providers the perils of possessing private information about users. Encrypting data mitigates but does not eliminate this threat: the pattern of data accesses still reveals information. Thus, we present Shroud, a general storage system that hides data access patterns from the servers running it, protecting user privacy. Shroud functions as a virtual disk with a new privacy guarantee: the user can look up a block without revealing the block’s address. Such a virtual disk can be used for many purposes, including map lookup, microblog search, and social networking.

Shroud aggressively targets hiding accesses among hundreds of terabytes of data. We achieve our goals by adapting oblivious RAM algorithms to enable large-scale parallelization. Specifically, we show, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency. Our evaluation combines large-scale emulation with an implementation on secure coprocessors and suggests that these adaptations bring private data access closer to practicality.

K. Lin, D. Chu, J. Mickens, L. Zhuang, F. Zhao, and J. Qiu, “Gibraltar: Exposing Hardware Devices to Web Pages Using AJAX,” in USENIX WebApps, Boston, MA, 2012. PaperAbstract

Gibraltar is a new framework for exposing hardware devices to web pages. Gibraltar’s fundamental insight is that JavaScript’s AJAX facility can be used as a hardware access protocol. Instead of relying on the browser to mediate device interactions, Gibraltar sandboxes the browser and uses a small device server to handle hardware requests. The server uses native code to interact with devices, and it exports a standard web server interface on the localhost. To access hardware, web pages send device commands to the server using HTTP requests; the server returns hardware data via HTTP responses.

Using a client-side JavaScript library, we build a simple yet powerful device API atop this HTTP transfer protocol. The API is particularly useful to developers of mobile web pages, since mobile platforms like cell phones have an increasingly wide array of sensors that, prior to Gibraltar, were only accessible via native code plugins or the limited, inconsistent APIs provided by HTML5. Our implementation of Gibraltar on Android shows that Gibraltar provides stronger security guarantees than HTML5; furthermore, it shows that HTTP is responsive enough to support interactive web pages that perform frequent hardware accesses. Gibraltar also supports an HTML5 compatibility layer that implements the HTML5 interface but provides Gibraltar’s stronger security.