Charlie Curtsinger

Research

Paper
Source

Riker: Always-Correct and Fast Incremental Builds from Simple Specifications USENIX ATC 2022 (Best Paper)
Charlie Curtsinger and Daniel W. Barowy
abstract

Build systems are responsible for building software correctly and quickly. Unfortunately, traditional build tools like make are correct and fast only when developers precisely enumerate dependencies for every incremental build step. Forward build systems improve correctness over traditional build tools by discovering dependencies automatically, but existing forward build tools have two fundamental flaws. First, they are incorrect; existing forward build tools miss dependencies because their models of system state are incomplete. Second, they rely on users to manually specify incremental build steps, increasing the programmer burden for fast builds.

This paper introduces Riker, a forward build system that guarantees fast, correct builds. Riker builds are easy to specify; in many cases a single command such as gcc *.c suffices. From these simple specifications, Riker automatically discovers fast incremental rebuild opportunities. Riker models the entire POSIX filesystem—not just files, but directories, pipes, and so on. This model guarantees that every dependency is checked on every build so every output is correct.

We use Riker to build 14 open source packages including LLVM and memcached. Riker incurs a median overhead of 8.8% on the initial full build. On average, Riker's incremental builds realize 94% of make's incremental speedup with no manual effort and no risk of errors.

Paper
Source

Paper
Source

Parallelism in Practice: Experiences Teaching Concurrency and Parallelism in an Undergraduate OS Course SPLASH-E 2019
Charlie Curtsinger
abstract

Parallelism and concurrency are important topics that, according to ACM guidelines, should be part of every undergraduate computer science education. At Grinnell College, these concepts are covered in CSC 213, the operating systems course. In this course, students complete a series of four labs on concurrency to learn the basics of threads and data-parallel computation, how to deal with concurrency errors, and how to write efficient, scalable parallel code. This paper describes these labs and our experience using them for three sections of CSC 213.

Paper
Source

CACM Article

Coz: Finding Code that Counts with Causal Profiling CACM Research Highlight, June 2018
Charlie Curtsinger and Emery D. Berger
abstract

Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spend their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities.

This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus "virtually" speeding it up.

We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications such as Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.

CACM Article

Paper

Scrambler: Dynamic Layout Adaptation SPLASH 2016 Student Research Competition
David Chang, Thu Nguyen, and Niko Takayesu – Supervised by Charlie Curtsinger
abstract

Developers and compiler writers spend significant effort on performance tuning and optimizations. However, the impact of these efforts must be taken with a grain of salt. Prior work has shown that changes to a program's layout—the placement of code and data in memory—can change performance by more than the effect of standard optimization techniques.

This paper presents Scrambler, a system that dynamically changes the layout of programs to improve their performance. Scrambler runs C and C++ programs with a randomized layout, and monitors these programs for evidence of layout-related performance issues. When an issue is detected, Scrambler relocates the offending code to eliminate the layout issue. We evaluate Scrambler on eight SPEC CPU2006 benchmarks, and find that Scrambler can provide speedups as large as 4.6% by fixing layouts that hurt branch predictor performance. These early results are encouraging, and we plan to extend this work to support additional layout-related performance issues in the future.

Paper

Paper

Energy Profiling with Alpaca SPLASH 2016 Student Research Competition
Logan Goldberg, Joel Katticaran, and Abraham Mhaidli – Supervised by Charlie Curtsinger
abstract

Rapid increases in demand for cloud computing, large-scale internet services, and mobile devices have led to a correspondingly large increase in the amount of energy required for computation. This additional energy use comes at a high cost, both financially and environmentally.

Energy profilers are useful tools which allow programmers to energy audit their programs and determine what functions are responsible for the most energy use. In this paper, we introduce a new technique for energy profiling, which takes energy measurements at regularly spaced intervals and uses linear regression to correlate executing functions with changes in power demand. We have implemented this technique in Alpaca, an energy profiler for Linux. Initial results suggest that this approach shows promise in energy profiling programs in an accurate and meaningful way.

Paper

;login: Article

Coz: This is the Profiler You've Been Looking For USENIX ;login: Summer 2016
Charlie Curtsinger and Emery D. Berger
abstract

Causal profiling is a new approach to software profiling that tells developers which code is important for performance. Conventional profilers report where programs spend their time running, but optimizing long-running code may not improve program performance. Instead of simply observing a program, a causal profiler conducts performance experiments to predict the effect of speeding up many different parts of a program. During each experiment, a causal profiler uses virtual speedup to create the effect of optimizing part of the program, and progress points to measure any change in program performance as a result of the virtual speedup. A causal profile summarizes the results of many performance experiments, telling developers exactly where performance tuning would be worthwhile. Using COZ, a prototype causal profiler for Linux, we improve the performance of Memcached by 9%, SQLite by 25%, and several PARSEC applications by as much as 68%.

;login: Article

Paper
Source

DoubleTake: Evidence-Based Dynamic Analysis ICSE 2016
Tongping Liu, Charlie Curtsinger, and Emery D. Berger
abstract

Dynamic analysis can be helpful for debugging, but is often too expensive to use in deployed applications. We introduce evidence-based dynamic analysis, an approach that enables extremely lightweight analyses for an important class of errors: those that can be forced to leave evidence of their existence. Evidence-based dynamic analysis lets execution proceed at full speed until the end of an epoch. It then examines program state to find evidence that an error occurred at some time during that epoch. If so, execution is rolled back and re-execution proceeds with instrumentation activated to pinpoint the error. We present DoubleTake, a prototype evidence-based dynamic analysis framework. We demonstrate its generality by building analyses to find buffer overflows, memory use-after-free errors, and memory leaks. DoubleTake is precise and efficient: its buffer overflow analysis runs with just 2% overhead on average, making it the fastest such system to date.

Paper
Source

Coz: Finding Code that Counts with Causal Profiling SOSP 2015 (Best Paper)
Charlie Curtsinger and Emery D. Berger
abstract

Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities.

This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus “virtually” speeding it up.

We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.

Stabilizer: Statistically Sound Performance Evaluation ASPLOS 2013
Charlie Curtsinger and Emery D. Berger
abstract

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.

Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.

This paper presents Stabilizer, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. Stabilizer thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate Stabilizer's efficiency (<7% median overhead) and its effectiveness by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

Paper

Probabilistic Timing Analysis on Conventional Cache Designs DATE 2013
Leonidas Kosmidis, Charlie Curtsinger, Eduardo Quiñones, Jaume Abella, Emery D. Berger, and Francisco J. Cazorla
abstract

Probabilistic timing analysis (PTA), a promising alternative to traditional worst-case execution time (WCET) analyses, enables pairing time bounds (named probabilistic WCET or pWCET) with an exceedance probability (e.g., 10^-16), resulting in far tighter bounds than conventional analyses. However, the applicability of PTA has been limited because of its dependence on relatively exotic hardware: fully-associative caches using random replacement. This paper extends the applicability of PTA to conventional cache designs via a software-only approach. We show that, by using a combination of compiler techniques and runtime system support to randomise the memory layout of both code and data, conventional caches behave as fully-associative ones with random replacement.

Paper

AutoMan: A Platform for Integrating Human-Based and Digital Computation OOPSLA 2012, CACM Research Highlight
Daniel W. Barowy, Charlie Curtsinger, Emery D. Berger, and Andrew McGregor
abstract

Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power at an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling more human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.

This paper introduces AutoMan, the first fully automatic crowdprogramming system. AutoMan integrates human-based computations into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. This abstraction lets AutoMan programmers focus on their programming logic. An AutoMan program specifies a confidence level for the overall computation and a budget. The AutoMan runtime system then transparently manages all details necessary for scheduling, pricing, and quality control. AutoMan automatically schedules human tasks for each computation until it achieves the desired confidence level; monitors, reprices, and restarts human tasks as necessary; and maximizes parallelism across human workers while staying under budget.

DThreads: Efficient and Deterministic Multithreading SOSP 2011
Tongping Liu, Charlie Curtsinger, and Emery D. Berger
abstract

Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads.

This paper presents DThreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. DThreads enforces determinism in the face of data races and deadlocks. DThreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, DThreads has the additional benefit of eliminating false sharing. Experimental results show that DThreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.

Tech Report

Nofus: "Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code" MSR Tech Report
Scott F. Kaplan, Benjamin Livshits, Benjamin G. Zorn, and Charlie Curtsinger
abstract

Obfuscation is applied to large quantities of benign and malicious JavaScript throughout the web. In situations where JavaScript source code is being submitted for widespread use, such as in a gallery of browser extensions (e.g., Firefox), it is valuable to require that the code submitted is not obfuscated and to check for that property. In this paper, we describe Nofus, a static, automatic classifier that distinguishes obfuscated and non-obfuscated JavaScript with high precision. Using a collection of examples of both obfuscated and non-obfuscated JavaScript, we train Nofus to distinguish between the two and show that the classifier has both a low false positive rate (about 1%) and low false negative rate (about 5%).

Applying Nofus to collections of deployed JavaScript, we show it correctly identifies obfuscated JavaScript files from Alexa top 50 websites. While prior work conflates obfuscation with maliciousness (assuming that detecting obfuscation implies maliciousness), we show that the correlation is weak. Yes, much malware is hidden using obfuscation, but so is benign JavaScript. Further, applying Nofus to known JavaScript malware, we show our classifier finds 15% of the files are un-obfuscated, showing that not all malware is obfuscated.

Tech Report

Zozzle: Low-Overhead Mostly Static JavaScript Malware Detection USENIX Security 2011
Charlie Curtsinger, Benjamin Livshits, Benjamin G. Zorn, and Christian Seifert
abstract

JavaScript malware-based attacks account for a large fraction of successful mass-scale exploitation happening today. Attackers like JavaScript-based attacks because they can be mounted against an unsuspecting user visiting a seemingly innocent web page. While several techniques for addressing these types of exploits have been proposed, in-browser adoption has been slow, in part because of the performance overhead these methods incur.

In this paper, we propose Zozzle, a low-overhead solution for detecting and preventing JavaScript malware that is fast enough to be deployed in the browser. Our approach uses Bayesian classification of hierarchical features of the JavaScript abstract syntax tree to identify syntax elements that are highly predictive of malware. Our experimental evaluation shows that Zozzle is able to detect JavaScript malware through mostly static code analysis effectively. Zozzle has an extremely low false positive rate of 0.0003%, which is less than one in a quarter million. Despite this high accuracy, the Zozzle classifier is fast, with a throughput of over one megabyte of JavaScript code per second.