nSIM Product Released
Synopsys has officially announced and released nSIM - an Instruction Accurate simulator for the DesignWare ARC processor family.
"It uses state-of-the-art JIT compilation technology that reduces compilation overhead and leverages multi-core host capabilities to increase simulation speed. The JIT compilation engine is integrated with a powerful and very efficient micro-architectural performance model of the processor pipeline. This enables a near cycle-accuracy of up to 95% (compared to RTL) to be achieved at high simulation speeds of 25 MIPS, and allows high-speed architectural exploration and system profiling. "
Parallel Trace Based JIT in Action
If you can not play this video get the Flash Player.
Scalable Multi-Core Simulation Paper Accepted
Our paper titled Scalable Multi-Core Simulation Using Parallel Dynamic Binary Translation and written by O.Almer, I.Böhm, T.Edler von Koch, B.Franke, S.Kyle, V.Seeker, C.Thompson and N.Topham, was accepted at the International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS'11) in Samos, Greece.
In this work we have made advances in the area of multi-core simulation, exploiting and extending our novel trace-based parallel JIT compilation system. For the first time we were able to demonstrate that architectural simulation of multi-core systems can be extremely fast, efficient, and scalable.
Paper Accepted at PLDI'11
Our paper titled Generalized Just-In-Time Trace Compilation using a Parallel Task Farm in a Dynamic Binary Translator and written by I.Böhm, T.Edler von Koch, S. Kyle, B.Franke and N.Topham, was accepted at the ACM SIGPLAN 2011 Conference on Programming Language Design and Implementation (PLDI'11) in San Jose, CA.
In this work we have made advances in the area of trace-based compilation by proposing a novel tracing strategy, namely interval based tracing. Another significant advancement is the design and implementation of a truly parallel JIT compilation system based on the task farm design pattern. This parallel JIT compilation system is the first of its kind to successfully exploit the parallelism available on today's multi-core architectures, and to exploit the parallelism exposed by our novel tracing strategy.
PvD in Edinburgh
This week Paul is in Edinburgh and I have VIP tickets!
EnCore Castle Processor Fully Functional!
The second silicon implementation of an extended EnCore processor is a test-chip codenamed Castle, fabricated in a generic 90nm CMOS process. All of the EnCore test chips are named after hills in Edinburgh; Castle is named after the rock on which Edinburgh Castle is built.
The Castle chip contains an extended version of the EnCore processor, together with a 32KB 4-way set-associative Instruction Cache, and a 32KB 4-way set-associative Data Cache. It is embedded within a system-on-chip (SoC) design that provides a generic 32-bit memory interface, as well as interrupt, clocks and reset signals.
Paper accepted for SAMOS'10
Our paper about Cycle-Accurate Performance Modelling in an Ultra-Fast Just-In-Time Dynamic Binary Translation Instruction Set Simulator got accepted for SAMOS 2010.
Here is the abstract:
“Instruction set simulators (ISS) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (JIT) dynamic binary translation (DBT) techniques are able to simulate complex embedded processors at speeds above 500 MIPS. However, these functional ISS do not provide microarchitectural observability. In contrast, low-level cycle-accurate ISS are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing FPGA-based simulation speeds. We extend the JIT DBT engine of our ISS and augment JIT generated code with a verified cycle-accurate processor model. Our approach can model any microarchitectural configuration, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded processor implementing the ARCompact instruction set architecture (ISA). We achieve simulation speeds up to 63 MIPS on a standard x86 desktop computer, whilst the average cycle-count deviation is less than 1.5 % for the industry standard EEMBC and CoreMark benchmark suites.“
Pre-Print of CGO'10 paper available
If you would like to read the joint paper with Tobias and Björn that I am going to present at CGO in Toronto next week, here is the place where you can get it: