Hyperscan is a regular expression engine from Intel® with a focus on high performance, simultaneous matching of large sets of patterns and streaming operation. This post is an introduction to Hyperscan’s tools for performance measurement, including a number of test cases that can be replicated on your own equipment.
Hyperscan version 4.4 and later include a standard benchmarking tool, hsbench, designed to provide an easy way to measure Hyperscan’s performance for a particular set of patterns and a corpus of data to be scanned.
The hsbench tool is documented in the Hyperscan Developer Reference; this post is intended as a walk-through of how to use it to measure Hyperscan’s performance in a number of different scenarios.
To this end, the Hyperscan team has generated a number of sample test pattern sets and corpora as examples.
It is important to note that Hyperscan’s performance is variable and depends on a number of different factors:
- The number and composition of the patterns: this affects which implementation strategies Hyperscan selects when building a compiled database.
- The data being scanned (for example, performance may be affected by the rate of matches or near-matches in the data.)
- Pattern flags: some pattern flags — such as SOM or UTF8 mode — impose performance costs.
- The scanning mode: there are optimizations available to Hyperscan in block mode, where it knows precisely how much data will be scanned, which are not available in streaming mode. Similarly, streaming mode requires bookkeeping work at stream scan boundaries that is not needed in block mode.
- The platform hardware: Hyperscan is able to take advantage of modern instruction set features, such as Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Bit Manipulation Instructions 2 (Intel® BMI2), where available.
With this in mind, it is important to note that there is no one “baseline” measure of Hyperscan’s performance or resource requirements (database size, stream state size, etc.). Performance matters while using your application, pattern sets, and representative scanning data. We encourage users to build their own benchmarks and test Hyperscan under their own conditions.
It is hoped that these examples serve as a useful guide.
Three sample pattern sets are examined in this post:
- snort_literals: This is a set of 3,316 literal patterns extracted from the sample ruleset included with the Snort* 3 network intrusion detection system, available at https://github.com/snortadmin/snort3. Some of these are marked with the
HS_FLAG_CASELESSflag so that they match case-insensitively, and all of them use
HS_FLAG_SINGLEMATCHto limit matching to once per scan for each pattern.
- snort_pcres: This is a set of 847 regular expressions that were also extracted from the sample ruleset includes with Snort 3, taken from rules targeted at HTTP traffic. It is important to note that these are just the patterns extracted from the rules’ “
pcre:” options, and that scanning for them in a single pattern set with Hyperscan is not semantically equivalent to scanning for these rules within Snort; this is a sample case intended to show Hyperscan’s capability for matching expressions simultaneously.
- teakettle_2500: This is a set of 2,500 synthetic pattern generated with a script that produces regular expressions of limited complexity. These are composed of dictionary words separated by character class repeats and alternations.
The format for these pattern sets is a text file, with one ID and expression per line. For example, the first few lines of the teakettle_2500 set are:
1:/loofas.+stuffer[^\n]*interparty[^\n]*godwit/is 2:/procurers.*arsons/s 3:/^authoress[^\r\n]*typewriter[^\r\n]*disservices/is 4:/praesidiadyeweedisonomic.*reactivating/is 5:/times/s
Two hsbench corpora files are available for download as sample data for scanning:
- gutenberg.db: A collection of English-language texts from Project Gutenberg, broken up into 10,240-byte streams of 2,048-byte blocks.
- alexa200.db: A large traffic sample constructed from a PCAP capture of an automated Web browser browsing a subset of the top sites listed on Alexa. This file contains 130,957 blocks (originally corresponding to packets), and only traffic to or from port 80 is included.
These files are SQLite databases designed to allow convenient construction of corpora for hsbench from arbitrary input. Their format is described in the Hyperscan Developer Reference, and some sample scripts are included with Hyperscan to construct corpora from common inputs, such as text files and PCAP traffic samples.
Using hsbench to collect performance measurements
Given a pattern file and a corpus database, hsbench can be used to perform a single-threaded benchmark like this:
$ hsbench -e pcre/snort_literals -c corpora/alexa200.db -N Signatures: pcre/snort_literals Hyperscan info: Version: 4.4.1 Features: AVX2 Mode: BLOCK Expression count: 3,116 Bytecode size: 1,111,416 bytes Database CRC: 0x17dce83b Scratch size: 4,289 bytes Compile time: 0.313 seconds Peak heap usage: 551,178,240 bytes Time spent scanning: 11.767 seconds Corpus size: 514,572,016 bytes (405,004 blocks) Matches per iteration: 1,894,263 (3.770 matches/kilobyte) Overall block rate: 688,357.34 blocks/sec Overall throughput: 6,996.66 Mbit/sec
-N argument above instructs hsbench to scan in block mode; by default, streaming mode will be used.
By default, the corpus is scanned twenty times, and the overall performance reported is computed based on the total number of bytes scanned in the time it took to perform all twenty scans. The number of repeats can be changed with the
-n argument, and the time taken for each scan will be displayed if the
--per-scan argument is specified.
For readers who have not used the Hyperscan API before, here is a quick run-down of the meanings of the fields in the output above:
- Hyperscan info: the version of Hyperscan used to create the pattern database, the platform it was constructed for (“AVX2” in this case) and the scanning mode (“BLOCK”).
- Expression count: the number of patterns in the pattern file supplied.
- Bytecode size: the size of the Hyperscan database bytecode built from the patterns. This is a compiled data structure used to match the complete set of patterns during scanning. The bytecode is immutable once built and can be shared between scanning threads.
- Database CRC: a CRC32 of the database bytecode.
- Scratch size: the size of the mutable “scratch” space required to scan data against this bytecode. Each scanning context requires its own scratch space.
- Compile time: the time requires to compile the Hyperscan database bytecode from the pattern set.
- Peak heap usage: The peak memory usage for the compilation process.
The remainder of the fields in the hsbench output are scan performance statistics and should be self-explanatory.
Example performance measurements
As an example, the following sample measurements were collected using hsbench and Hyperscan 4.4.1 on an Intel Core i7-6700K workstation running at 4.2 GHz.
In these commands, we use the Linux taskset utility to pin the process to the first core on the system.
1. Snort literals against HTTP traffic, block mode.
$ taskset 1 hsbench -e pcre/snort_literals -c corpora/alexa200.db -N
2. Snort PCREs against HTTP traffic, block mode.
$ taskset 1 hsbench -e pcre/snort_pcres -c corpora/alexa200.db -N
3. Teakettle synthetic patterns against Gutenberg text, streaming mode.
$ taskset 1 hsbench -e pcre/teakettle_2500 -c corpora/gutenberg.db
The results are summarised in the table below.
|Pattern Set||Scan Corpus||Number of Patterns||Matches/KB||Blocks/Sec||Megabits/Sec|
|Snort Literals||HTTP Traffic||3,116||3.686||541,772||5,861|
|Snort PCREs||HTTP Traffic||847||8.804||140,116||1,516|
|Teakettle 2500||Gutenberg Text||2,500||0.577||205,249||3,355|
Note: These results show Hyperscan running on only a single core of the test machine. To run the test on multiple cores, you can use the
-T argument to create multiple threads. For example,
-T0,1,2,3 will run four scanning threads, one on each of the four given cores.
With the pattern sets and corpora available for download, you can replicate these performance measurements yourself on your own platform using the hsbench invocations above.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Changes to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice Revision #2011080