Enum regex_automata::sparse::SparseDFA

source ·

pub enum SparseDFA<T: AsRef<[u8]>, S: StateID = usize> {
    Standard(Standard<T, S>),
    ByteClass(ByteClass<T, S>),
    // some variants omitted
}

Expand description

A sparse table-based deterministic finite automaton (DFA).

In contrast to a dense DFA, a sparse DFA uses a more space efficient representation for its transition table. Consequently, sparse DFAs can use much less memory than dense DFAs, but this comes at a price. In particular, reading the more space efficient transitions takes more work, and consequently, searching using a sparse DFA is typically slower than a dense DFA.

A sparse DFA can be built using the default configuration via the SparseDFA::new constructor. Otherwise, one can configure various aspects of a dense DFA via dense::Builder, and then convert a dense DFA to a sparse DFA using DenseDFA::to_sparse.

In general, a sparse DFA supports all the same operations as a dense DFA.

Making the choice between a dense and sparse DFA depends on your specific work load. If you can sacrifice a bit of search time performance, then a sparse DFA might be the best choice. In particular, while sparse DFAs are probably always slower than dense DFAs, you may find that they are easily fast enough for your purposes!

State size

A SparseDFA has two type parameters, T and S. T corresponds to the type of the DFA’s transition table while S corresponds to the representation used for the DFA’s state identifiers as described by the StateID trait. This type parameter is typically usize, but other valid choices provided by this crate include u8, u16, u32 and u64. The primary reason for choosing a different state identifier representation than the default is to reduce the amount of memory used by a DFA. Note though, that if the chosen representation cannot accommodate the size of your DFA, then building the DFA will fail and return an error.

While the reduction in heap memory used by a DFA is one reason for choosing a smaller state identifier representation, another possible reason is for decreasing the serialization size of a DFA, as returned by to_bytes_little_endian, to_bytes_big_endian or to_bytes_native_endian.

The type of the transition table is typically either Vec<u8> or &[u8], depending on where the transition table is stored. Note that this is different than a dense DFA, whose transition table is typically Vec<S> or &[S]. The reason for this is that a sparse DFA always reads its transition table from raw bytes because the table is compactly packed.

Variants

This DFA is defined as a non-exhaustive enumeration of different types of dense DFAs. All of the variants use the same internal representation for the transition table, but they vary in how the transition table is read. A DFA’s specific variant depends on the configuration options set via dense::Builder. The default variant is ByteClass.

The `DFA` trait

This type implements the DFA trait, which means it can be used for searching. For example:

use regex_automata::{DFA, SparseDFA};

let dfa = SparseDFA::new("foo[0-9]+")?;
assert_eq!(Some(8), dfa.find(b"foo12345"));

The DFA trait also provides an assortment of other lower level methods for DFAs, such as start_state and next_state. While these are correctly implemented, it is an anti-pattern to use them in performance sensitive code on the SparseDFA type directly. Namely, each implementation requires a branch to determine which type of sparse DFA is being used. Instead, this branch should be pushed up a layer in the code since walking the transitions of a DFA is usually a hot path. If you do need to use these lower level methods in performance critical code, then you should match on the variants of this DFA and use each variant’s implementation of the DFA trait directly.

Variants§

§

Standard(Standard<T, S>)

A standard DFA that does not use byte classes.

§

ByteClass(ByteClass<T, S>)

A DFA that shrinks its alphabet to a set of equivalence classes instead of using all possible byte values. Any two bytes belong to the same equivalence class if and only if they can be used interchangeably anywhere in the DFA while never discriminating between a match and a non-match.

Unlike dense DFAs, sparse DFAs do not tend to benefit nearly as much from using byte classes. In some cases, using byte classes can even marginally increase the size of a sparse DFA’s transition table. The reason for this is that a sparse DFA already compacts each state’s transitions separate from whether byte classes are used.

Enum regex_automata::sparse::SparseDFA

Variants§

Standard(Standard<T, S>)

ByteClass(ByteClass<T, S>)

Implementations§

impl SparseDFA<Vec<u8>, usize>

pub fn new(pattern: &str) -> Result<SparseDFA<Vec<u8>, usize>, Error>

impl<S: StateID> SparseDFA<Vec<u8>, S>

pub fn empty() -> SparseDFA<Vec<u8>, S>

impl<T: AsRef<[u8]>, S: StateID> SparseDFA<T, S>

pub fn as_ref<'a>(&'a self) -> SparseDFA<&'a [u8], S>

pub fn to_owned(&self) -> SparseDFA<Vec<u8>, S>

pub fn memory_usage(&self) -> usize

impl<T: AsRef<[u8]>, S: StateID> SparseDFA<T, S>

pub fn to_u8(&self) -> Result<SparseDFA<Vec<u8>, u8>, Error>

pub fn to_u16(&self) -> Result<SparseDFA<Vec<u8>, u16>, Error>

pub fn to_u32(&self) -> Result<SparseDFA<Vec<u8>, u32>, Error>

pub fn to_u64(&self) -> Result<SparseDFA<Vec<u8>, u64>, Error>

pub fn to_sized<A: StateID>(&self) -> Result<SparseDFA<Vec<u8>, A>, Error>

pub fn to_bytes_little_endian(&self) -> Result<Vec<u8>, Error>

pub fn to_bytes_big_endian(&self) -> Result<Vec<u8>, Error>

pub fn to_bytes_native_endian(&self) -> Result<Vec<u8>, Error>

impl<'a, S: StateID> SparseDFA<&'a [u8], S>

pub unsafe fn from_bytes(buf: &'a [u8]) -> SparseDFA<&'a [u8], S>

Trait Implementations§

impl<T: Clone + AsRef<[u8]>, S: Clone + StateID> Clone for SparseDFA<T, S>

fn clone(&self) -> SparseDFA<T, S>

fn clone_from(&mut self, source: &Self)

impl<T: AsRef<[u8]>, S: StateID> DFA for SparseDFA<T, S>

type ID = S

fn start_state(&self) -> S

fn is_match_state(&self, id: S) -> bool

fn is_dead_state(&self, id: S) -> bool

fn is_match_or_dead_state(&self, id: S) -> bool

fn is_anchored(&self) -> bool

fn next_state(&self, current: S, input: u8) -> S

unsafe fn next_state_unchecked(&self, current: S, input: u8) -> S

fn is_match_at(&self, bytes: &[u8], start: usize) -> bool

fn shortest_match_at(&self, bytes: &[u8], start: usize) -> Option<usize>

fn find_at(&self, bytes: &[u8], start: usize) -> Option<usize>

fn rfind_at(&self, bytes: &[u8], start: usize) -> Option<usize>

fn is_match(&self, bytes: &[u8]) -> bool

fn shortest_match(&self, bytes: &[u8]) -> Option<usize>

fn find(&self, bytes: &[u8]) -> Option<usize>

fn rfind(&self, bytes: &[u8]) -> Option<usize>

impl<T: Debug + AsRef<[u8]>, S: Debug + StateID> Debug for SparseDFA<T, S>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl<T, S> RefUnwindSafe for SparseDFA<T, S>where S: RefUnwindSafe, T: RefUnwindSafe,

impl<T, S> Send for SparseDFA<T, S>where S: Send, T: Send,

impl<T, S> Sync for SparseDFA<T, S>where S: Sync, T: Sync,

impl<T, S> Unpin for SparseDFA<T, S>where S: Unpin, T: Unpin,

impl<T, S> UnwindSafe for SparseDFA<T, S>where S: UnwindSafe, T: UnwindSafe,

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>