|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.fasterxml.jackson.core.sym.BytesToNameCanonicalizer
public final class BytesToNameCanonicalizer
A caching symbol table implementation used for canonicalizing JSON field
names (as Names which are constructed directly from a byte-based
input source).
Complications arise from trying to do efficient reuse and merging of
symbol tables, to be able to make use of usually shared vocabulary
of subsequent parsing runs.
| Field Summary | |
|---|---|
protected int |
_collCount
Total number of Names in collision buckets (included in _count along with primary entries) |
protected int |
_collEnd
Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker) |
protected com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.Bucket[] |
_collList
Array of heads of collision bucket chains; size dynamically |
protected int |
_count
Total number of Names in the symbol table; only used for child tables. |
protected boolean |
_intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not |
protected int |
_longestCollisionList
We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases. |
protected int[] |
_mainHash
Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index) |
protected int |
_mainHashMask
Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N). |
protected Name[] |
_mainNames
Array that contains Name instances matching
entries in _mainHash. |
protected BytesToNameCanonicalizer |
_parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary. |
protected AtomicReference<com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.TableInfo> |
_tableInfo
Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table. |
protected static int |
DEFAULT_TABLE_SIZE
|
protected static int |
MAX_TABLE_SIZE
Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names. |
| Method Summary | |
|---|---|
Name |
addName(String symbolStr,
int[] quads,
int qlen)
|
Name |
addName(String symbolStr,
int q1,
int q2)
|
int |
bucketCount()
|
int |
calcHash(int firstQuad)
|
int |
calcHash(int[] quads,
int qlen)
|
int |
calcHash(int firstQuad,
int secondQuad)
|
protected static int[] |
calcQuads(byte[] wordBytes)
|
int |
collisionCount()
Method mostly needed by unit tests; calculates number of entries that are in collision list. |
static BytesToNameCanonicalizer |
createRoot()
Factory method to call to create a symbol table instance with a randomized seed value. |
protected static BytesToNameCanonicalizer |
createRoot(int hashSeed)
Factory method that should only be called from unit tests, where seed value should remain the same. |
Name |
findName(int firstQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table. |
Name |
findName(int[] quads,
int qlen)
Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it. |
Name |
findName(int firstQuad,
int secondQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table. |
static Name |
getEmptyName()
|
int |
hashSeed()
|
BytesToNameCanonicalizer |
makeChild(boolean canonicalize,
boolean intern)
Factory method used to create actual symbol table instance to use for parsing. |
int |
maxCollisionLength()
Method mostly needed by unit tests; calculates length of the longest collision chain. |
boolean |
maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries. |
void |
release()
Method called by the using code to indicate it is done with this instance. |
protected void |
reportTooManyCollisions(int maxLen)
|
int |
size()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final int DEFAULT_TABLE_SIZE
protected static final int MAX_TABLE_SIZE
protected final BytesToNameCanonicalizer _parent
protected final AtomicReference<com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.TableInfo> _tableInfo
protected final boolean _intern
protected int _count
protected int _longestCollisionList
protected int _mainHashMask
protected int[] _mainHash
protected Name[] _mainNames
Name instances matching
entries in _mainHash. Contains nulls for unused
entries.
protected com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.Bucket[] _collList
protected int _collCount
_count along with primary entries)
protected int _collEnd
| Method Detail |
|---|
public static BytesToNameCanonicalizer createRoot()
protected static BytesToNameCanonicalizer createRoot(int hashSeed)
public BytesToNameCanonicalizer makeChild(boolean canonicalize,
boolean intern)
intern - Whether canonical symbol Strings should be interned
or notpublic void release()
public int size()
public int bucketCount()
public boolean maybeDirty()
public int hashSeed()
public int collisionCount()
size() - 1), but should usually be much lower, ideally 0.
public int maxCollisionLength()
size() - 1 in the pathological case
public static Name getEmptyName()
public Name findName(int firstQuad)
Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)
firstQuad - int32 containing first 4 bytes of the name;
if the whole name less than 4 bytes, padded with zero bytes
in front (zero MSBs, ie. right aligned)
public Name findName(int firstQuad,
int secondQuad)
Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)
firstQuad - int32 containing first 4 bytes of the name.secondQuad - int32 containing bytes 5 through 8 of the
name; if less than 8 bytes, padded with up to 3 zero bytes
in front (zero MSBs, ie. right aligned)
public Name findName(int[] quads,
int qlen)
Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.
quads - Array of int32s, each of which contain 4 bytes of
encoded nameqlen - Number of int32s, starting from index 0, in quads
parameter
public Name addName(String symbolStr,
int q1,
int q2)
public Name addName(String symbolStr,
int[] quads,
int qlen)
public int calcHash(int firstQuad)
public int calcHash(int firstQuad,
int secondQuad)
public int calcHash(int[] quads,
int qlen)
protected static int[] calcQuads(byte[] wordBytes)
protected void reportTooManyCollisions(int maxLen)
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||