We continue our discussion with Julian Bucknall, the Chief Technical Officer of DevExpress and the author Tomes of Delphi: Algorythms and Data Structures. We will be talking about the Generic TDictionary that comes with Delphi 2009, and the latest news with DevExpress.

- Julian’s Blog: www.Boyet.com
- His EZDSL: Easy Data Structures Library for Delphi
- Tomes of Delphi: Algorythms and Data Structures
- DevExpress
- Turbo Power (now on SourceForge)

Be sure to read the great additional information from Barry Kelly in the comments!

## 5 replies on “4.2 – Julian Bucknall”

Good job with all these podcasts !

In general, I think you should publish the full podcast (whatever length will have) instead of fragmenting it in a number of sub-episodes.

Nice! Thanks for your effort.

Julian is wrong about the linear probing versus rehashing, in so far as his comments apply to .NET and Delphi 2009, at least.

Delphi 2009’s TDictionary use linear probing, *not* rehashing, for collision resolution. The dictionary size is a power of two for speed of hash code -> table index calculation, to avoid a division.

.NET’s Dictionary uses chaining for collision resolution, and an array of a prime number size for buckets. However, it chains using indexes into a separate array of a value type, probably for cache locality, rather than reference chaining.

FWIW, Java’s HashMap and friends use powers of two for the bucket array, again probably to save a division, but use reference chaining.

Dictionary’s implementation details may change in future releases, however; don’t take this comment as part of the specification. Some parts of the implementation were constrained by details of which particular generic features in the compiler had been implemented at that time.

Also, I should add that rehashing the way Julian described is problematic unless the stride is relatively prime (no common factors) to the hash table size. If the hash table size was itself prime, that would be trivially true, but for a power of two size we’d have to eliminate 2 from all the factors of the calculated stride. The specific problem is that only relatively prime strides will eventually iterate through every bucket index. For a counter-example, consider a table of size 8 and a stride of length 4; if it starts at 3, say, it will simply loop between 3 and 7, and won’t find empty slots elsewhere.

Another knock against rehashing is that you can’t delete from a table that uses rehashing, because you don’t know how far back to move an item to fill in any gaps in any possible rehash-probing sequences. You can use a tombstone for this, and periodically (i.e. by calculating that you have too many tombstones) re-insert everything in the dictionary to clear out the tombstones.

I should finally point out that I chose linear probing over heap reference chaining for cache effects in longer-running programs, where memory allocations will be scattered throughout memory, within the limitations of generics at the time of writing. Dynamic arrays were mandatory; static arrays of generic types couldn’t have their element sizes calculated in early versions of the compiler. Similarly, there were restrictions on using generic types inside themselves, and some other issues which have since been fixed, but not early enough to afford a redesign of the dictionary implementation.