There are mainly two separate makes use of: the block index which is basically metadata and the UTXO database.
The dbcache shouldn’t be actually a cache, it is a write buffer and it prevents needing to sync the disk or make random writes. As a cache it would not do a lot and is not wanted for that. It does additionally act as a cache, however in case you nullify that profit it is solely roughly a 10% slowdown even with an enormous dbcache.
With a quick SSD (e.g. NVMe) I believe the distinction between a 400MB cache and a 5GB one is “solely” a halving of IBD time (when syncing from LAN friends).
To stop corruption the database updates should contain synchronizing writes.
We modified how dbcache flushing labored with the specific intention of constructing background flushing potential in order that it may very well be concurrently flushing and processing blocks always quite than inserting these bubbles the place it waits on the disk. Regardless that the consistency necessities make that potential now (the UTXO database would not should be constant besides there’s a file that states that every one blocks earlier than it have completely positively been utilized to the database), really making the change remains to be extraordinarily sophisticated.
Each time you course of a block, as soon as the cache fills you can flush out to disk the remaining soiled entries for the oldest remaining blocks till it is again below the restrict once more.. So primarily flushing each block within the background. Then each every now and then, you do a disk sync and replace the file that claims the place the fully constant level is, with out having to flush something extra out. So virtually no latency spikes at runtime.
This is able to be good for mining however doing that requires a bunch of equipment to effectively observe issues and doubtless would make sense to switch the dbcache’s map with an open hash desk of some variety on the identical time to chop the malloc site visitors down by 10x. I count on there’s a issue of two IBD speedup ready from these modifications. I am undecided if it’s going to ever occur although, it is a huge sophisticated job and any mistake is a consensus bug.
You may even do neat stuff like resolve collisions within the open hash desk by displacing the oldest UTXO and as soon as you have gone “far sufficient” flush. The desk may run extraordinarily near fully full always.
I am personally keen on cuckoo tables. The place each merchandise has a small variety of potential places like two random buckets which every maintain four gadgets and if these places are full you decide one, insert the merchandise there, bump what was in that slot into one in every of its various places.
If the desk has a bit of slack (not more than ~95% full for two buckets of four gadgets for instance) then it is at all times profitable at discovering an empty slot after a number of kicks. Lookup is extraordinarily quick because you solely must do two random reminiscence accesses. There are a bunch of various designs although most are finest when you possibly can hold the desk <50% full however the cuckoo design is straightforward to take as much as arbritarily excessive fullness.
The STL units that Bitcoin makes use of are a hash desk, however collisions are resolved by having every entry within the hash desk be a linked record, so each entry includes a number of pointer chases and each insertion takes a malloc.
(This query was answered by gmaxwell on IRC. It has been paraphrased and any errors are my very own.)