TODO

-:[bs]ug  +:done  x:dropped  ^:forwarded  v:with-history #:note  ?:question

Thu Dec 05 2013
- simplify hxlib: dlopen actually does all that LD_LIBRARY_PATH traversal itself (!)

Mon Nov 27 2013
- hxlox improvements
- hxlock(hp) that lets you bypass all lock calls --- released by hxrel.
  Typically used by one-writer/many-readers, where the writer process signals
  all readers to release their persistent locks, falling back to intra-call locks,
  which are slower but permit readers to continue while writer does its updates
  using intra-call locks.

Sun Oct 20 2013
- hxbuild should use mmap'd file.

Thu Oct 03 2013
# Rolled back hxlox upgrades until I have time to sort them out.
    Currently, many_t.pass but it's always locking (0,0)! WTF
- hxopen *overrides* a previous setting of hxdebug, with getenv("HXDEBUG").
- use the leap-by-two idea to simplify finding loops in chains in hxfix().
    int hasloop(LIST *list)
    {
        if (list == NULL) return 0;
        LIST *slow = list, *fast = list;
        while (1) {
            fast = fast->next;
            if (fast == NULL) return 0;
            if (fast == slow) return 1;
            fast = fast->next; 
            if (fast == NULL) return 0;
            if (fast == slow) return 1;
            slow = slow->next;
        }
    }

Tue Oct 01 2013
- Clean up the lock mechanism:
    - add a sorted PAGENO vector that has the length in [0]

Thu Sep 26 2013
- hxfreeze(hp,onoff) takes a persisent read lock on the whole file, and stores npages in hp.
    hxrel(hp) is independent of this ... perhaps hxnext can ignore locking issues
    altogether and become a dirty scan?
    - add npages to HXFILE{}. When it's not zero, the file is frozen.
    - all ops except (hxget,hxnext) are BAD_REQUEST while frozen.
- Calculate the number of cache line hits for each hxget, assuming hxfreeze().
    - page header; hindex; record
 
Wed Sep 25 2013
- Optimize unlock to _hxunlock(locp,0,0), if hp->lockv is empty.

Sun Sep 22 2013
- For HXFILE to compete with CDB, it needs a persistent file-wide readonly lock
    ("hxlock"? "hxfreeze"?) that is (only) released by "hxrel".
    While the file is under hxlock, it does no fcntl or lseek(END) --- the file size is const, too.
    no more lseeks, fwiw. Write a test that a file locked that way can be shared
    but not updated.

Sat Sep 21 2013
- Make filetype mandatory leading string ("", 1) instead of (NULL, 0) unless HX_STATIC specified.
    Make hxopen fail if file has no type? How do you know which is the error?
    - no such file
    - invalid file header
    - unable to load typelib
- deps are wrong: many_t.pass ought to force many_t to rebuild, and many_t should rebuild
    if hxlox.c changes hence libhx.a
    AHA: the build pattern (%.pass : %) does nothing, because the targets are
    local names (no proj-specific prefix boo) while deps are not.
    many_t.pass : many_t  (but you only have rules for building $(PWD)/many_t). Gah.

+ locking has another problem. HIGH locks just the head: BOTH tries to lock the old split (5,6)
    but then finds that the file size has changed (so SPLIT moved from 5 to 7),
    so it unlocks all but the head ...  then tries to lock pgno=(9,10)???
    # solution is that when locking "just" the head, instead lock the set of
    potential splits starting at head.
    

Thu Sep 19 2013
+  sed -i 's/sizeof(\(\**[a-z][a-z_0-9]*\))/sizeof \1/g; s/sizeof(\("[^"]*"\))/sizeof \1/g' *.[ch]

Fri Sep 13 2013
- make _hxfindHeads take an output argument instead of implicitly using locp->vprev.

Wed Sep 11 2013
+ deadlocks are possible. At first, it looks like D computed the head (15)
    the split (13). When it reached hxlockset(BOTH), the file size had already
    changed, and the split was now GREATER than 15. WTF? How did the split move
    past (15) when D had 15 locked??
  From many_t: 
        @PUT D00205 89
        lock start=15 count=1 npages=33 < ---()                                 _hxlock:52      D
        lock > ---(15)                                                          _hxlock:101     D
        load pgno=15 next=16 used=3940 recs=58 hsize=74                         _hxload:271     D
        lock start=16 count=1 npages=33 < ---(15)                               _hxlock:52      D
        lock > ---(15 16)                                                       _hxlock:101     D
        load pgno=16 next=0 used=3939 recs=51 hsize=74                          _hxload:271     D
        unlock start=16 count=1 npages=33 < ---(15 16)                          _hxunlock:203   D
        unlock > ---(15)                                                        _hxunlock:228   D
        lock start=13 count=2 npages=33 < ---(15)                               _hxlock:52      D
        lock > ---(15 13 14)                                                    _hxlock:101     D
        lock start=11 count=1 npages=33 < ---(15 13 14)                         _hxlock:52      D
        lock > ---(15 13 14 11)                                                 _hxlock:101     D
        npages changes: 33 >> 37                                                _hxsize:392     D
        head=15 old=15 part=BOTH                                                _hxlockset:190  D
        unlock start=1 count=14 npages=37 < ---(15 13 14 11)                    _hxunlock:203   D
        unlock > ---(15)                                                        _hxunlock:228   D
        lock start=17 count=2 npages=37 < ---(15)                               _hxlock:52      D

Mon Jul 29 2013
- remaining tasks before release:
    - hard proof that deadlocks are impossible
    - incremental hxindexify
    - 48bit hashes

Wed Jul 03 2013
- add hindex to "chx dump"

Sat Jun 22 2013
+ add a serious concurrency test, that concurrent update is (a) correct (b) efficient.
    - fork ten processes (0..9)
        - add key=n00000..n99999, val=0
        - 10 times, for each key, val=val+1
    - check that there are 000000..999999 with value 10.
    - hh
- the hxgrow/hxsplit logic could be made a tiny bit smarter by noticing when
    when its source is (head) and both its target pages had enough space for
    the record to be inserted.

Wed Jun 19 2013
- Use XMM code to do the offset-adjustment of hind[] AND behind[] really fast.

    Suggest making behind[] have a multiple of 8 elements.
    #ifdef __SSE2__
        static const short mask = { 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, -1, -1, -1 };
        __m128i *xp, xoff = _mm_set1_epi16(offset), xlen = _mm_set1_epi16(delta);
        pos = (bufp->used + 1)/2;
        if (pos & 7)
           xp = (__m128i*)(hind - pos),
           *xp = _mm_add_epi16(*xp, _mm_and_si128(_mm_lddqu_si128(mask + 8 - pos)),
                                                  _mm_and_si128(xlen, _mm_cmpgt_epi16(*xp, xoff)));
        for (;(char*)xp < bufp->data + DATASIZE(hp); ++xp)
            *xp = _mm_add_epi16(*xp, _mm_and_si128(xlen, _mm_cmpgt_epi16(*xp, xoff));

        for (xp = (__m128i*)bufp->behind; (COUNT*)xp < bufp->behind+bufp->nbehind; ++xp)
            *xp = _mm_add_epi16(*xp, _mm_and_si128(xlen, _mm_cmpgt_epi16(*xp, xoff));
    #else
        for (i = HINDSIZE(bufp); --i >= 0;)
            if (bufp->hind[-i] > offset) bufp->hind[-i] += delta;
        for (i = 0; i < bufp->nbehind; ++i)
            if (bufp->behind[i] > offset) bufp->behind[i] += delta;
    #endif

+ _hxmask >> MASK
- GNUmakefile install xx
- hx_.c scramble hash
- hxshape.c (gets _hxmove logic)

Sun May 26 2013
- next step: 64bit hashes
- add something like FOR_EACH_REC for traversing hind[].
- when you _hxremove a record, be sure to zap its hind[] entry,
    else _redex will save that bogus offset.
    Hey! its offset will be saved in buf.{offset,delta}
    for the fixup.
Wed May 22 2013
- incremental change tracking in _hxshift is BRUTAL.
    It is not worth it for srcp, since you create multiple holes.
    dstp changes (where dstp != srcp) are doable:
    if dstp->used + size > dstp->orig
        scan [dstp->used ..< dstp->used+size] for nonzero hash entries
        and push them into dstp->redex[] before memmove.

- cases:
        remove(p1,old)
        ... save bufp(offset,oldsize)
        shift(p1,p2) ---
        split
        ... while shifting, when
        <:
        =:
        >:
        shift(p2,p3) [ < ]
        append(p3,new) [ < = > ]

    - remove(p1,old) shift(p1,p2)[ < = > ] shift(p2,p3) [ = > ] append(p3,new) [ > ]
    - replace(p1,old,new)[ < = > ]; shift(p1,p2)

    ... and does hxgrow FORCE hxindexify, or does it become a fallthrough?
    # after remove: set locp->(offset,delta) for later
    # during shift, cache hind(+delta?) from data[max(used,orig)..used+len]
    # after shift

Mon May 20 2013
- split hxupd etc into routines that SET Dirty and those that TEST Dirty.
# making the hash table smaller (e.g. power of 2 > (nrecs + 1) * 5 / 4
    is a dead end ... it makes insertion AND search less efficient.
    Better to work out the separate incremental steps carefully.
- basic plan is clear:
    + make _hxindexed() independent of _hxindexify()
    - add redex[] list (limited size) to HXBUF
    ? what affects hind[]?
        - (offset,delta) where delta < 0
        - (offset,orig,used) where used < orig
    Currently, hxput is careful to REPLACE an old rec at the same offset.
    Suppose remove(p1,oldrec), shift(p1,p2), append(p2,newrec).
    _hxsave (p1) must adjust offsets of all records after deletion:
    COUNT *hind = (COUNT*)bufp->data + (bufp->used + 1)/2
    COUNT *hend = (COUNT*)(bufp->data + DATASIZE(hp));
    for (; hind < hend; ++hind) if (*hind > pos) *hind += delta;

    shifted records are not in (p1).hind yet.

Sun May 19 2013
- performance:
    - hind fixed _hxfind. Now what fixes _hxindexify?
        hxput (MMAP) spends 60% of its time in hxindexify.
        _hxfind is still not cheap! it is 30% of hxget.
                $ perf_x 1mx.tab 24
            index % time    self  children    called     name
            -----------------------------------------------
                            0.11    3.63 4000000/4000000     main [1]
            [2]     53.9    0.11    3.63 4000000         hxput [2]
                            0.05    1.79 4639402/4639402     sync_save [4]
                            0.22    0.23  471267/481223      _hxshift [11]
                            0.32    0.04 2639402/8049393     _hxload [8]
                            0.29    0.03 2163281/7516554     _hxfind [7]
            -----------------------------------------------
                            0.06    2.49 4000000/4000000     main [1]
            [3]     36.7    0.06    2.49 4000000         hxget [3]
                            0.73    0.07 5353273/7516554     _hxfind [7]
                            0.65    0.08 5353273/8049393     _hxload [8]
            -----------------------------------------------
                            0.05    1.79 4639402/4639402     hxput [2]
            [4]     26.5    0.05    1.79 4639402         sync_save [4]
                            0.03    1.75 2215911/2284310     _hxsave [5]
            -----------------------------------------------
                            0.03    1.75 2215911/2284310     sync_save [4]
            [5]     26.5    0.04    1.80 2284310         _hxsave [5]
                            1.50    0.28 2262950/2262950     _hxindexify [6]
            -----------------------------------------------
                            1.50    0.28 2262950/2262950     _hxsave [5]
            [6]     25.7    1.50    0.28 2262950         _hxindexify [6]
            -----------------------------------------------
                            0.29    0.03 2163281/7516554     hxput [2]
                            0.73    0.07 5353273/7516554     hxget [3]
            [7]     16.1    1.02    0.10 7516554         _hxfind [7]
    x _hxfindfree: single isolated optimization.
        Using an 8-byte-stride ffz() makes no measurable diff.

- multithreading. flock ([pid,inode]) is the problem.

Tue May 14 2013
+ perf_x 1m.tab 24 crashes on a hxalloc:40 BAD_FILE:
    the page being freed is marked (0) in the bitmap.
    # Culprit was the logic to save scanning a tail page
        for any recs that matched locp->head. SOMETIMES
        you get this for free from _hxshift. Sometimes not.
- _hxindexify is now the top time-user in perf.
    Each hxbuf has its own little cache of entries to re-insert.
    (bufp->ncached == MAX_CACHED) is the way to say, "_hxindexify!"
    _hxappend needs an easy test for whether it has overwritten
    part of the index. That suggests that the index should always
    be (say) the power of 2 >= (nrecs+1)*9/8.
    - "+1" guarantees that there is at least one empty slot,
        so the stopping condition is easy?
    - this makes the masking calculation (hash -> hindex) simpler.
    - make _hxindexify do a two-pass insertion to improve median lookup time.
    - incremental updates to hind[]:
        - use coverage to determine frequency
        - everything happens inside hxput and _hxgrow(_hxshift)
        - _hxshift needs to be smart enough to flag minimal cases.
            Should _hxappend control the reindex cache?
        - _hxremove: adjust OFFSET of every hin
    This may take more analysis: perhaps hind[] needs to be limited in size.

- _hxload (on an MMAP'd file!) is in second place after _hxindexify.

Mon May 13 2013
- improve hxstat: add rechash and improve keyhash:
    I need an order-independent hash across all records in a file.
    - use low byte of hash as an index into a table of hashes.
    - xor hashes into respective table slots.
    - do a non-symmetric aggregate across the table(s).
+ remove the third... args from hxopen.
    Convert hxfunc into hxbind(hp,diff,hash,load,save,test).

Thu May 09 2013
+ perftest hxget-only random read mode
- perftest 100 processes calling hxput,hxget.
    - perhaps locking pages below the split might require something else
Mon Apr 29 2013
- BUG: somehow we sometimes get a hxind[] with no zero entries
    and _hxfind spins forever.
    - update _hxprbuf to print hind[] (!)

Sun Apr 28 2013
- latest perf_x shows _hxfindfree is the single biggest cost ---
    about 45% of hxput. 96% of _hxfindfree's time is in its own code.
    - use SSE2 or to speed up search for a zero bit in a map page
        Try using ULL+bsfl instead, first; code is less cryptic.
    - cache the end point (page, offset) of the last search.
        Cycle through the map pages in the file. This only helps
        if there are few other processes freeing pages.

Mon Feb 11 2013
- make hxstat compute a record hash (fnv04) instead of a key hash.

Mon Nov 12 2012
- hxlib parsing of $LD_LIBRARY_PATH is wrong: it treats "::" as an empty
    string, not as ".", before appending "/hx_xxx.so" to it.

Tue Oct 16 2012
- Perhaps keep a tiny change log in the HXBUF, and only do
    (incremental/full) _hxindexify in hxsave. This means you can
    do incremental changes even if delta<0 or next!=0
- failing test in corrupt_t ("bad free next") which forces used=0
    means we need to add more proper testing of hind[].
    Guaranteed that any nonzero bytes in a page with used=0
    is a corruption; as well as a chain that points at a page with used=0
    is a corruption.
- HX has a deadlock-avoidance scheme: hashed pages are always locked in a
    specific (descending) order for the duration of an API call.
    This gets complicated for "hxput", which may have to split a bunch of
    pages to create space for a new record.
    This means that multiple threads can work together if they
    can synchronize on the current state of (ino,pid).
    It's tricky, because no thread can unlock part of the file that
    some other thread might need.

    Otherwise, they have to single-thread themselves on the (ino).
    This also has its problems. How do you associate a pthread_mutex with
    an inode? You have a race condition between threads creating that
    mutex, or the mutex covering a static table that maps (inode <=> mutex).

Tue Sep 18 2012
- implement hxgetv and hxputv that do everything within one whole-file lock.
    ret = hxgetv(hp, nrecs, char **recv, int *lenv)
    ret = hxputv(hp, nrecs, char const**recv, int const *lenv)
    On entry, recv[i] must point to a buffer of at least lenv[i].
    On success, ret is 0.
    If there is a partial failure partway through processing the list,
    lenv[i] will be BAD_REQUEST (-1) for all unprocessed records.
    Note that recv[] will not necessarily be processed in order.

    No, this doesn't help, because locking is insignificant overhead
   once disk IO becomes an issue.

Thu Sep 13 2012
^ Implement HX_EXCLUSIVE mode. File size is stored in (HXFILE*).
    Perhaps keep a copy of the entire HXLOCAL struct in HXFILE.
    No fcntl or lseek(END) needed.

Fri Sep 07 2012
- consider putting hind[] at the front of the page and data (recs) at the back.
    This makes hind[] calculations much easier.
    It makes _hxshift harder (i think), unless you store the record length
    at the END of each HXREC. Hmmm ... ugly.

Tue Aug 28 2012
- UNDEX bit will indicate to _hxsave that, even though buffer has changed,
    _hxindexify should NOT be called.
- Five cases for incremental hind[] update in hxput:
    This is only done if delta >= 0 or currp->next == 0.
    - make _hxfind set &hindpos.
    - delete (newsize == 0, or found old record and new does not fit) when currp->recs > 1
        AFTER _hxremove
        - remove (zero) its hind entry, and move every following hind entry
            up to the next zero into save[]
        - move into save[] every entry that might move into the new delta space,
            plus every following entry past the split up to the next zero.
            if the last split-area entry was not zero.
        ... then the same as shrink:
    - shrink i.e delta < 0
        AFTER MEMMOVE
        - increase hsize
        - zero out the delta space
        - reinsert save[].
        - add (negative) delta to every hind entry that is > pos

    - update i.e. delta == 0
        - test that _hxsave does not call _hxindexify.

    - grow i.e. delta > 0
        BEFORE MEMMOVE
        - add delta to all hind entries >= pos
        - move hind[] entries in delta area to save[]
        - if last hind[old_hsize-1] is nonzero, move every leading entry (starting at hind[0]) to save.

    - add (may_find == 0) when recs > 0:
        This is a pure append to [used]
        - move hind[] entries in delta area to save[]
        - if last hind[] entry is nonzero, move every leading entry (starting at hind[0]) to save.
        - add entry to hind for new record

    There are going to be some nasty edge cases here.

Mon Aug 27 2012
- need to fix corrupt_t
+ fix perf_x! it misloads records!
- now that perf_x works, _hxindexify is now the big bottleneck (1mq.tab)
    Back we go to working out an incremental version of it?
    - Add "UNDEX" bit to bufp->flag.
    ^ implement INCREMENTAL shrink/expand

Thu Aug 23 2012
+ speed up IS_MAP by memoizing map1 (setting it in hxopen!)
    IS_MAP: pgno != 0 && (pgno < hp->map1 || ...)

Sun Aug 19 2012
+ use fastcall. It seems to cut hxput time in half, and push hxget time below 2 usec.

Wed Aug 15 2012
- update docs re hind.
+ review how two processes cannot deadlock on a shared tail.
- consider alternative locking schemes ... fcntl is the biggest ovhd for mmap.

Sat Aug 11 2012
x perftest the bsrl versus shift-and-or methods of _hxmask.
    bsrl: 5 ops, >>|: 15 ops

Fri Aug 10 2012
- add easy coverage for:
    - "Cancel a prior hxhold() for a different key"
    - hxnext in update mode skips a record in a shared tail for some other head.
    - hxcheck(/dev/null)
    - hxcheck: bad: head_rec map_head
        - map page with bits set beyond lastpage of file.
    - _hxleave: something that sets tmpmap(!?)
- cover _hxlockset: when file size changed since the last _hxsize.
    This is a nasty case of how to test something with two racing processes.
- cover _hxremap: when mmap returns a different pointer.

Wed Aug 08 2012
- biggest coverage hole in hxupd is hxgrow. Artefact of the tests??

Wed Aug 01 2012
- the way hxbuild saves and reloads the tail page is slightly wasteful.
+ make lockpart an ENUM
- add to hxfix: if a chain is >HX_MAX_CHAIN, break it.
    Breaking a chain means following (and breaking) every link.
    However, if the tail page is shared, breaking the link is
    insufficient.

Tue Jul 31 2012
- put signature ("Hx") in the two unused bytes of HXROOT.

Thu Jul 19 2012
- chx del would be sped up by sorting on revbits(hash)

Wed Jul 18 2012
- strace test on doc suggests locking calls (and there are no conflicts)
    are the single biggest system-call overhead.

    $ strace -c ./perf_t labs.tab 24
    analyze:0.119s nrecs:245809 nbytes:16773180 inpsize:472705029   hxbuild:120
    parts:31 split:82997 middle:87382 vbuf:528K                     _split:320
    split=10.827s store=108.158s                                    hxbuild:184
    hxputs: 0 0.000s                                                hxbuild:251
    build: 119.1084
    mmap: 4D4F6000 697872384: Success                               hxopen:98
    nrecs:6.42952e+06 succ-get:74.2578 fail-get:74.3304 put+11:0.5054 put-11:0.4738 (usec) put+100:0.4756 succ-get:75.4158 fail-get:76.0424
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     52.68    0.246146           0  61611688           fcntl
     34.74    0.162310           0  52297771           lseek
      5.73    0.026752           0   7223934           read
      5.47    0.025550           0    172459           write
      1.39    0.006496         171        38           munmap

    _hxfind's loop is still the largest overhead by far.

- (taken from hxi/TODO, Wed 02 Mar 2011)
    + make pgrate a constant
    + add recs field to +HXPAGE, +HXROOT, +HXBUF
    + update recs
        + display it in chx hdrs et al.
        + verify it in hxfix. recs < used, recs > used: used trumps recs.
    + change FITS to (bytes,nrecs).
    + rewrite hxbuild to calc space reqs using INDEX_SIZE(nrecs)
    + change hxmaxrec to (pgsize-18):
        18 = sizeof(HXPAGE including .recs) + sizeof(HXREC) + INDEX_SIZE(1)
    + any place that calculates max recs per page includes allowance for recs.
        (hxbuild, hxfix, ... ?)
    + use revbits to compute pgno. (actually, just bswap32 is enough).
    + implement hxind[]. Populate it from scratch in _hxsave.
        ^ Add "UNDEX" bit to bufp->flag.
        ^ set UNDEX bit inside hxshift, hxsplit.
        ^ set UNDEX bit *outside* hxremove, hxappend.
    x remove HXBUF.orig >> still needed for SHRUNK test in hxput and _hxshift
    x reorder HXBUF fields to put (next,used,recs) at the front.
        This will let Refs in hxshape use recs when folding atail/ztail.
    + make hxfix check hxind. For now, _hxsave will repair it anyway.
    + try to make hxmask faster, since it is now needed by hxind.
    + rationalize macros like INDEX_SIZE (count), INDEX_MIN (min-required)
    + calc start pos in hxind[] using linear hash formula, not just modulo hsize.
    + make _hxfind use hxind[]
        - hxfix must not use _hxfind any more!
    + fix hxcheck tests for dup recs in adjacent pages of the chain.
        This has always been lame, and now is skipped because we can no longer
        trust hindex[] to be safe; hxfix repairs the index and saves the page.
    ^ implement INCREMENTAL shrink/expand
        # Not sure how useful this is. Start by ONLY doing this in hxput,
            when currp->next == 0 (no further page mods) and
            currp->recs > 1 (page is not empty!)
        # note that all hash entries have to be scanned anyway, to add "delta"
            to their offset values if they are above the changed part of (used) bytes.

      When scanning a memory range of locations to move (compact) when hxput needs
        more space for a record, any entry followed by a zero is in the correct place,
        and doesn't need any hash manipulation to calculate where it belongs
        (subtract xwid/2 from it).
        assert(xmax > 0)
        for (i = n; i >= xmax; --i)
            if (xtab[-i-1] == 0) insert(xtab[-i], i-xwid/2), --i
            else j = xpos(*(HXHASH*)(page + xtab[-i]), xwid)
                insert(xtab[-i], j)
                insert(xtab[-i],
    - update %used in hxstat to indicate based on existing nbytes/nrecs.
    - perftest this mofo

Sat Jun 30 2012
- fast batch update: incremental packing may be faster than expanding.
    Given a non-empty hxfile, a block of RAM, and an input stream
    of updates, where each line is hx_save'd record prefixed by
    "+" or "-".

    non-empty hxfile, repeatedly:
    - fill RAM with [+-]records
    - sort records by [revbits(hash), seq].
    - eliminate useless updates (only the last one per key counts).
    - scan updates and file (pages affected by updates) to estimate the increase/decrease in file size
    - reshape to that size (factoring a chunk out of hxshape),
    - insert records without changing size -- using fine-grain locks,
        and multiple insertions per block.
    - postfill with hxput's

    The ideal would be to make the edge case (empty input file)
    close to hxbuild in speed. hxbuild has the advantage of
    partitioning the entire input stream first. It estimates
    final file size at the first memory-full point, because
    the number of partitions depends on the final file size.

    How about a version of hxput that cannot grow the file?
    Or even alter the free map? This has higher concurrency
    than hxput, since you only lock the pages of the chain head.

    Repeat:
    - a pass that inserts/deletes wherever will fit.
        - sort by [revbits(hash), seq]; for keys with dup hashes,
            check for equal keys and keep the last entry.
        - update multiple records in a single pass through
            a chain. updates of the freemap are okay.
    - resize that estimates how much is needed for the remaining
        inserts (use _hxgrow and entire file).
    - insert remaining recs, plus any more records that can
        be put in the queue?

Sat Jun 23 2012
^ we're back at _hxfind being the biggest part of hxget/hxput.
    I'm leery of tackling this, because it makes updates more
    complex. The in-record hash index uses 2.5 bytes per record.

Thu Jun 21 2012
- chx stat output gives a silly picture of near-empty files,
    showing huge zero-length chains, "single-head" shared pages.

Sun May 20 2012
^ Fast batch updates suggest trying to maximize cache effect,
    by sorting updates on reverse_bits(hash(x)).
    The trick is to not do TOO much at one point, creating
    long overflow chains. The answer requires knowing things
    that the public API cannot; hence hxbuild.
    The need to maintain a file lock for the duration can go away.
    "hxbatch(hp, fp, opts)" that can process a stream of deletes
    or a stream of puts: preserve the order of updates
    (i.e. discard all but the last update).
    opt bits:
    - DELETE (else PUT);
    - FILE lock, else granular page locks. Locking is granular
    for head below split, too, since the object is to allow
    concurrent use of the file.

    Partition the input on a smaller scale.
    First pass is allowed to insert into existing pages
    (plus one shared-tail at a time).
    Second pass begins by growing file to anticipated size,
    after discounting free overflow pages in the map.

Sun May 13 2012
x fix locking btw different threads (using different hp's).
    The problem is that fcntl-locks are inode-based, not file-handle based.
    fcntl(SETLK) is just plain BROKEN:
    qv. http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
    Get this: if the same file is opened twice (two fds),
    closing one fd releases all locks held by the OTHER fd.  Wow.
    That's right. REALLY. So there's no pthread_mutex wrapper that
    can save this turkey.
    FreeBSD flock is better (fd-based) but has NO granularity.
!!! This means that you cannot hxclose(a) while another HXFILE is holding
    a lock ... either because hxhold(b)/hxnext(b) is being used,
    or some hx*(b) call is in progress in another thread.

Mon Apr 30 2012
x add hxput(replace) hxput(change) and hxput(new) and hxput(del) to perf_t

Tue Apr 10 PDT 2012
- hxupd is fundamental to everything else. Coverage has problems:

    _hxputfreed never calls _hxflushfreed (i.e. double page-free)

      135:  223: if (hp->locked & LOCKED_BEYOND && !(hp->locked & LOCKED_BODY))
    #####:  224:     _hxaddlock(hp, newpg);

    Every case in _hxgrow except the simple one (1):
        -:  243: switch (_hxshift(locp, oldpg, newpg, bufp, oldp, newp)) {

      135:  286: if (oldp->next && !FILE_HELD(hp) && !IS_HEAD(hp, oldp->next))
    #####:  288:     _hxunlock(locp, oldp->next, 1);

Mon Apr  9 2012
^ now that the loop bug is fixed, the performance of hx vs hxi
    pops to the top of the stack. Previous perf tests had been
    with 40-50byte records, with about 10-25usec per hxget.s
    Perf test with 1m.tab (~700b records) shows ~1-2usec "hxget".
    So the linear-search has a significant cost, and a hash lookup
    WITHIN each page would pay off for small records and be
    insignificant for large pages (space overhead probably not
    significant, since it's <20 bits/record).

Sun Apr  8 2012
+ fixed bug in _hxsave that created loop in mass loads like
    hxbuild. Do not set tail_pgno=bufp->pgno when bufp->next is not zero.

Sat Apr  7 2012
+ making things like RECSIZE into inline functions would make
    some asserts more readable :-)

Mon Apr  2 2012
+ hxbuild bug: the logic for weeding out dup records in _store is crap.
    What was I thinking? Dup recs create a file that fails hxcheck.
    Ironically, hxcheck only catches the dups when they are in different
    pages.

Mon Mar 26 2012
+ hxput in hxbuild should not be locking anything; the file
    should be under a FILE+BEYOND lock. Unfortunately, it IS
    trying to lock individual pages, eventually overrunning
    lockv[] and filling crap into hp->buffer,
    making it look like hp->buffer.page is set to a non-null pointer,
    and hxput's "if (SCANNING(hp) && ...)" blows up.
    The problem is with _hxaddlock calls in _hxgrow.

+ when build_t or perf_t fails because 88.tab hasn't been created,
    the error is a bit cryptic.

Sun Mar 25 2012
+ hxbuild has a bug: the logic for populating the map pages
    blows on an assert in hx_load for any map page past the root.
    Correct is to _hxfresh each root page (past the root).

Fri Mar 23 2012
^ implement HX_EXCLUSIVE: no locking, no resize of file
    outside the api (hence HXLOCAL can live in HXFILE).
    This is the Berkeley dbm model.

Tue Mar  6 2012
- use mremap in _hxremap!

Thu Mar  1 2012
- change HXHASH to uint64_t. See what breaks.
    Files so big they need 64bit hashes prolly are too big to
    need HXI; HXI is for when the file is mmap'd and HX is
    competing with in-memory hash tables for lookup speed.

Fri Jan 27 2012
- make hx threadsafe:
    fcntl locking is based on (pid,ino).
    - declare a static (ptr to a) map {dev,ino => mutex,refcount}.
    - hxopen/hxclose modify the map in the obvious ways.
        A pointer to the map entry is stored in the HXFILE.
        Assume a linked list is good enough.
        Use compare_and_set to ensure the list is okay.

        typedef struct {
            __dev_t         dev;
            __ino_t         ino;
            pthread_mutex_t mux;
            int             refs;
        } _HXMUTEX;

        static _HXMUTEX *_hxmutexes;

        fstat(hp->file, &st);
        _HXMUTEX    *mp, *mxp = NULL;
        while (1) {
            for (mp = _hxmutexes; mp && mp->key != curkey; mp = mp->next);
            if (mp) {
                // This is wrong: assume *mp can be freed while doing this.
                while (mp && !compare_and_set(&mp->refs, mp->refs, mp->refs + 1);
                if (mxp) pthread_mutex_destroy(&mxp->mux), free(mxp);
                break;
            }

            if (!mxp) {
                malloc(sizeof*mxp);
                mxp->dev = st.st_dev;
                mxp->ino = st.st_ino;
                mxp->refs = 1;
                pthread_mutex_init(&mxp->mux);
            }

            mxp->next = _hxmutexes->next;
            if (compare_and_set(&_hxmutexes, _hxmutexes, mxp)) {
                mp = mxp;
                break;
            }
        }

        // Similarly, deletion is necessary.

    - unless I can figure out better granularity,
        _hxlock/_hxunlock lock the ENTIRE FILE (the inode)
        against any other thread.

Fri Dec 30 2011
- hxshape is the least cover'd function.
- _hxgrow() is the most important function for which to improve coverage!

Sun Dec 25 2011
# The whole HXI project is stopped by the hxcheck/hxfix issues.
# HXI will require hxfix to change; best to fix the problems first.

Sat Dec 24 2011
# hxcheck.c:xor_map is an obscure way to write getbit/setbit.
+ the only failing test now is corrupt_t
^ add to hxfix: if a chain is >HX_MAX_CHAIN, break it.
    Breaking a chain means following (and breaking) every link.
    However, if the tail page is shared, breaking the link is
    insufficient.
^ problem #0 in hxfix: only detects dup recs in adj pages of a chain (previously known)
- problem #1 in hxfix: when a link is broken, every link
    beyond that in the chain must also be broken,
    else they will not get redone.
- problem #2 in hxfix: if there are dup recs in a chain,
    hxfix doesn't fix it AT ALL. hxfix must break the link to the
    page containing the dupkey of a rec earlier in the chain,
    forcing the unreferenced pages to be redone.
- problem #3 in hxfix: if a dup rec is in a shared tail page,
    breaking the link to that page is not enough to redo it.
    You have to selectively edit the record out of the tail page
    and into the spool file. Would running hxfix twice catch this?

    Extend the scan for dup recs:
    Allocate HX_MAX_CHAIN buffers in locp.
    For each chain:
        Create an in-memory hash hrec[ceillog2(hxmaxrec(hp)/HX_MIN_REC*HX_MAX_CHAIN)]
            cells: (hash,pgno,offset).
        Note that the loop-check cuts any loop in vnext[], even if !REPAIRING.
            So this is safe to traverse.
        For each page, for each rec:
            if hash not in hrec[]
                hrec += (hash,pgno,offset)
            elif page#
                - if page# is the same,
    For each rec in curr, check whether it is in prev.
        If so, cut off immediately,
        elif its hash is in duphash[] for duphash[x].pgno != prev.pgno
            then break prev link to curr and exit
        else duphash += (hash,pgno,offset)
    If you didn't reach the end of chain, bad_dup_recs++.
    If !REPAIRING, bail out of the entire loop?
    This looks suspiciously like it will leave a tail page
    with a dup record in it that does not belong to the remaining
    chains pointing to the tail page. URRRRRRR.

Fri Dec 23 2011
x check_data_page:308 catches (next) pointing at a head page,
    but does not catch (next) pointing at a map page.
    This may explain Tim's bug.
- Tim's bug indicates need for a lot more thrashy testing
    on files in the <2MB range.

Tue Dec 20 2011
+ looks like next_t has a similar HX_MAX_CHAIN problem.
    Side issue is that hxnext doesn't notice that the
    chain is too long... it could be trapped in an infinite loop
    by a corrupt file.
^ check_t and corrupt_t have problems, not sure what.
    May be just brittle tests.
- ensure that hxput/hxbuild/hxshape never creates a chain > HX_MAX_CHAIN.

Mon Dec 19 2011
+ no need to do expensive REVBITS since, for a 4K page,
    it makes no difference for any file less than 64GB,
    and even then, records have to be <8 bytes!
# okay basic_t:136 'bug' is that chain length exceeds maxloops.
^ HXPAGE.recs changes
    - _hxappend? increments
    - _hxshift decrements srcp
    - chx hdrs/dump reports
    - hxcheck validates: recs>used, recs<used
    - unit test validates hxfix/hxdump

Fri Dec 16 2011
+ aha! a better way: make _hxhead do the revbits().
+ bug: basic_t:136: hxput('1---') corrupts tmp.hx.
    Not actually a bug: bad test.
- add a diagnostic mode that does no locking,
    allowing read-only operations (i.e. chx)
    to inspect a file partway through an update.

Tue Dec 13 2011
+ make pgrate a const 4
+ update (pgrate >>> version) to 0x0100 ... i.e. invalid >MAXPGRATE
+ store revbit(hash) instead of (hash)
+ add nrecs to HXPAGE/HXBUF.
^ add reindex-flag, and reindex to _hxsave (and _hxbuild and _hxfix)
    Set flag in _hxshift etc.
^ _hxfind uses revhash and index
^ hxput does incremental index update
x any other performance enhancements?

Thu Dec  8 2011
^ Major change for hxi: store each rec's REVBIT(HXHASH) in the page, instead of its HXHASH.
    This means that resizing the in-page hash table doesn't have to do a REVBIT
    for every entry relocated or reindexed.
    It makes the _hxsplit call in _hxgrow slightly different,
    but otherwise does not change anything.
    - change _hxmask to use (msutil.h) fls.
        mask(n | n >= 0) :: min(x | x & x-1 == 0 and x >= n) = (1 << fls(n-1)) - 1

^ when scanning a memory range of locations to move (compact) when hxput needs
    more space for a record, any entry followed by a zero is in the correct place,
    and doesn't need any hash manipulation to calculate where it belongs
    (subtract xwid/2 from it).
    assert(xmax > 0)
    for (i = n; i >= xmax; --i)
        if (xtab[-i-1] == 0) insert(xtab[-i], i-xwid/2), --i
        else j = xpos(*(HXHASH*)(page + xtab[-i]), xwid)
            insert(xtab[-i], j)
            insert(xtab[-i],

Tue Nov 29 2011
+ major bug on x86_64, starting with:
    All std types (HXHASH, PAGENO) must be converted to fixed types (uint32_t etc).
+ _hxload:257: display (pgsize,pgrate) instead of "next=bigweirdnumber"
+ chx dump doesn't dump maps > root?
+ change hxcheck to take a single argument. The remainder are only relevant to hxfix.
- is there any point to locking bitmap pages other than the root?

Sat Nov 26 2011
+ use HXROOT.pgrate field as a version number; make pgrate = 2*2 permanently.
- add hxupgrade(hp) which converts type-4 to type-5 (hxi) files.
    Use _hxtemp instead of taking an (fp) parameter.

Fri Nov 18 2011
x hxi: perhaps NOT use the entire space beyond (used) as an index, just 5*recs/4+1 shorts.
    Does this make things simpler? Most inserts don't relocate hash entries.
    OTOH growing an open-addressed hash table is not simple: every time you "split" a cell,
    AND the occupant moves, you have to do the same for every cell beyond it
    up to the next empty cell. This creates edge cases when (recs) is small.
    Best to have an index system that is unit-testable.

Sat Nov 12 2011
x reduce the number of munmap/mmap calls due to filesize changes.
    Pageno can be calculated knowing the file size, whether or not it is all mapped;
    but the page only has to be mapped if you want to read/write it!
    This suggests forcing _hxremap only after ftruncate, or when
    _hxfresh/_hxload/_hxlink/_hxmove need to address a page beyond mlen.
    This change should be effective, when the file is large,
    and the probability of hitting the most recently-grown part of the file
    is small.
    Currently _hxremap uses munmap/mmap. Probably more efficient is to FIRST
    try to force-map the grown part of the file to the memory beyond the
    currently-mapped part of the file. If that fails, munmap (or multiple
    pointers in HXFILE, with more complex MAP_BUF logic) is required.
# Although this seemed like a good idea, in practice it does nothing.
    You can't safely just map the grown segment, because mmap actively
    replaces anything mapped into the fixed address range you must specify.
    And trying to map the file while the previous map exists will always
    create a new address range. About all you can do is optimize
    "shrink", which almost never happens.

Thu Nov  3 2011
^ hxi revisited:
    - add (ushort recs) to HXPAGE, HXBUF
    - compute it/update it in hxput and _hxappend.
        (Only) hxshape uses _hxappend to copy multiple records at a time.
        Workaround:
            _hxload(locp, srcp, atail->pgno);
            _hxload(locp, dstp, ztail->pgno);
            _hxappend(dstp, srcp->data, srcp->used);
            dstp->recs += srcp->recs - 1;

    - make hxfix check recs<used, recs>used
    - change FITS to allow (5*sizeof(short)/4 * 2 + 2)bytes extra for hash index space.
    - add HXBUF fields (mask, shift) computed from (recs)
    - add REINDEX bit to STAIN; recompute index from scratch in _hxsave
    - make hxfix check index (rather than checking for all-zero bytes > used)
    - use index for lookups
        _hxload computes HXBUF.(xmask,xbits)
    - perftest use of index for hxget on MMAP/!MMAP

    - incremental index update:
        In the following cases, first phase would just set the REINDEX
        If _hxshift is called (and this buffer is either source or destination)
        the REINDEX bit is set, and no incremental work is done.
        - SHRINK (increase recs and used)
            This is the case when hxput expands or adds a record.
            It also means that hxshift is not going to be called
            with this page as a source or target.
            It is complicated by the scan that adds (delta)
            to every entry > pos in the index.

            Before copying data on top of index entries....
            store (pos,delta)
            If entry at &data[used] is nonzero, move
                entries starting at &data[DATASIZE]-2 to fixup[],
                replacing them with zeroes, until you reach a zero entry.
            Push nonzero index entries about to be overwritten.
        - EXPAND (decrease recs and used)
            This could be followed by _hxshift into this buffer from
            its successor (if any). So perhaps not worth doing unless
            bp->next is zero. So, assuming bp->next is zero
            If entry at &data[used] is nonzero, move entries
            at end to fixup[] (zeroing index entries)
            Do the same for entries starting at split point.
            ** These ranges may overlap.
Tue Nov  1 2011
X no coverage at all for recursive split!
    That may be the cause of the Oct30 coredump
    Nope, that was just a make failure.

Sun Oct 30 2011
- coredump in mamelog.hx update due to CHECK assert in _hxsave:
    two MODIFIED buffers with same pageno. Unfortunately, this
    is too late to determine WHY it got to that state.
    + Added asserts in _hxload to catch when a page is dirty in one buffer
      and also read in another.
    - Should do some similar check every time (pgno,flag) changes in any buffer.

Wed 18 May 2011
- hxbuild coverage (aside from LEAVE(HXERR_<syscall>)
    - entire input fits in mem (the single-_store case)
    - file with >1 map page
    - input with duplicate keys
    - an error in hxbuild requiring hxleave to munmap(tmpmap)
- hx.c coverage
    - hxhold("alpha:bet") then hxput("azbuk:va")
    - ?? force an mmap (remap) to relocate to a different starting page?
        thus forcing active buffer page ptrs to need adjusting.
^ hxupd.c coverage
    - the whole of case 3 in _hxgrow, including recursion
^ hxlox coverage
    - the race condition where the file changes size between
        the first size calc and the second (after the lock).

Mon 28 Mar 2011
- drop  DISK_PGSIZE being a constant in hxbuild.
    This means manually aligning vb[] to st.st_blksize of the tmpfile.

Mon 28 Feb 2011
^ okay t.perf shows that _hxfind is the biggest cost by far in hxget;
    about half. This was using 4K pages and ~16b records => 22b.
    Having a binary search rather than a linear search would make
    this faster with (relatively) minor changes to the rest of hx.
    What else would work? Would it require grouping the hash codes
    along with the offsets to make it RAM-efficient?
    Or move the WHOLE header into 8-byte array elements?
    This becomes a lot less recoverable, since scribbling on the array
    effectively destroys the records.
^ THE IDEA: use the space beyond (used) as an open hash table
    keyed by the top byte of record hashes. The hash table is like
    a linear-hash that is more often in a state of contraction.
    Each time (used) changes, a small set of table entries have
    to be rehashed.
^ This IMPLIES: need for a record-count field, so the page head
    changes. So make PGRATE a constant and use the pgrate header
    field as a version number. The version-5 header includes a
    record count.
- implement chx save so that it can dump version-4 files,
    but do nothing else with them.
+ FITS gets a delta-recs flag (>=0)
+ DECIDE on a hash overflow algorithm.
    Hopscotch is overkill. 5/4 space means happy linear overflow.
    That keeps resizing simple.
+ use all available (hxmaxrec-used) bytes as the hash table
        The advantage of (b) is that, most of the time, you only
        look at one entry (no overflows) because the hash table
        is pretty sparse.

Thu 24 Feb 2011
x bug in hxbuild? No: ought to pay attention to the return code
    (BAD_RECORD) because input records were >4KB.

Wed 23 Feb 2011
x add an opportunistic record index?
    NO, an offset array at the end of the page, sorted by hash, gives
    a very modest increase in speed, and maintaining it becomes a major
    complication on update.

Mon 21 Feb 2011
- hxbuild spools records assuming that hx_load does not produce
    a record that hx_test will fail on.

Sat 12 Feb 2011
+ BUG: in t.build
    ok 5 - hxbuild from stream in 5.45 secs returns ok
    not ok 6 - 85750/88172 records loaded
+ if hxopen(...diff...) is not null, hxopen does not even
    try to load hx*.so Makes statically-linked hxfiles
    a bit safer.

Mon 07 Feb 2011
- when hxcheck reloads dumped records, under what circumstances
    can the second read (record data) fail?
- add cases to get 100% coverage
    - more tests with a multipage bitmap.
        Do this by filling root with a huge udata.
        - hxcheck bad map head (next || used)
    ? how to get coverage for (fcntl,fsync,munmap,lseek,read,mmap) returning -1 ?
    ? how to get coverage for tests where an fcntl(SETLKW) fails?!
    - a file with a chain >20 overflows long
    + hxget(NULL,)
    - mapped file size changed externally (_hxremap calls MAP_BUF)
    + hxbuild(NULL,) or memlimit < MINMEM. Allow memlimit==0?
    ! hxbuild of an input that fits in memsize!? test case #4 should do it!
    - hxopen: getenv of HXDEBUG HXPROC HXTIME
    - hxopen: dlopen fails for an existing .so file (empty file? 0444?)
    - hxcheck dup recs not in same chain
    - hxcheck reload dumped record that fails hx_test.
    - hxcheck hxput reloaded record fails (!?)
    - hxbuild with duplicate keys.
    - hxput attempt to update (grow) a record while calling hxnext.
    - hxput hxgrow case 3 (including recursion).
    - hxput cause file size to grow between first and second lock attempts
        in hxlox.
    - { hxhold(,a,); hxput(,b,); }
    - hxput on a corrupt file
         - invalid file size (_hxsize)
         - record sizes in a page go beyond pg_used (qv _hxfind)
         - invalid used/next (q.v. _hxload)
         - attempt to allocate allocated page (or free a free page).
    - prove that a split does not cause a deadlock, by ensuring
        that the (shared) tail page is unlocked.

Fri 19 Nov 2010
# profile/chx seems to have problems loading hx_*.so files,
    even if the .so files aren't compiled -pg
    Nothing seems to help (profile.CFLAGS etc)
^ fix the TODO in hxfix, to detect dup recs within the same page.

Thu 18 Nov 2010
- give hxhold the hxnext advantage: store (pgno,prev,pos)
    so that hxput skips directly to the place needed to update
    or delete the record. Implement this for hxnext too, of course!
- add a HX_EXCLUSIVE option that holds a file lock.

Mon 08 Nov 2010
- it is logically inconsistent that hx_save needs a (recleng) arg.
+ Change hxlockset to take (1,2,3) instead of (0,1,2). Simplifies
    "if (hp->splocked >= part) return".

Fri 05 Nov 2010
+ extend "chx type" to search for possible types in LD_LIBRARY_PATH.
+ simplifying the lock system while re-implementing hxhold.
    + no longer lock body without root; lock_file and lock_over only.
    - _hxlock/_hxunlock
        + owns: lockv,lock_file,lock_over
        - uses: hp->keeplock
    + _hxlockset
        owns: hp->lockpart (_hxrel clears it!)
    - _hxenter inits hp->keeplock=0 (=== !mylock)
    - _hxleave sets hp->keeplock=0 ON ERROR
    - hxhold and hxnext set hp->keeplock=1
    - hxhold: owns: hp->hold, cleared by _hxenter/_hxrel.
        The hold page will also be in lockv since
        hxhold call hxlockset(,1).
    - hxhold OPTIMIZATION: same as for hxnext: save the record
        (page,offset) in (hp->buffer.pgno,hp->currpos).

Mon 01 Nov 2010
+ why does _hxenter care that locp->head be set to a NONZERO value?
    (0) is not a valid head page, either!
# locking is stable, but not optimal. There is still the case where
    _hxlock detects head pages locked out of order (NOT descending).
    This can only be an issue for mixed hxputs, and seems to happen
    only at a 2^N boundary and when a HEAD is split.
+ inefficiency: hxhold locks pages that hxput will completely
    redundantly re-lock, because lockv is in HXLOCAL, not HXFILE.
    # moved lockv to HXFILE.
+ inefficiency: hxput/hxdel after hxnext could immediately jump to
    the page/pos of the record being updated/deleted. However,
    hxnext needs to bookkeep the PREV page, so that (if the page
    is freed) the previous page's NEXT pointer can be updated
    (with _hxlink). With this change, hxput when SCANNING can
    immediately _hxload(currp, hp->buffer.pgno).

Thu 28 Oct 2010
+ come up with a better error message than:
        info: cannot open foo.hx: No such file or directory
    ... when it can't find the hx_*.so file.

Wed 20 Oct 2010
+ BUG: when a hxnext/hxdel loop empties a page, it does not hxputfree it!?
Mon 18 Oct 2010
+ _hxenter has to compute npages for _hxremap!
x have yet to find how to compile hx_*.so for coverage.
	dlopen reports: "__gcov_init is not defined"
	Cannot resolve undefs against pgm symbols (!?)

Sun 17 Oct 2010
+ bug: create hx 2048, build hx t/88.tab causes page 136->140 (?)
    but sets tail_pgno = 136 (!?) which creates a corruption at 662 -> 136.
+ brianize code
    inline becomes __inline__
    replace "#define locp ..."
+ ensure that all procs that call _hxlock(,0,) also set npages.
+ hxlockset,addsplits: test file with only 2-3 pages.

-----------------------------------------------------------------------

LOCKING
Objectives:
+ ensure complete integrity within an operation, and between hxhold
    and a later hxput/hxdel.
        Every page is locked before it is read.

+ ensure no deadlocks possible:
    For functions that lock pages (rather than the whole file):
    - locks of head pages and the root page (page 0) are always acquired
        in DESCENDING ORDER. split pages are a kind of head page.

        hxget and hxdel never need to lock split pages, since they
        cannot need the file to grow. hxput locks split pages.
        hxhold must be pessimistic as to whether the update
        will be completed with a hxdel or a hxput, so it also
        locks split pages.

        For a given key (hash) and file size, there can be 0..3
        split pages to be locked.
        H: # of split pages > head; L: # of split pages <= head.
        H<4, L<2, H+L<4. There may be ZERO split pages to lock,
        if the next page beyond the end of file is an overflow page.
        In this case, locking the root prevents contention.

        Split pages > head may be non-consecutive --- separated
        by an overflow page that must NOT be locked. At most
        three ranges of pages can need to be locked.

        If the head page # is lower than some split page, all the
        split pages are locked. Note that there can be split pages
        higher AND lower than the head page.

    - (shared?) tail pages are unlocked before attempting to
        lock any head/root page.

    - map pages other than the root need not be locked, since all
        processes lock the root first. However, this only applies
        to files over 500MB (for a 4KB pagesize).

    - non-shared overflow pages do not need to be locked, since
        all processes lock the head before following the chain.
        Unfortunately, it is not possible to tell whether an
        overflow page is shared until it has been loaded!
        So, all overflow pages are locked before reading.

+ maximize concurrent UPDATE (not just UPDATERS)
    - locks are acquired lazily. If H=0 (75% of the time),
        only the head page is locked. If the head page is locked,
        it is not possible for the split point to shift from
        below the head to above the head (the reverse is possible).
        When the updater needs to search the map for a free
        overflow page, it must lock the root, which means it must
        first lock the split. The split may have moved since the
        time the head page was locked. The updater recomputes
        the split page(s), locks them in descending order,
        then checks the file size. If the file size has changed,
        it unlocks those split pages, BUT NOT THE HEAD, then
        recomputes and re-locks the split. When it succeeds,
        the file size may no longer change. It then locks the root.

+ minimize lock calls
    With a mmapd'd file, fcntl() is practically the only overhead.
    A simple way to lock a key's head page would be to
    read-lock the file-end, preventing the file from changing size,
    then lock the calculated head page and possibly the split pages,
    then unlock the file-end.

    Instead, _hxlockset reads the file size (lseek), calculates
    the head page, locks that page and any split pages above it,
    then reads the file size again. If the file size is the same,
    then the head page # is correct, even if the file size changes.
    If any split page is locked, then the file size cannot change
    at all. The assumption is that (lseek,lock,lseek) is faster
    than (lock,lock,unlock). This might be moot.
-----------------------------------------------------------------------
Fri 15 Oct 2010
+ fix type=profile: .so files are unloadable
+ make NULL==hxopen more comprehensible:
    - file not found
    - EBADF: file is corrupt
    - ENOEXEC: hxlib not found
+ create a test harness that diagnoses whether a hx_ lib is good.
    - testfile is a unique set of save-format records.
    - load a record into a large UNALIGNED fenced buffer.
    - test that nothing outside the buffer size has been changed.
    - memcpy the result to an unaligned offset
    - test hash is still same, diff is 0.
    - load it again into a buffer EXACTLY the stated size,
    - then the stated size - 1, then size = 1, then size = 0,
    ... and that the results are as expected.
    - call "test" on each result. for each result deemed valid,
        hash it, save it, load it as X, rehash X and diff it
        against X, save X as Y, compare X and Y.
    - given a file of data to load, test that saving
        then reloading produces the same records.

Wed 13 Oct 2010
+ fix bug in how the split-range is locked.
    - wraparound at max split
    - do not lock an overflow that happens to fall in range
        That might also be a map page!
    > ensure hx never loads a page it has not locked,
        by keeping a sorted list of locked pages/ranges in HXLOCAL.
        This will avoid redundant fcntl calls.
        It can also check that HEAD pages are always locked
        in descending order.
    DON'T FORGET FILE_HELD().
    What about head-held between hxhold and hxput/hxdel?

+ to keep hxget/hxdel locks: unlock a tail page on save.
    This happens in the hxput main loop AND in _hxgrow.

Tue 12 Oct 2010
+ import _hxlockset change from gisco.
- make hxstat hash be the xor'd hash of the entire records,
    not just of the keys. (keyhash + recbytes) is not a strong
    indicator that two files are equivalent, though it's
    good enough for hxbuild/hxshape tests.

Tue 05 Oct 2010
+ The CHECK in _hxlink is too harsh; it makes _hxlink into a
    potential bomb wherever it is used. If there is an existing
    dirty buffer with .pgno == pg, the answer is easy.
    If there is a clean buffer perhaps it is easier to
    set the link and mark the buffer dirty. This implies
    _hxsave ought to make a page "invalid", e.g. set pgno == -1

Mon 04 Oct 2010
+ add a test for hxfunc. hxbuild is still the worst for coverage:
    108	flush full membuf from initial stream read.
	Test with input stream rather than file.
    143	empty input
    153	all input fits in membuf
    201,460	(enough input for) more than one map page
    376	input for one partition exceeds partition size
    432	dup recs in input
    467	run out of overflows (write to spool)

Sat 03 Jul 2010
+ make errno more meaningful. Zero it out at the beginning of
    every API call that calls system routines (which set it).

Fri 25 Jun 2010
x hxbuild: remove limit of input (text) buffer size.
    << generalize the memory pool that needs to be freed on LEAVE.
- hxbuild: limit size of partition (output) buffers to 64K.
    (makes hxbuild less of a disk i/o bandwidth hog).
x hxbuild: use hxfile itself as the partition ("temp") file.
+ remove hxeach from the documentation until it actually does
    what it's supposed to (handle shared tails, etc)
^ hxcheck: the check for duplicate records is incomplete.

Thu 24 Jun 2010
# Forget about hxeach: the problem with shared tails is the straw
    that broke the camel's back.

Tue 22 Jun 2010
- Change hxfix to create its own tmpf using locp->fp.
    Add note to doc that it is doing this.
    Protocol for deciding check/repair becomes (mode&HX_UPDATE)
    or some other flag such that hxcheck and hxfix both call
    an internal routine.

Mon 21 Jun 2010
^ Remove the restriction on text buffer size for hxbuild.
    If the line read in does not end in \n, double the buffer size
    and read more. Sheesh.
+ bug: when hxnext/hxdel shifts records, it leaves one byte
    not overwritten. check returns bad_overused.

Sun 20 Jun 2010
+ forget about making hxnext/hxput handle increases in record size:
    the edge cases for when the currently-scanned chain is the split
    require too much bookkeeping.

Wed 16 Jun 2010
x on attempting to increase curr (hxnext) rec size in hxput,
    where head (NOT hp->buffer.pgno) is in split zone (i.e. >= split and
    split/pgrate == head/pgrate), allocate and putfreed an overflow
    page, and assert !SCANNING for the aforesaid cond in the grow_file {}.

Tue 15 Jun 2010
x change the way hxput appends a new page to the END of the chain.
    If the tail is shared, insert the new page IN FRONT of it,
    (i.e. update the prev page's next-link), hxshift the
    records out of the shared tail.
x try to find a way hxnext/hxput can handle increases in
    record size, even if it means getting tricky with
    flipping records around in a chain to avoid maintaining
    yet more complex state about which records are left to scan.
    Handling updates of a chain that needs to split is still
    a problem. Perhaps ensuring that there is a free overflow
    page (not allocated, but available) before tackling anything
    in the pagesplit window (3page block) solves it.

Mon 14 Jun 2010
+ use simpler (complete) test that hxput after hxnext is for
    the same key.

Sun 13 Jun 2010
+ bug: first hxnext does not leave file locked?

Sat 12 Jun 2010
x hxnext optimization: with the file locked, it is unnecessary
    to lseek in _hxenter for every call. The rest of the
    logic (including hxpoint) is necessary; just fall through.
+ flatten "HXLOCAL loc, *locp = &loc;" in all calls.

Wed 09 Jun 2010
+ hxnext is practically two different functions for readonly
    and updatable files.
    hxput is practically two different functions for SCANNING(hp)
    true or false.
    + use hp->length for hxnext.
    + hxput rejects record length increases.
    + hxput rejects updates of other keys when scanning.
    + use HXFILE.head as hxnext cursor (pageno).
    + hxnext for HX_UPDATE traverses in chain order,
	being selective in traversing a shared tail.
    + make hxput jump immediately to hp->buffer page.
        hxnext must track (prev) pageno so that when "skip"
        unlinks a page, it can update the link.
    + make hxnext works on MMAP'd file.

Tue 01 Jun 2010
+ Okay: a better plan.
    Fix hxnext/hxput such that, if a hxnext scan is in progress,
    hxput fails (BAD_REQUEST) for any key other than the last one
    returned by hxnext; or if the request attempts to increase the
    record's size. hxnext must follow page chains, and must read
    shared tails multiple times; but guarantees
    that it will visit each rec only once.
    If hp is opened read-only, hxnext remains exactly the same,
    reading each page exactly once.

x Implement hxeach(hp, fn, vp) which calls fn(rec, len, vp)
    for each record. Records may increase in size.
    fn() may return a negative number, which causes hxeach
    to stop scanning, and to return the absolute value of that number.
    The real trade-off is that a hxnext-loop can be stopped
    partway through the file.
    Important: ensure hp is not used re-entrantly; the caller
    could pass it to fn via vp.

Thu 27 May 2010
x AHA! Two birds with one stone: make hxnext allocate a completely
    separate buffer, big enough for HX_MAX_CHAIN * pgsize bytes.
    Normal load COPIES into that buffer. This means that hxnext
    in mmap mode requires a buffer: calculate this just before ENTER,
    to keep the door open for a HXFILE that switches mode.

    Every shift from a page below nextpg to a page above nextpg
    is appended to the buffer, which can be compacted.
    This solves the problem missing records, without adding
    duplicate records, AND it means I can take away the restriction
    on hxnext, that it cannot be called on an mmap'd file.

Wed 26 May 2010
x Every attempt to work around C<A<B has defects. Patchwork answers
    don't let you make blanket assertions to the user ("missing records
    in the scan is unlikely" is not much of a statement).
    x solve the problem completely
    x solve the problem completely for a restricted, useful set of operations
        such as hxdel, or non-growing update, of record hxnext last returned.
    x indicate (how??) that a miss has occurred, after making a best effort
        to avoid misses.
    But what about duplicate reads?
- remove old hxopen(*,*,fn,fn) interface
- create Win32 implementation
x MAJOR change: make _hxfind faster by arranging (hash,leng,offset) array
    at the END of the page. This is useful for hash files with std-size
    pages (e.g. 4KB) but small records, so that searching for a record
    examines 100-200 hash code per page. This is inconsequential for a
    huge (disk-bound) file, but for a smaller mmap'd file, not so much.

Tue 25 May 2010
x After thinking through how hxput can support hxnext not skipping records,
    it turns out to be too hard to be worth doing. The original idea was
    that for a chain A -> B -> C, where C < A < B, that after shrinkage in
    A (deleting or shrinking the target record), shifting from B to A is
    okay (though it causes records to be revisited) but shifting from C to B
    takes them out of the scan. At first, I thought you could fix this by
    noticing the C < N < B case (where "N" is the current page of the scan)
    and cleverly swapping C and B in the chain (using _hxlink to make
    A -> B -> C. Unfortunately, that fails when (B) is a shared tail page.

    OPTION: opportunistically do the B <-> C swap.

    OPTION: make hxnext follow chains when there is a C < A < B shift.
    This will cause a revisit of (B) records, but only once; and
    if B -> C, B cannot be a shared tail page.

    OPTION: when shifting from B to N (special case!) shift the source
    records BEFORE the offset point (and adjust HXFILE.offset), so they
    are not rescanned.

Thu 20 May 2010
x by changing (condensing) the sort key, it makes radix sort a better option.
+ bug: hxhold(hp, 0, 0) does not lock!
    Can't just fix this by passing in a dummy 1byte buffer:
    the key is somewhere else! Need a different internal interface
    that both hxget and hxhold can call?

Mon 17 May 2010
^ make hxbuild REUSE the HXFILE as the partition file.
    Don't know how much it will speed things up, but it LOOKS like it
    will go easier on the disk cache. In particular, if part[0] = hash[1]
    part[1] = hash[2] etc, then store will be mmap'ing in a segment,
    then (in the next iteration) writing to that same range of disk pages
    (kernel buffers).
+ docs should note that doing hxbuild for a file that is smaller
    than what's safe to mmap is almost useless... hxput on a memmap'd
    file is that fast.

Wed 12 May 2010
+ finally get the single-file partitioning to work.

Sat 08 May 2010
- Make hxbuild sort faster and dup-detection absolutely correct:
    - move the (head) value to the high bits of the field.
    - fill the remaining bits with the high bits of the rechash.
	This amounts to a ROTATE of the hash code, possibly zeroing
	out one bit.
    - sort ONLY on the head+(hash) field. This speeds up qsort a bit,
        since it will far less frequently have to look into the
        second part of the key.
    - when scanning the chain, throw dup values to the spill file
	with no further checking. That lets hxget/hxput determine EXACTLY
        whether there was a duplicate key value.

Wed 05 May 2010
- need to write up a "principles of implementation" doc, covering
    things like:
    . any routine that is given the (locp) can potentially "throw"
    an exit; so any temp resources (mem, handles) must be part of
    locp too.
    . the first pass has to start by loading up a heap buffer
	of records, since: we have no way of knowing the relative
	sizes of an internal (load) vs external (save) -fmt record.
    . we need to know (estimate) the number of records, because
	we need to know the average record size ... to calculate
	the average waste per page. This only matters in edge
	cases where the average record is >25% of hxmaxrec.
+ time to reconstruct the single partitioned file versus hundreds
    of single-partition files.

Sun 02 May 2010
- never figured out what hxbuild is supposed to do if the allowed
    memlimit is too big to mmap. Hmmm...
# need to checkin more often; lost the separate allocation of
    recv for the single-segment case when I rolled back the segfile
    changes.
+ still need a way to duck the filehandle limit for partitions.
    The more partitions the better, until you hit the limit for
    for vbufs (either stdio or handrolled).
+ convert double nbytes to off_t, so you can do exact rational math
    (p->nbytes + len) / bufsize > p->nbytes / bufsize.
    Watch out for the nbytes *= inpsize/inpseen; that should
    still use real fractional math.
+ make hxbuild retrieve disk pagesize from fstat(fileno(fp[0]),
    rather than hardcoding 4096.
+ From PART.nbytes you can calculate when, where and how much
    to write to disk:

        if part[p].nbytes + len > segsize
            // write to spill file
        else
        if (part[p].nbytes / bufsize < (part[p].nbytes + len) / bufsize)
            left = bufsize - part.nbytes % bufsize
            if len > left
                memcpy(vbuf[p*bufsize + bufsize - left], buf, left)
                lseek(p*segsize + part[p].nbytes + left - bufsize)
                write(vbuf[p*bufsize], bufsize)
                buf += left
                part[p].nbytes += left
                if (len > bufsize)
                    x = len - len % bufsize
                    write(buf, x)
                    part[p].nbytes += x
                    buf += x
                    len -= x
            memcpy(vbuf[part[p].nbytes % bufsize], buf, len)
            part[p].nbytes += len

Sun 25 Apr 2010
+ fixed outstanding hxbuild bug. First solution was not clean:
    could leak memory (recv) if _store called LEAVE.
    Still seems slow
+ hxfix segv'd (!) on the malformed file build.hx from t/uri.tab
    after (chx shape build.hx 9; chx shape build.hx .5)
    Looks like locp was trash (fp = 0x50, ret = big number)
    at the point it crashed (freeing fp).

Tue 23 Mar 2010
+ make the hxnext/hxdel relationship a bit smarter, though hxdel
    can still always cause hxnext to miss records;
    e.g. A -> B -> C where C < A < B and deletion from A causes
    compaction (record shift) from C to B. Cannot return BAD_REQUEST
    because page A has already been updated on disk.

    In the case where A < B, make the shift smarter: reorder
    the records shifted into A to a point before the hxnext offset
    (altering the offset, too).

+ OPEN ISSUES:
    + !!!hxbuild does not handle input smaller than the memory allowed!
	Need to allocate recv[] separately and pass it in to _store().
    - hxshape's SHAPE statistic is calculated differently from hxstat's.
    - hxcheck does not work if diff and hash are not set in *hp, on a corrupt file
        because _hxenter bails too early on BAD_REQUEST!
    - hxbuild handling of duplicate keys DOES NOT WORK (Huh?)

+ Brian's notions:
    + get rid of #define locp ...
    + clean up the 'typedef struct hxfile HXFILE' thing.

Thu Mar 18 2010
+ add a test for deletes within a hxnext loop when there is compaction.
    Delete from neck and tail before or after head.

Tue Mar 16 2010
# The problem with locking modes might not have been such if
    the basic lock mode had been a _hxenter() parameter.

Mon Mar 15 2010
- Add a test case for hxget of a read-only file with overflows
    within a hxnext loop. Previous logic was that hxget didn't
    override the default locp->mode (F_WRLCK) if the hxget
    preamble detected someone else (hxnext!) had locked the file.
    Later, if hxget tries to load (and hence lock) an overflow
    page, an assert fails, because _hxlock is being asked to
    do a write lock in a file that was opened read-only.

Fri Dec 11 2009
+ BUGS:
    + hxbuild does not handle the case where the input
        is less than the memory allowed!
    + hxbuild has started creating a corrupt hxfile again.
        The refactor into hxupd may be the problem.

Tue Dec  8 2009
+ DEFINITELY change the interface:
    - hxput returns the reclen of the record (if any) replaced, else 0.
    - hxdel returns the reclen of the record (if any) deleted, else 0.

Thu Nov 26 2009
- hxshape's SHAPE statistic is calculated differently from hxstat's,
    so the behaviour is not what you'd expect.

+ the bug is here:
    $ cp 88.hx fox; chx info fox; chx stat fox
    recs=86556 hash=64C6C5E9 used=35781808 pages=12530 ovfl_used=61% load=77,82% pgsize=4096
    shape: 0.44  Chains: 5438 3830 129  Shares: 1232 720 516 409 181 59 15

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 15538837/4088 = 3801  totbytes:35781808  dpages:9413 /1.30 = 7240 => npages:9654
    recs=86556 hash=64C6C5E9 used=35781808 pages=10752 ovfl_used=100% load=85,70% pgsize=4096
    shape: 0.58  Chains: 3585 4258 215 6  Shares: 1 1572 502 387 167 47 11

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 17932205/4706 = 3810  totbytes:35781808  dpages:9390 /1.30 = 7223 => npages:9631
    recs=86556 hash=64C6C5E9 used=35781808 pages=10275 ovfl_used=100% load=86,84% pgsize=4096
    shape: 0.64  Chains: 3261 4010 403 30 2  Shares: 1 1271 620 394 205 64 10 3

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 18770460/4914 = 3820  totbytes:35781808  dpages:9367 /1.30 = 7205 => npages:9607
          0 bad_overmap     check_map_page@449
      10144 bad_free_bit    check_data_page@407
      10148 bad_free_bit    check_data_page@407
      10152 bad_free_bit    check_data_page@407
          3 x bad_free_bit  hxfix@263
          1 x bad_overmap   hxfix@263
READ
recs=86556 hash=64C6C5E9 used=35781808 pages=10156 ovfl_used=100% load=86,88% pgsize=4096
shape: 0.65  Chains: 3178 3957 439 40 3  Shares: 1 1195 651 398 211 65 12 5

- using the naive ("wrong"?!) calculating for hxshape's goodsize:

    cp 88.hx fox; chx info fox; chx stat fox
    recs=86556 hash=64C6C5E9 used=35781808 pages=12530 ovfl_used=61% load=77,82% pgsize=4096
    shape: 0.44  Chains: 5438 3830 129  Shares: 1232 720 516 409 181 59 15

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 15538837/4088 = 3801  totbytes:35781808  dpages:13485 /1.30 = 10373 => npages:13831
    recs=86556 hash=64C6C5E9 used=35781808 pages=13831 ovfl_used=50% load=71,78% pgsize=4096
    shape: 0.35  Chains: 6828 3441 104  Shares: 1722 713 438 356 161 54 13

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 13875096/3649 = 3802  totbytes:35781808  dpages:14022 /1.30 = 10786 => npages:14382
    recs=86556 hash=64C6C5E9 used=35781808 pages=14382 ovfl_used=44% load=69,81% pgsize=4096
    shape: 0.32  Chains: 7429 3259 98  Shares: 2024 583 411 352 147 63 14 1

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 13144075/3455 = 3804  totbytes:35781808  dpages:14241 /1.30 = 10954 => npages:14606
    recs=86556 hash=64C6C5E9 used=35781808 pages=14606 ovfl_used=41% load=69,83% pgsize=4096
    shape: 0.31  Chains: 7677 3181 96  Shares: 2150 534 392 346 146 66 16 1

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12831924/3373 = 3804  totbytes:35781808  dpages:14327 /1.30 = 11020 => npages:14694
    recs=86556 hash=64C6C5E9 used=35781808 pages=14694 ovfl_used=40% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7773 3153 94  Shares: 2208 507 381 342 148 70 15 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12706856/3341 = 3803  totbytes:35781808  dpages:14361 /1.30 = 11046 => npages:14729
    recs=86556 hash=64C6C5E9 used=35781808 pages=14729 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7814 3138 94  Shares: 2228 498 380 343 146 69 16 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12650295/3326 = 3803  totbytes:35781808  dpages:14372 /1.30 = 11055 => npages:14741
    recs=86556 hash=64C6C5E9 used=35781808 pages=14741 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7825 3136 94  Shares: 2232 497 381 342 146 69 16 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12642897/3324 = 3804  totbytes:35781808  dpages:14379 /1.30 = 11060 => npages:14747
    recs=86556 hash=64C6C5E9 used=35781808 pages=14747 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7833 3133 94  Shares: 2236 495 380 342 145 70 16 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12632454/3321 = 3804  totbytes:35781808  dpages:14381 /1.30 = 11062 => npages:14750
    recs=86556 hash=64C6C5E9 used=35781808 pages=14750 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7835 3133 94  Shares: 2238 494 380 341 146 70 16 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12632454/3321 = 3804  totbytes:35781808  dpages:14383 /1.30 = 11063 => npages:14751
    recs=86556 hash=64C6C5E9 used=35781808 pages=14753 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7838 3132 94  Shares: 2239 495 379 341 146 70 16 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12629011/3320 = 3804  totbytes:35781808  dpages:14384 /1.30 = 11064 => npages:14753
    recs=86556 hash=64C6C5E9 used=35781808 pages=14753 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7838 3132 94  Shares: 2240 494 378 342 146 70 16 2

    $ chx -d shape fox .3; chx -dd check fox; chx info fox; chx stat fox
    full bytes/pages: 12629011/3320 = 3804  totbytes:35781808  dpages:14384 /1.30 = 11064 => npages:14753
    recs=86556 hash=64C6C5E9 used=35781808 pages=14753 ovfl_used=39% load=68,84% pgsize=4096
    shape: 0.30  Chains: 7838 3132 94  Shares: 2240 494 378 342 146 70 16 2


Fri Nov 20 2009
    #####:  267:                    PUTLINK(locp, srctail, dstneck);
call    0 never executed
call    1 never executed

Tue Nov 17 2009
^ add hxbuild coverage:
    + input stream with inpsize == 0
    + single-_store() case (no partitioning needed)
        Known failure because of recv[] mismapping.
    - clear second map page (beyond last ovfl used).
    - duplicate hashes i.e. dup recs in input.

- add hxshape(hp,0) coverage case!

Mon Nov 16 2009
+ AHA fixed the bug: _hxalloc was failing because the conversion of an
    assert(foo) to "if (!foo) LEAVE(*,HXERR_BAD_FILE)" was missing the um "!". oops.

Fri Nov  6 2009
- hxshape needs to calculated expected dpages the same way hxbuild does.
    It currently uses locp->dpages, which doesn't mean anything if the
    file is sparse. However, this requires (nbytes,recs) ... hxinfo?

Tue Nov  3 2009
+ hxbuild edge case (no partitioning) makes a massive error since recv
    is overlaid on the record data (membase)!
x test the edge case for hxshape compressing (near)empty files.

Tue Oct 27 2009
+ hxcheck does not work (try "chx check bad.hx") because _hxenter bails on BAD_REQUEST
    if diff and hash are not set in *hp, on a corrupt file!

- Add tests to hxfix for:
    - a page with multiple heads and next != 0
    - duplicate-key records in the same page (highly unlikely except on a hxbuild error).

Fri Oct 23 2009
+ Does the hxshape COMPRESS logic need to end with a _hxflushfreed?

Thu Oct 22 2009
+ change assert in _hxalloc to   LEAVE(locp, HXERR_BAD_FILE).
? should LINK test whether the "next" field is actually being changed?
    Lots of edge cases in _hxgrow where it changes from 0 to 0.

Tue Oct 20 2009
- hxbuild handling of duplicate keys DOES NOT WORK.
- add test cases:
    - input with duplicate keys.
    - split overflow
    - store overflow
    - empty input

Sun Oct 18 2009
+ Give hxpack an 'overweight' parameter.
    W is the average number of overflow page that would
    be read on an unsuccesful probe.
    W=0 means there are absolutely no overflow pages used.
    This means "hxpack" will have a loop that calls
    "growFile" repeatedly to reduce (W)
    > chx-info/hxstat reports this.
    The successful probe rate depends on what proportion
    of records are in the overflow pages.

Thu Oct 15 2009
+ chx: implement "load" using hxbuild if file is empty.
+ hxbuild: change double to off_t where appropriate.
+ hxbuild still does not validate that input contains unique
    keys (and either load one at random or barf or return
    count of skipped duplicates).

- add coverage:
    - hxnext with hxhold
    + hxbuild of a file with >1,>2 map page
    - hxbuild that pushes recs into ovfl in _store.
    + permutations of specifying hxbuild inpsize and stream/file.
    - all previously-noted hxcheck cases of corruption.

Wed Oct 14 2009
x hxpack's threshold calculation (when to start folding down)
    is too conservative. It should measure stats on heads
    below split and above middle only: the REAL potential folds.
    Don't know what this means when the file is "all middle".

    hxpack has no proper STOPPING criterion for folding.
    Exhausting the ovfl page supply is necessary but not suff.

    To put a tight (statistical) bound on the time for
    unsucessful probes means minimizing the sum of the
    squares of chain lengths.

+ hxbuild is too inexact about its space calculation.
    I don't know how to fix this; it will always create files
    larger than optimal. And '16' as ovhd is just swag.

Mon Oct 12 2009
+ fix hxbuild: it's not writing all the records!

Sat Oct 10 2009
x add test for hxcreate(name, perms, 0, 0, 0)
+ add chx "info" numbers to HXSTAT; make hxstat() do it all.
    Makes it easier to test result files.

Sun Sep 20 2009
+ massive load (much larger than RAM)

Tue Sep  8 2009
+ fixed annoyance: setvbuf sets errno, which causes a spurious
    strerror to show for "chx check <file>" result.

Mon Sep  7 2009
+ Okay, now improve on that: hxloadf

Wed Sep  2 2009
+ Reimplement the bulk-loader for empty hxfiles.

Sat Jul 25 2009
+ deprecate hxcheck. It was a crippled interface anyway.
    Moot as to whether hxfix ought to accept a pathname
    rather than a lobotomized (HXFILE*).

Fri Jul 17 2009
x I thought there might be a logic issue in hxpack, when:

    "if (srchead == zfree[-1] || _hx_is_map(hp, srchead))"
                   ^^^^^^^^^
    fails because of an earlier:
                    "*--afree = srcp->pgno;"
                    ^^^^^^^^
    which might make vfree[] no longer sorted.
    However, *afree and *zfree only meet when the free list
    is down to one element, which never happens in that loop:
    the loop stops when the free list has less than two elements.

Thu Jun 25 2009
# The incremental packing just slows down hxput (somewhat)
#   and slows down hxnext*/hxput (a lot).

# still need hxfix coverage for bad:
#   dup_recs free_next loop map_head rec_hash rec_size

+ make hxpack more conservative; leave hxput alone.

# coverage to-date
hxget.c     0 Lines:      97.06 %  34
hxopen.c    0 Calls:      96.97 %  33
hxget.c     1 Calls:      93.75 %  16
hxref.c     1 Calls:      77.78 %   9
hxref.c     1 Taken:      88.89 %  18
hxstat.c    1 Lines:      97.14 %  35
hxstat.c    1 Taken:      90.91 %  22
hxpack.c    2 Branches:   97.67 %  86
hxput.c     2 Lines:      98.78 % 245
hxcreate.c  3 Lines:      88.46 %  26
hxget.c     3 Taken:      86.36 %  22
hxname.c    3 Taken:      70.00 %  10
hxnext.c    3 Calls:      82.61 %  23
hxnext.c    3 Taken:      83.33 %  18
hxput.c     4 Branches:   97.80 % 182
hxcreate.c  5 Taken:      58.33 %  12
hxnext.c    5 Lines:      84.62 %  39
hxpack.c    7 Lines:      95.07 % 142
hxpack.c    8 Calls:      90.36 %  83
hxpack.c    9 Taken:      89.53 %  86
hxput.c     9 Calls:      90.10 % 101
hxopen.c   16 Taken:      82.22 %  90
hxput.c    22 Taken:      87.91 % 182

Tue Jun 23 2009
# first attempt at estimating a target page count for hxpack
    ALWAYS produces a number lower than the number of available
    free pages gained from packing tails :-(

Mon Jun 22 2009
x remove hxlib from the public interface.
+ make hxload/hxsave aware of page 0 having a different header.
    Or don't use hxsave to repair root page in hxcheck?

Sat Jun 20 2009
# "gmake profile" seems to be just about useless.

Tue Jun 16 2009
+ make hxpack STOP early on merging chains.
    The current behaviour (shrink file to minimal size)
    creates longer chains and hence slows things down
    without changing the file enough to bring it below some
    (assumed) caching boundary.

    # AHA! Compute a single number density of head pages.
    When total space used in HEADS is a power of 2 less than
    the available space, compress the file by that much.
    This makes general good sense, but creates a catastrophic
    pack point ... but it's the most efficient point, too.

    Compute a 'full page' threshold on the fly by taking
    the average USED of (all) pages with a nonzero NEXT pointer.
    Any HEAD page with USED >= FULL is counted as DATASIZE.

    This doesn't feel completely right; some incremental
    adjustment required in case there is something that
    can be done starting at the back of the file.
    Suggests computing a threshold for merges that don't
    just coalesce two heads into one.

    The actual 'gain' of packing the file is hard to compute:
    +small shrinking the file enough that a measurable amount
        fits in cache
    -large creating longer chains immediately affects perf,
        esp since nonshared overflows will have as bad a
        cache behaviour as heads.

    NB when the main pack loop bails early, it must:
	_hxalloc(locp, srcp->pgno, 0);
	_hxsetRef(locp, srcp->pgno, 0);

- add coverage for:
    + hxopen with bad MODE arg
    + hxopen file with bad pgsize/pgrate/uleng/filesize
    - hxopen of zero-length file in REPAIR mode
    - hxopen with LD_LIBRARY_PATH not set
    + hxopen with neither diffp nor hashp
    + hxclose(NULL)
    + hx_*.so with only one of (diff,hash) defined.

    +hxput(hp,recp,-1)
    +hxput(hp,NULL,1)
    +hxput(hxopen("foo",0,0,0), recp, 1)
    +hxput with bad record
    +hxput update with different-sized record (smaller, larger)
    -hxput in file with 20record chain
    +dodgy hxputs in hxnext loop: different key, larger record
    -hxput deletes last record in page -> creates free page or disconnects from sharetail.

    +hxpack READONLY file
    +hxpack during hxnext

Mon Jun 15 2009
+ hxfix(hp, tmpfp, pgsize, udata, uleng)
    ... and if it has to fix up udata, then IT calls hxlib.
    In other words, nobody outside hxopen and hxcheck calls hxlib.

+ add thresholds to hxpack to limit what it does!

Wed Jun 10 2009
- hxfix with a read-only fp ought to be a BAD_REQUEST (it's an ERR_WRITE).
- things hxcheck ought to catch but that aren't hit in COVER is now down to:

        dup_recs    duplicate records in same chain
        free_next   next != 0 and used == 0
        map_head    next != 0 or used != 0
        map_self    root.used is too big, or map page is not
                    marked allocated in itself.
        rec_hash    stored hash does not match computed hash
        rec_size    stored leng is out of bounds.
        used        used is too small or too large.

    ... thanks to the corrupt UniBo file.

Tue Jun  9 2009
+    At UBologna, mamelogd KEPT coredumping, immediately after coming up.
    - there's nothing corrupt about the file that hxcheck can find
    - the file is not at a boundary (i.e. about to change to the first
	non-root map). Exactly what pgno is it trying to alloc/free??
- review all asserts

+ "chx stat" ought to write to stdout, not stderr.
+ fix bug in "chx maps" that asserts unless (npages-1) is an overflow page (!?)
# the first phase of hxpack (merging tails) creates usable free pages
    (a la VACUUM) and improves access speed. The second phase of hxpack
    (folding pages down from the end) leaves speed unchanged when it
    relocates overflow pages, increases speed when it coalesces head
    pages, and decreases speed when it moves head pages into tail pages.
    There ought to be some metric that prevents hxpack from
    collapsing the file in a way that creates more (longer) chains
    on average.

+ move hx_KS1.0.c (plus makefile rules for hx_KS1.0.so) to kstore.
+ _hxenter now tests that you don't call it with a useless HXFILE*,
    i.e. without (diff,hash) functions. This is still up in the air.
    hxlib is SORT OF an internal function, but needs to be called
    explicitly between hxopen(HX_REPAIR) and hxcheck(),
    as long as the hxcheck() call has no way of specifying
    the (test) function.

x change hxpack: all _hxsave's of overflow pages update vtail.
    After finishing a phase of folding pages down from EOF,
    re-examine vtail for more potential mergeable tails.
    Rinse and repeat.
    Hmmm ... this is not as simple as it first looked, and
    might mean redesigning how the original list is built.
    OTOH that would also make the free list generation
    simpler.

Mon Jun  8 2009
# the deadlock prevention scheme is to lock the split prior to locking
    any head page below the split. That does not prevent deadlocks
    in cases related to shared tails:

    - A locks X (X > split), B LOCKS split then Y (y <= split)
        - A locks shared tail Z
        - B blocks on lock of Z
        - A blocks on lock of split

    - same deal, but where "Y" is tail_pgno, which might be
        a sharable tail, or an unsharable overflow page,
        or a free block! This cannot be hacked around by a nowait
        lock of tail_pgno, since that might succeed but the
	true owner of that page would not do it.

    The only unbreakable solution is to lock the split
    before reading (locking) ANY overflow page.

+ WUPS found another logical problem in share_tail:
    between the time that the page is loaded (and is detected
    as being empty) and the time that _hxalloc has locked & loaded the map page,
    some other process might have allocated that page.

Thu Jun  4 2009
+ do away with -t args that are not actual hx_*.so names.
- prepare for 64bit hashes.
- what initially looked like a gdb internal error suggested
    that hxcheck calling hxput (to reinsert records)
    has problems if hxput bails early --- i.e. when
    hx_test says a record is bad!

Sun May 31 2009
+ add hx_test to hxput.
+ Still tangling a bit with hxput within a hxnext loop.
    A harsh but consistent rule is that ONLY the
    record returned by hxnext can be updated.
    My previous rule was, "any update that does not
    increase a record's size". This is good enough to
    keep the hxnext scan from missing records, but may
    cause it to return some records more than once.

    It's easy to tell whether the hxput record is
    the last one returned by hxnext, because
    _hxfind + recsize will be hp->offset.

    hxput must return BAD_REQUEST if hp->buffer.pgno and:
    - it reaches (!prevp->next && must_find)
    - hxfind >= 0 && (currp->pgno != hp->buffer.pgno
                            || hxfind+recsize != hp->offset)

Fri May 29 2009
+ hxcheck needs hold (file) lock across hxget/hxput calls.
    Given HOLD_FILE(), this is now possible!
- make hxcheck attempt to repair a file that is >= 2*PGSIZE
    even if it isn't a multiple of PGSIZE.
    This is conceivably a (corrupted) hxfile if PGSIZE
    is >1 disk page and a partial write is done.
+ ensure that attempts to do update operations on readonly files
    returns BAD_REQUEST instead of asserting.
+ if "test" is available, attempt to recover records in a block
    with a damaged header ("used" byte-count).
+ write a test that actually exercises hxcheck/recover!

+ !!!!!!!!!!!!
   hxcheck does not repair UDATA when the root page has been damaged.

Sat May 23 2009
+ discard hxstart; make hxnext lock the file in the mode
    compatible with the one in which it was opened.
    This makes a clean default that matches existing
    usage without hxstart.

STATE TRANSITIONS for persistently-held locks
# hxnext sets a FILE lock in the mode the file was opened:
    (WRLCK for HX_UPDATE, RDLCK for HX_READ)
# a persistent file lock (hxnext) is sufficient for any
    other operation.
# a persistent head lock (hxhold) is sufficient for
    any operation on the same head (hxget,hxhold,hxput).
# anything that calls _hxload (e.g. hxget and hxput)
    will explicitly lock every overflow page it tries to read,
    unless loc.locked == LOCKED_FILE.
# if a persistent lock is insufficient for the current operation,
    it is cleared at the start of the call. For example,
    hxhold(reca) followed by hxget(recb) will cause
    hxget to clear the persistent lock before attempting
    to lock recb.
# hxrel clears any persistent lock; so it must end a hxnext scan.
# hxput/hxdel clears a persistent head lock on success.
# if a call fails, it either unlocks nothing or everything.
    It unlocks nothing if it did not do the lock itself;
    i.e. if it is a hxget/hxput(head) following a hxhold,
    or anything following a hxnext.

# last hxnext clears its persistent file lock (hxrel).

# NOTE: hxhold actually locks the (pgrate - 1) pages
    between two overflow pages, that contain the head page.
    This is necessary to prevent deadlocks on page splits.
    It also means that the test for 'already locked'
    could be slightly different ... except we use head locks
    because there is no finer granularity to specify key
    locks, other than the (head) page number.

Fri May 22 2009
+ clean up the whole locked/hold model for maintaining a lock
    across multiple hx api calls; and that would tie a bow on
    hxcheck calling hxget/hxput.
+ let chx -t xx search for (and dlload) xxhx.so having entry points
    fnhash, fndiff, etc.
+ make hxput return BAD_REQUEST if hxnext is in progress,
    and the hxput (REPLACE) could cause records to be missed in the scan.
    That's potentially the case when you are about to remove
    the old record and the new one won't fit in the same page.
    There's no easy way to know whether grow_file will be needed,
    and grow_file has many ways of moving records from
    below hp->buffer.pgno.

    Conceivably, hxput might be for a key other than the
    one last returned by hxnext (!?). This has bad
    edge cases on a REPLACE. Simpler just to ban it.

    hxcheck(repair) and hxpack are also probably bad ideas!

Thu May 21 2009
+ now that chx no longer does api-level calls,
    remove the "-e" and "-f" options.

Wed May 20 2009
# hxdelv is almost no improvement over hxnext/hxdel with HX_MMAP.
    Ashcan the whole notion.

Tue May 19 2009
+ make chx use fnsave/fnload as they were intended;
    test whether returned length is greater than the buffer size,
    and extend the buffer as needed. DOC this in hx.h
# chx quits interactive mode after a dump/save loop. Huh?
+ implement load for all record types.
+ rename HXFILE->locked to ->hold; HXLOCAL->locked means
    something really different, having the same name is silly.
+ wrap HXFILE->hold in readable set/test functions
+ eliminate api-level commands from chx. No point,
    now that tests are done with C source.
+ fix/simplify ksload/kssave.

Sun May 17 2009
- add man pages for hx, chx and hxlib
+ fix small issues related to hxnext with updates:
    x there is no validation that hxstart(*,HX_UPDATE)
	has been called; hxput should return BAD_REQUEST if not.

Sat May 16 2009
+ add ksrec (kstore KS1.0)
x add tests for ksrec type.

Fri May 15 2009
^ remaining missing coverage
    - hxnext with HX_MMAP
    - hxcheck: bad: dup_recs free_next map_head rec_hash rec_size rec_test used
    - hxpack: chains of >2 pages (dsttail != dstneck)

Thu May 14 2009
+ Does a hxnext loop tolerate deletions?
    fix: refresh hxnext's buffer and offset from prevp->page
    and (pos or pos+size) within hxput.
    x traverse file from from to back, not from back to front;
        so updates that increase record size, that result
        in page splits, will be caught by hxnext.

+ add hxinfo which returns hxcreate udata.
+ hxpack's initial scan should just read page headers.
    + added _hxpginfo(locp,pgno,&next,&used) to do this.
    This should speed up the case where hxpack has little to do.
    # Making hxpack skip reading empty pages, by looping through
    # the bitmap, is a waste of time: it will be going right
    # back and writing those pages anyway, since they can be used
    # to fold data back into.

Sun May 10 2009
+ complete the HXRECTYPE (virtual record class)
    and use a mapping from udata to HXRECTYPE in chx.

+ retry and (record results of) perf tests for pgrate=5.

Fri May  8 2009
+ the recursive growFile bug turned out to be a shift problem:

    When recursive growFile is called, the caller has partially
    split a chain: two heads can share a common "tail" of
    multiple pages.

    The called growFile may extend the file across a
    power-of-2 of dpages, thus changing the mapping of hashes
    to pages.

    hxshift calculates (lo,hi) from hash based on the "current" mask.
    hxshift needs to be passed a mask that is constant
    within one instance of growFile.

Wed May  6 2009
+ find out why recursive growFile has a problem (mislinks pages).

Mon Apr 27 2009
+ make kstore and Hash::Dynamic store udata[] in hxcreate call,
    so that chx can automatically distinguish and handle different
    record types.
x incremental compaction. Amounts to taking the two phases of
    "hxpack" and calling them from hxput.
    # Note that doing the equivalent of _hxfindRefs is expensive.
    - when a tail page is reduced to having (tail_used) bytes free,
        merge it with tail_pgno, freeing a page.
    - when loc.freed is not zero in putFreed, you have
        two empty pages; that's enough to fold page[npages - 1]
        back into the lower half of the file.
+ mass delete operation (array of (reckey) ptrs).
    Sorting deletes by pageno does not have the downside that
    'optimizing' a mass put (bad distribution of records).
    This is consistent with a "hxnext" sweep of a file
    accumulating records to delete.
+ add a HX_TEST_FN parameter to hxcheck to validate that a record
    is correct (at very least, that HX_HASH_FN and HX_DIFF_FN
    won't blow up, if they are based on the assumption that
    they are given a VALID record).

MISSING COVERAGE:

    hxget:
    - BAD_FILE: file with a chain of >HX_MAX_CHAIN pages
    + for a key not in the file

    hxopen:
    + open with HX_FSYNC
    + failure to open (corrupt or missing file?)

    hxcheck:
    - BAD_REQUEST: file opened for update but not repair
    - corrupt header fields (pgsize,pgrate,uleng)
    - folg types of corruption:
	bad_loop bad_dup_recs bad_next bad_used
	bad_rec_size bad_rec_hash bad_free_next bad_map_head

    hxput:
    + allocPage in a file having >1 map
    + recursive growFile: needs records with length > maxrec(hp)/2
    - shareTail where the tail page has been emptied (freed)
	by some other process between two hxput calls from this process.

    hxpack:
    + merge cases for more complicated chains:
	  741:  169:        if (dsttail == dsthead || VREF(locp, dsttail) == 1) {
	  154:  173:        } else if (srctail == srchead) {
	   78:  177:        } else if (VREF(locp, srctail) == 1) {
	#####:  179:            PUTLINK(locp, srctail, dstneck);
	   78:  181:        } else if (srctail == dsttail) {
	    9:  183:            if (dsttail != dstneck) {
	#####:  185:                srctail = _hxgetRef(locp, srcp->pgno, srctail);

    - file with a chain of >HX_MAX_CHAIN pages
	 1091:  226:        while (1) {
	 1091:  228:            if (!--loops)
	#####:  229:                LEAVE(locp, HXERR_BAD_FILE);

    _hxenter:
    + bad file
    + bad request
    + file opened with MPROTECT

    _hxlink:
    x with mmap

    No coverage for errors in system calls:
	(read, write, lseek, fcntl, ftruncate, mmap).


Sun Apr 26 2009
x use concurs and a mock "fcntl" to prove locking is correct.
    Interesting idea but not necessary.
+ a redundant fcntl call is done in hxput (etc) for a hxhold'd key.

Thu Apr 16 2009

+ restore coverage WITHOUT the gigantic data files, by cooking the
    hash function in unit tests. Perhaps chx needs a dynload lib
    (librec.so?) for the record-type-specific functions,
    including some kind of to/from-string functions
    for chx dump/load.

^ chx diff file1 file2 to generate a diff?

+ hxcheck interface is incomplete (qv: >>> hxfix)
    It OUGHT to take all the parameters of hxcreate and hxopen,
    rather than making hxopen do a hacked job when the HX_REPAIR
    mode bit is passed to it. This follows the Hash::Dynamic interface.

Mon Feb 23 2009

- add madvise(*,*,MADV_RANDOM) to hxopen and _remap.
    Perf-TDD!

^ Make search for a record within a page much faster by
    storing a record count, and list of record offsets sorted by
    hash code, at one end of the page. This means the page and
    each record now have an overhead of 8 bytes. It solves the
    NEW performance limit, that matters in mmap'd hxfiles:
    linear search of (say) 300 20-byte records in an 8K page
    is now significant. Note that efficient page size is a function
    of the operating system. This should reduce the _find
    loop to 2 x 9 comparisons, rather than 300.

... After a night's think, the binary search doesn't seem too
    hard to implement (updates become more complex) but the
    record-count tipping point isn't obvious for binary search
    being faster than linear search: 100 records? 200 records?
    _hxfind can decide which to do, based on record count.
    Not clear if locality matters (i.e. store {hash,offset}
    in a vector or two, rather than indirection all over the page
    to read hash values. If you put {U32 hash, U16 offset, U16 reclen}
    in an array based at one end of the page, you'd never have to
    worry about alignment, either. page header += (U16 recs).
    EXTREME locality: hash_v[] and off_len_v[]. I don't
    think this is worth the effort; but forcing aligment
    will definitely be good.

    TDD!

    Tasks:
    - FITS needs a record-count parameter.
    - _hxfind and _hxremove communicate via 'pos', which becomes
	an offset[] index. (hxget also needs to change).
    - _hxsave changes (don't zero out offset[])!
    - _hxremove can do a linear pass through the offsets,
	decrementing all offsets >offset[pos] by leng.
    - _hxshift becomes complicated where the target is not empty,
	since you don't know how many records will go to the target(s)
	and so you need two buffer vectors of offsets.
    - _hxappend can take advantage of the cases where
	both the source and target record sets are already sorted
	by hash[offset[pos]].
    - integers will be stored in LSBfirst (native) format so
	LDUS amounts to a MOV instruction for intel.

Mon Dec  8 2008
+ t/check.t fails: some REPAIR operations create files with >1000 recs.
# hxcheck needs a way to hold (file) lock across hxget/hxput calls.

Thu Nov 27 2008
+ BUG hxalloc assert failure loading uri.tab into a 2K-page file.
    Looks like files with map pages beyond root have a LEETLE problem.
    MMAP is not a factor.
    # Turns out that it was allocPage had a bug in calculating
    # pgno that added in the map pg's pgno twice: problem doesn't
    # occur in tests where mappg never goes > 0.

Sat Nov 22 2008
# IT IS NOT POSSIBLE FOR HXCHECK TO USE HX_MMAP.
+ BUG in (chx dump? hxput?): some records are coming back
    from hxnext with the last byte duplicated (the only
    data error, ever). Happens after update/revert in t/data.t
    # bug in chx that didn't add the required extra null term.
- Longstanding BUG in hxcheck: it 'fixes' map pages, and as long as
    there is only one (the root) it caches it in memory.
    However, for files with >1 map page, it will write the
    modified map page back, even in "check" mode.
    ? How do we use the corrected map page later
    to validate ovfl pages?

+ make hxstat store data in memory, not write to a stream.

- a second pass in hxpack could shuffle individual heads' tails
    among the emptiest tails containing multiple heads.
    There is an advantage to more heads per tail page:
    the cache hit freq goes up.

Sun Nov 16 2008
+ in chx dump(), after hxnext loop, hxclose fails on free(hp->buffer.page)
    because (based on DBMALLOC) someone has freed it already?
+ GNUmakefile is getting repetitive ... fix it?
+ still getting weird errors: crazy init value of HXLOCAL and overwrites of hp.
x speculative: in hxput, at the end of a chain, if the head's tail is shared,
    and tail_used >= rec.size + (bytes head uses in the current tail),
    then it would be more efficient to transfer this head's part of tail
    to the other tail, and update the rump page's NEXT link.
    THIS IS NOT WORTH DOING. Adds extra processing (scanning
    a page for records from other heads) that affects each
    shareTail+allocPage+growFile step; and it probably uses
    share space which would otherwise be used by another put anyway.

+ make chx dump/load match Hash::Dynamic record format exactly:
    ( key, \0, value, \0 )
+ ! add full chain checking to hxcheck.
    How else do you get dup recs in chx dump output
    on a file that passes chx check ?
+ 223878. dup key
x make 'chx create' default to getpagesize() ? (4096)
+ reduce 'dump' function to dumping tabsep data.
+ add a simple load function to chx, with \t and \n
    as field and record separators, translating nonprintable chars
    to \-escapes. Also, make udata = ""[0].
+ make \0 the std internal field separator, so that chx is
    compatible with Hash::Dynamic by default.

Thu Nov 13 2008
x The bug in hxpack/MMAP continues ...

$ x=8118; chx create f.hx 8192; head -$x f.put |chx -mf f.hx; chx check f.hx; chx info f.hx; chx -m pack f.hx;chx
 check f.hx; chx info f.hx
*check: CLEAN
recs=8118 hash=A21106DC used=1368813 pages=209 free=0 load=79,85 pgsize=8192
*check: CLEAN
recs=8118 hash=A21106DC used=1368813 pages=208 free=1 load=79,86 pgsize=8192

$ x=8119; chx create f.hx 8192; head -$x f.put |chx -mf f.hx; chx check f.hx; chx info f.hx; chx -m pack f.hx;chx
 check f.hx; chx info f.hx
*check: CLEAN
recs=8119 hash=A4530689 used=1369046 pages=212 free=1 load=78,82 pgsize=8192
*check: CLEAN
recs=8133 hash=4E8211AF used=1371631 pages=210 free=1 load=79,85 pgsize=8192

- even if HX_MMAP doesn't work in hxpack,
    ensure that the crash tests work (are detected and reparable).
    That implies doing crash tests on something other than hxpack!
- keep on thinking about what can increase the number of sharetails
    and hence make a higher pgrate possible.
+ fix lock RANGE for heads in hxput!
x fix hxpack for HX_MMAP.
    # fixed part of hxpack: NEVER just change buf.pgno!
    # "pack 27.hx" still produces a READABLE file.

+ test actual locking cases (dust off concurs!?)
+ test hxhold coverage
+ test hxnext coverage
+ test files having  map page > root.
+ test recursive growFile case.
- tests mmap of files >500MB.
^ test hxcheck coverage:
    + file with corrupt pgsize/pgrate in root
    - add tests so that hxcheck encounters bad...
	    dup_recs  free_next  loop  map_head  next
	    overmap  rec_hash  rec_size  used
+ change hxcheck interface
    It needs all the args of both hxcreate and hxopen.
    Update Hash::Dynamic (impl) to hxcheck filename not file.
x convert tests to perl test-harness
+ change HXFILE{tail_*} to HXFILE{PGINFO share}
+ prettify whitespace calc in hxdebug().

Sun Oct 26 2008
+ for operations that lock the entire file (hxcheck,hxpack,hxstat),
    suppress pagewise locking (wasted syscalls).

Sun Oct 19 2008
+ opportunistic: make use of potential sharedTail after growFile.
x [SH]R locking policy instead of R[SH]:
    x initial lock repeats lseek/locks/lseek until locked entity
	(split or load) does not change.
    x lock split and load pages the same way, since a load page
	(above split) may become the split by the time you reach
	growFile.
    x lock the split (first) if load <= end of split range.
    x lock split before locking root (shareTail may update root)
    x lock ovfl pages before reading: solves the shared tail problem.
    x for delete, lock head then (on demand) root.

Sep 4 2008
+ Complete the fix of tests after adding "-t" option to chx.
    chx had a hardcoded ":" as key-val separator.
    mamelog uses '\0' as the separator; so chx won't work with them.
    Tests need to be fixed to use "-t:", and chx may need some love.
    not locked while hxenter uses the file size to calculate a record's
    head page.
+ add a 'stat' function to chx, that computes a histogram
    of chain lengths. Strictly for interest's sake.

Aug 18 2008
+ make hxput:growFile smart enough that, if a page split creates a
    usable shareTail, that hxput uses that instead of trying to
    reclaim an entire free page.

x hxpack's initial FOLD pass (combining tails)
    seems to miss opportunities to fold pages together.

x hxpack/MMAP fails in the fold pass, where buffers are
    and pgnos do not match (?!).

+ hxlock should shut up when it decides there's nothing to do.
+ hxlock should not try to lock individual pages when
    LOCKED_BODY is set.

Tue Nov 11 2008
# t/data.t with pgsize=4K was slower with bigger files;
#   lower density, of course.
# @ith pgsize=16K was same speed but way more overflow!?
? what is actually slowing down hxput? 500 updates/sec
    is not great:
    - block transfers (so pgsize)
    - system call overhead (read,write,lock)
    - complex code (???)
    What could reduce these?
    + mmap
    - test 4096b

+ Add code to skip (tail)lock calls when LOCK_BODY is set.

+ Time to try mmap access again.
    + implement _hxsize.

+ _hxsplit is only called once, and only works in that
    exact context ... better to inline the code.

+ Now that the SPLIT lock is worked out, HEAD lock
    needs the same arithmetic, but without the conditional
    ... you always have to be sure to lock the head at least,
    to protect readers from seeing pages in transition.
    Start by asking, which split starting points would
    try to lock the current head?

x go back and see how to use shareTail with non-heads
    that have only one head's records in them.

[70657] <put A0ACD4AF [321]'44939723_1228_13_1'
_hxlock   	   1597 (3) need=0 have=0	:268
_hxload   	   1599 next=3440 used=8050	:201
_hxlock   	   3440 (1) need=0 have=0	:268
_hxload   	   3440 next=0 used=8025	:201
_hxlock   	      0 (1) need=1 have=0	:268
_hxload   	      0 next=270336 used=6	:201
growFile  	newf=4330 newd=3247 n=4331 d=3248 m=4095 oldf=1599	:319
_hxload   	   1599 next=3440 used=8050	:201
_hxlock   	   3440 (1) need=0 have=1	:268
_hxload   	   3440 next=0 used=8025	:201
_hxsave   	   3440 next=0 used=1707	:225
_hxsave   	   4330 next=0 used=6186	:225
_hxsave   	   1599 next=3440 used=8182	:225
growFile  	newf=4331 newd=3248 n=4332 d=3249 m=4095 oldf=1601	:319
_hxload   	   1601 next=1256 used=8005	:201
_hxlock   	   1256 (1) need=0 have=1	:268
_hxload   	   1256 next=0 used=4173	:201
_hxsave   	   1601 next=0 used=6764	:225
_hxsave   	   4331 next=0 used=5414	:225
hxput     	head=1599 new=1256	:209	<<<<<< HEAD was split
_hxload   	   1599 next=3440 used=8182	:201
_hxlock   	   3440 (1) need=0 have=1	:268
_hxload   	   3440 next=0 used=1707	:201
_hxsave   	   3440 next=0 used=2034	:225
_hxalloc  	   1256 mappg=0 bit=0 val=362	:409
_hxlock   	      0 (1) need=1 have=1	:268
_hxsave   	   1256 next=0 used=0	:225

Sun Nov  9 2008
+ Fix the a bug in _hxsplit (usage), I think.
    Both hxput and growFile use _hxsplit to lock and load
    the split page, but they come to different answers (!?)
    because npages has changed.

x Looks like there's no point in saving >1 sharetail.
    In loading 92K rows, a 'miss' only happened once.

Tue Nov  4 2008
- disk io optimizations:
    - when only a link has changed (e.g. a shareTail)
	use a _hxlink call instead of _write
    - _hxalloc following _hxload(mappg) doesn't need
	to do an extra byte-_read.
    + track how many shareTails fail (if any?)
	because of insuff room
    x In growFile, you don't need to reload the head page;
	you just need to _hxlink the head to the new page.

_hxlock             469 (3) need=0 have=0       :271
_hxload             470 next=4484 used=8138     :204
_hxlock            4484 (1) need=0 have=0       :271
_hxload            4484 next=0 used=7970        :204
_hxlock             198 (2) need=0 have=0       :271
_hxlock               0 (1) need=1 have=2       :271
_hxload               0 next=270336 used=6      :204
_hxload             199 next=2820 used=8152     :204
_hxlock            2820 (1) need=0 have=3       :271
_hxload            2820 next=0 used=3940        :204
_hxsave             199 next=0 used=4907        :228
_hxsave            5661 next=0 used=4203        :228
_hxsave            2820 next=0 used=2982        :228
_hxload             201 next=0 used=6853        :204
_hxsave             201 next=0 used=4733        :228
_hxsave            5662 next=0 used=2120        :228
_hxload             202 next=0 used=7088        :204
_hxsave             202 next=0 used=4572        :228
_hxsave            5663 next=0 used=2516        :228
_hxalloc           5664 mappg=0 bit=0 val=1464  :409
_hxlock               0 (1) need=1 have=3       :271
_hxload             470 next=4484 used=8138     :204
_hxsave             470 next=5664 used=8138     :228	<< _hxlink would do.
_hxlock            4484 (1) need=0 have=3       :271
_hxload            4484 next=0 used=7970        :204
_hxsave            4484 next=0 used=6574        :228	<< remaining shared tail.
_hxsave            5664 next=0 used=1774        :228	<< new unshared (tail)

Sat Nov  1 2008
* make hxcheck open a file directly; it really needs the
    combined interface of hxcreate and hxopen.
# AHA now that every HEAD lock locks the (pgrate-1) contig seq
# of heads, there is no longer a deadlock issue! The rules now are:
    . lock head
    . lock split before root.
+ fix whitespace while the code is stable
+ organize remaining ops to make mmap easy.
    + _hxsize wrapper for both ftruncate and extending file
	(top of growFile).
    - remember that hxnext's buffer is still not mmapable.
+ simplify the locking logic; THEN move it out of _hxenter
    and into the routines that know who they are!
    + don't forget to flag and skip redundant locking operations.

+ implement mmap:
    + new hxopen mode bit
    x _hxassess(locp) lseeks to get file size, munmap/mmaps
	as needed. Call this just before locking split in hxput.
    x growing by a page is still not all there.

Tue Oct 28 2008
+ ensure that file io can deal with 64bit file offsets.
x work out tests for [SH]R locking.

Sun Oct 26 2008
+ for operations that lock the entire file (hxcheck,hxpack,hxstat),
    suppress pagewise locking (wasted syscalls).

Sun Oct 19 2008
+ opportunistic: make use of potential sharedTail after growFile.
x [SH]R locking policy instead of R[SH]:
    x initial lock repeats lseek/locks/lseek until locked entity
	(split or load) does not change.
    x lock split and load pages the same way, since a load page
	(above split) may become the split by the time you reach
	growFile.
    x lock the split (first) if load <= end of split range.
    x lock split before locking root (shareTail may update root)
    x lock ovfl pages before reading: solves the shared tail problem.
    x for delete, lock head then (on demand) root.

Sep 4 2008
+ Complete the fix of tests after adding "-t" option to chx.
    chx had a hardcoded ":" as key-val separator.
    mamelog uses '\0' as the separator; so chx won't work with them.
    Tests need to be fixed to use "-t:", and chx may need some love.
    not locked while hxenter uses the file size to calculate a record's
    head page.
+ add a 'stat' function to chx, that computes a histogram
    of chain lengths. Strictly for interest's sake.

Aug 18 2008
+ make hxput:growFile smart enough that, if a page split creates a
    usable shareTail, that hxput uses that instead of trying to
    reclaim an entire free page.

Jul 25 2008