Multi-head attention API sample code and numpy/autograd reference model.

BUILD INSTRUCTIONS
------------------

1. Make sure CUDA and cuDNN are installed.

2. Build the "multiHeadAttention" program in its sample folder specifying
   the CUDA installation path:

   > make CUDA_PATH=<cuda installation path>
   > make DEBUG=1 CUDA_PATH=<cuda installation path> (for debug version)

EXECUTING API TEST
------------------

1. Print the help screen with program options (type "multiHeadAttention"
   with no command line options):

   > ./multiHeadAttention

2. Run a test of your choice, for example:

   > ./multiHeadAttention -attnTrain1 -attnDataType0 -attnNumHeads3 -attnBatchSize6 -attnBeamSize1 -attnQsize8 -attnKsize8 -attnVsize8
     -attnProjQsize2 -attnProjKsize2 -attnProjVsize2 -attnProjOsize8 -attnResLink0 -attnSeqLenQ3 -attnSeqLenK10

3. Task dimensions can be randomized by adding the "-attnRandGeom1" option.
   The randomized configuration is always smaller or equal to the provided
   command line arguments.

4. To generate different input data and test geometries, change the random
   number generator seed using the "-attnRandSeed" switch.

The API sample code demonstrates how to access multi-head attention weights
and SeqData.  See saveAllParams() and saveData() functions.

RUNNING REFERENCE MODEL
-----------------------

Currently, the multi-head attention reference model (attn_ref.py) supports
only one sentence so it requires "-attnBatchSize1 -attnBeamSize1" options
when .dat files are generated.  The model handles multiple attention heads 
and multiple time-steps in Q and K/V.  It is assumed that the attention 
window includes all K/V vectors so "-attnRandGeom0" should be used.  
The model can compute the forward output and all dgrad/wgrad derivatives.

1. Install autograd; follow instructions from: https://github.com/HIPS/autograd

2. Add the "-attnFileDump1" option to the "multiHeadAttention" command line.
   This option instructs the API test program to save all input/output data 
   and weights in separate files such as: "q.dat" or "wk.dat".  The remaining
   configuration parameters are written to "meta.dat".

   The .dat files are in the text format so they can be easily inspected by the
   user.  Unfortunately, the numpy loadtxt() function handles 2D matrices only.
   When 3D tensors are needed, the 2D matrices from .dat files are internally
   reshaped by the "attn_ref.py" script.  The final shape is shown when the
   files are loaded by the reference model:

   Loaded [3x2x8] of WQ weights from 'wq.dat'
   Loaded [3x8x2] of WO weights from 'wo.dat'

3. After generating .dat files, run the "attn_ref.py" script with no arguments.
   The python script will read input data and weights and it will compute forward,
   dgrad, and wgrad results.  Dgrad/wgrad results are computed when "-attnTrain1"
   was specified in the API test program command line.  When "-attnTrain0" was 
   used, only the forward output is processed by the reference model.

   The cuDNN API output is automatically compared with the reference results
   generated by the numpy/autograd model.  Larger configurations and less
   accurate data types may require more generous tolerances in the "attn_ref.py" 
   script.

4. To test quickly the multi-head attention reference model, run the "run_ref.sh"
   script on systems where the bash shell is available.

