Efficient setup of random123 streams on GPU

[copy/paste of internal issue created by @pramodk]

In GPU implementation of synapse model, we need to setup random123 streams on GPU which are being done in bbcore_read for every synapse instance:
```c++
static void bbcore_read(double* x, int* d, int* xx, int* offset, _threadargsproto_) {
    assert(!_p_rng);
    uint32_t* di = ((uint32_t*)d) + *offset;
        if (di[0] != 0 || di[1] != 0)
        {
      nrnran123_State** pv = (nrnran123_State**)(&_p_rng);
      *pv = nrnran123_newstream(di[0], di[1]);
        }
    *offset += 2;
}
```
For this we launch a kernel for Serial compute i.e. just initialize stream as:
```c++
/* nrn123 streams are created from cpu launcher routine */
nrnran123_State* nrnran123_newstream(uint32_t id1, uint32_t id2) {

    nrnran123_State* s;

    cudaMalloc( (void**)&s, sizeof(nrnran123_State) );
    cudaMemset( (void**)&s, 0, sizeof(nrnran123_State) );

    nrnran123_setup_cuda_newstream<<<1,1>>> (s, id1, id2);
    cudaDeviceSynchronize();

    return s;
}
```
This is terribly slow and inefficient! See sample profile for small 5msec simulation: ![nvvp_cuda_stream_random123](https://user-images.githubusercontent.com/6459623/125078828-38bbac00-e0c3-11eb-9cb3-cceb1a7ffaf2.png)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient setup of random123 streams on GPU #587

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Efficient setup of random123 streams on GPU #587

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions