Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Efficient setup of random123 streams on GPU #587

@olupton

Description

@olupton

[copy/paste of internal issue created by @pramodk]

In GPU implementation of synapse model, we need to setup random123 streams on GPU which are being done in bbcore_read for every synapse instance:

static void bbcore_read(double* x, int* d, int* xx, int* offset, _threadargsproto_) {
    assert(!_p_rng);
    uint32_t* di = ((uint32_t*)d) + *offset;
        if (di[0] != 0 || di[1] != 0)
        {
      nrnran123_State** pv = (nrnran123_State**)(&_p_rng);
      *pv = nrnran123_newstream(di[0], di[1]);
        }
    *offset += 2;
}

For this we launch a kernel for Serial compute i.e. just initialize stream as:

/* nrn123 streams are created from cpu launcher routine */
nrnran123_State* nrnran123_newstream(uint32_t id1, uint32_t id2) {

    nrnran123_State* s;

    cudaMalloc( (void**)&s, sizeof(nrnran123_State) );
    cudaMemset( (void**)&s, 0, sizeof(nrnran123_State) );

    nrnran123_setup_cuda_newstream<<<1,1>>> (s, id1, id2);
    cudaDeviceSynchronize();

    return s;
}

This is terribly slow and inefficient! See sample profile for small 5msec simulation: nvvp_cuda_stream_random123

Metadata

Metadata

Assignees

No one assigned

    Labels

    gpuimprovementImprovement over existing implementation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions