Questions tagged [cuda]

CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.

1
vote
1answer
36 views

code is running, but the gpu function won't be executed

I got two functions: The add_cpu function works fine, but the add_gpu function does not. I tried to check sum options on my GPU driver Software and read my code over and over again. I tried the exact ...
-2
votes
0answers
19 views

CUDA float value problem , The value is strange when operated with float

i used to CUDA version 10.0. i calculate float value. but The computed result is not a valid value. my code is __global__ void ict_kernel(int *imgData_0, int *imgData_1, int *imgData_2, int range, ...
0
votes
0answers
10 views

Is Folly library compatible with CUDA?

When I try to use Facebook Open-source Library (Folly) with CUDA I get the following error: error: allowing all exceptions is incompatible with previous function "malloc" A simplified version of ...
-1
votes
0answers
9 views

Removing sequential duplicates from an unsorted thrust::vector

Background Summary I have a vector of daily stock prices. I want to remove any days where the price hasn't changed. Example before [100, 100, 100, 95, 97, 100, 80, 80] after [100, 95, 97, 100, 80] ...
-2
votes
1answer
44 views

Undefined symbol when trying to link with shared library built from CUDA objects

I'm experimenting with building a simple application from a couple of .cu source files and a very simple C++ main that calls a function from one of the .cu files. I'm making a shared library (.so file)...
0
votes
0answers
31 views

CUDA compile problems on Windows, Cmake error: No CUDA toolset found

so I've been successfully working on my CUDA program on my Linux but I would like to support Windows platform as well. However, I've been struggling with correctly compiling it. I use : Windows 10 ...
0
votes
0answers
24 views

Earliest CUDA version with certain libraries

What was the earliest version of CUDA to have (integrated or separately) the following libraries? nVIDIA Tools Extension (a.k.a. nvtx, nvToolsExt)? nVIDIA OpenCL support (a.k.a. OpenCL)?
-2
votes
0answers
17 views

How to interpret these results for mean filter for both GPU and CPU serial versions?

I implemented image Mean Filter code for the CPU serial version and NVIDIA GPU parallel version. I got the running times(Please See results here. Why case 2 has the highest speedup and case 3 has the ...
1
vote
0answers
25 views

Processing and creation of array in numba.cuda device function

I want to pass to device function a slice of an array and then create there some new array and return their combination. But it seems not the general way to solve such problems with cuda because numba ...
-2
votes
0answers
13 views

Issue with calling PyCuda function (LogicError: cuFuncSetBlockShape failed: invalid resource handle)

Firstly, I'll say in advance that I've gone through all the threads on here as well as the PyCuda forums regarding the given error message, and have tried all the given solutions, and yet I continue ...
1
vote
1answer
25 views

How to correctly cast IntPtr into an array when using Device memory in Hybridizer

I am writing some C# code for CUDA GPU processing using Hybridizer. My problem is I do not understand how to pass objects held in device memory into Hybridizer code and am getting a ...
-2
votes
0answers
18 views

Can MacBook run the CUDA library..? [duplicate]

I have been using a Mac pro 2011 model, on which running a deep learning models is such a impossible task due to its limited resources. So my question is that recently I started learning deep ...
0
votes
0answers
70 views

Global memory access coalescing in CUDA - Maxwell architecture

I have code for matrices multiplication running on my Geforce 940m (Maxwell architecture) with CUDA compute capability 5.0. I have used NVIDIA Visual Profiler to measure number of global load ...
-3
votes
0answers
28 views

Where is the bug

I'm beginning with GPU programming and I'm trying to implement a simple matrix multiplication but it fails, the program returns a matrix of 0 instaed of 6. Can someone points me where it fails. ...
0
votes
1answer
28 views

Finding the nVIDIA Toolkit Extensions library with CMake

I'm using a recent version of CMake, with inherent support for CUDA as a language, to build a project. This project requires the nVIDIA Toolkit Extensions library. On a previous system, I had it under ...
0
votes
2answers
36 views

Recipe to copy 1D strided data with cudaMemcpy2D

If one has two continuous ranges of device memory it is possible to copy memory from from one to the other using cudaMemcpy. double* source = ... double* dest = ... cudaMemcpy(dest, source, ...
0
votes
0answers
43 views

how do i avoid a race condition with this atomic operation? [duplicate]

Take the following code fragment example: __global__ void my_kernel(float *d_min, uint32_t *d_argmin, float *d_input, uint32_t N) { uint32_t ii = blockDim.x * blockIdx.x + threadIdx.x;...
-2
votes
0answers
37 views

Persistent LNK1318 FORMAT(11) error after program crashed

I am writing a program in visual studio. The program uses c++ and CUDA. Things were working fine until I changed a piece of code that started causing the program to crash after a bit. I believe this ...
-1
votes
0answers
20 views

cudaMemGetInfo from multiple processes behaves inconsistently on Windows 10

When running two (or more) programs utilizing CUDA (v10.1) at the same time, I am observing significant discrepancies in the behavior of cudaMemGetInfo. I have two GTX 2080 graphics cards (each with ...
1
vote
0answers
28 views

How to set up MSVC++ 14.0 build tools for python copperhead? [duplicate]

I'd like to use python copperhead for CUDA C++ prototyping, which requires MSVC++14, and I want to make copperhead work first without CUDA. I've installed Microsoft Build Tools 2015 and tried to ...
1
vote
2answers
70 views

CUDA per-thread arrays with different types

Each instance of my CUDA kernel (i.e. each thread) needs three private arrays, with different types. e.g. __global__ void mykernel() { type1 a[aLen]; type2 b[bLen]; type3 c[cLen]; .....
0
votes
1answer
45 views

Is it possible to guarantee each different kernel stream cannot be interleaved?

If the kernel is launched with different streams, can we guarantee that each stream does not interleave? It seems that different kernel streams are interleaved together. What I want to is that ...
-1
votes
0answers
69 views

How can I improve the performance of this large CUDA kernel with a double nested loop?

I have a CUDA kernel for calculating symmetric matrices that happen to be very large (on the order of 16 million entries). Each entry in the matrix is independent, so the kernel uses each thread to ...
1
vote
1answer
42 views

nvcc fatal : '--ptxas-options=-v': expected a number

Getting the nvcc fatal : '--ptxas-options=-v': expected a number error when I try to build a Windows port of Faster-RCNN. You may reach the setup file (which is a Python script) directly from here. ...
-1
votes
1answer
50 views

Passing GpuMat directly to cufftExecC2C function for doing fast fourier transform

I am trying to optimize my code using opencv with cuda and cufft library. Everytime I have do fast fourier transform, I have to download cv::Mat from GpuMat and then do cufft. (Please see the code ...
-2
votes
0answers
36 views

Tensorflow, CUDA, VS versions

According to some online research, it seems like no version of Tensorflow is compatible with CUDA 10.1 yet. Is this true? If that's the case, and I must use CUDA 10.0, can I do so with Visual Studio ...
3
votes
1answer
45 views

Understanding indexing and how many thread there are in a block

I'm studying cuda programming and I've found that there are more than one way to indexing a grid. What I don't understan is how those indexing tecnique are different between each other. Those are my ...
0
votes
0answers
43 views

Which constructor is called for the defined class? [duplicate]

I am working on a matrix class which has all the computation happening on gpu using CUDA libraries. I have given a stripped down version of the class to show the problem I am facing. The problem is ...
0
votes
0answers
59 views

Losing data after successive CUDA kernel launches

I am trying to create an array of double elements where each element is a sum of elements. However, after a new value is added, this value is lost and the vector vet is filled again with zeros as if ...
1
vote
1answer
52 views

Yocto for Nvidia Jetson fails because of GCC 7 - cannot compute suffix of object files

I am trying to use Yocto with meta-tegra ( https://github.com/madisongh/meta-tegra ) to build a minimal system for the Nvidia Jetson Nano. I need to use CUDA ( current version 10 for Nano ) with ...
-2
votes
0answers
20 views

What is the meaning and use of ROWS_RESULT_STEPS, ROWS_HALO_STEPS, COLS_RESULT_STEPS, COLS_HALO_STEPS?

I'm studying the CUDA sparable convolution sample and don't know why ROWS_RESULT_STEPS, ROWS_HALO_STEPS, COLS_RESULT_STEPS, and COLS_HALO_STEPS are used in the program? Is the kernel is working for ...
0
votes
0answers
31 views

How to switch CUDA version after installing two different version of CUDA?

I already had CUDA V9.0 but now i installed CUDA V10.0 with cudnn 7.3.I have upgraded my tensorflow-gpu version to 1.13.1.But when i imported tensorflow i got the following error.When i searched the ...
0
votes
0answers
77 views

How to approach implementing a GPU device-side sprintf?

I'm considering implementing sprintf() (and snprintf(), vsprintf(), vsnprintf()) - for use in CUDA code. Compilers' standard C (and C++) library is not available to GPU-side CUDA code - and can't be ...
2
votes
1answer
42 views

How to use only one GPU for tensorflow session?

I have two GPUs. My program uses TensorRT and Tensorflow. When I run only TensorRT part, it is fine. When I run together with Tensorflow part, I have error as [TensorRT] ERROR: engine.cpp (370) - ...
1
vote
0answers
32 views

Nvcc missing when installing cudatoolkit?

I have installed cuda along pytorch with conda install pytorch torchvision cudatoolkit=10.0 -c pytorch However, it seems like nvcc was not installed along with it. If I want to use for example nvcc -...
-1
votes
0answers
69 views

A CUDA kernel for matrix multiplication of sparse gpuarray: sum only when the product exceeds some threshold

I would like to know if it is possible to user-define Matlab's built-in matrix multiplication mtimes() (or an good equivalence in an open-source linear algebra library in C/C++). The goal is: adding ...
0
votes
0answers
39 views

standard way to call compiled CUDA code from python

I am trying to figure out if there is a standard way to import compiled CUDA code within Python. I've done a bit of searching and it looks like you can import compiled C++ code with cython and ...
0
votes
0answers
68 views

Fastest way to read an image ROI with Thrust

I'm trying to calculate the mean of the ROI of an image using Thrust, but it is too slow (it's way faster on the CPU): { struct MeanTransform { thrust::device_vector<float4>::...
1
vote
0answers
45 views

How to recover from CUDA errors when using cudaLaunchHostFunc instead of cudaStreamAddCallback

The documentation page for cudaStreamAddCallback says that it is "slated for eventual deprecation and removal" and to use cudaLaunchHostFunc instead. However, documentation for cudaLaunchHostFunc says ...
-1
votes
0answers
47 views

Returning an output of variable size from the device to the host

I have an kernel operation that creates an output of unknown size and needs to "send" that result back to the cpu. I'm reluctant to pre-allocate a big enough space from cpu because estimation of size ...
0
votes
0answers
30 views

call cublas library from device code on cuda 9.0 is slower than that on cuda 8.0

I called the cublas library from the device code. The device is GTX 1060, and when using vs2015 and cuda 8.0, it worked fine and fast. However, when I used vs2015 and cuda9.0 on the same computer, it ...
-2
votes
0answers
65 views

Failed to install Cuda 10.0 on Ubuntu 18.04

I followed this instruction to install the cuda 10.0 driver for working with Tensorflow-gpu: `sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb` `sudo apt-key adv --fetch-keys https://developer....
-2
votes
0answers
61 views

How to remove cuda completely from ubuntu?

I have ubuntu 18.04, and accidentally installed cuda 9.1 to run Tensorflow-gpu, but it seems tensorflow-gpu requires cuda 10.0, so I want to remove cuda first by executing: [email protected]:~$ sudo ...
-2
votes
0answers
61 views

enabling software preemption in cuda 10 nsight eclipse (linux) to allow single gpu debugging

exactly how do you enable software preemption in cuda 10 nsight eclipse (fedora linux)? according to the cuda gdb manual, you need to either: Use the following command: set cuda software_preemption ...
1
vote
1answer
41 views

Efficiency of CUDA program/device

I see that some CUDA metrics are totally confusing. According to the definition sm_efficiency The percentage of time at least one warp is active on a multiprocessor averaged over all ...
0
votes
0answers
56 views

Does using pointer in cuda affect perfomance?

In my CUDA code I am using a class that contains pointers to other objects. How do I get the pointers in my classes to point to objects in the shared memory instead of the global memory? I tried to ...
-1
votes
0answers
49 views

How Do I Pararellilize this function with nested loops?

So I'm doing an assignment from this class where we have to make a cuda version ov a given algorithm and with my actual knowledge won't work. I've tried doing it but the result Buffer is full with ...
0
votes
1answer
27 views

nvprof is crashing as it writes a very large file to /tmp/ and runs out of disk space

How do I work-around an nvprof crash that occurs when running on a disk with a relatively small amount of space available? Specifically, when profiling my cuda kernel, I use the following two ...
-1
votes
0answers
46 views

Using CUDA 6.5 with Visual Studio 2017 or 2019

Is it possible to use CUDA 6.5 with VS 2017 and/or 2019? I need CUDA 6.5 to code for NVIDIA TK1.
-2
votes
0answers
25 views

What are the implications of installing CUDA locally to use with Tensorflow

I want to setup Tensorflow 2.0 GPU for python on my computer (emphasis on 2.0). I read the tutorial, and understand that I need CUDA related software. I don't have admin privileges on my computer and ...