MatlabGPUDemo1

From FarmShare

(Difference between revisions)
Jump to: navigation, search
Line 23: Line 23:
<source lang="sh">
<source lang="sh">
-
$ module load matlab
+
bishopj@scorn:~$ ssh rye01
-
$ matlab -nodesktop
+
rye01.stanford.edu - Ubuntu 13.04, amd64
 +
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
 +
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
 +
--*-*- Stanford University Research Computing -*-*--
 +
 
 +
  _____                    ____  _
 +
|  ___|_ _ _ __ _ __ ___ / ___|| |__  __ _ _ __ ___
 +
| |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 +
|  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 +
|_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|
 +
 
 +
 
 +
    http://farmshare.stanford.edu
 +
 
 +
###
 +
##
 +
# new to Ubuntu 13.04 Farmshare?
 +
# follow this link to get started:
 +
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
 +
##
 +
###
 +
 
 +
Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu
 +
 
 +
your cuda device is:
 +
CUDA_VISIBLE_DEVICES=0
 +
device last used: Sun Sep 15 21:25:34 2013
 +
 
 +
bishopj@rye01:~$ module load matlab
 +
bishopj@rye01:~$ matlab -nodesktop
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.
Warning: No window system found.  Java option 'MWT' ignored.
-
                            < M A T L A B (R) >
+
                                                          < M A T L A B (R) >
-
                  Copyright 1984-2013 The MathWorks, Inc.
+
                                                Copyright 1984-2013 The MathWorks, Inc.
-
                    R2013a (8.1.0.604) 64-bit (glnxa64)
+
                                                  R2013a (8.1.0.604) 64-bit (glnxa64)
-
                            February 15, 2013
+
                                                            February 15, 2013
No window system found.  Java option 'MWT' ignored.
No window system found.  Java option 'MWT' ignored.
Line 49: Line 78:
   CUDADevice with properties:
   CUDADevice with properties:
-
                       Name: 'Tesla C2070'
+
                       Name: 'GeForce GTX 480'
                     Index: 1
                     Index: 1
         ComputeCapability: '2.0'
         ComputeCapability: '2.0'
Line 60: Line 89:
               MaxGridSize: [65535 65535 65535]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
                 SIMDWidth: 32
-
               TotalMemory: 5.6366e+09
+
               TotalMemory: 1.6103e+09
-
                 FreeMemory: 5.5344e+09
+
                 FreeMemory: 1.5101e+09
-
       MultiprocessorCount: 14
+
       MultiprocessorCount: 15
-
               ClockRateKHz: 1147000
+
               ClockRateKHz: 1401000
               ComputeMode: 'Default'
               ComputeMode: 'Default'
       GPUOverlapsTransfers: 1
       GPUOverlapsTransfers: 1
Line 76: Line 105:
   CUDADevice with properties:
   CUDADevice with properties:
-
                       Name: 'Tesla C2070'
+
                       Name: 'GeForce GTX 480'
                     Index: 1
                     Index: 1
         ComputeCapability: '2.0'
         ComputeCapability: '2.0'
Line 87: Line 116:
               MaxGridSize: [65535 65535 65535]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
                 SIMDWidth: 32
-
               TotalMemory: 5.6366e+09
+
               TotalMemory: 1.6103e+09
-
                 FreeMemory: 5.5344e+09
+
                 FreeMemory: 1.5101e+09
-
       MultiprocessorCount: 14
+
       MultiprocessorCount: 15
-
               ClockRateKHz: 1147000
+
               ClockRateKHz: 1401000
               ComputeMode: 'Default'
               ComputeMode: 'Default'
       GPUOverlapsTransfers: 1
       GPUOverlapsTransfers: 1
Line 102: Line 131:
ranging from 1024-by-1024 to 13312-by-13312.
ranging from 1024-by-1024 to 13312-by-13312.
Creating a matrix of size 1024-by-1024.
Creating a matrix of size 1024-by-1024.
-
Gigaflops on CPU: 34.472190
+
Gigaflops on CPU: 5.566165
-
Gigaflops on GPU: 56.288799
+
Gigaflops on GPU: 37.670697
Creating a matrix of size 2048-by-2048.
Creating a matrix of size 2048-by-2048.
-
Gigaflops on CPU: 49.891778
+
Gigaflops on CPU: 33.638140
-
Gigaflops on GPU: 106.760173
+
Gigaflops on GPU: 143.898457
Creating a matrix of size 3072-by-3072.
Creating a matrix of size 3072-by-3072.
-
Gigaflops on CPU: 64.997307
+
Gigaflops on CPU: 40.107724
-
Gigaflops on GPU: 197.257665
+
Gigaflops on GPU: 223.183271
Creating a matrix of size 4096-by-4096.
Creating a matrix of size 4096-by-4096.
-
Gigaflops on CPU: 70.944260
+
Gigaflops on CPU: 55.753796
-
Gigaflops on GPU: 266.873255
+
Gigaflops on GPU: 327.146632
Creating a matrix of size 5120-by-5120.
Creating a matrix of size 5120-by-5120.
-
Gigaflops on CPU: 84.640804
+
Gigaflops on CPU: 54.888358
-
Gigaflops on GPU: 319.151358
+
Gigaflops on GPU: 292.626007
Creating a matrix of size 6144-by-6144.
Creating a matrix of size 6144-by-6144.
-
Gigaflops on CPU: 92.799236
+
Gigaflops on CPU: 72.191110
-
Gigaflops on GPU: 355.467871
+
Gigaflops on GPU: 452.020228
Creating a matrix of size 7168-by-7168.
Creating a matrix of size 7168-by-7168.
-
Gigaflops on CPU: 98.141367
+
Gigaflops on CPU: 80.896917
-
Gigaflops on GPU: 388.194551
+
Gigaflops on GPU: 498.172535
Creating a matrix of size 8192-by-8192.
Creating a matrix of size 8192-by-8192.
-
Gigaflops on CPU: 102.462204
+
Gigaflops on CPU: 84.840500
-
Gigaflops on GPU: 405.167131
+
Gigaflops on GPU: 506.676184
Creating a matrix of size 9216-by-9216.
Creating a matrix of size 9216-by-9216.
-
Gigaflops on CPU: 98.400070
+
Gigaflops on CPU: 68.652257
-
Gigaflops on GPU: 419.867571
+
Gigaflops on GPU: 533.858153
Creating a matrix of size 10240-by-10240.
Creating a matrix of size 10240-by-10240.
-
Gigaflops on CPU: 96.734765
+
Gigaflops on CPU: 73.660056
-
Gigaflops on GPU: 434.993371
+
Gigaflops on GPU: 541.269779
Creating a matrix of size 11264-by-11264.
Creating a matrix of size 11264-by-11264.
-
Gigaflops on CPU: 112.294056
+
Gigaflops on CPU: 93.310377
-
Gigaflops on GPU: 439.164558
+
Gigaflops on GPU: 560.362334
Creating a matrix of size 12288-by-12288.
Creating a matrix of size 12288-by-12288.
-
Gigaflops on CPU: 115.434767
+
Gigaflops on CPU: 89.056557
-
Gigaflops on GPU: 440.911860
+
Gigaflops on GPU: 558.393444
Creating a matrix of size 13312-by-13312.
Creating a matrix of size 13312-by-13312.
-
Gigaflops on CPU: 115.826290
+
Gigaflops on CPU: 102.489253
-
Gigaflops on GPU: 460.198654
+
Gigaflops on GPU: 574.326117
Starting benchmarks with 9 different double-precision matrices of sizes
Starting benchmarks with 9 different double-precision matrices of sizes
ranging from 1024-by-1024 to 9216-by-9216.
ranging from 1024-by-1024 to 9216-by-9216.
Creating a matrix of size 1024-by-1024.
Creating a matrix of size 1024-by-1024.
-
Gigaflops on CPU: 14.479196
+
Gigaflops on CPU: 14.504665
-
Gigaflops on GPU: 21.906035
+
Gigaflops on GPU: 24.855377
Creating a matrix of size 2048-by-2048.
Creating a matrix of size 2048-by-2048.
-
Gigaflops on CPU: 27.758668
+
Gigaflops on CPU: 19.376792
-
Gigaflops on GPU: 70.264055
+
Gigaflops on GPU: 74.501813
Creating a matrix of size 3072-by-3072.
Creating a matrix of size 3072-by-3072.
-
Gigaflops on CPU: 35.325472
+
Gigaflops on CPU: 29.208044
-
Gigaflops on GPU: 110.924771
+
Gigaflops on GPU: 106.253927
Creating a matrix of size 4096-by-4096.
Creating a matrix of size 4096-by-4096.
-
Gigaflops on CPU: 41.316066
+
Gigaflops on CPU: 35.060889
-
Gigaflops on GPU: 151.816138
+
Gigaflops on GPU: 121.734819
Creating a matrix of size 5120-by-5120.
Creating a matrix of size 5120-by-5120.
-
Gigaflops on CPU: 47.203079
+
Gigaflops on CPU: 40.079125
-
Gigaflops on GPU: 182.013352
+
Gigaflops on GPU: 133.176539
Creating a matrix of size 6144-by-6144.
Creating a matrix of size 6144-by-6144.
-
Gigaflops on CPU: 50.618165
+
Gigaflops on CPU: 43.513209
-
Gigaflops on GPU: 203.495957
+
Gigaflops on GPU: 139.033109
Creating a matrix of size 7168-by-7168.
Creating a matrix of size 7168-by-7168.
-
Gigaflops on CPU: 53.713014
+
Gigaflops on CPU: 45.878316
-
Gigaflops on GPU: 220.657206
+
Gigaflops on GPU: 146.538608
Creating a matrix of size 8192-by-8192.
Creating a matrix of size 8192-by-8192.
-
Gigaflops on CPU: 54.993392
+
Gigaflops on CPU: 48.424626
-
Gigaflops on GPU: 225.368964
+
Gigaflops on GPU: 147.271608
Creating a matrix of size 9216-by-9216.
Creating a matrix of size 9216-by-9216.
-
Gigaflops on CPU: 56.978938
+
Gigaflops on CPU: 45.145666
-
Gigaflops on GPU: 237.973215
+
Gigaflops on GPU: 151.482486
</source>
</source>

Revision as of 22:16, 15 September 2013

Matlab GPU demos

GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.

Resources:

 Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
 For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
 These matlab functions have GPU support: http://www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1
 Example scritpts: http://www.mathworks.com/help/distcomp/examples/index.html#gpu
 matlab file exchange: http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

In this example we will run the Benchmarking A\b on the GPU one found here: [Benchmarking A\b on the GPU]

matlab commands used below:

paralleldemo_gpu_devices
paralleldemo_gpu_backslash(.75);

example output

Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.

bishopj@scorn:~$ ssh rye01
rye01.stanford.edu - Ubuntu 13.04, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
 --*-*- Stanford University Research Computing -*-*--

  _____                    ____  _
 |  ___|_ _ _ __ _ __ ___ / ___|| |__   __ _ _ __ ___
 | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 |  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 |_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|


    http://farmshare.stanford.edu

###
##
# new to Ubuntu 13.04 Farmshare?
# follow this link to get started:
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
##
###

Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu

your cuda device is:
CUDA_VISIBLE_DEVICES=0
device last used: Sun Sep 15 21:25:34 2013

bishopj@rye01:~$ module load matlab
bishopj@rye01:~$ matlab -nodesktop
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2013 The MathWorks, Inc.
                                                   R2013a (8.1.0.604) 64-bit (glnxa64)
                                                            February 15, 2013

No window system found.  Java option 'MWT' ignored.
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
>> paralleldemo_gpu_devices

numDevices =

     1


origDevice = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 480'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.6103e+09
                FreeMemory: 1.5101e+09
       MultiprocessorCount: 15
              ClockRateKHz: 1401000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


device = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 480'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.6103e+09
                FreeMemory: 1.5101e+09
       MultiprocessorCount: 15
              ClockRateKHz: 1401000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

>> paralleldemo_gpu_backslash(.75);
Starting benchmarks with 13 different single-precision matrices of sizes
ranging from 1024-by-1024 to 13312-by-13312.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 5.566165
Gigaflops on GPU: 37.670697
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 33.638140
Gigaflops on GPU: 143.898457
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 40.107724
Gigaflops on GPU: 223.183271
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 55.753796
Gigaflops on GPU: 327.146632
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 54.888358
Gigaflops on GPU: 292.626007
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 72.191110
Gigaflops on GPU: 452.020228
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 80.896917
Gigaflops on GPU: 498.172535
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 84.840500
Gigaflops on GPU: 506.676184
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 68.652257
Gigaflops on GPU: 533.858153
Creating a matrix of size 10240-by-10240.
Gigaflops on CPU: 73.660056
Gigaflops on GPU: 541.269779
Creating a matrix of size 11264-by-11264.
Gigaflops on CPU: 93.310377
Gigaflops on GPU: 560.362334
Creating a matrix of size 12288-by-12288.
Gigaflops on CPU: 89.056557
Gigaflops on GPU: 558.393444
Creating a matrix of size 13312-by-13312.
Gigaflops on CPU: 102.489253
Gigaflops on GPU: 574.326117
Starting benchmarks with 9 different double-precision matrices of sizes
ranging from 1024-by-1024 to 9216-by-9216.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 14.504665
Gigaflops on GPU: 24.855377
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 19.376792
Gigaflops on GPU: 74.501813
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 29.208044
Gigaflops on GPU: 106.253927
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 35.060889
Gigaflops on GPU: 121.734819
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 40.079125
Gigaflops on GPU: 133.176539
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 43.513209
Gigaflops on GPU: 139.033109
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 45.878316
Gigaflops on GPU: 146.538608
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 48.424626
Gigaflops on GPU: 147.271608
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 45.145666
Gigaflops on GPU: 151.482486
Personal tools
Toolbox
LANGUAGES