|
Description
With the CUDA architecture and tools, developers are achieving dramatic speedups in fields such as medical imaging and natural resource exploration, and creating breakthrough applications in areas such as image recognition and real-time HD video playback and encoding.
CUDA enables this unprecedented performance via standard APIs such OpenCL and DirectCompute, and high level programming languages such as C/C++, Fortran, Java, Python, and the Microsoft .NET Framework.
CUDA Toolkit:
* C/C++ compiler * CUDA Visual Profiler * OpenCL Visual Profiler * GPU-accelerated BLAS library * GPU-accelerated FFT library * Additional tools and documentation
Release Highlights
* GPUDirect(tm) gives 3rd party devices direct access to CUDA Memory * Support for 16-way concurrency allows up to 16 different kernels to run at the same time on Fermi architecture GPUs * Runtime / Driver interoperability enables applications to mix-n-match use of the CUDA Driver API with CUDA C Runtim and math libraries via buffer sharing and context migration * New language features added to CUDA C / C++ include: o Support for printf() in device code o Support for function pointers and recursion make it easier to port many existing algorithms to Fermi GPUs * Unified Visual Profiler now supports both CUDA C/C++ and OpenCL, and now includes support for CUDA Driver API tracing * Math Libraries Performance Improvements, including: o Improved performance of selected transcendental functions from the log, pow, erf, and gamma families o Significant improvements in double-precision FFT performance on Fermi-architecture GPUs for 2^n transform sizes o Streaming API now supported in CUBLAS for overlapping copy and compute operations o CUFFT Real-to-complex (R2C) and complex-to-real (C2R) optimizations for 2^n data sizes o Improved performance for GEMV and SYMV subroutines in CUBLAS o Optimized double-precision implementations of divide and reciprocal routines for the Fermi architecture * New and updated SDK code samples demonstrating how to use: o Function pointers in CUDA C/C++ kernels o OpenCL / Direct3D buffer sharing o Hidden Markov Model in OpenCL
|