Interesting Talks from GTC 17

A month ago, I traveled to San Jose, CA, to visit the GPU Technology Conference, GTC, and learn the latest on all things GPU.

The schedule was super packed and at more than one time, I wasn’t able to see some interesting talk because I was already sitting in one other interesting talk.¹

Here’s a list of sessions I found interesting/noteworthy and/or want to (re)visit after the conference, sorted by topics.
Links to recordings and slides are provided. Bold indicates that I have not yet seen the talk and want to do so.

I post this only today since the materials have been private up to now.²

Volta, CUDA 9
- S7798: Inside Volta (link, recording, slides)
- S7132: CUDA 9 And Beyond (link, recording, slides)
- S7824: Developer Tools Update In CUDA 9 (link, recording, slides)
- S7622: A Robust And Scalable CUDA Parallel Programming Model [Cooperative Groups] (link, recording, slides)
- Additional read on Parallel Forall blog
  - Inside Volta: The World’s Most Advanced Data Center GPU
  - CUDA 9 Features Revealed: Volta, Cooperative Groups and More
General CUDA, GPU:
- S7495: Optimizing Application Performance With CUDA Profiling Tools (link, recording, slides)
- S7122: CUDA Optimization Tips, Tricks And Techniques (link, recording, slides)
- S7445: What The Profiler Is Telling You: Optimizing Whole Application Performance (link, recording, slides)
- S7444: What The Profiler Is Telling You: Optimizing GPU Kernels (link, recording, slides)
GPU Data Management
- S7362: Benchmarking The New Unified Memory Of CUDA 8 (link, recording)
- S7628: The Future Of GPU Data Management (link, recording, slides)
- S7285: Unified Memory On The Latest GPU Architectures (Pascal, Volta) (link, recording, slides)
- S7764: GPUs: Using HMM To Blur The Lines Between CPU And GPU Programming (link, recording, slides)
- S7128: How To Enable NVIDIA CUDA Stream Synchronous Communications Using Gpudirect (link, recording, slides)
- S7700: An Introduction To The GPU Memory Model - Presented By Acceleware (session 2 Of 4) (link, recording)
- S7628: The Future Of GPU Data Management (link, recording, slides)
Libraries, Packages, Tools
- S7150: Accelerating cuBLAS/cuDNN Using Input-aware Auto-tuning: The ISAAC Library (link, recording, slides)
- S7405: Bifrost: A Python/c++ Framework For Easy High-throughput Computing (link, recording, slides)
- S7438: Build Systems: Combining CUDA And Modern CMake (link, recording, slides)
Multi-GPU, MPI
- S7133: Multi-GPU Programming With MPI (link, recording, slides)
- S7142: Multi-GPU Programming Models (link, recording, slides)
- S7356: MVAPICH2-GDR: Pushing The Frontier Of HPC And Deep Learning (link, recording, slides)
- S7546: Multi-GPU Programming With OpenACC (link, recording, slides)
- S7155: Optimized Inter-GPU Collective Operations With NCCL (link, recording, slides)
Other Programming Models (OpenACC, OpenMP, OpenCL, Etc.)
- S7344: Kokkos - The C++ Performance Portability Programming Model (link, recording, slides)
- S7192: OmpSs+OpenACC: Multi-target Task-based Programming Model Exploiting OpenACC GPU Kernels (link, recording, slides)
- S7496: OpenCL At NVIDIA: Best Practices, Learnings, And Plans (link, recording, slides)
- S7626: A Simple Guideline For Code Optimizations On Modern Architectures With OpenACC And CUDA (link, recording, slides)
- S7636: Cache Directive Optimization In OpenACC Programming Model (link, recording, slides)
- Use-Cases
  - S7341: Using OpenAC For NGS Techniques To Create A Portable And Easy-to-use Code Base (link, recording, slides)
  - S7640: Porting C++ Applications To GPUs With OpenACC For Lattice Quantum Chromodynamics (link, recording, slides)
  - S7672: OpenACC Best Practices: Accelerating The C++ NUMECA FINE/Open CFD (link, recording, slides)
  - S7635: Comparison Of OpenACC And OpenMP4.5 Offloading: Speeding Up Simulations Of Stellar Explosions (link, recording, slides)
  - S7478: Using OpenACC To Parallelize Irregular Algorithms On GPUs (link, recording, slides)
  - S7193: Achieving Portable Performance For GTC-P With OpenACC On GPU, Multi-core CPU, And Sunway Many-core Processor (link, recording, slides)
  - S7735: GPU Acceleration Of The Higrad Computational Fluid Dynamics Code With Mixed OpenACC And CUDA Fortran (link, recording, slides)
  - S7382: GPUs Unleashed: Analysis Of Petascale Molecular Simulations With VMD (link, recording, slides)
  - S7535: Potential Field Solutions Of The Solar Corona: Converting A PCG Solver From MPI To MPI+OpenACC (link, recording)
AI, Machine Learning, Deep Learning, and Siblings
- S7457: Deep Learning Demystified (link, recording, slides)
- S7515: Eliminating The Regular Expression With Neural Networks (link, recording, slides)
- S7800: Leveraging The Power Of Google’s Cloud Machine Learning Service (presented By Google) (link, slides)
- S7860: Starting A Deep Learning Project (link, recording, slides)
- S7666: Learning Particle Physics By Example: Using Generative Adversarial Networks To Accelerate Physics (link, recording, slides)
- S7804: Tensorflow: Open Source Machine Learning (presented By Google) (link, recording)
Round Tables, Panels
- SE7142: CUDA Developer Tools Round Table (nothing on this :()
- S7564: Accelerator Programming Ecosystems (link, recording, slides)
Use-Cases, Applications
- S7332: Accelerated Astrophysics: Using NVIDIA DGX-1 To Simulate And Understand The Universe (link, recording, slides)
Others
- Python:
  - S7785: Harnessing The Power Of Anaconda For Scalable Data Science (link, recording, slides)
- S7609: Porting After Effects To The GPU (link, recording, slides)
- S7590: Passengers: Awakening VR, When Film Meet VR (link, nothing on this :()
- S7296: Cloudlighting: Merging GPU-based Hpc With Cloud Services (link, recording, slides)
- S7329: Open-source Tools For GPU Programming Assignments In Large Classroom Settings (link, recording, slides)
- S7482: Advances In Real-time Graphics At Pixar (link, unfortunately nothing else, even though I thought they said so during the session)
- S7642: Preparing GPU-accelerated Applications For The Summit Supercomputer (link, recording, slides)
Keynote (link)

The pinnacle of things was the Wednesday-4pm timeslot, when four this year new-like talks happened at the same time. Talk about parallelism. ↩
The GTC team changed the URLs when they made the talks and slides available publicly. So I needed to go through my nearly-ready-made post and re-do all the links. I was to lazy to do that manually, so… Python to the rescue! I did it with this Jupyter Notebook (.ipynb file). ↩