<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:media="http://search.yahoo.com/mrss/" xmlns:podcast="https://podcastindex.org/namespace/1.0">
  <channel>
    <atom:link href="https://feeds.simplecast.com/OB5FkIl8" rel="self" title="MP3 Audio" type="application/atom+xml"/>
    <atom:link href="https://simplecast.superfeedr.com" rel="hub" xmlns="http://www.w3.org/2005/Atom"/>
    <generator>https://simplecast.com</generator>
    <title>PyTorch Developer Podcast</title>
    <description>The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.</description>
    <copyright>2021 - PyTorch Developer Podcast</copyright>
    <language>en</language>
    <pubDate>Sun, 4 Aug 2024 01:47:46 +0000</pubDate>
    <lastBuildDate>Sun, 4 Aug 2024 01:47:56 +0000</lastBuildDate>
    <image>
      <link>https://pytorch-dev-podcast.simplecast.com</link>
      <title>PyTorch Developer Podcast</title>
      <url>https://image.simplecastcdn.com/images/8cefde76-fb46-406a-8d87-ab0df67f3423/92f11400-2dad-49b4-8b14-cce35f5ab765/3000x3000/pytorch-symbol-02-orangeondark.jpg?aid=rss_feed</url>
    </image>
    <link>https://pytorch-dev-podcast.simplecast.com</link>
    <itunes:type>episodic</itunes:type>
    <itunes:summary>The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.</itunes:summary>
    <itunes:author>Edward Yang, Team PyTorch</itunes:author>
    <itunes:explicit>false</itunes:explicit>
    <itunes:image href="https://image.simplecastcdn.com/images/8cefde76-fb46-406a-8d87-ab0df67f3423/92f11400-2dad-49b4-8b14-cce35f5ab765/3000x3000/pytorch-symbol-02-orangeondark.jpg?aid=rss_feed"/>
    <itunes:new-feed-url>https://feeds.simplecast.com/OB5FkIl8</itunes:new-feed-url>
    <itunes:keywords>deep learning, machine learning, pytorch</itunes:keywords>
    <itunes:owner>
      <itunes:name>PyTorch</itunes:name>
      <itunes:email>wookim@fb.com</itunes:email>
    </itunes:owner>
    <itunes:category text="Technology"/>
    <item>
      <guid isPermaLink="false">6c91edbc-8b45-41f7-8ae4-11cb850f71ac</guid>
      <title>Compiler collectives</title>
      <description><![CDATA[<p>Compiler collectives are a PT2 feature where by compiler instances across multiple ranks use NCCL collectives to communicate information to other instances. This is used to ensure we consistently decide if inputs or static or dynamic across all ranks. See also PR at <a href="https://github.com/pytorch/pytorch/pull/130935">https://github.com/pytorch/pytorch/pull/130935</a></p>
]]></description>
      <pubDate>Sun, 4 Aug 2024 01:47:46 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/compiler-collectives-gPI5muII</link>
      <content:encoded><![CDATA[<p>Compiler collectives are a PT2 feature where by compiler instances across multiple ranks use NCCL collectives to communicate information to other instances. This is used to ensure we consistently decide if inputs or static or dynamic across all ranks. See also PR at <a href="https://github.com/pytorch/pytorch/pull/130935">https://github.com/pytorch/pytorch/pull/130935</a></p>
]]></content:encoded>
      <enclosure length="15898472" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/8b926ae2-28cc-4424-8857-a9946fa2581b/audio/f973d03f-dfef-43be-a5af-2eff8592290c/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Compiler collectives</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:33</itunes:duration>
      <itunes:summary>Compiler collectives are a PT2 feature where by compiler instances across multiple ranks use NCCL collectives to communicate information to other instances. This is used to ensure we consistently decide if inputs or static or dynamic across all ranks. See also PR at https://github.com/pytorch/pytorch/pull/130935</itunes:summary>
      <itunes:subtitle>Compiler collectives are a PT2 feature where by compiler instances across multiple ranks use NCCL collectives to communicate information to other instances. This is used to ensure we consistently decide if inputs or static or dynamic across all ranks. See also PR at https://github.com/pytorch/pytorch/pull/130935</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>83</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">044797e7-4cd3-417a-8d37-0a8e5f4c1da5</guid>
      <title>TORCH_TRACE and tlparse</title>
      <description><![CDATA[TORCH_TRACE and tlparse are a structured log and log parser for PyTorch 2. It gives useful information about what code was compiled and what the intermediate build products look like.
]]></description>
      <pubDate>Mon, 29 Apr 2024 00:01:28 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torch-trace-and-tlparse-fH21m_7y</link>
      <enclosure length="14849430" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/3cf7caa8-d90d-4dd6-b519-06795ad301e4/audio/18ec52d7-7243-4ad2-b6cc-94c5a154eef3/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>TORCH_TRACE and tlparse</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:28</itunes:duration>
      <itunes:summary>TORCH_TRACE and tlparse are a structured log and log parser for PyTorch 2. It gives useful information about what code was compiled and what the intermediate build products look like.</itunes:summary>
      <itunes:subtitle>TORCH_TRACE and tlparse are a structured log and log parser for PyTorch 2. It gives useful information about what code was compiled and what the intermediate build products look like.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>82</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">e4ec9c7f-1667-4771-beb8-05e494948b63</guid>
      <title>Higher order operators</title>
      <description><![CDATA[Higher order operators are a special form of operators in torch.ops which have relaxed input argument requirements: in particular, they can accept any form of argument, including Python callables. Their name is based off of their most common use case, which is to represent higher order functions like control flow operators. However, they are also used to implement other variants of basic operators and can also be used to smuggle in Python data that is quite unusual. They are implemented using a Python dispatcher.
]]></description>
      <pubDate>Sun, 21 Apr 2024 19:28:57 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/higher-order-operators-nMWmslFS</link>
      <enclosure length="16481978" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/066fb3b0-3207-43d0-a642-63c656dc7efd/audio/1b0adbee-2a95-47f9-b4fa-789c9bb92cca/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Higher order operators</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:10</itunes:duration>
      <itunes:summary>Higher order operators are a special form of operators in torch.ops which have relaxed input argument requirements: in particular, they can accept any form of argument, including Python callables. Their name is based off of their most common use case, which is to represent higher order functions like control flow operators. However, they are also used to implement other variants of basic operators and can also be used to smuggle in Python data that is quite unusual. They are implemented using a Python dispatcher.</itunes:summary>
      <itunes:subtitle>Higher order operators are a special form of operators in torch.ops which have relaxed input argument requirements: in particular, they can accept any form of argument, including Python callables. Their name is based off of their most common use case, which is to represent higher order functions like control flow operators. However, they are also used to implement other variants of basic operators and can also be used to smuggle in Python data that is quite unusual. They are implemented using a Python dispatcher.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>81</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">5f5f4bce-3612-44b6-812c-900495f146b2</guid>
      <title>Inductor - Post-grad FX passes</title>
      <description><![CDATA[The post-grad FX passes in Inductor run after AOTAutograd has functionalized and normalized the input program into separate forward/backward graphs. As such, they generally can assume that the graph in question is functionalized, except for some mutations to inputs at the end of the graph. At the end of post-grad passes, there are special passes that reintroduce mutation into the graph before going into the rest of Inductor lowering which is generally aware of passes. The post-grad FX passes are varied but are typically domain specific passes making local changes to specific parts of the graph.
]]></description>
      <pubDate>Fri, 12 Apr 2024 07:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/inductor-post-grad-fx-passes-9FGImlU5</link>
      <enclosure length="23160556" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/3a53b264-3648-4d33-9620-d013c0d14479/audio/8e191f04-9675-4809-bf1b-191f57d9a2fc/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Inductor - Post-grad FX passes</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:24:07</itunes:duration>
      <itunes:summary>The post-grad FX passes in Inductor run after AOTAutograd has functionalized and normalized the input program into separate forward/backward graphs. As such, they generally can assume that the graph in question is functionalized, except for some mutations to inputs at the end of the graph. At the end of post-grad passes, there are special passes that reintroduce mutation into the graph before going into the rest of Inductor lowering which is generally aware of passes. The post-grad FX passes are varied but are typically domain specific passes making local changes to specific parts of the graph.</itunes:summary>
      <itunes:subtitle>The post-grad FX passes in Inductor run after AOTAutograd has functionalized and normalized the input program into separate forward/backward graphs. As such, they generally can assume that the graph in question is functionalized, except for some mutations to inputs at the end of the graph. At the end of post-grad passes, there are special passes that reintroduce mutation into the graph before going into the rest of Inductor lowering which is generally aware of passes. The post-grad FX passes are varied but are typically domain specific passes making local changes to specific parts of the graph.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>80</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">5b01e402-e278-44c3-80bf-58f2472d7cbb</guid>
      <title>CUDA graph trees</title>
      <description><![CDATA[CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode="reduce-overhead". Their primary innovation is that they allow the reuse of memory across multiple CUDA graphs, as long as they form a tree structure of potential paths you can go down with the CUDA graph. This greatly reduced the memory usage of CUDA graphs in PT2. There are some operational implications to using CUDA graphs which are described in the podcast.
]]></description>
      <pubDate>Sun, 24 Mar 2024 07:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/cuda-graph-trees-R6rtIpa4</link>
      <enclosure length="20008712" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/0ae0917b-1da6-4845-9b8d-dc96abc4ba23/audio/ca808b2a-0deb-4748-b82a-91370018d0f9/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>CUDA graph trees</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:20:50</itunes:duration>
      <itunes:summary>CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode=&quot;reduce-overhead&quot;. Their primary innovation is that they allow the reuse of memory across multiple CUDA graphs, as long as they form a tree structure of potential paths you can go down with the CUDA graph. This greatly reduced the memory usage of CUDA graphs in PT2. There are some operational implications to using CUDA graphs which are described in the podcast.</itunes:summary>
      <itunes:subtitle>CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode=&quot;reduce-overhead&quot;. Their primary innovation is that they allow the reuse of memory across multiple CUDA graphs, as long as they form a tree structure of potential paths you can go down with the CUDA graph. This greatly reduced the memory usage of CUDA graphs in PT2. There are some operational implications to using CUDA graphs which are described in the podcast.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>79</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">977883d0-be71-4012-acc4-33adc805690c</guid>
      <title>Min-cut partitioner</title>
      <description><![CDATA[The min-cut partitioner makes decisions about what to save for backwards when splitting the forward and backwards graph from the joint graph traced by AOTAutograd. Crucially, it doesn't actually do a "split"; instead, it is deciding how much of the joint graph should be used for backwards. I also talk about the backward retracing problem.
]]></description>
      <pubDate>Sun, 17 Mar 2024 07:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/min-cut-partitioner-MvcSsUpR</link>
      <enclosure length="15297061" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/d2733990-9661-4f01-b618-73278e4757fc/audio/48c3ded8-65dd-4ca3-b42a-844618409cc3/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Min-cut partitioner</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:56</itunes:duration>
      <itunes:summary>The min-cut partitioner makes decisions about what to save for backwards when splitting the forward and backwards graph from the joint graph traced by AOTAutograd. Crucially, it doesn&apos;t actually do a &quot;split&quot;; instead, it is deciding how much of the joint graph should be used for backwards. I also talk about the backward retracing problem.</itunes:summary>
      <itunes:subtitle>The min-cut partitioner makes decisions about what to save for backwards when splitting the forward and backwards graph from the joint graph traced by AOTAutograd. Crucially, it doesn&apos;t actually do a &quot;split&quot;; instead, it is deciding how much of the joint graph should be used for backwards. I also talk about the backward retracing problem.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>78</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">d36f4075-6f41-4ff2-b141-d9833da1682c</guid>
      <title>AOTInductor</title>
      <description><![CDATA[AOTInductor is a feature in PyTorch that lets you export an inference model into a self-contained dynamic library, which can subsequently be loaded and used to run optimized inference. It is aimed primarily at CUDA and CPU inference applications, for situations when your model export once to be exported once while your runtime may still get continuous updates. One of the big underlying organizing principles is a limited ABI which does not include libtorch, which allows these libraries to stay stable over updates to the runtime. There are many export-like use cases you might be interested in using AOTInductor for, and some of the pieces should be useful, but AOTInductor does not necessarily solve them.
]]></description>
      <pubDate>Sat, 2 Mar 2024 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/aotinductor-dpJGWW20</link>
      <enclosure length="16806721" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/af8dcbdd-28a4-473f-85a6-67baeebccd3a/audio/22ea17ef-f81c-4dc2-b8ca-4dde80b4214a/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>AOTInductor</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:30</itunes:duration>
      <itunes:summary>AOTInductor is a feature in PyTorch that lets you export an inference model into a self-contained dynamic library, which can subsequently be loaded and used to run optimized inference. It is aimed primarily at CUDA and CPU inference applications, for situations when your model export once to be exported once while your runtime may still get continuous updates. One of the big underlying organizing principles is a limited ABI which does not include libtorch, which allows these libraries to stay stable over updates to the runtime. There are many export-like use cases you might be interested in using AOTInductor for, and some of the pieces should be useful, but AOTInductor does not necessarily solve them.</itunes:summary>
      <itunes:subtitle>AOTInductor is a feature in PyTorch that lets you export an inference model into a self-contained dynamic library, which can subsequently be loaded and used to run optimized inference. It is aimed primarily at CUDA and CPU inference applications, for situations when your model export once to be exported once while your runtime may still get continuous updates. One of the big underlying organizing principles is a limited ABI which does not include libtorch, which allows these libraries to stay stable over updates to the runtime. There are many export-like use cases you might be interested in using AOTInductor for, and some of the pieces should be useful, but AOTInductor does not necessarily solve them.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>77</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">97b948a2-cb3b-4d25-bde5-539e3d6a3bce</guid>
      <title>Tensor subclasses and PT2</title>
      <description><![CDATA[Tensor subclasses allow you to add extend PyTorch with new types of tensors without having to write any C++. They have been used to implement DTensor, FP8, Nested Jagged Tensor and Complex Tensor. Recent work by Brian Hirsh means that we can compile tensor subclasses in PT2, eliminating their overhead. The basic mechanism by which this compilation works is a desugaring process in AOTAutograd. There are some complications involving views, dynamic shapes and tangent metadata mismatch.
]]></description>
      <pubDate>Sat, 24 Feb 2024 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/tensor-subclasses-and-pt2-kxKGk1jm</link>
      <enclosure length="12882934" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/78eea40f-85a6-4c09-a252-c455b1b9a724/audio/c4ec1323-dd41-412f-bbd6-c11d9ffc5c02/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Tensor subclasses and PT2</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:25</itunes:duration>
      <itunes:summary>Tensor subclasses allow you to add extend PyTorch with new types of tensors without having to write any C++. They have been used to implement DTensor, FP8, Nested Jagged Tensor and Complex Tensor. Recent work by Brian Hirsh means that we can compile tensor subclasses in PT2, eliminating their overhead. The basic mechanism by which this compilation works is a desugaring process in AOTAutograd. There are some complications involving views, dynamic shapes and tangent metadata mismatch.</itunes:summary>
      <itunes:subtitle>Tensor subclasses allow you to add extend PyTorch with new types of tensors without having to write any C++. They have been used to implement DTensor, FP8, Nested Jagged Tensor and Complex Tensor. Recent work by Brian Hirsh means that we can compile tensor subclasses in PT2, eliminating their overhead. The basic mechanism by which this compilation works is a desugaring process in AOTAutograd. There are some complications involving views, dynamic shapes and tangent metadata mismatch.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>76</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">c7782638-f46a-48d7-987c-d41474d6026d</guid>
      <title>Compiled autograd</title>
      <description><![CDATA[Compiled autograd is an extension to PT2 that permits compiling the entirety of a backward() call in PyTorch. This allows us to fuse accumulate grad nodes as well as trace through arbitrarily complicated Python backward hooks. Compiled autograd is an important part of our plans for compiled DDP/FSDP as well as for whole-graph compilation.
]]></description>
      <pubDate>Mon, 19 Feb 2024 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/compiled-autograd-TCcEyBRZ</link>
      <enclosure length="17396468" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/34210e6b-b4a1-4b02-aa71-d689d4859a60/audio/1cb1312b-5114-4998-982b-2ff178b2340e/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Compiled autograd</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:18:07</itunes:duration>
      <itunes:summary>Compiled autograd is an extension to PT2 that permits compiling the entirety of a backward() call in PyTorch. This allows us to fuse accumulate grad nodes as well as trace through arbitrarily complicated Python backward hooks. Compiled autograd is an important part of our plans for compiled DDP/FSDP as well as for whole-graph compilation.</itunes:summary>
      <itunes:subtitle>Compiled autograd is an extension to PT2 that permits compiling the entirety of a backward() call in PyTorch. This allows us to fuse accumulate grad nodes as well as trace through arbitrarily complicated Python backward hooks. Compiled autograd is an important part of our plans for compiled DDP/FSDP as well as for whole-graph compilation.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>75</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">58e7bdcb-875c-4eca-a452-579c90d253c4</guid>
      <title>PT2 extension points</title>
      <description><![CDATA[We discuss some extension points for customizing PT2 behavior across Dynamo, AOTAutograd and Inductor.
]]></description>
      <pubDate>Mon, 5 Feb 2024 09:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/pt2-extension-points-4nxVX5Yr</link>
      <enclosure length="15268511" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/ce84f21c-9199-47f7-a2ee-f78e440692f8/audio/60f894cb-d8f8-49ae-b8df-a5951a840c32/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>PT2 extension points</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:54</itunes:duration>
      <itunes:summary>We discuss some extension points for customizing PT2 behavior across Dynamo, AOTAutograd and Inductor.</itunes:summary>
      <itunes:subtitle>We discuss some extension points for customizing PT2 behavior across Dynamo, AOTAutograd and Inductor.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>74</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">3cd9ee37-2b7a-46db-a716-435bd1b4632b</guid>
      <title>Inductor - Define-by-run IR</title>
      <description><![CDATA[Define-by-run IR is how Inductor defines the internal compute of a pointwise/reduction operation. It is characterized by a function that calls a number of functions in the 'ops' namespace, where these ops can be overridden by different handlers depending on what kind of semantic analysis you need to do. The ops Inductor supports include regular arithmetic operators, but also memory load/store, indirect indexing, masking and collective operations like reductions.
]]></description>
      <pubDate>Wed, 24 Jan 2024 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/inductor-define-by-run-ir-HvmBA7Yi</link>
      <enclosure length="11625578" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/cad40045-6886-4458-bd7d-5fd7e547ce4b/audio/c8cee170-d6a2-4653-b8e9-3066110c56ea/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Inductor - Define-by-run IR</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:12:06</itunes:duration>
      <itunes:summary>Define-by-run IR is how Inductor defines the internal compute of a pointwise/reduction operation. It is characterized by a function that calls a number of functions in the &apos;ops&apos; namespace, where these ops can be overridden by different handlers depending on what kind of semantic analysis you need to do. The ops Inductor supports include regular arithmetic operators, but also memory load/store, indirect indexing, masking and collective operations like reductions.</itunes:summary>
      <itunes:subtitle>Define-by-run IR is how Inductor defines the internal compute of a pointwise/reduction operation. It is characterized by a function that calls a number of functions in the &apos;ops&apos; namespace, where these ops can be overridden by different handlers depending on what kind of semantic analysis you need to do. The ops Inductor supports include regular arithmetic operators, but also memory load/store, indirect indexing, masking and collective operations like reductions.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>73</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">7983d4f7-0530-4cc4-9021-b8a32d1f15fb</guid>
      <title>Unsigned integers</title>
      <description><![CDATA[Traditionally, unsigned integer support in PyTorch was not great; we only support uint8. Recently, we added support for uint16, uint32 and uint64. Bare bones functionality works, but I'm entreating the community to help us build out the rest. In particular, for most operations, we plan to use PT2 to build anything else. But if you have an eager kernel you really need, send us a PR and we'll put it in. While most of the implementation was straightforward, there are some weirdnesses related to type promotion inconsistencies with numpy and dealing with the upper range of uint64. There is also upcoming support for sub-byte dtypes uint1-7, and these will exclusively be implemented via PT2.
]]></description>
      <pubDate>Wed, 17 Jan 2024 14:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/unsigned-integers-doHhgC7m</link>
      <enclosure length="12594408" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/101dbfe4-d0cf-4a2d-9db0-3d47d09be563/audio/20bc062d-cd58-4997-b067-d317153dcefd/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Unsigned integers</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:07</itunes:duration>
      <itunes:summary>Traditionally, unsigned integer support in PyTorch was not great; we only support uint8. Recently, we added support for uint16, uint32 and uint64. Bare bones functionality works, but I&apos;m entreating the community to help us build out the rest. In particular, for most operations, we plan to use PT2 to build anything else. But if you have an eager kernel you really need, send us a PR and we&apos;ll put it in. While most of the implementation was straightforward, there are some weirdnesses related to type promotion inconsistencies with numpy and dealing with the upper range of uint64. There is also upcoming support for sub-byte dtypes uint1-7, and these will exclusively be implemented via PT2.</itunes:summary>
      <itunes:subtitle>Traditionally, unsigned integer support in PyTorch was not great; we only support uint8. Recently, we added support for uint16, uint32 and uint64. Bare bones functionality works, but I&apos;m entreating the community to help us build out the rest. In particular, for most operations, we plan to use PT2 to build anything else. But if you have an eager kernel you really need, send us a PR and we&apos;ll put it in. While most of the implementation was straightforward, there are some weirdnesses related to type promotion inconsistencies with numpy and dealing with the upper range of uint64. There is also upcoming support for sub-byte dtypes uint1-7, and these will exclusively be implemented via PT2.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>72</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">1ceecf2e-adcf-406d-b041-a5549a43d04b</guid>
      <title>Inductor - IR</title>
      <description><![CDATA[<p>Inductor IR is an intermediate representation that lives between ATen FX graphs and the final Triton code generated by Inductor. It was designed to faithfully represent PyTorch semantics and accordingly models views, mutation and striding. When you write a lowering from ATen operators to Inductor IR, you get a TensorBox for each Tensor argument which contains a reference to the underlying IR (via StorageBox, and then a Buffer/ComputedBuffer) that says how the Tensor was computed. The inner computation is represented via define-by-run, which allows for compact definition of IR representation, while still allowing you to extract an FX graph out if you desire. Scheduling then takes buffers of inductor IR and decides what can be fused. Inductor IR may have too many nodes, this would be a good thing to refactor in the future.</p>
]]></description>
      <pubDate>Tue, 16 Jan 2024 09:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/inductor-ir-khd8zpaZ</link>
      <content:encoded><![CDATA[<p>Inductor IR is an intermediate representation that lives between ATen FX graphs and the final Triton code generated by Inductor. It was designed to faithfully represent PyTorch semantics and accordingly models views, mutation and striding. When you write a lowering from ATen operators to Inductor IR, you get a TensorBox for each Tensor argument which contains a reference to the underlying IR (via StorageBox, and then a Buffer/ComputedBuffer) that says how the Tensor was computed. The inner computation is represented via define-by-run, which allows for compact definition of IR representation, while still allowing you to extract an FX graph out if you desire. Scheduling then takes buffers of inductor IR and decides what can be fused. Inductor IR may have too many nodes, this would be a good thing to refactor in the future.</p>
]]></content:encoded>
      <enclosure length="17290701" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/a9136018-0c52-4c8e-b0f5-9a50be3cad70/audio/2741bbc9-dfa8-4e33-ae20-33964b63a986/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Inductor - IR</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:18:00</itunes:duration>
      <itunes:summary>Inductor IR is an intermediate representation that lives between ATen FX graphs and the final Triton code generated by Inductor. It was designed to faithfully represent PyTorch semantics and accordingly models views, mutation and striding. When you write a lowering from ATen operators to Inductor IR, you get a TensorBox for each Tensor argument which contains a reference to the underlying IR (via StorageBox, and then a Buffer/ComputedBuffer) that says how the Tensor was computed. The inner computation is represented via define-by-run, which allows for compact definition of IR representation, while still allowing you to extract an FX graph out if you desire. Scheduling then takes buffers of inductor IR and decides what can be fused. Inductor IR may have too many nodes, this would be a good thing to refactor in the future.</itunes:summary>
      <itunes:subtitle>Inductor IR is an intermediate representation that lives between ATen FX graphs and the final Triton code generated by Inductor. It was designed to faithfully represent PyTorch semantics and accordingly models views, mutation and striding. When you write a lowering from ATen operators to Inductor IR, you get a TensorBox for each Tensor argument which contains a reference to the underlying IR (via StorageBox, and then a Buffer/ComputedBuffer) that says how the Tensor was computed. The inner computation is represented via define-by-run, which allows for compact definition of IR representation, while still allowing you to extract an FX graph out if you desire. Scheduling then takes buffers of inductor IR and decides what can be fused. Inductor IR may have too many nodes, this would be a good thing to refactor in the future.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>71</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">c5405c1b-952b-4e02-8500-9041593e849f</guid>
      <title>Dynamo - VariableTracker</title>
      <description><![CDATA[<p>I talk about VariableTracker in Dynamo. VariableTracker is Dynamo's representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker (<a href="https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6">https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6</a>).</p>
]]></description>
      <pubDate>Fri, 12 Jan 2024 17:40:41 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/dynamo-variabletracker-Pl1oIjfj</link>
      <content:encoded><![CDATA[<p>I talk about VariableTracker in Dynamo. VariableTracker is Dynamo's representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker (<a href="https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6">https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6</a>).</p>
]]></content:encoded>
      <enclosure length="15282019" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/0210853a-9259-4159-9eab-1efab3caadcf/audio/f9c73ef7-037d-44bb-89cf-7b56c6f4fced/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Dynamo - VariableTracker</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:55</itunes:duration>
      <itunes:summary>I talk about VariableTracker in Dynamo. VariableTracker is Dynamo&apos;s representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker (https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6).</itunes:summary>
      <itunes:subtitle>I talk about VariableTracker in Dynamo. VariableTracker is Dynamo&apos;s representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker (https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6).</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>70</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">a1e2b2b6-bb47-46c6-b3ed-743d246cb82b</guid>
      <title>Unbacked SymInts</title>
      <description><![CDATA[<p>This podcast goes over the basics of unbacked SymInts. You might want to listen to this one before listening to https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization Some questions we answer (h/t from Gregory Chanan):</p><p> </p><p>- Are unbacked symints only for export?  Because otherwise I could just break / wait for the actual size.  But maybe I can save some retracing / graph breaks perf if I have them too?  So the correct statement is "primarily" for export?</p><p>- Why am I looking into the broadcasting code at all?  Naively, I would expect the export graph to be just a list of ATen ops strung together.  Why do I recurse that far down?  Why can't I annotate DONT_TRACE_ME_BRO?</p><p>- How does 0/1 specialization fit into this?  I understand we may want to 0/1 specialize in a dynamic shape regime in "eager" mode (is there a better term?), but that doesn't seem to matter for export?</p><p>- So far we've mainly been talking about how to handle our own library code.  There is a worry about pushing complicated constraints downstream, similar to torchscript.  What constraints does this actually push?</p>
]]></description>
      <pubDate>Tue, 21 Feb 2023 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/unbacked-symints-oyqu0_P6</link>
      <content:encoded><![CDATA[<p>This podcast goes over the basics of unbacked SymInts. You might want to listen to this one before listening to https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization Some questions we answer (h/t from Gregory Chanan):</p><p> </p><p>- Are unbacked symints only for export?  Because otherwise I could just break / wait for the actual size.  But maybe I can save some retracing / graph breaks perf if I have them too?  So the correct statement is "primarily" for export?</p><p>- Why am I looking into the broadcasting code at all?  Naively, I would expect the export graph to be just a list of ATen ops strung together.  Why do I recurse that far down?  Why can't I annotate DONT_TRACE_ME_BRO?</p><p>- How does 0/1 specialization fit into this?  I understand we may want to 0/1 specialize in a dynamic shape regime in "eager" mode (is there a better term?), but that doesn't seem to matter for export?</p><p>- So far we've mainly been talking about how to handle our own library code.  There is a worry about pushing complicated constraints downstream, similar to torchscript.  What constraints does this actually push?</p>
]]></content:encoded>
      <enclosure length="20666161" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/48fa564f-76b4-4aa9-825f-ecf4a3611339/audio/9e2a7fda-1bfe-47e0-b845-7c423aa27b13/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Unbacked SymInts</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:21:31</itunes:duration>
      <itunes:summary>This podcast goes over the basics of unbacked SymInts. You might want to listen to this one before listening to https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization Some questions we answer (h/t from Gregory Chanan):

- Are unbacked symints only for export?  Because otherwise I could just break / wait for the actual size.  But maybe I can save some retracing / graph breaks perf if I have them too?  So the correct statement is &quot;primarily&quot; for export?
- Why am I looking into the broadcasting code at all?  Naively, I would expect the export graph to be just a list of ATen ops strung together.  Why do I recurse that far down?  Why can&apos;t I annotate DONT_TRACE_ME_BRO?
- How does 0/1 specialization fit into this?  I understand we may want to 0/1 specialize in a dynamic shape regime in &quot;eager&quot; mode (is there a better term?), but that doesn&apos;t seem to matter for export?
- So far we&apos;ve mainly been talking about how to handle our own library code.  There is a worry about pushing complicated constraints downstream, similar to torchscript.  What constraints does this actually push?</itunes:summary>
      <itunes:subtitle>This podcast goes over the basics of unbacked SymInts. You might want to listen to this one before listening to https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization Some questions we answer (h/t from Gregory Chanan):

- Are unbacked symints only for export?  Because otherwise I could just break / wait for the actual size.  But maybe I can save some retracing / graph breaks perf if I have them too?  So the correct statement is &quot;primarily&quot; for export?
- Why am I looking into the broadcasting code at all?  Naively, I would expect the export graph to be just a list of ATen ops strung together.  Why do I recurse that far down?  Why can&apos;t I annotate DONT_TRACE_ME_BRO?
- How does 0/1 specialization fit into this?  I understand we may want to 0/1 specialize in a dynamic shape regime in &quot;eager&quot; mode (is there a better term?), but that doesn&apos;t seem to matter for export?
- So far we&apos;ve mainly been talking about how to handle our own library code.  There is a worry about pushing complicated constraints downstream, similar to torchscript.  What constraints does this actually push?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>69</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">85fac632-606c-4478-adbc-5413cbfa67a6</guid>
      <title>Zero-one specialization</title>
      <description><![CDATA[<p>Mikey Dagistes joins me to ask some questions about the recent recent composability sync https://www.youtube.com/watch?v=NJV7YFbtoR4 where we discussed 0/1 specialization and its implications on export in PT2. What's the fuss all about? What do I need to understand about PT2 to understand why 0/1 specialization is a thing?</p>
]]></description>
      <pubDate>Mon, 20 Feb 2023 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization-blqQ5jde</link>
      <content:encoded><![CDATA[<p>Mikey Dagistes joins me to ask some questions about the recent recent composability sync https://www.youtube.com/watch?v=NJV7YFbtoR4 where we discussed 0/1 specialization and its implications on export in PT2. What's the fuss all about? What do I need to understand about PT2 to understand why 0/1 specialization is a thing?</p>
]]></content:encoded>
      <enclosure length="20272869" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/36f235df-3772-45d0-86f0-2a426dea2dc3/audio/7cb27d20-7fb6-492c-86bb-25dbf44c2f5e/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Zero-one specialization</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:21:07</itunes:duration>
      <itunes:summary>Mikey Dagistes joins me to ask some questions about the recent recent composability sync https://www.youtube.com/watch?v=NJV7YFbtoR4 where we discussed 0/1 specialization and its implications on export in PT2. What&apos;s the fuss all about? What do I need to understand about PT2 to understand why 0/1 specialization is a thing?</itunes:summary>
      <itunes:subtitle>Mikey Dagistes joins me to ask some questions about the recent recent composability sync https://www.youtube.com/watch?v=NJV7YFbtoR4 where we discussed 0/1 specialization and its implications on export in PT2. What&apos;s the fuss all about? What do I need to understand about PT2 to understand why 0/1 specialization is a thing?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>68</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">197fd798-8bee-4242-a31c-5795b3ec88db</guid>
      <title>torchdynamo</title>
      <description><![CDATA[<p>What is torchdynamo? From a bird's eye view, what exactly does it do? What are some important things to know about it? How does it differ from other graph capture mechanisms?</p><p>For more reading, check out https://docs.google.com/document/d/13K03JN4gkbr40UMiW4nbZYtsw8NngQwrTRnL3knetGM/edit#</p>
]]></description>
      <pubDate>Tue, 6 Dec 2022 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torchdynamo-2nQoMU76</link>
      <content:encoded><![CDATA[<p>What is torchdynamo? From a bird's eye view, what exactly does it do? What are some important things to know about it? How does it differ from other graph capture mechanisms?</p><p>For more reading, check out https://docs.google.com/document/d/13K03JN4gkbr40UMiW4nbZYtsw8NngQwrTRnL3knetGM/edit#</p>
]]></content:encoded>
      <enclosure length="24573632" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/24135642-b919-448b-9b78-45d8e7a7999b/audio/057595ff-70aa-463a-a318-36362a28d7b9/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>torchdynamo</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:25:35</itunes:duration>
      <itunes:summary>What is torchdynamo? From a bird&apos;s eye view, what exactly does it do? What are some important things to know about it? How does it differ from other graph capture mechanisms?</itunes:summary>
      <itunes:subtitle>What is torchdynamo? From a bird&apos;s eye view, what exactly does it do? What are some important things to know about it? How does it differ from other graph capture mechanisms?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>67</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">36bcabbc-41de-4db6-b401-577a6d237679</guid>
      <title>PyTorch 2.0</title>
      <description><![CDATA[<ul><li>Soumith's keynote on PT2.0: https://youtu.be/vbtGZL7IrAw?t=1037</li><li>PT2 Manifesto: https://docs.google.com/document/d/1tlgPcR2YmC3PcQuYDPUORFmEaBPQEmo8dsh4eUjnlyI/edit# </li><li>PT2 Architecture: https://docs.google.com/document/d/1wpv8D2iwGkKjWyKof9gFdTf8ISszKbq1tsMVm-3hSuU/edit#</li></ul>
]]></description>
      <pubDate>Sun, 4 Dec 2022 08:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/pytorch-20-etEDEfts</link>
      <content:encoded><![CDATA[<ul><li>Soumith's keynote on PT2.0: https://youtu.be/vbtGZL7IrAw?t=1037</li><li>PT2 Manifesto: https://docs.google.com/document/d/1tlgPcR2YmC3PcQuYDPUORFmEaBPQEmo8dsh4eUjnlyI/edit# </li><li>PT2 Architecture: https://docs.google.com/document/d/1wpv8D2iwGkKjWyKof9gFdTf8ISszKbq1tsMVm-3hSuU/edit#</li></ul>
]]></content:encoded>
      <enclosure length="17141901" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/ec2e2978-e8e2-4eb2-b0e7-c9507a9194ac/audio/30e8068a-45ec-42dd-a229-37788014b891/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>PyTorch 2.0</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:51</itunes:duration>
      <itunes:summary>What is PyTorch 2.0? What makes it different from all the other compilation stories we&apos;ve had for PyTorch in the past? What is the bird&apos;s eye view of the episodes that will be coming soon?</itunes:summary>
      <itunes:subtitle>What is PyTorch 2.0? What makes it different from all the other compilation stories we&apos;ve had for PyTorch in the past? What is the bird&apos;s eye view of the episodes that will be coming soon?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>66</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">19adc381-02ce-4e1c-b324-9f95989fbd43</guid>
      <title>History of functorch</title>
      <description><![CDATA[<p>Join me with Richard Zou to talk about the history of functorch. What was the thought process behind the creation of functorch? How did it get started? JAX’s API and model is fairly different from PyTorch’s, how did we validate that it would work in PyTorch? Where did functorch go after the early user studies? Where is it going next?</p>
]]></description>
      <pubDate>Mon, 7 Nov 2022 23:14:31 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/history-of-functorch-vSVuVIbo</link>
      <content:encoded><![CDATA[<p>Join me with Richard Zou to talk about the history of functorch. What was the thought process behind the creation of functorch? How did it get started? JAX’s API and model is fairly different from PyTorch’s, how did we validate that it would work in PyTorch? Where did functorch go after the early user studies? Where is it going next?</p>
]]></content:encoded>
      <enclosure length="18406260" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/948f0bbe-c9c9-4a66-bb8a-8c82d480f687/audio/e3aa8eaf-da71-4f3e-8ccf-4261863d76eb/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>History of functorch</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:19:10</itunes:duration>
      <itunes:summary>Join me with Richard Zou to talk about the history of functorch. What was the thought process behind the creation of functorch? How did it get started? JAX’s API and model is fairly different from PyTorch’s, how did we validate that it would work in PyTorch? Where did functorch go after the early user studies? Where is it going next?</itunes:summary>
      <itunes:subtitle>Join me with Richard Zou to talk about the history of functorch. What was the thought process behind the creation of functorch? How did it get started? JAX’s API and model is fairly different from PyTorch’s, how did we validate that it would work in PyTorch? Where did functorch go after the early user studies? Where is it going next?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>65</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">3f89fc6d-a81e-4928-8cd6-4b71833934b4</guid>
      <title>Learning rate schedulers</title>
      <description><![CDATA[<p>What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?</p>
]]></description>
      <pubDate>Mon, 13 Jun 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/learning-rate-schedulers-B9b9ZQ4p</link>
      <content:encoded><![CDATA[<p>What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?</p>
]]></content:encoded>
      <enclosure length="18810013" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/36c5cbcc-f888-4b7d-9a1b-0c3c9b6e2d36/audio/89e2c9e5-44ff-4087-91f3-b23883ed60e3/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Learning rate schedulers</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:19:35</itunes:duration>
      <itunes:summary>What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?</itunes:summary>
      <itunes:subtitle>What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>64</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">910ca93a-f2d1-4ca9-9007-08586625c010</guid>
      <title>Weak references</title>
      <description><![CDATA[<p>What are they good for? (Caches. Private fields.) C++ side support, how it’s implemented / release resources. Python side support, how it’s implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok’s release resources optimization.</p><p>Other episodes to listen to first: <a href="https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting">https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting</a> <a href="https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation">https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation</a></p>
]]></description>
      <pubDate>Mon, 6 Jun 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/weak-references-Eafg3LOB</link>
      <content:encoded><![CDATA[<p>What are they good for? (Caches. Private fields.) C++ side support, how it’s implemented / release resources. Python side support, how it’s implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok’s release resources optimization.</p><p>Other episodes to listen to first: <a href="https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting">https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting</a> <a href="https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation">https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation</a></p>
]]></content:encoded>
      <enclosure length="16104136" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/471d0ea6-1282-429a-8538-9ac79234d3ab/audio/e6a519f6-0b5a-4029-9350-c3e0a9efd504/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Weak references</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:46</itunes:duration>
      <itunes:summary>What are they good for? (Caches. Private fields.) C++ side support, how it’s implemented / release resources. Python side support, how it’s implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok’s release resources optimization.</itunes:summary>
      <itunes:subtitle>What are they good for? (Caches. Private fields.) C++ side support, how it’s implemented / release resources. Python side support, how it’s implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok’s release resources optimization.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>63</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">f8b04ddc-5ae1-4c22-836e-4e3aac8e429f</guid>
      <title>Strides</title>
      <description><![CDATA[<p>Mike Ruberry has an RFC about stride-agnostic operator semantics (<a href="https://github.com/pytorch/pytorch/issues/78050">https://github.com/pytorch/pytorch/issues/78050</a>), so let's talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them?</p><p>My blog post that covers strides along with other topics can be found at <a href="http://blog.ezyang.com/2019/05/pytorch-internals/">http://blog.ezyang.com/2019/05/pytorch-internals/</a></p>
]]></description>
      <pubDate>Mon, 30 May 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/strides-J8T_8Obe</link>
      <content:encoded><![CDATA[<p>Mike Ruberry has an RFC about stride-agnostic operator semantics (<a href="https://github.com/pytorch/pytorch/issues/78050">https://github.com/pytorch/pytorch/issues/78050</a>), so let's talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them?</p><p>My blog post that covers strides along with other topics can be found at <a href="http://blog.ezyang.com/2019/05/pytorch-internals/">http://blog.ezyang.com/2019/05/pytorch-internals/</a></p>
]]></content:encoded>
      <enclosure length="19708190" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/882a8c9f-a7ed-482d-b5fa-2234a4d76859/audio/a32cb0cb-d2cf-4536-b338-f80186740c5b/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Strides</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:20:31</itunes:duration>
      <itunes:summary>Mike Ruberry has an RFC about stride-agnostic operator semantics (https://github.com/pytorch/pytorch/issues/78050), so let&apos;s talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them?

My blog post that covers strides along with other topics can be found at http://blog.ezyang.com/2019/05/pytorch-internals/</itunes:summary>
      <itunes:subtitle>Mike Ruberry has an RFC about stride-agnostic operator semantics (https://github.com/pytorch/pytorch/issues/78050), so let&apos;s talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them?

My blog post that covers strides along with other topics can be found at http://blog.ezyang.com/2019/05/pytorch-internals/</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>62</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">a6a7c41b-b6bc-4543-947a-e4fdb41f033f</guid>
      <title>AOTAutograd</title>
      <description><![CDATA[<p>AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.</p>
]]></description>
      <pubDate>Mon, 9 May 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/aotautograd-qW76vMmk</link>
      <content:encoded><![CDATA[<p>AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.</p>
]]></content:encoded>
      <enclosure length="18445121" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/25029844-d956-4a08-aa14-ea89d026d8d8/audio/8fea30a7-c403-45c6-82bd-da00510cec8e/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>AOTAutograd</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:19:12</itunes:duration>
      <itunes:summary>AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.</itunes:summary>
      <itunes:subtitle>AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>61</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">150b738d-8a29-49c5-938a-22d51717b756</guid>
      <title>Dispatcher questions with Sherlock</title>
      <description><![CDATA[<p>Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock’s going to ask me some questions about the dispatcher, and I’m going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at <a href="https://youtu.be/_qB2Ho1O3u4">https://youtu.be/_qB2Ho1O3u4</a></p>
]]></description>
      <pubDate>Mon, 2 May 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/dispatcher-questions-with-sherlock-YP90Pyzi</link>
      <content:encoded><![CDATA[<p>Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock’s going to ask me some questions about the dispatcher, and I’m going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at <a href="https://youtu.be/_qB2Ho1O3u4">https://youtu.be/_qB2Ho1O3u4</a></p>
]]></content:encoded>
      <enclosure length="17857912" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/a9c99164-cac9-49af-ab8a-3f54ed5fe713/audio/a00567ca-c6a1-45eb-9266-c39fe25a3640/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Dispatcher questions with Sherlock</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:18:36</itunes:duration>
      <itunes:summary>Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock’s going to ask me some questions about the dispatcher, and I’m going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at https://youtu.be/_qB2Ho1O3u4</itunes:summary>
      <itunes:subtitle>Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock’s going to ask me some questions about the dispatcher, and I’m going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at https://youtu.be/_qB2Ho1O3u4</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>60</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">2b341bb7-580f-461f-993d-5f488def8c51</guid>
      <title>New CI</title>
      <description><![CDATA[<p>PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.</p>
]]></description>
      <pubDate>Mon, 25 Apr 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/new-ci-uTFBJiq_</link>
      <content:encoded><![CDATA[<p>PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.</p>
]]></content:encoded>
      <enclosure length="15579136" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/3d2a7b75-8a82-4a98-b2bc-facd46a4af2c/audio/685952d7-919b-4935-865b-e0eb4b2e89ae/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>New CI</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:12</itunes:duration>
      <itunes:summary>PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.</itunes:summary>
      <itunes:subtitle>PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>59</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">a95eb9bb-5fd7-432c-8daf-715cb476c156</guid>
      <title>Python exceptions</title>
      <description><![CDATA[<p>C++ has exceptions, Python has exceptions. But they’re not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it’s different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?</p>
]]></description>
      <pubDate>Sun, 17 Apr 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/python-exceptions-W0Z3br8D</link>
      <content:encoded><![CDATA[<p>C++ has exceptions, Python has exceptions. But they’re not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it’s different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?</p>
]]></content:encoded>
      <enclosure length="12754962" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/fe98d55f-593c-431e-bcbc-039b1344fd09/audio/07f8b94e-a5ea-41f6-a832-5a4f96edec6e/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Python exceptions</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:47</itunes:duration>
      <itunes:summary>C++ has exceptions, Python has exceptions. But they’re not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it’s different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?</itunes:summary>
      <itunes:subtitle>C++ has exceptions, Python has exceptions. But they’re not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it’s different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>58</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">fdcc5dc8-8d39-4b27-917b-5e6001d966f5</guid>
      <title>Torch vs ATen APIs</title>
      <description><![CDATA[<p>PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.</p>
]]></description>
      <pubDate>Mon, 11 Apr 2022 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torch-vs-aten-apis-wV6mkdfv</link>
      <content:encoded><![CDATA[<p>PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.</p>
]]></content:encoded>
      <enclosure length="14442333" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/93276ff0-acbf-4ad5-adae-46c05846de44/audio/d6c9a493-7ae1-4c2c-bc46-854ab8a6e68a/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Torch vs ATen APIs</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:03</itunes:duration>
      <itunes:summary>PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.</itunes:summary>
      <itunes:subtitle>PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.</itunes:subtitle>
      <itunes:keywords>api, pytorch, frontend, torch, aten</itunes:keywords>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>57</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">acafe094-bf1a-4952-a7e1-6199599c668e</guid>
      <title>All about NVIDIA GPUs</title>
      <description><![CDATA[<p>PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA’s physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that’s going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all.</p><p><strong>Further reading.</strong></p><ul><li>NVIDIA microarchitectures on Wikipedia <a href="https://en.wikipedia.org/wiki/Category:Nvidia_microarchitectures">https://en.wikipedia.org/wiki/Category:Nvidia_microarchitectures</a></li><li>A slightly old post about matching SM to architecture <a href="https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/">https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/</a></li></ul>
]]></description>
      <pubDate>Fri, 24 Sep 2021 16:00:00 +0000</pubDate>
      <author>wookim@fb.com (Natalia Gimelshein)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/all-about-nvidia-gpus-kRjF4Tvt</link>
      <content:encoded><![CDATA[<p>PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA’s physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that’s going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all.</p><p><strong>Further reading.</strong></p><ul><li>NVIDIA microarchitectures on Wikipedia <a href="https://en.wikipedia.org/wiki/Category:Nvidia_microarchitectures">https://en.wikipedia.org/wiki/Category:Nvidia_microarchitectures</a></li><li>A slightly old post about matching SM to architecture <a href="https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/">https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/</a></li></ul>
]]></content:encoded>
      <enclosure length="18707717" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/e1ef0a85-bad6-46f9-acef-2e3ae30feb9e/audio/dac26d2e-e302-4dae-956d-b121a6387f89/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>All about NVIDIA GPUs</itunes:title>
      <itunes:author>Natalia Gimelshein</itunes:author>
      <itunes:duration>00:19:29</itunes:duration>
      <itunes:summary>PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA’s physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that’s going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all.</itunes:summary>
      <itunes:subtitle>PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA’s physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that’s going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>56</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">8255d591-4b74-4051-b037-22a19e2785a5</guid>
      <title>Tensor subclasses and Liskov substitution principle</title>
      <description><![CDATA[<p>A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering "desirable" properties of this program. In this podcast I'll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical "abstract Tensor specification" which really doesn't exist but which sort of implicitly exists in the corpus of existing PyTorch programs.</p><p>Further reading:</p><ul><li>This is a cool interview with Barbara Liskov that I quote in the podcast <a href="https://www.youtube.com/watch?v=-Z-17h3jG0A">https://www.youtube.com/watch?v=-Z-17h3jG0A</a></li><li>Max Balandat talking about linear operators in PyTorch <a href="https://github.com/pytorch/pytorch/issues/28341">https://github.com/pytorch/pytorch/issues/28341</a></li><li>At the end I talk a little bit about multiple dispatch; an earlier discussion about this topic is in this podcast <a href="https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function">https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function</a></li></ul>
]]></description>
      <pubDate>Thu, 16 Sep 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/tensor-subclasses-and-liskov-substitution-principle-nxMatT7O</link>
      <content:encoded><![CDATA[<p>A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering "desirable" properties of this program. In this podcast I'll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical "abstract Tensor specification" which really doesn't exist but which sort of implicitly exists in the corpus of existing PyTorch programs.</p><p>Further reading:</p><ul><li>This is a cool interview with Barbara Liskov that I quote in the podcast <a href="https://www.youtube.com/watch?v=-Z-17h3jG0A">https://www.youtube.com/watch?v=-Z-17h3jG0A</a></li><li>Max Balandat talking about linear operators in PyTorch <a href="https://github.com/pytorch/pytorch/issues/28341">https://github.com/pytorch/pytorch/issues/28341</a></li><li>At the end I talk a little bit about multiple dispatch; an earlier discussion about this topic is in this podcast <a href="https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function">https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function</a></li></ul>
]]></content:encoded>
      <enclosure length="18445997" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/990c12c8-c13b-489b-b21f-37b6199bb4e0/audio/7c25e930-b3c1-4ad0-b1bb-fa01829d5c60/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Tensor subclasses and Liskov substitution principle</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:19:13</itunes:duration>
      <itunes:summary>A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering &quot;desirable&quot; properties of this program. In this podcast I&apos;ll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical &quot;abstract Tensor specification&quot; which really doesn&apos;t exist but which sort of implicitly exists in the corpus of existing PyTorch programs.</itunes:summary>
      <itunes:subtitle>A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering &quot;desirable&quot; properties of this program. In this podcast I&apos;ll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical &quot;abstract Tensor specification&quot; which really doesn&apos;t exist but which sort of implicitly exists in the corpus of existing PyTorch programs.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>55</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">79c00a6b-c83b-4605-b4e5-25473dc0f018</guid>
      <title>Half precision</title>
      <description><![CDATA[<p>In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I'll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know!</p><p><strong>Further reading.</strong></p><ul><li>The Wikipedia article on IEEE floating point is pretty great <a href="https://en.wikipedia.org/wiki/IEEE_754">https://en.wikipedia.org/wiki/IEEE_754</a></li><li>How bfloat16 works out when doing training <a href="https://arxiv.org/abs/1905.12322">https://arxiv.org/abs/1905.12322</a></li><li>Definition of acc_type in PyTorch <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h</a></li></ul>
]]></description>
      <pubDate>Fri, 10 Sep 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/half-precision-iKOTruc9</link>
      <content:encoded><![CDATA[<p>In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I'll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know!</p><p><strong>Further reading.</strong></p><ul><li>The Wikipedia article on IEEE floating point is pretty great <a href="https://en.wikipedia.org/wiki/IEEE_754">https://en.wikipedia.org/wiki/IEEE_754</a></li><li>How bfloat16 works out when doing training <a href="https://arxiv.org/abs/1905.12322">https://arxiv.org/abs/1905.12322</a></li><li>Definition of acc_type in PyTorch <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h</a></li></ul>
]]></content:encoded>
      <enclosure length="17286541" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/857c195f-acd1-40aa-a686-54f24941cbc1/audio/ad2eb402-039e-4189-81d5-f937762a1c54/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Half precision</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:18:00</itunes:duration>
      <itunes:summary>In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I&apos;ll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know!</itunes:summary>
      <itunes:subtitle>In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I&apos;ll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know!</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>54</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">af21fae3-5a3c-4a61-bd75-2e46647c5f2d</guid>
      <title>DataLoader with multiple workers leaks memory</title>
      <description><![CDATA[<p>Today I'm going to talk about a famous issue in PyTorch, DataLoader with num_workers > 0 causes memory leak (<a href="https://github.com/pytorch/pytorch/issues/13246">https://github.com/pytorch/pytorch/issues/13246</a>). This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.</p><p><strong>Further reading.</strong></p><ul><li>A nice summary of the full issue <a href="https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662">https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662</a></li><li>DataLoader architecture RFC <a href="https://github.com/pytorch/pytorch/issues/49440">https://github.com/pytorch/pytorch/issues/49440</a></li><li>Cinder Python <a href="https://github.com/facebookincubator/cinder">https://github.com/facebookincubator/cinder</a></li></ul>
]]></description>
      <pubDate>Wed, 1 Sep 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/dataloader-with-multiple-workers-leaks-memory-JSEQUm6e</link>
      <content:encoded><![CDATA[<p>Today I'm going to talk about a famous issue in PyTorch, DataLoader with num_workers > 0 causes memory leak (<a href="https://github.com/pytorch/pytorch/issues/13246">https://github.com/pytorch/pytorch/issues/13246</a>). This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.</p><p><strong>Further reading.</strong></p><ul><li>A nice summary of the full issue <a href="https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662">https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662</a></li><li>DataLoader architecture RFC <a href="https://github.com/pytorch/pytorch/issues/49440">https://github.com/pytorch/pytorch/issues/49440</a></li><li>Cinder Python <a href="https://github.com/facebookincubator/cinder">https://github.com/facebookincubator/cinder</a></li></ul>
]]></content:encoded>
      <enclosure length="15974599" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/3a0b2060-3201-4047-ab16-fc2550b17deb/audio/371ee19c-a23a-453f-9c12-33b513630bfd/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>DataLoader with multiple workers leaks memory</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:38</itunes:duration>
      <itunes:summary>Today I&apos;m going to talk about a famous issue in PyTorch, DataLoader with num_workers &gt; 0 causes memory leak (https://github.com/pytorch/pytorch/issues/13246). This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.</itunes:summary>
      <itunes:subtitle>Today I&apos;m going to talk about a famous issue in PyTorch, DataLoader with num_workers &gt; 0 causes memory leak (https://github.com/pytorch/pytorch/issues/13246). This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>53</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">c6778c3a-f5c7-47ca-8299-35d65997097a</guid>
      <title>Batching</title>
      <description><![CDATA[<p>PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch's API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.</p><p><strong>Further reading.</strong></p><ul><li>ufuncs and gufuncs <a href="https://numpy.org/doc/stable/reference/ufuncs.html">https://numpy.org/doc/stable/reference/ufuncs.html</a> and <a href="https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html">https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html</a></li><li>A brief taxonomy of PyTorch operators by shape behavior <a href="http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/">http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/</a></li><li>Related episodes on TensorIterator and vmap <a href="https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator">https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator</a> and <a href="https://pytorch-dev-podcast.simplecast.com/episodes/vmap">https://pytorch-dev-podcast.simplecast.com/episodes/vmap</a></li></ul>
]]></description>
      <pubDate>Wed, 18 Aug 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/batching-MAvJs_I_</link>
      <content:encoded><![CDATA[<p>PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch's API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.</p><p><strong>Further reading.</strong></p><ul><li>ufuncs and gufuncs <a href="https://numpy.org/doc/stable/reference/ufuncs.html">https://numpy.org/doc/stable/reference/ufuncs.html</a> and <a href="https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html">https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html</a></li><li>A brief taxonomy of PyTorch operators by shape behavior <a href="http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/">http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/</a></li><li>Related episodes on TensorIterator and vmap <a href="https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator">https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator</a> and <a href="https://pytorch-dev-podcast.simplecast.com/episodes/vmap">https://pytorch-dev-podcast.simplecast.com/episodes/vmap</a></li></ul>
]]></content:encoded>
      <enclosure length="13076015" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/511409c9-2eca-42a9-96b1-b6db7cae21e3/audio/44522aaf-0af3-4c00-a5b9-bce9e97d4ae8/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Batching</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:37</itunes:duration>
      <itunes:summary>PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch&apos;s API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.</itunes:summary>
      <itunes:subtitle>PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch&apos;s API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>52</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">b3645cb8-c91f-48bf-8221-0b1601a78d0d</guid>
      <title>Multiple dispatch in __torch_function__</title>
      <description><![CDATA[<p>Python is a single dispatch OO language, but there are some operations such as binary magic methods which implement a simple form of multiple dispatch. <strong>torch_function__ (through its Numpy predecessor __array_function</strong>) generalizes this mechanism so that invocations of torch.add with different subclasses work properly. This podcast describes how this mechanism works and how it can be used (in an unconventional way) to build composable subclasses ala JAX in functorch.</p><p><strong>Further reading:</strong></p><ul><li>This podcast in written form <a href="https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294">https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294</a></li><li>Multiple dispatch resolution rules in the RFC <a href="https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md#process-followed-during-a-functionmethod-call">https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md#process-followed-during-a-functionmethod-call</a></li></ul>
]]></description>
      <pubDate>Tue, 10 Aug 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function-Qokxu_fq</link>
      <content:encoded><![CDATA[<p>Python is a single dispatch OO language, but there are some operations such as binary magic methods which implement a simple form of multiple dispatch. <strong>torch_function__ (through its Numpy predecessor __array_function</strong>) generalizes this mechanism so that invocations of torch.add with different subclasses work properly. This podcast describes how this mechanism works and how it can be used (in an unconventional way) to build composable subclasses ala JAX in functorch.</p><p><strong>Further reading:</strong></p><ul><li>This podcast in written form <a href="https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294">https://dev-discuss.pytorch.org/t/functorch-levels-as-dynamically-allocated-classes/294</a></li><li>Multiple dispatch resolution rules in the RFC <a href="https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md#process-followed-during-a-functionmethod-call">https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md#process-followed-during-a-functionmethod-call</a></li></ul>
]]></content:encoded>
      <enclosure length="13764006" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/5c74123f-d03a-4f8e-ba66-48e3fb554b3b/audio/e2f22efd-75d6-44a8-9b07-46f853309b75/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Multiple dispatch in __torch_function__</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:20</itunes:duration>
      <itunes:summary>Python is a single dispatch OO language, but there are some operations such as binary magic methods which implement a simple form of multiple dispatch. __torch_function__ (through its Numpy predecessor __array_function__) generalizes this mechanism so that invocations of torch.add with different subclasses work properly. This podcast describes how this mechanism works and how it can be used (in an unconventional way) to build composable subclasses ala JAX in functorch.</itunes:summary>
      <itunes:subtitle>Python is a single dispatch OO language, but there are some operations such as binary magic methods which implement a simple form of multiple dispatch. __torch_function__ (through its Numpy predecessor __array_function__) generalizes this mechanism so that invocations of torch.add with different subclasses work properly. This podcast describes how this mechanism works and how it can be used (in an unconventional way) to build composable subclasses ala JAX in functorch.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>51</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">94960f56-9695-42e4-a422-afad851162a9</guid>
      <title>Multithreading</title>
      <description><![CDATA[<p>Writing multithreading code has always been a pain, and in PyTorch there are buckets and buckets of multithreading related issues you have to be aware about and deal with when writing code that makes use of it. We'll cover how you interface with multithreading in PyTorch, what goes into implementing those interfaces (thread pools!) and also some miscellaneous stuff like TLS, forks and data structure thread safety that is also relevant.</p><p>Further reading:</p><ul><li>TorchScript CPU inference threading documentation <a href="https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cpu_threading_torchscript_inference.rst">https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cpu_threading_torchscript_inference.rst</a></li><li>c10 thread pool <a href="https://github.com/pytorch/pytorch/blob/master/c10/core/thread_pool.h">https://github.com/pytorch/pytorch/blob/master/c10/core/thread_pool.h</a> and autograd thread pool <a href="https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp">https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp</a></li><li>Tracking issue for TLS propagation across threads <a href="https://github.com/pytorch/pytorch/issues/28520">https://github.com/pytorch/pytorch/issues/28520</a></li></ul>
]]></description>
      <pubDate>Tue, 3 Aug 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/multithreading-gtMLrEUO</link>
      <content:encoded><![CDATA[<p>Writing multithreading code has always been a pain, and in PyTorch there are buckets and buckets of multithreading related issues you have to be aware about and deal with when writing code that makes use of it. We'll cover how you interface with multithreading in PyTorch, what goes into implementing those interfaces (thread pools!) and also some miscellaneous stuff like TLS, forks and data structure thread safety that is also relevant.</p><p>Further reading:</p><ul><li>TorchScript CPU inference threading documentation <a href="https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cpu_threading_torchscript_inference.rst">https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cpu_threading_torchscript_inference.rst</a></li><li>c10 thread pool <a href="https://github.com/pytorch/pytorch/blob/master/c10/core/thread_pool.h">https://github.com/pytorch/pytorch/blob/master/c10/core/thread_pool.h</a> and autograd thread pool <a href="https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp">https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp</a></li><li>Tracking issue for TLS propagation across threads <a href="https://github.com/pytorch/pytorch/issues/28520">https://github.com/pytorch/pytorch/issues/28520</a></li></ul>
]]></content:encoded>
      <enclosure length="17831560" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/d61bf007-dc61-4301-b0a1-2b4c1424b3b4/audio/4c38bf3f-d74d-4c61-91a1-ac9f142bbb22/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Multithreading</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:18:34</itunes:duration>
      <itunes:summary>Writing multithreading code has always been a pain, and in PyTorch there are buckets and buckets of multithreading related issues you have to be aware about and deal with when writing code that makes use of it. We&apos;ll cover how you interface with multithreading in PyTorch, what goes into implementing those interfaces (thread pools!) and also some miscellaneous stuff like TLS, forks and data structure thread safety that is also relevant.</itunes:summary>
      <itunes:subtitle>Writing multithreading code has always been a pain, and in PyTorch there are buckets and buckets of multithreading related issues you have to be aware about and deal with when writing code that makes use of it. We&apos;ll cover how you interface with multithreading in PyTorch, what goes into implementing those interfaces (thread pools!) and also some miscellaneous stuff like TLS, forks and data structure thread safety that is also relevant.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>50</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">965b66b8-1931-4998-a26d-536ccd5aee63</guid>
      <title>Asynchronous versus synchronous execution</title>
      <description><![CDATA[<p>CUDA is asynchronous, CPU is synchronous. Making them play well together can be one of the more thorny and easy to get wrong aspects of the PyTorch API. I talk about why non_blocking is difficult to use correctly, a hypothetical "asynchronous CPU" device which would help smooth over some of the API problems and also why it used to be difficult to implement async CPU (but it's not hard anymore!) At the end, I also briefly talk about how async/sync impedance can also show up in unusual places, namely the CUDA caching allocator.</p><p><strong>Further reading.</strong></p><ul><li>CUDA semantics which discuss non_blocking somewhat <a href="https://pytorch.org/docs/stable/notes/cuda.html">https://pytorch.org/docs/stable/notes/cuda.html</a></li><li>Issue requesting async cpu <a href="https://github.com/pytorch/pytorch/issues/44343">https://github.com/pytorch/pytorch/issues/44343</a></li></ul>
]]></description>
      <pubDate>Tue, 27 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/asynchronous-versus-synchronous-execution-qVVFF_wN</link>
      <content:encoded><![CDATA[<p>CUDA is asynchronous, CPU is synchronous. Making them play well together can be one of the more thorny and easy to get wrong aspects of the PyTorch API. I talk about why non_blocking is difficult to use correctly, a hypothetical "asynchronous CPU" device which would help smooth over some of the API problems and also why it used to be difficult to implement async CPU (but it's not hard anymore!) At the end, I also briefly talk about how async/sync impedance can also show up in unusual places, namely the CUDA caching allocator.</p><p><strong>Further reading.</strong></p><ul><li>CUDA semantics which discuss non_blocking somewhat <a href="https://pytorch.org/docs/stable/notes/cuda.html">https://pytorch.org/docs/stable/notes/cuda.html</a></li><li>Issue requesting async cpu <a href="https://github.com/pytorch/pytorch/issues/44343">https://github.com/pytorch/pytorch/issues/44343</a></li></ul>
]]></content:encoded>
      <enclosure length="14445282" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/6e4ad5f5-cf9f-4345-9924-dc3d095d57d4/audio/bde55c18-6232-4844-940e-1798c885c82a/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Asynchronous versus synchronous execution</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:03</itunes:duration>
      <itunes:summary>CUDA is asynchronous, CPU is synchronous. Making them play well together can be one of the more thorny and easy to get wrong aspects of the PyTorch API. I talk about why non_blocking is difficult to use correctly, a hypothetical &quot;asynchronous CPU&quot; device which would help smooth over some of the API problems and also why it used to be difficult to implement async CPU (but it&apos;s not hard anymore!) At the end, I also briefly talk about how async/sync impedance can also show up in unusual places, namely the CUDA caching allocator.</itunes:summary>
      <itunes:subtitle>CUDA is asynchronous, CPU is synchronous. Making them play well together can be one of the more thorny and easy to get wrong aspects of the PyTorch API. I talk about why non_blocking is difficult to use correctly, a hypothetical &quot;asynchronous CPU&quot; device which would help smooth over some of the API problems and also why it used to be difficult to implement async CPU (but it&apos;s not hard anymore!) At the end, I also briefly talk about how async/sync impedance can also show up in unusual places, namely the CUDA caching allocator.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>49</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">11eb9d13-a866-4297-9f9e-7c16c3e33340</guid>
      <title>gradcheck</title>
      <description><![CDATA[<p>We talk about gradcheck, the property based testing mechanism that we use to verify the correctness of analytic gradient formulas in PyTorch. I'll talk a bit about testing in general, property based testing and why gradcheck is a particularly useful property based test. There will be some calculus, although I've tried to keep the math mostly to intuitions and pointers on what to read up on elsewhere.</p><p><strong>Further reading.</strong></p><ul><li>Gradcheck mechanics, a detailed mathematical explanation of how it works <a href="https://pytorch.org/docs/stable/notes/gradcheck.html">https://pytorch.org/docs/stable/notes/gradcheck.html</a> In particular, it also explains how gradcheck extends to complex numbers</li><li>JAX has a pretty good explanation about vjp and jvp at <a href="https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html">https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html</a></li><li>Fast gradcheck tracking issue <a href="https://github.com/pytorch/pytorch/issues/53876">https://github.com/pytorch/pytorch/issues/53876</a></li></ul>
]]></description>
      <pubDate>Fri, 23 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/gradcheck-ymqvjZ_4</link>
      <content:encoded><![CDATA[<p>We talk about gradcheck, the property based testing mechanism that we use to verify the correctness of analytic gradient formulas in PyTorch. I'll talk a bit about testing in general, property based testing and why gradcheck is a particularly useful property based test. There will be some calculus, although I've tried to keep the math mostly to intuitions and pointers on what to read up on elsewhere.</p><p><strong>Further reading.</strong></p><ul><li>Gradcheck mechanics, a detailed mathematical explanation of how it works <a href="https://pytorch.org/docs/stable/notes/gradcheck.html">https://pytorch.org/docs/stable/notes/gradcheck.html</a> In particular, it also explains how gradcheck extends to complex numbers</li><li>JAX has a pretty good explanation about vjp and jvp at <a href="https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html">https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html</a></li><li>Fast gradcheck tracking issue <a href="https://github.com/pytorch/pytorch/issues/53876">https://github.com/pytorch/pytorch/issues/53876</a></li></ul>
]]></content:encoded>
      <enclosure length="16283016" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/eebca497-5861-4971-b752-12fbd23fd4ac/audio/f1990303-be34-4661-96c1-4a48b2a21ae0/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>gradcheck</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:58</itunes:duration>
      <itunes:summary>We talk about gradcheck, the property based testing mechanism that we use to verify the correctness of analytic gradient formulas in PyTorch. I&apos;ll talk a bit about testing in general, property based testing and why gradcheck is a particularly useful property based test. There will be some calculus, although I&apos;ve tried to keep the math mostly to intuitions and pointers on what to read up on elsewhere.</itunes:summary>
      <itunes:subtitle>We talk about gradcheck, the property based testing mechanism that we use to verify the correctness of analytic gradient formulas in PyTorch. I&apos;ll talk a bit about testing in general, property based testing and why gradcheck is a particularly useful property based test. There will be some calculus, although I&apos;ve tried to keep the math mostly to intuitions and pointers on what to read up on elsewhere.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>48</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">3935465d-aec9-4f31-bf00-0a7e3834ce5e</guid>
      <title>torch.use_deterministic_algorithms</title>
      <description><![CDATA[<p>torch.use_deterministic_algorithms lets you force PyTorch to use deterministic algorithms. It's very useful for debugging!</p><p>There are some errors in the recording: the feature is called torch.use_deterministic_algorithms, and there is not actually a capability to warn (this was in an old version of the PR but taken out), we just error if you hit nondeterministic code.</p><p>Docs: <a href="https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms">https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms</a></p>
]]></description>
      <pubDate>Wed, 21 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torchuse-deterministic-algorithms-7OUBuJKe</link>
      <content:encoded><![CDATA[<p>torch.use_deterministic_algorithms lets you force PyTorch to use deterministic algorithms. It's very useful for debugging!</p><p>There are some errors in the recording: the feature is called torch.use_deterministic_algorithms, and there is not actually a capability to warn (this was in an old version of the PR but taken out), we just error if you hit nondeterministic code.</p><p>Docs: <a href="https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms">https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms</a></p>
]]></content:encoded>
      <enclosure length="10401505" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/bb461b62-57a9-42d2-9ad5-bb5f01d7d230/audio/7af39329-b773-4285-843c-c99f21a2ab60/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>torch.use_deterministic_algorithms</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:10:50</itunes:duration>
      <itunes:summary>torch.use_deterministic_algorithms lets you force PyTorch to use deterministic algorithms. It&apos;s very useful for debugging!</itunes:summary>
      <itunes:subtitle>torch.use_deterministic_algorithms lets you force PyTorch to use deterministic algorithms. It&apos;s very useful for debugging!</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>47</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">88d08dd1-f538-4f6f-be97-59bf38341240</guid>
      <title>Reference counting</title>
      <description><![CDATA[<p>Reference counting is a common memory management technique in C++ but PyTorch does its reference counting in a slightly idiosyncratic way using intrusive_ptr. We'll talk about why intrusive_ptr exists, the reason why refcount bumps are slow in C++ (but not in Python), what's up with const Tensor& everywhere, why the const is a lie and how TensorRef lets you create a const Tensor& from a TensorImpl* without needing to bump your reference count.</p><p><strong>Further reading.</strong></p><ul><li>Why you shouldn't feel bad about passing tensor by reference <a href="https://dev-discuss.pytorch.org/t/we-shouldnt-feel-bad-about-passing-tensor-by-reference/85">https://dev-discuss.pytorch.org/t/we-shouldnt-feel-bad-about-passing-tensor-by-reference/85</a></li><li>Const correctness in PyTorch <a href="https://github.com/zdevito/ATen/issues/27">https://github.com/zdevito/ATen/issues/27</a></li><li>TensorRef RFC <a href="https://github.com/pytorch/rfcs/pull/16">https://github.com/pytorch/rfcs/pull/16</a></li></ul>
]]></description>
      <pubDate>Tue, 20 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting-a3yF3fEQ</link>
      <content:encoded><![CDATA[<p>Reference counting is a common memory management technique in C++ but PyTorch does its reference counting in a slightly idiosyncratic way using intrusive_ptr. We'll talk about why intrusive_ptr exists, the reason why refcount bumps are slow in C++ (but not in Python), what's up with const Tensor& everywhere, why the const is a lie and how TensorRef lets you create a const Tensor& from a TensorImpl* without needing to bump your reference count.</p><p><strong>Further reading.</strong></p><ul><li>Why you shouldn't feel bad about passing tensor by reference <a href="https://dev-discuss.pytorch.org/t/we-shouldnt-feel-bad-about-passing-tensor-by-reference/85">https://dev-discuss.pytorch.org/t/we-shouldnt-feel-bad-about-passing-tensor-by-reference/85</a></li><li>Const correctness in PyTorch <a href="https://github.com/zdevito/ATen/issues/27">https://github.com/zdevito/ATen/issues/27</a></li><li>TensorRef RFC <a href="https://github.com/pytorch/rfcs/pull/16">https://github.com/pytorch/rfcs/pull/16</a></li></ul>
]]></content:encoded>
      <enclosure length="14626653" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/799ed542-8b1b-49d1-8080-5a0bcc8f7b0d/audio/d4ab173b-ec98-43b8-a5f4-ee45377bd776/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Reference counting</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:14</itunes:duration>
      <itunes:summary>Reference counting is a common memory management technique in C++ but PyTorch does its reference counting in a slightly idiosyncratic way using intrusive_ptr. We&apos;ll talk about why intrusive_ptr exists, the reason why refcount bumps are slow in C++ (but not in Python), what&apos;s up with const Tensor&amp; everywhere, why the const is a lie and how TensorRef lets you create a const Tensor&amp; from a TensorImpl* without needing to bump your reference count.</itunes:summary>
      <itunes:subtitle>Reference counting is a common memory management technique in C++ but PyTorch does its reference counting in a slightly idiosyncratic way using intrusive_ptr. We&apos;ll talk about why intrusive_ptr exists, the reason why refcount bumps are slow in C++ (but not in Python), what&apos;s up with const Tensor&amp; everywhere, why the const is a lie and how TensorRef lets you create a const Tensor&amp; from a TensorImpl* without needing to bump your reference count.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>46</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">2894e449-cd4c-43cb-a7b7-e627e51a03ec</guid>
      <title>Memory layout</title>
      <description><![CDATA[<p>Memory layout specifies how the logical multi-dimensional tensor maps its elements onto physical linear memory. Some layouts admit more efficient implementations, e.g., NCHW versus NHWC. Memory layout makes use of striding to allow users to conveniently represent their tensors with different physical layouts without having to explicitly tell every operator what to do.</p><p><strong>Further reading.</strong></p><ul><li>Tutorial <a href="https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html">https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html</a></li><li>Memory format RFC <a href="https://github.com/pytorch/pytorch/issues/19092">https://github.com/pytorch/pytorch/issues/19092</a></li><li>Layout permutation proposal (not implemented) <a href="https://github.com/pytorch/pytorch/issues/32078">https://github.com/pytorch/pytorch/issues/32078</a></li></ul>
]]></description>
      <pubDate>Tue, 13 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/memory-layout-aqAEiu35</link>
      <content:encoded><![CDATA[<p>Memory layout specifies how the logical multi-dimensional tensor maps its elements onto physical linear memory. Some layouts admit more efficient implementations, e.g., NCHW versus NHWC. Memory layout makes use of striding to allow users to conveniently represent their tensors with different physical layouts without having to explicitly tell every operator what to do.</p><p><strong>Further reading.</strong></p><ul><li>Tutorial <a href="https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html">https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html</a></li><li>Memory format RFC <a href="https://github.com/pytorch/pytorch/issues/19092">https://github.com/pytorch/pytorch/issues/19092</a></li><li>Layout permutation proposal (not implemented) <a href="https://github.com/pytorch/pytorch/issues/32078">https://github.com/pytorch/pytorch/issues/32078</a></li></ul>
]]></content:encoded>
      <enclosure length="15779798" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/86a5f2dc-238e-49c7-8ba2-6346181c5b75/audio/c986188b-48bc-4352-895b-ab942a734115/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Memory layout</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:26</itunes:duration>
      <itunes:summary>Memory layout specifies how the logical multi-dimensional tensor maps its elements onto physical linear memory. Some layouts admit more efficient implementations, e.g., NCHW versus NHWC. Memory layout makes use of striding to allow users to conveniently represent their tensors with different physical layouts without having to explicitly tell every operator what to do.
</itunes:summary>
      <itunes:subtitle>Memory layout specifies how the logical multi-dimensional tensor maps its elements onto physical linear memory. Some layouts admit more efficient implementations, e.g., NCHW versus NHWC. Memory layout makes use of striding to allow users to conveniently represent their tensors with different physical layouts without having to explicitly tell every operator what to do.
</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>45</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">e8fdfbcb-4d8a-442c-a69e-46cd84c07b3d</guid>
      <title>pytorch-probot</title>
      <description><![CDATA[<p>pytorch-probot is a GitHub application that we use to automate common tasks in GitHub. I talk about what it does and some design philosophy for it. Repo is at: <a href="https://github.com/pytorch/pytorch-probot">https://github.com/pytorch/pytorch-probot</a></p>
]]></description>
      <pubDate>Mon, 12 Jul 2021 14:28:17 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/pytorch-probot-1edj_jys</link>
      <content:encoded><![CDATA[<p>pytorch-probot is a GitHub application that we use to automate common tasks in GitHub. I talk about what it does and some design philosophy for it. Repo is at: <a href="https://github.com/pytorch/pytorch-probot">https://github.com/pytorch/pytorch-probot</a></p>
]]></content:encoded>
      <enclosure length="12579067" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/5ce6c78b-0d11-4e60-99b9-949a16aebca9/audio/a34fa6f5-0d5f-4876-8509-79c1dcc46bb2/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>pytorch-probot</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:06</itunes:duration>
      <itunes:summary>pytorch-probot is a GitHub application that we use to automate common tasks in GitHub. I talk about what it does and some design philosophy for it. Repo is at: https://github.com/pytorch/pytorch-probot</itunes:summary>
      <itunes:subtitle>pytorch-probot is a GitHub application that we use to automate common tasks in GitHub. I talk about what it does and some design philosophy for it. Repo is at: https://github.com/pytorch/pytorch-probot</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>44</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">0e5c8dfd-0f2b-493a-9010-23335c875901</guid>
      <title>API design via lexical and dynamic scoping</title>
      <description><![CDATA[<p>Lexical and dynamic scoping are useful tools to reason about various API design choices in PyTorch, related to context managers, global flags, dynamic dispatch, and how to deal with BC-breaking changes. I'll walk through three case studies, one from Python itself (changing the meaning of division to true division), and two from PyTorch (device context managers, and torch function for factory functions).</p><p><strong>Further reading.</strong></p><ul><li>Me unsuccessfully asking around if there was a way to simulate <code>__future__</code> in libraries <a href="https://stackoverflow.com/questions/66927362/way-to-opt-into-bc-breaking-changes-on-methods-within-a-single-module">https://stackoverflow.com/questions/66927362/way-to-opt-into-bc-breaking-changes-on-methods-within-a-single-module</a></li><li>A very old issue asking for a way to change the default GPU device <a href="https://github.com/pytorch/pytorch/issues/260">https://github.com/pytorch/pytorch/issues/260</a> and a global GPU flag <a href="https://github.com/pytorch/pytorch/issues/7535">https://github.com/pytorch/pytorch/issues/7535</a></li><li>A more modern issue based off the lexical module idea <a href="https://github.com/pytorch/pytorch/issues/27878">https://github.com/pytorch/pytorch/issues/27878</a></li><li>Array module NEP <a href="https://numpy.org/neps/nep-0037-array-module.html">https://numpy.org/neps/nep-0037-array-module.html</a></li></ul>
]]></description>
      <pubDate>Fri, 9 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/api-design-via-lexical-and-dynamic-scoping-BV7x17g4</link>
      <content:encoded><![CDATA[<p>Lexical and dynamic scoping are useful tools to reason about various API design choices in PyTorch, related to context managers, global flags, dynamic dispatch, and how to deal with BC-breaking changes. I'll walk through three case studies, one from Python itself (changing the meaning of division to true division), and two from PyTorch (device context managers, and torch function for factory functions).</p><p><strong>Further reading.</strong></p><ul><li>Me unsuccessfully asking around if there was a way to simulate <code>__future__</code> in libraries <a href="https://stackoverflow.com/questions/66927362/way-to-opt-into-bc-breaking-changes-on-methods-within-a-single-module">https://stackoverflow.com/questions/66927362/way-to-opt-into-bc-breaking-changes-on-methods-within-a-single-module</a></li><li>A very old issue asking for a way to change the default GPU device <a href="https://github.com/pytorch/pytorch/issues/260">https://github.com/pytorch/pytorch/issues/260</a> and a global GPU flag <a href="https://github.com/pytorch/pytorch/issues/7535">https://github.com/pytorch/pytorch/issues/7535</a></li><li>A more modern issue based off the lexical module idea <a href="https://github.com/pytorch/pytorch/issues/27878">https://github.com/pytorch/pytorch/issues/27878</a></li><li>Array module NEP <a href="https://numpy.org/neps/nep-0037-array-module.html">https://numpy.org/neps/nep-0037-array-module.html</a></li></ul>
]]></content:encoded>
      <enclosure length="20861374" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/d142109a-4af6-429f-ad47-f8b312dbb4f7/audio/cda0f892-6d8b-4486-9252-6085895a659c/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>API design via lexical and dynamic scoping</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:21:44</itunes:duration>
      <itunes:summary>Lexical and dynamic scoping are useful tools to reason about various API design choices in PyTorch, related to context managers, global flags, dynamic dispatch, and how to deal with BC-breaking changes. I&apos;ll walk through three case studies, one from Python itself (changing the meaning of division to true division), and two from PyTorch (device context managers, and torch function for factory functions).</itunes:summary>
      <itunes:subtitle>Lexical and dynamic scoping are useful tools to reason about various API design choices in PyTorch, related to context managers, global flags, dynamic dispatch, and how to deal with BC-breaking changes. I&apos;ll walk through three case studies, one from Python itself (changing the meaning of division to true division), and two from PyTorch (device context managers, and torch function for factory functions).</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>43</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">d9542988-d250-4fbb-9e85-c6a9f2b1087b</guid>
      <title>Intro to distributed</title>
      <description><![CDATA[<p>Today, Shen Li (mrshenli) joins me to talk about distributed computation in PyTorch. What is distributed? What kinds of things go into making distributed work in PyTorch? What's up with all of the optimizations people want to do here?</p><p><strong>Further reading.</strong></p><ul><li>PyTorch distributed overview <a href="https://pytorch.org/tutorials/beginner/dist_overview.html">https://pytorch.org/tutorials/beginner/dist_overview.html</a></li><li>Distributed data parallel <a href="https://pytorch.org/docs/stable/notes/ddp.html">https://pytorch.org/docs/stable/notes/ddp.html</a></li></ul>
]]></description>
      <pubDate>Thu, 8 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/intro-to-distributed-A_mUgr_p</link>
      <content:encoded><![CDATA[<p>Today, Shen Li (mrshenli) joins me to talk about distributed computation in PyTorch. What is distributed? What kinds of things go into making distributed work in PyTorch? What's up with all of the optimizations people want to do here?</p><p><strong>Further reading.</strong></p><ul><li>PyTorch distributed overview <a href="https://pytorch.org/tutorials/beginner/dist_overview.html">https://pytorch.org/tutorials/beginner/dist_overview.html</a></li><li>Distributed data parallel <a href="https://pytorch.org/docs/stable/notes/ddp.html">https://pytorch.org/docs/stable/notes/ddp.html</a></li></ul>
]]></content:encoded>
      <enclosure length="15060915" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/4ee93a14-2fd4-4b4b-9902-c620c4830d91/audio/3bcaabee-082f-438b-96ef-38294b8a29a6/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Intro to distributed</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:41</itunes:duration>
      <itunes:summary>Today, Shen Li (mrshenli) joins me to talk about distributed computation in PyTorch. What is distributed? What kinds of things go into making distributed work in PyTorch? What&apos;s up with all of the optimizations people want to do here?</itunes:summary>
      <itunes:subtitle>Today, Shen Li (mrshenli) joins me to talk about distributed computation in PyTorch. What is distributed? What kinds of things go into making distributed work in PyTorch? What&apos;s up with all of the optimizations people want to do here?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>42</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">87ce9c08-c1eb-4236-bb4e-5e2186b020a1</guid>
      <title>Double backwards</title>
      <description><![CDATA[<p>Double backwards is PyTorch's way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?</p><p><strong>Further reading.</strong></p><ul><li>Epic PR that added double backwards support for convolution initially <a href="https://github.com/pytorch/pytorch/pull/1643">https://github.com/pytorch/pytorch/pull/1643</a></li></ul>
]]></description>
      <pubDate>Wed, 7 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/double-backwards-3aQOHjSV</link>
      <content:encoded><![CDATA[<p>Double backwards is PyTorch's way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?</p><p><strong>Further reading.</strong></p><ul><li>Epic PR that added double backwards support for convolution initially <a href="https://github.com/pytorch/pytorch/pull/1643">https://github.com/pytorch/pytorch/pull/1643</a></li></ul>
]]></content:encoded>
      <enclosure length="15990674" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/bf7a3d88-eefb-4888-8d43-5c64d6650140/audio/fe5b6507-d43a-4579-961f-9f683e519c0a/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Double backwards</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:39</itunes:duration>
      <itunes:summary>Double backwards is PyTorch&apos;s way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?</itunes:summary>
      <itunes:subtitle>Double backwards is PyTorch&apos;s way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>41</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">a3604b27-7665-4096-b8e3-84525a95623d</guid>
      <title>Functional modules</title>
      <description><![CDATA[<p>Functional modules are a proposed mechanism to take PyTorch's existing NN module API and transform it into a functional form, where all the parameters are explicit argument. Why would you want to do this?  What does functorch have to do with it? How come PyTorch's existing APIs don't seem to need this? What are the design problems?</p><p><strong>Further reading.</strong></p><ul><li>Proposal in GitHub issues <a href="https://github.com/pytorch/pytorch/issues/49171">https://github.com/pytorch/pytorch/issues/49171</a></li><li>Linen design in flax <a href="https://flax.readthedocs.io/en/latest/design_notes/linen_design_principles.html">https://flax.readthedocs.io/en/latest/design_notes/linen_design_principles.html</a></li></ul>
]]></description>
      <pubDate>Tue, 6 Jul 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/functional-modules-BhdNz4l5</link>
      <content:encoded><![CDATA[<p>Functional modules are a proposed mechanism to take PyTorch's existing NN module API and transform it into a functional form, where all the parameters are explicit argument. Why would you want to do this?  What does functorch have to do with it? How come PyTorch's existing APIs don't seem to need this? What are the design problems?</p><p><strong>Further reading.</strong></p><ul><li>Proposal in GitHub issues <a href="https://github.com/pytorch/pytorch/issues/49171">https://github.com/pytorch/pytorch/issues/49171</a></li><li>Linen design in flax <a href="https://flax.readthedocs.io/en/latest/design_notes/linen_design_principles.html">https://flax.readthedocs.io/en/latest/design_notes/linen_design_principles.html</a></li></ul>
]]></content:encoded>
      <enclosure length="13989268" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/f7b3562b-a786-4635-b0dc-62058c824e39/audio/f5e6a28f-ad14-425f-b7d8-901f889af019/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Functional modules</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:34</itunes:duration>
      <itunes:summary>Functional modules are a proposed mechanism to take PyTorch&apos;s existing NN module API and transform it into a functional form, where all the parameters are explicit argument. Why would you want to do this?  What does functorch have to do with it? How come PyTorch&apos;s existing APIs don&apos;t seem to need this? What are the design problems?</itunes:summary>
      <itunes:subtitle>Functional modules are a proposed mechanism to take PyTorch&apos;s existing NN module API and transform it into a functional form, where all the parameters are explicit argument. Why would you want to do this?  What does functorch have to do with it? How come PyTorch&apos;s existing APIs don&apos;t seem to need this? What are the design problems?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>40</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">23fb5824-5863-4089-85cc-a5a8094d8c9e</guid>
      <title>CUDA graphs</title>
      <description><![CDATA[<p>What are CUDA graphs? How are they implemented? What does it take to actually use them in PyTorch?</p><p><strong>Further reading.</strong></p><ul><li>NVIDIA has docs on CUDA graphs <a href="https://developer.nvidia.com/blog/cuda-graphs/">https://developer.nvidia.com/blog/cuda-graphs/</a></li><li>Nuts and bolts implementation PRs from mcarilli: <a href="https://github.com/pytorch/pytorch/pull/51436">https://github.com/pytorch/pytorch/pull/51436</a> <a href="https://github.com/pytorch/pytorch/pull/46148">https://github.com/pytorch/pytorch/pull/46148</a></li></ul>
]]></description>
      <pubDate>Mon, 28 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/cuda-graphs-RKJummXr</link>
      <content:encoded><![CDATA[<p>What are CUDA graphs? How are they implemented? What does it take to actually use them in PyTorch?</p><p><strong>Further reading.</strong></p><ul><li>NVIDIA has docs on CUDA graphs <a href="https://developer.nvidia.com/blog/cuda-graphs/">https://developer.nvidia.com/blog/cuda-graphs/</a></li><li>Nuts and bolts implementation PRs from mcarilli: <a href="https://github.com/pytorch/pytorch/pull/51436">https://github.com/pytorch/pytorch/pull/51436</a> <a href="https://github.com/pytorch/pytorch/pull/46148">https://github.com/pytorch/pytorch/pull/46148</a></li></ul>
]]></content:encoded>
      <enclosure length="13355277" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/48653774-b3a0-4163-8626-4844a2c3d696/audio/fee0a953-144d-4bce-b791-461e8c2a8a1b/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>CUDA graphs</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:55</itunes:duration>
      <itunes:summary>What are CUDA graphs? How are they implemented? What does it take to actually use them in PyTorch?</itunes:summary>
      <itunes:subtitle>What are CUDA graphs? How are they implemented? What does it take to actually use them in PyTorch?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>39</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">7e3f5ff1-1f11-42cf-88e4-44f5ca047be1</guid>
      <title>Default arguments</title>
      <description><![CDATA[<p>What do default arguments have to do with PyTorch design? Why are default arguments great for clients (call sites) but not for servers (implementation sites)? In what sense are default arguments a canonicalization to max arity? What problems does this canonicalization cause? Can you canonicalize to minimum arity? What are some lessons to take?</p><p><strong>Further reading.</strong> <a href="https://github.com/pytorch/pytorch/issues/54613">https://github.com/pytorch/pytorch/issues/54613</a> stop serializing default arguments</p>
]]></description>
      <pubDate>Fri, 25 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/default-arguments-rTRvmi1T</link>
      <content:encoded><![CDATA[<p>What do default arguments have to do with PyTorch design? Why are default arguments great for clients (call sites) but not for servers (implementation sites)? In what sense are default arguments a canonicalization to max arity? What problems does this canonicalization cause? Can you canonicalize to minimum arity? What are some lessons to take?</p><p><strong>Further reading.</strong> <a href="https://github.com/pytorch/pytorch/issues/54613">https://github.com/pytorch/pytorch/issues/54613</a> stop serializing default arguments</p>
]]></content:encoded>
      <enclosure length="14351763" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/74024bdb-f576-4bc2-bf5a-d1ca3e5a2838/audio/1d6136d9-cf33-458d-93da-acec55fc945b/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Default arguments</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:57</itunes:duration>
      <itunes:summary>What do default arguments have to do with PyTorch design? Why are default arguments great for clients (call sites) but not for servers (implementation sites)? In what sense are default arguments a canonicalization to max arity? What problems does this canonicalization cause? Can you canonicalize to minimum arity? What are some lessons to take?</itunes:summary>
      <itunes:subtitle>What do default arguments have to do with PyTorch design? Why are default arguments great for clients (call sites) but not for servers (implementation sites)? In what sense are default arguments a canonicalization to max arity? What problems does this canonicalization cause? Can you canonicalize to minimum arity? What are some lessons to take?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>38</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">cc4d1aff-a5a4-4c4c-96cf-a70c1db453b0</guid>
      <title>Anatomy of a domain library</title>
      <description><![CDATA[<p>What's a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? How coupled are they with PyTorch as a whole? What's cool about working on domain libraries?</p><p><strong>Further reading.</strong></p><ul><li>The classic trio of domain libraries is <a href="https://pytorch.org/audio/stable/index.html">https://pytorch.org/audio/stable/index.html</a> <a href="https://pytorch.org/text/stable/index.html">https://pytorch.org/text/stable/index.html</a> and <a href="https://pytorch.org/vision/stable/index.html">https://pytorch.org/vision/stable/index.html</a></li></ul><p><strong>Line notes.</strong></p><ul><li>why do domain libraries exist?  lots of domains specific gadgets,<br />inappropriate for PyTorch</li><li>what does a domain library do<ul><li>operator implementations (old days: pure python, not anymore)<ul><li>with autograd support and cuda acceleration</li><li>esp encoding/decoding, e.g., for domain file formats<ul><li>torchbind for custom objects</li><li>takes care of getting the dependencies for you</li></ul></li><li>esp transformations, e.g., for data augmentation</li></ul></li><li>models, esp pretrained weights</li><li>datasets</li><li>reference scripts</li><li>full wheel/conda packaging like pytorch</li><li>mobile compatibility</li></ul></li><li>separate repos: external contributors with direct access<ul><li>manual sync to fbcode; a lot easier to land code!  less<br />motion so lower risk</li></ul></li><li>coupling with pytorch? CI typically runs on nightlies<ul><li>pytorch itself tests against torchvision, canary against<br />extensibility mechanisms</li><li>mostly not using internal tools (e.g., TensorIterator),<br />too unstable (this would be good to fix)</li></ul></li><li>closer to research side of pytorch; francesco also part of papers</li></ul>
]]></description>
      <pubDate>Thu, 24 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/anatomy-of-a-domain-library-3hBHE7ZW</link>
      <content:encoded><![CDATA[<p>What's a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? How coupled are they with PyTorch as a whole? What's cool about working on domain libraries?</p><p><strong>Further reading.</strong></p><ul><li>The classic trio of domain libraries is <a href="https://pytorch.org/audio/stable/index.html">https://pytorch.org/audio/stable/index.html</a> <a href="https://pytorch.org/text/stable/index.html">https://pytorch.org/text/stable/index.html</a> and <a href="https://pytorch.org/vision/stable/index.html">https://pytorch.org/vision/stable/index.html</a></li></ul><p><strong>Line notes.</strong></p><ul><li>why do domain libraries exist?  lots of domains specific gadgets,<br />inappropriate for PyTorch</li><li>what does a domain library do<ul><li>operator implementations (old days: pure python, not anymore)<ul><li>with autograd support and cuda acceleration</li><li>esp encoding/decoding, e.g., for domain file formats<ul><li>torchbind for custom objects</li><li>takes care of getting the dependencies for you</li></ul></li><li>esp transformations, e.g., for data augmentation</li></ul></li><li>models, esp pretrained weights</li><li>datasets</li><li>reference scripts</li><li>full wheel/conda packaging like pytorch</li><li>mobile compatibility</li></ul></li><li>separate repos: external contributors with direct access<ul><li>manual sync to fbcode; a lot easier to land code!  less<br />motion so lower risk</li></ul></li><li>coupling with pytorch? CI typically runs on nightlies<ul><li>pytorch itself tests against torchvision, canary against<br />extensibility mechanisms</li><li>mostly not using internal tools (e.g., TensorIterator),<br />too unstable (this would be good to fix)</li></ul></li><li>closer to research side of pytorch; francesco also part of papers</li></ul>
]]></content:encoded>
      <enclosure length="15542941" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/0aaa69c1-bbee-4e76-8a72-78dfbe63bcd4/audio/b868dddd-1f76-4c7d-a449-057a4f75ae17/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Anatomy of a domain library</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:11</itunes:duration>
      <itunes:summary>What&apos;s a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? What&apos;s cool about working on domain libraries?</itunes:summary>
      <itunes:subtitle>What&apos;s a domain library? Why do they exist? What do they do for you? What should you know about developing in PyTorch main library versus in a domain library? What&apos;s cool about working on domain libraries?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>37</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">da20c251-cc90-41fc-aa58-a9dd815cabe7</guid>
      <title>TensorAccessor</title>
      <description><![CDATA[<p>What's TensorAccessor? Why not just use a raw pointer? What's PackedTensorAccessor? What are some future directions for mixing statically typed and typed erase code inside PyTorch proper?</p><p><strong>Further reading.</strong> </p><ul><li>TensorAccessor source code, short and sweet <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TensorAccessor.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TensorAccessor.h</a></li><li>Legacy THCDeviceTensor <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCDeviceTensor.cuh">https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCDeviceTensor.cuh</a></li></ul>
]]></description>
      <pubDate>Wed, 23 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/tensoraccessor-Kdz8PkrO</link>
      <content:encoded><![CDATA[<p>What's TensorAccessor? Why not just use a raw pointer? What's PackedTensorAccessor? What are some future directions for mixing statically typed and typed erase code inside PyTorch proper?</p><p><strong>Further reading.</strong> </p><ul><li>TensorAccessor source code, short and sweet <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TensorAccessor.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TensorAccessor.h</a></li><li>Legacy THCDeviceTensor <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCDeviceTensor.cuh">https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCDeviceTensor.cuh</a></li></ul>
]]></content:encoded>
      <enclosure length="11205648" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/4e0af47d-a556-4a84-99b2-47ee7f2a4cb4/audio/6cf8e34e-9de6-4886-abee-f1d0474e19f9/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>TensorAccessor</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:11:40</itunes:duration>
      <itunes:summary>What&apos;s TensorAccessor? Why not just use a raw pointer? What&apos;s PackedTensorAccessor? What are some future directions for mixing statically typed and typed erase code inside PyTorch proper?</itunes:summary>
      <itunes:subtitle>What&apos;s TensorAccessor? Why not just use a raw pointer? What&apos;s PackedTensorAccessor? What are some future directions for mixing statically typed and typed erase code inside PyTorch proper?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>36</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">cdffc33d-92eb-4e63-8dcf-86f92dde00f0</guid>
      <title>Random number generators</title>
      <description><![CDATA[<p>Why are RNGs important? What is the generator concept? How do PyTorch's CPU and CUDA RNGs differ? What are some of the reasons why Philox is a good RNG for CUDA? Why doesn't the generator class have virtual methods for getting random numbers? What's with the next normal double and what does it have to do with Box Muller transform? What's up with csprng?</p><p><strong>Further reading.</strong></p><ul><li>CUDAGeneratorImpl has good notes about CUDA graph interaction and pointers to all of the rest of the stuff <a href="https://github.com/pytorch/pytorch/blob/1dee99c973fda55e1e9cac3d50b4d4982b6c6c26/aten/src/ATen/CUDAGeneratorImpl.h">https://github.com/pytorch/pytorch/blob/1dee99c973fda55e1e9cac3d50b4d4982b6c6c26/aten/src/ATen/CUDAGeneratorImpl.h</a></li><li>Transform uniformly distributed random numbers to other distributions with <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TransformationHelper.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TransformationHelper.h</a></li><li>torchcsprng <a href="https://github.com/pytorch/csprng">https://github.com/pytorch/csprng</a></li></ul>
]]></description>
      <pubDate>Tue, 22 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/random-number-generators-6Kx3U_87</link>
      <content:encoded><![CDATA[<p>Why are RNGs important? What is the generator concept? How do PyTorch's CPU and CUDA RNGs differ? What are some of the reasons why Philox is a good RNG for CUDA? Why doesn't the generator class have virtual methods for getting random numbers? What's with the next normal double and what does it have to do with Box Muller transform? What's up with csprng?</p><p><strong>Further reading.</strong></p><ul><li>CUDAGeneratorImpl has good notes about CUDA graph interaction and pointers to all of the rest of the stuff <a href="https://github.com/pytorch/pytorch/blob/1dee99c973fda55e1e9cac3d50b4d4982b6c6c26/aten/src/ATen/CUDAGeneratorImpl.h">https://github.com/pytorch/pytorch/blob/1dee99c973fda55e1e9cac3d50b4d4982b6c6c26/aten/src/ATen/CUDAGeneratorImpl.h</a></li><li>Transform uniformly distributed random numbers to other distributions with <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TransformationHelper.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/TransformationHelper.h</a></li><li>torchcsprng <a href="https://github.com/pytorch/csprng">https://github.com/pytorch/csprng</a></li></ul>
]]></content:encoded>
      <enclosure length="13830298" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/4d976cc1-6358-447f-ad7e-b2cfc32314ea/audio/ac0a334d-0e7e-41eb-bef6-9fedff739596/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Random number generators</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:24</itunes:duration>
      <itunes:summary>Why are RNGs important? What is the generator concept? How do PyTorch&apos;s CPU and CUDA RNGs differ? What are some of the reasons why Philox is a good RNG for CUDA? Why doesn&apos;t the generator class have virtual methods for getting random numbers? What&apos;s with the next normal double and what does it have to do with Box Muller transform? What&apos;s up with csprng?</itunes:summary>
      <itunes:subtitle>Why are RNGs important? What is the generator concept? How do PyTorch&apos;s CPU and CUDA RNGs differ? What are some of the reasons why Philox is a good RNG for CUDA? Why doesn&apos;t the generator class have virtual methods for getting random numbers? What&apos;s with the next normal double and what does it have to do with Box Muller transform? What&apos;s up with csprng?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>35</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">90213080-7cdd-41cb-9c70-13ef945a2593</guid>
      <title>vmap</title>
      <description><![CDATA[<p>What is vmap? How is it implemented? How does our implementation compare to JAX's? What is a good way of understanding what vmap does? What's up with random numbers? Why are there some issues with the vmap that PyTorch currently ships?</p><p><strong>Further reading.</strong></p><ul><li>Tracking issue for vmap support <a href="https://github.com/pytorch/pytorch/issues/42368">https://github.com/pytorch/pytorch/issues/42368</a></li><li>BatchedTensor source code <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/BatchedTensorImpl.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/BatchedTensorImpl.h</a> , logical-physical transformation helper code <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/VmapTransforms.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/VmapTransforms.h</a> (well documented, worth a read)</li><li>functorch, the better, more JAX-y implementation of vmap <a href="https://github.com/facebookresearch/functorch">https://github.com/facebookresearch/functorch</a></li><li>Autodidax <a href="https://jax.readthedocs.io/en/latest/autodidax.html">https://jax.readthedocs.io/en/latest/autodidax.html</a> which contains a super simple vmap implementation that is a good model for the internal implementation that PyTorch has</li></ul>
]]></description>
      <pubDate>Mon, 21 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/vmap-T4Bi__t_</link>
      <content:encoded><![CDATA[<p>What is vmap? How is it implemented? How does our implementation compare to JAX's? What is a good way of understanding what vmap does? What's up with random numbers? Why are there some issues with the vmap that PyTorch currently ships?</p><p><strong>Further reading.</strong></p><ul><li>Tracking issue for vmap support <a href="https://github.com/pytorch/pytorch/issues/42368">https://github.com/pytorch/pytorch/issues/42368</a></li><li>BatchedTensor source code <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/BatchedTensorImpl.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/BatchedTensorImpl.h</a> , logical-physical transformation helper code <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/VmapTransforms.h">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/VmapTransforms.h</a> (well documented, worth a read)</li><li>functorch, the better, more JAX-y implementation of vmap <a href="https://github.com/facebookresearch/functorch">https://github.com/facebookresearch/functorch</a></li><li>Autodidax <a href="https://jax.readthedocs.io/en/latest/autodidax.html">https://jax.readthedocs.io/en/latest/autodidax.html</a> which contains a super simple vmap implementation that is a good model for the internal implementation that PyTorch has</li></ul>
]]></content:encoded>
      <enclosure length="17073542" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/10e94df4-6cbb-4129-a71f-f41fc4f6f3ea/audio/7a48b6c9-9194-4cbe-82a1-500685efc7e0/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>vmap</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:47</itunes:duration>
      <itunes:summary>What is vmap? How is it implemented? How does our implementation compare to JAX&apos;s? What is a good way of understanding what vmap does? What&apos;s up with random numbers? Why are there some issues with the vmap that PyTorch currently ships?</itunes:summary>
      <itunes:subtitle>What is vmap? How is it implemented? How does our implementation compare to JAX&apos;s? What is a good way of understanding what vmap does? What&apos;s up with random numbers? Why are there some issues with the vmap that PyTorch currently ships?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>34</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">45e79bc1-7337-4c5b-a14b-ae1a3c1d54bd</guid>
      <title>Expect tests</title>
      <description><![CDATA[<p>What's an expect test? Why should you use them? Why is inline expect test better than out of line? How to write a good expect test? </p><p><strong>Further reading.</strong> expecttest source implementation https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/expecttest.py (only 311 lines!)</p>
]]></description>
      <pubDate>Fri, 18 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/expect-tests-xJNCFKcn</link>
      <content:encoded><![CDATA[<p>What's an expect test? Why should you use them? Why is inline expect test better than out of line? How to write a good expect test? </p><p><strong>Further reading.</strong> expecttest source implementation https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/expecttest.py (only 311 lines!)</p>
]]></content:encoded>
      <enclosure length="12902926" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/03e53f92-d285-481e-9ef8-40dc52cf65ad/audio/87a718ac-a23e-4140-bb68-5965f8f6d790/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Expect tests</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:26</itunes:duration>
      <itunes:summary>What&apos;s an expect test? Why should you use them? Why is inline expect test better than out of line? How to write a good expect test? </itunes:summary>
      <itunes:subtitle>What&apos;s an expect test? Why should you use them? Why is inline expect test better than out of line? How to write a good expect test? </itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>33</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">261f72f9-5e9f-4a6e-adc3-302814d9497b</guid>
      <title>XLA</title>
      <description><![CDATA[<p>What's PyTorch XLA? Why should you care? How is it implemented? How does PyTorch XLA trade off functionality versus ease of performance debugging? What are some new developments in this space?</p><p><strong>Further reading.</strong></p><ul><li>XLA's repo has lots of really good docs. Check out <a href="https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md">https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md</a> and also the main <a href="https://github.com/pytorch/xla/blob/master/README.md">https://github.com/pytorch/xla/blob/master/README.md</a></li><li>Alex Suhan's RFC about lazy core <a href="https://github.com/pytorch/rfcs/pull/18">https://github.com/pytorch/rfcs/pull/18</a></li></ul>
]]></description>
      <pubDate>Thu, 17 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/xla-PZxcTtCL</link>
      <content:encoded><![CDATA[<p>What's PyTorch XLA? Why should you care? How is it implemented? How does PyTorch XLA trade off functionality versus ease of performance debugging? What are some new developments in this space?</p><p><strong>Further reading.</strong></p><ul><li>XLA's repo has lots of really good docs. Check out <a href="https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md">https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md</a> and also the main <a href="https://github.com/pytorch/xla/blob/master/README.md">https://github.com/pytorch/xla/blob/master/README.md</a></li><li>Alex Suhan's RFC about lazy core <a href="https://github.com/pytorch/rfcs/pull/18">https://github.com/pytorch/rfcs/pull/18</a></li></ul>
]]></content:encoded>
      <enclosure length="15352633" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/be0a98e4-1e70-4c70-aeb0-76989c22c616/audio/8b907845-450f-4f60-9114-3b3d76020b4d/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>XLA</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:59</itunes:duration>
      <itunes:summary>What&apos;s PyTorch XLA? Why should you care? How is it implemented? How does PyTorch XLA trade off functionality versus ease of performance debugging? What are some new developments in this space?</itunes:summary>
      <itunes:subtitle>What&apos;s PyTorch XLA? Why should you care? How is it implemented? How does PyTorch XLA trade off functionality versus ease of performance debugging? What are some new developments in this space?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>32</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">29cd1d27-bd04-473a-8267-d93c76d044bb</guid>
      <title>TH</title>
      <description><![CDATA[<p>What is TH? Why might you care? What is so horrible about it? What the heck is the generic/ folder? Why are we porting everything to C++? What are some downsides of having ported all our TH code to C++?</p><p><strong>Further reading.</strong> </p><ul><li>The TH to ATen porting guide has lots of explanations of old school TH idioms https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide</li><li>Old notes about refcounting in TH https://github.com/pytorch/pytorch/blob/master/aten/src/README.md</li></ul>
]]></description>
      <pubDate>Wed, 16 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/th-0B4PmfMi</link>
      <content:encoded><![CDATA[<p>What is TH? Why might you care? What is so horrible about it? What the heck is the generic/ folder? Why are we porting everything to C++? What are some downsides of having ported all our TH code to C++?</p><p><strong>Further reading.</strong> </p><ul><li>The TH to ATen porting guide has lots of explanations of old school TH idioms https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide</li><li>Old notes about refcounting in TH https://github.com/pytorch/pytorch/blob/master/aten/src/README.md</li></ul>
]]></content:encoded>
      <enclosure length="10709508" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/9cd3549e-4619-427f-8241-6439dc9ae398/audio/7949545d-3d25-4477-8c1c-888b4023ccd6/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>TH</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:11:09</itunes:duration>
      <itunes:summary>What is TH? Why might you care? What is so horrible about it? What the heck is the generic/ folder? Why are we porting everything to C++? What are some downsides of having ported all our TH code to C++?</itunes:summary>
      <itunes:subtitle>What is TH? Why might you care? What is so horrible about it? What the heck is the generic/ folder? Why are we porting everything to C++? What are some downsides of having ported all our TH code to C++?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>31</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">dbd75646-b28e-4c00-b4c8-b9d89432ee5c</guid>
      <title>TorchScript</title>
      <description><![CDATA[<p>There is a really good TorchScript overview at <a href="https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md">https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md</a> and in this 20min podcast, I want to give you some of the highlights from this document.</p>
]]></description>
      <pubDate>Tue, 15 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torchscript-YX71Wx8D</link>
      <content:encoded><![CDATA[<p>There is a really good TorchScript overview at <a href="https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md">https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md</a> and in this 20min podcast, I want to give you some of the highlights from this document.</p>
]]></content:encoded>
      <enclosure length="19129485" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/1c680710-3875-4df4-a92f-cdfbe0b3ef7e/audio/adcfd3c6-87ea-4f25-a2d2-c36f614096d5/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>TorchScript</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:19:56</itunes:duration>
      <itunes:summary>There is a really good TorchScript overview at https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md and in this 20min podcast, I want to give you some of the highlights from this document.</itunes:summary>
      <itunes:subtitle>There is a really good TorchScript overview at https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md and in this 20min podcast, I want to give you some of the highlights from this document.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>30</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">701ae552-fbd7-4641-8ce5-7103db31a495</guid>
      <title>CMake</title>
      <description><![CDATA[<p>Why is PyTorch's build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it...</p><p><strong>Further reading.</strong></p><ul><li>The official CMake documentation is a great help and well worth reading <a href="https://cmake.org/documentation">https://cmake.org/documentation</a></li><li>If you work in torch/csrc chances are you'll need to edit this file <a href="https://github.com/pytorch/pytorch/blob/master/tools/build_variables.bzl">https://github.com/pytorch/pytorch/blob/master/tools/build_variables.bzl</a></li></ul><p><strong>Liner notes.</strong></p><ul><li>multiple build systems: cmake, buck, xplat buck, ovrsource buck, bazel<ul><li>tools/build_variables.bzl is read from cmake!  append_filelist<ul><li>but not used uniformly for all components! (ouch!)</li></ul></li></ul></li><li>mashed together ATen and Caffe2 build systems (e.g., main library libtorch_cpu is defined in caffe2/CMakeLists.txt)</li><li>cmake: not very much syntax, "everything is a function".  This means you can look up constructs relatively easily; e.g., even if() is a command</li><li>the general cmake model: "set a bunch of variables, run a bunch of commands".  cmake is VERY GREPPABLE<ul><li>but not everything is in CMakeLists.txt; check *.cmake too</li><li>the directory structure makes no sense, you really need to grep.<br />(doing a lot of set PARENT_SCOPE to propagate stuff)</li><li>renaming a file? grep for it</li><li>primary hazard of refactoring: need to make sure all the variables<br />are setup at the new location</li></ul></li><li>many directories are not recursive glob, beware of adding new directories</li><li>old school cmake: literally everything is stuffed in variables (CMAKE_CXX_FLAGS).  new school cmake: attach things to targets, things propagate when you depend on targets (public/private dependencies)</li><li>add_library: the most important thing</li><li>don't randomly change things and pray: have hypotheses and test them</li></ul>
]]></description>
      <pubDate>Mon, 14 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/cmake-9yhhMOWz</link>
      <content:encoded><![CDATA[<p>Why is PyTorch's build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it...</p><p><strong>Further reading.</strong></p><ul><li>The official CMake documentation is a great help and well worth reading <a href="https://cmake.org/documentation">https://cmake.org/documentation</a></li><li>If you work in torch/csrc chances are you'll need to edit this file <a href="https://github.com/pytorch/pytorch/blob/master/tools/build_variables.bzl">https://github.com/pytorch/pytorch/blob/master/tools/build_variables.bzl</a></li></ul><p><strong>Liner notes.</strong></p><ul><li>multiple build systems: cmake, buck, xplat buck, ovrsource buck, bazel<ul><li>tools/build_variables.bzl is read from cmake!  append_filelist<ul><li>but not used uniformly for all components! (ouch!)</li></ul></li></ul></li><li>mashed together ATen and Caffe2 build systems (e.g., main library libtorch_cpu is defined in caffe2/CMakeLists.txt)</li><li>cmake: not very much syntax, "everything is a function".  This means you can look up constructs relatively easily; e.g., even if() is a command</li><li>the general cmake model: "set a bunch of variables, run a bunch of commands".  cmake is VERY GREPPABLE<ul><li>but not everything is in CMakeLists.txt; check *.cmake too</li><li>the directory structure makes no sense, you really need to grep.<br />(doing a lot of set PARENT_SCOPE to propagate stuff)</li><li>renaming a file? grep for it</li><li>primary hazard of refactoring: need to make sure all the variables<br />are setup at the new location</li></ul></li><li>many directories are not recursive glob, beware of adding new directories</li><li>old school cmake: literally everything is stuffed in variables (CMAKE_CXX_FLAGS).  new school cmake: attach things to targets, things propagate when you depend on targets (public/private dependencies)</li><li>add_library: the most important thing</li><li>don't randomly change things and pray: have hypotheses and test them</li></ul>
]]></content:encoded>
      <enclosure length="17090092" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/aebcbd84-a53e-4101-b225-378d1a1666ab/audio/078a5aa1-01bb-4197-b79c-41fae0cb2b66/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>CMake</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:49</itunes:duration>
      <itunes:summary>Why is PyTorch&apos;s build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it...</itunes:summary>
      <itunes:subtitle>Why is PyTorch&apos;s build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it...</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>29</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">239e1d8d-f9ed-4dcd-b1ef-092ff2cf714f</guid>
      <title>torchdeploy</title>
      <description><![CDATA[<p>torchdeploy is a way of running multiple Python interpreters inside the same process. It can be used to deploy Python PyTorch programs in situations where the GIL is a problem, not the CPython interpreter. How does it work, and what kind of challenges does it pose for people who want to write code that calls from C++ to Python?</p><p><strong>Further reading.</strong></p><ul><li>How the torchdeploy build system works <a href="https://dev-discuss.pytorch.org/t/torch-deploy-the-build/238">https://dev-discuss.pytorch.org/t/torch-deploy-the-build/238</a></li><li>Description of the single interpreter per Tensor invariant <a href="https://github.com/pytorch/pytorch/issues/57756">https://github.com/pytorch/pytorch/issues/57756</a></li><li>Recent work on making it possible to load C extensions into torchdeploy <a href="https://dev-discuss.pytorch.org/t/running-multiple-python-interpreters-via-custom-dynamic-loading/241">https://dev-discuss.pytorch.org/t/running-multiple-python-interpreters-via-custom-dynamic-loading/241</a></li></ul>
]]></description>
      <pubDate>Fri, 11 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torchdeploy-Xi5_J1gp</link>
      <content:encoded><![CDATA[<p>torchdeploy is a way of running multiple Python interpreters inside the same process. It can be used to deploy Python PyTorch programs in situations where the GIL is a problem, not the CPython interpreter. How does it work, and what kind of challenges does it pose for people who want to write code that calls from C++ to Python?</p><p><strong>Further reading.</strong></p><ul><li>How the torchdeploy build system works <a href="https://dev-discuss.pytorch.org/t/torch-deploy-the-build/238">https://dev-discuss.pytorch.org/t/torch-deploy-the-build/238</a></li><li>Description of the single interpreter per Tensor invariant <a href="https://github.com/pytorch/pytorch/issues/57756">https://github.com/pytorch/pytorch/issues/57756</a></li><li>Recent work on making it possible to load C extensions into torchdeploy <a href="https://dev-discuss.pytorch.org/t/running-multiple-python-interpreters-via-custom-dynamic-loading/241">https://dev-discuss.pytorch.org/t/running-multiple-python-interpreters-via-custom-dynamic-loading/241</a></li></ul>
]]></content:encoded>
      <enclosure length="13157901" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/1b91c57d-8d99-4434-b75f-cba6420cdd4d/audio/029b6ece-eed5-4151-8597-58f5f20fcd1e/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>torchdeploy</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:13:42</itunes:duration>
      <itunes:summary>torchdeploy is a way of running multiple Python interpreters inside the same process. It can be used to deploy Python PyTorch programs in situations where the GIL is a problem, not the CPython interpreter. How does it work, and what kind of challenges does it pose for people who want to write code that calls from C++ to Python?</itunes:summary>
      <itunes:subtitle>torchdeploy is a way of running multiple Python interpreters inside the same process. It can be used to deploy Python PyTorch programs in situations where the GIL is a problem, not the CPython interpreter. How does it work, and what kind of challenges does it pose for people who want to write code that calls from C++ to Python?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>28</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">5ffe0f08-2328-4561-bf74-5e5debede8e5</guid>
      <title>C++ frontend</title>
      <description><![CDATA[<p>What's the C++ frontend? Why is avoiding templates so important? Why is Tensor a reference type? How do we simulate keyword arguments in C++? Where did the nn Module support in the C++ API come from? Why did we reimplement all modules in C++? How are modules implemented in C++? What are some performance challenges of writing Python in C++, and how are we working around them?</p><p><strong>Further reading.</strong></p><ul><li>C++ frontend tutorial <a href="https://pytorch.org/tutorials/advanced/cpp_frontend.html">https://pytorch.org/tutorials/advanced/cpp_frontend.html</a></li><li>Writing Python in C++ (a manifesto) <a href="https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)">https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)</a></li><li>MaybeOwned PR <a href="https://github.com/pytorch/pytorch/pull/53317">https://github.com/pytorch/pytorch/pull/53317</a></li></ul>
]]></description>
      <pubDate>Thu, 10 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/c-frontend-nJ5qKIPs</link>
      <content:encoded><![CDATA[<p>What's the C++ frontend? Why is avoiding templates so important? Why is Tensor a reference type? How do we simulate keyword arguments in C++? Where did the nn Module support in the C++ API come from? Why did we reimplement all modules in C++? How are modules implemented in C++? What are some performance challenges of writing Python in C++, and how are we working around them?</p><p><strong>Further reading.</strong></p><ul><li>C++ frontend tutorial <a href="https://pytorch.org/tutorials/advanced/cpp_frontend.html">https://pytorch.org/tutorials/advanced/cpp_frontend.html</a></li><li>Writing Python in C++ (a manifesto) <a href="https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)">https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)</a></li><li>MaybeOwned PR <a href="https://github.com/pytorch/pytorch/pull/53317">https://github.com/pytorch/pytorch/pull/53317</a></li></ul>
]]></content:encoded>
      <enclosure length="16428046" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/2f4286ce-b025-490b-bbc5-f32b89d4acb2/audio/83ccd79f-75a6-42dc-b370-7cd821a9fda7/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>C++ frontend</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:07</itunes:duration>
      <itunes:summary>What&apos;s the C++ frontend? Why is avoiding templates so important? Why is Tensor a reference type? How do we simulate keyword arguments in C++? Where did the nn Module support in the C++ API come from? Why did we reimplement all modules in C++? How are modules implemented in C++? What are some performance challenges of writing Python in C++, and how are we working around them?</itunes:summary>
      <itunes:subtitle>What&apos;s the C++ frontend? Why is avoiding templates so important? Why is Tensor a reference type? How do we simulate keyword arguments in C++? Where did the nn Module support in the C++ API come from? Why did we reimplement all modules in C++? How are modules implemented in C++? What are some performance challenges of writing Python in C++, and how are we working around them?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>27</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">c7ff50f7-b908-43ad-bf47-5ce56f91703e</guid>
      <title>PyObject preservation</title>
      <description><![CDATA[<p>Given two separately refcounted objects, how can you arrange for each of them to stay live so long as the other is live? Why doesn't just having a strong-strong or strong-weak reference between the two objects work? What is object resurrection in CPython? What's a finalizer and why does it make things more complicated? How does Python GC work?</p><p><strong>Further reading.</strong></p><ul><li>PyObject preservation PR <a href="https://github.com/pytorch/pytorch/pull/56017">https://github.com/pytorch/pytorch/pull/56017</a></li><li>Sam Gross's original PoC, which works fine if the two objects in question are both PyObjects <a href="https://github.com/colesbury/refcount/">https://github.com/colesbury/refcount/</a></li><li>PEP 442 Safe object finalization <a href="https://www.python.org/dev/peps/pep-0442/">https://www.python.org/dev/peps/pep-0442/</a></li><li>Essential reading about Python GC <a href="https://devguide.python.org/garbage_collector/">https://devguide.python.org/garbage_collector/</a></li></ul>
]]></description>
      <pubDate>Wed, 9 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation-q6sE1n7z</link>
      <content:encoded><![CDATA[<p>Given two separately refcounted objects, how can you arrange for each of them to stay live so long as the other is live? Why doesn't just having a strong-strong or strong-weak reference between the two objects work? What is object resurrection in CPython? What's a finalizer and why does it make things more complicated? How does Python GC work?</p><p><strong>Further reading.</strong></p><ul><li>PyObject preservation PR <a href="https://github.com/pytorch/pytorch/pull/56017">https://github.com/pytorch/pytorch/pull/56017</a></li><li>Sam Gross's original PoC, which works fine if the two objects in question are both PyObjects <a href="https://github.com/colesbury/refcount/">https://github.com/colesbury/refcount/</a></li><li>PEP 442 Safe object finalization <a href="https://www.python.org/dev/peps/pep-0442/">https://www.python.org/dev/peps/pep-0442/</a></li><li>Essential reading about Python GC <a href="https://devguide.python.org/garbage_collector/">https://devguide.python.org/garbage_collector/</a></li></ul>
]]></content:encoded>
      <enclosure length="15658903" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/c9317b54-5507-4a91-98e6-a2fa5ba2ac62/audio/744b6e71-9abb-432c-9340-d62e03ac675c/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>PyObject preservation</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:19</itunes:duration>
      <itunes:summary>Given two separately refcounted objects, how can you arrange for each of them to stay live so long as the other is live? Why doesn&apos;t just having a strong-strong or strong-weak reference between the two objects work? What is object resurrection in CPython? What&apos;s a finalizer and why does it make things more complicated? How does Python GC work?</itunes:summary>
      <itunes:subtitle>Given two separately refcounted objects, how can you arrange for each of them to stay live so long as the other is live? Why doesn&apos;t just having a strong-strong or strong-weak reference between the two objects work? What is object resurrection in CPython? What&apos;s a finalizer and why does it make things more complicated? How does Python GC work?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>26</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">1e48418d-1407-4c94-9aec-4ec4f5c7d3c0</guid>
      <title>Mobile selective build</title>
      <description><![CDATA[<p>What is mobile selective build? Why are we so obsessed with reducing binary size? How does selective build work? Why doesn't static linking just work? Why can't you just read out the ops used in a TorchScript model to determine what operators you actually need? What are the tradeoffs of statically determining the operator dependency graph versus tracing? What's up with the SELECTIVE_NAME macro? How the heck does selective build work at all when you have multiple mobile apps in a single Buck build system? What takeaways should I have as a regular PyTorch developer?</p><p><strong>Further reading:</strong></p><ul><li>Official open source mobile documentation on custom selective builds <a href="https://pytorch.org/mobile/android/#custom-build">https://pytorch.org/mobile/android/#custom-build</a></li><li>How to rebuild the op dependency yaml <a href="https://github.com/pytorch/pytorch/blob/master/tools/code_analyzer/build.sh">https://github.com/pytorch/pytorch/blob/master/tools/code_analyzer/build.sh</a></li></ul><p><strong>Liner notes:</strong></p><ul><li><p> binary size is premium; ship only what you actually need</p></li><li><p>big idea:</p><ul><li> get the ops your model needs -> apply this to build of pytorch</li></ul></li><li><p>get the ops your model needs</p><ul><li> TorchScript ~> read it out directly from the model itself</li><li>but what if ops use other ops?<ul><li>need a dependency graph.   done with static analysis llvm (jiakai)  ~> with a (possibly inaccurate) yaml checked in for easy kickstart if you don't want to run the pass (updated by bot, not operational since Feb, recommend rebuilding from scratch if you run into trouble)</li></ul></li><li>other possibility: dynamic tracing<ul><li>pro: no need for dependency graph, just look at what was called; works for dtypes</li><li>con: need representative inputs, if control flow might not cover everything</li></ul></li></ul></li><li><p>apply this to build of pytorch</p><ul><li>ordinarily: static linking ensures stuff that isn't used gets pruned<ul><li>but this doesn't work with distributed operator registration based on static initializers</li></ul></li><li>how?<ul><li>codegen - just don't generate it</li><li>no codegen - SELECTIVE_NAME - C++ doesn't support string in macro</li></ul></li><li>build system integration<ul><li>buck constraint: only one library<ul><li>therefore: generate multiple copies of glue library</li></ul></li><li>alt: atomize library into each operator.  caffe2 used to do this; each library takes a long time to build (1m) and crashes xcode because there's too many</li></ul></li></ul></li><li><p>common hiccups</p><ul><li>modify implementation details, some op is/isn't called anymore ~> error!  usually just means some yaml needs regenerating.  PyTorch Edge developers are very friendly and can help</li></ul></li></ul>
]]></description>
      <pubDate>Tue, 8 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/mobile-selective-build-Q85zgX26</link>
      <content:encoded><![CDATA[<p>What is mobile selective build? Why are we so obsessed with reducing binary size? How does selective build work? Why doesn't static linking just work? Why can't you just read out the ops used in a TorchScript model to determine what operators you actually need? What are the tradeoffs of statically determining the operator dependency graph versus tracing? What's up with the SELECTIVE_NAME macro? How the heck does selective build work at all when you have multiple mobile apps in a single Buck build system? What takeaways should I have as a regular PyTorch developer?</p><p><strong>Further reading:</strong></p><ul><li>Official open source mobile documentation on custom selective builds <a href="https://pytorch.org/mobile/android/#custom-build">https://pytorch.org/mobile/android/#custom-build</a></li><li>How to rebuild the op dependency yaml <a href="https://github.com/pytorch/pytorch/blob/master/tools/code_analyzer/build.sh">https://github.com/pytorch/pytorch/blob/master/tools/code_analyzer/build.sh</a></li></ul><p><strong>Liner notes:</strong></p><ul><li><p> binary size is premium; ship only what you actually need</p></li><li><p>big idea:</p><ul><li> get the ops your model needs -> apply this to build of pytorch</li></ul></li><li><p>get the ops your model needs</p><ul><li> TorchScript ~> read it out directly from the model itself</li><li>but what if ops use other ops?<ul><li>need a dependency graph.   done with static analysis llvm (jiakai)  ~> with a (possibly inaccurate) yaml checked in for easy kickstart if you don't want to run the pass (updated by bot, not operational since Feb, recommend rebuilding from scratch if you run into trouble)</li></ul></li><li>other possibility: dynamic tracing<ul><li>pro: no need for dependency graph, just look at what was called; works for dtypes</li><li>con: need representative inputs, if control flow might not cover everything</li></ul></li></ul></li><li><p>apply this to build of pytorch</p><ul><li>ordinarily: static linking ensures stuff that isn't used gets pruned<ul><li>but this doesn't work with distributed operator registration based on static initializers</li></ul></li><li>how?<ul><li>codegen - just don't generate it</li><li>no codegen - SELECTIVE_NAME - C++ doesn't support string in macro</li></ul></li><li>build system integration<ul><li>buck constraint: only one library<ul><li>therefore: generate multiple copies of glue library</li></ul></li><li>alt: atomize library into each operator.  caffe2 used to do this; each library takes a long time to build (1m) and crashes xcode because there's too many</li></ul></li></ul></li><li><p>common hiccups</p><ul><li>modify implementation details, some op is/isn't called anymore ~> error!  usually just means some yaml needs regenerating.  PyTorch Edge developers are very friendly and can help</li></ul></li></ul>
]]></content:encoded>
      <enclosure length="15387416" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/af1e5c91-db8c-4b20-a724-250d7347c6ff/audio/23fb5bd7-f51f-4a5f-995a-5d40d14c2de7/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Mobile selective build</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:02</itunes:duration>
      <itunes:summary>What is mobile selective build? Why are we so obsessed with reducing binary size? How does selective build work? Why doesn&apos;t static linking just work? Why can&apos;t you just read out the ops used in a TorchScript model to determine what operators you actually need? What are the tradeoffs of statically determining the operator dependency graph versus tracing? What&apos;s up with the SELECTIVE_NAME macro? How the heck does selective build work at all when you have multiple mobile apps in a single Buck build system? What takeaways should I have as a regular PyTorch developer?</itunes:summary>
      <itunes:subtitle>What is mobile selective build? Why are we so obsessed with reducing binary size? How does selective build work? Why doesn&apos;t static linking just work? Why can&apos;t you just read out the ops used in a TorchScript model to determine what operators you actually need? What are the tradeoffs of statically determining the operator dependency graph versus tracing? What&apos;s up with the SELECTIVE_NAME macro? How the heck does selective build work at all when you have multiple mobile apps in a single Buck build system? What takeaways should I have as a regular PyTorch developer?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>25</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">7dbc228a-8d65-4a8c-812e-feb9d47ae791</guid>
      <title>torch.nn</title>
      <description><![CDATA[<p>What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What's the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules?</p><p><strong>Further reading:</strong></p><ul><li>Implementation of nn.Module <a href="https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py">https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py</a></li><li>nn.Module is complicated and that means its sometimes a bit slow. Some analysis at <a href="https://dev-discuss.pytorch.org/t/overhead-in-nn-module-causing-massive-slowdowns-compared-to-raw-cublas-or-torchscript/110">https://dev-discuss.pytorch.org/t/overhead-in-nn-module-causing-massive-slowdowns-compared-to-raw-cublas-or-torchscript/110</a></li><li>Lazy modules PR <a href="https://github.com/pytorch/pytorch/pull/44538">https://github.com/pytorch/pytorch/pull/44538</a> and factory kwargs <a href="https://github.com/pytorch/pytorch/pull/54508">https://github.com/pytorch/pytorch/pull/54508</a></li></ul><p><strong>Liner notes:</strong></p><ul><li> python for hackability (c++ is reimplemented)</li><li>parameters<ul><li> parameter collection (for optimization)</li><li> buffers: not considered optimizable</li></ul></li><li>modules<ul><li> functorial operation (_apply)</li><li> jit script: staged computation (init is not scripted)</li><li> <code>__call__</code> to forward (extra instrumentation)</li><li> serialization / state_dict</li></ul></li><li> new stuff: device kwarg (joel schlosser)</li><li> new stuff: lazy modules (emcastillo)</li><li> open problems: parameter initialization</li></ul>
]]></description>
      <pubDate>Mon, 7 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torchnn-tJbhUiad</link>
      <content:encoded><![CDATA[<p>What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What's the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules?</p><p><strong>Further reading:</strong></p><ul><li>Implementation of nn.Module <a href="https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py">https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py</a></li><li>nn.Module is complicated and that means its sometimes a bit slow. Some analysis at <a href="https://dev-discuss.pytorch.org/t/overhead-in-nn-module-causing-massive-slowdowns-compared-to-raw-cublas-or-torchscript/110">https://dev-discuss.pytorch.org/t/overhead-in-nn-module-causing-massive-slowdowns-compared-to-raw-cublas-or-torchscript/110</a></li><li>Lazy modules PR <a href="https://github.com/pytorch/pytorch/pull/44538">https://github.com/pytorch/pytorch/pull/44538</a> and factory kwargs <a href="https://github.com/pytorch/pytorch/pull/54508">https://github.com/pytorch/pytorch/pull/54508</a></li></ul><p><strong>Liner notes:</strong></p><ul><li> python for hackability (c++ is reimplemented)</li><li>parameters<ul><li> parameter collection (for optimization)</li><li> buffers: not considered optimizable</li></ul></li><li>modules<ul><li> functorial operation (_apply)</li><li> jit script: staged computation (init is not scripted)</li><li> <code>__call__</code> to forward (extra instrumentation)</li><li> serialization / state_dict</li></ul></li><li> new stuff: device kwarg (joel schlosser)</li><li> new stuff: lazy modules (emcastillo)</li><li> open problems: parameter initialization</li></ul>
]]></content:encoded>
      <enclosure length="13714238" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/b6f28bbf-4258-40a5-b30a-35bcae552ab1/audio/af07fabb-41dc-4956-a8c8-bfb1601aef7b/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>torch.nn</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:18</itunes:duration>
      <itunes:summary>What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What&apos;s the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules?</itunes:summary>
      <itunes:subtitle>What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What&apos;s the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>24</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">49aad913-868b-4671-b5ec-bf248e58d1cd</guid>
      <title>Code generation</title>
      <description><![CDATA[<p>Why does PyTorch use code generation as part of its build process? Why doesn't it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?</p><p><strong>Further reading.</strong></p><ul><li>Top level file for the new code generation pipeline <a href="https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.py">https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.py</a></li><li>Out of tree external backend code generation from Brian Hirsh: <a href="https://github.com/pytorch/xla/issues/2871">https://github.com/pytorch/xla/issues/2871</a></li><li>Documentation for native_functions.yaml <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md</a> (have you seen this README before? Yes you've seen this README before. Imma post it again.)</li></ul><p><strong>Outline:</strong></p><ul><li>High level: reduce the amount of code in PyTorch, easier to develop</li><li>Strongly typed python</li><li>Stuff we're using codegen for<ul><li>Meta point: stuff c++ metaprogramming can't do</li><li>C++ apis (functions, methods on classes)<ul><li>Especially for forwarding (operator dot doko)</li><li>Prototypes for c++ to implement</li></ul></li><li>YAML files used by external frameworks for binding (accidental)</li><li>Python arg parsing</li><li>pyi generation</li><li>Autograd classes for saving saved data</li><li>Otherwise complicated constexpr computation (e.g., parsing JIT<br />schema)</li></ul></li><li>Pros<ul><li>Better surface syntax (native_functions.yaml, jit schema,<br />derivatives.yaml)</li><li>Better error messages (template messages famously bad)</li><li>Easier to organize complicated code; esp nontrivial input<br />data structure</li><li>Easier to debug by looking at generated code</li></ul></li><li>Con<ul><li>Not as portable (template can be used by anyone)</li><li>Less good modeling for C++ type based metaprogramming (we've replicated a crappy version of C++ type system in our codegen)</li></ul></li><li>Counterpoints in the design space<ul><li>C++ templates: just as efficient</li><li>Boxed fallback: simpler, less efficient</li></ul></li><li>Open question: can you have best of both worlds, e.g., with partially evaluated interpreters?</li></ul>
]]></description>
      <pubDate>Fri, 4 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/code-generation-0vyr_o3t</link>
      <content:encoded><![CDATA[<p>Why does PyTorch use code generation as part of its build process? Why doesn't it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?</p><p><strong>Further reading.</strong></p><ul><li>Top level file for the new code generation pipeline <a href="https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.py">https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.py</a></li><li>Out of tree external backend code generation from Brian Hirsh: <a href="https://github.com/pytorch/xla/issues/2871">https://github.com/pytorch/xla/issues/2871</a></li><li>Documentation for native_functions.yaml <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md</a> (have you seen this README before? Yes you've seen this README before. Imma post it again.)</li></ul><p><strong>Outline:</strong></p><ul><li>High level: reduce the amount of code in PyTorch, easier to develop</li><li>Strongly typed python</li><li>Stuff we're using codegen for<ul><li>Meta point: stuff c++ metaprogramming can't do</li><li>C++ apis (functions, methods on classes)<ul><li>Especially for forwarding (operator dot doko)</li><li>Prototypes for c++ to implement</li></ul></li><li>YAML files used by external frameworks for binding (accidental)</li><li>Python arg parsing</li><li>pyi generation</li><li>Autograd classes for saving saved data</li><li>Otherwise complicated constexpr computation (e.g., parsing JIT<br />schema)</li></ul></li><li>Pros<ul><li>Better surface syntax (native_functions.yaml, jit schema,<br />derivatives.yaml)</li><li>Better error messages (template messages famously bad)</li><li>Easier to organize complicated code; esp nontrivial input<br />data structure</li><li>Easier to debug by looking at generated code</li></ul></li><li>Con<ul><li>Not as portable (template can be used by anyone)</li><li>Less good modeling for C++ type based metaprogramming (we've replicated a crappy version of C++ type system in our codegen)</li></ul></li><li>Counterpoints in the design space<ul><li>C++ templates: just as efficient</li><li>Boxed fallback: simpler, less efficient</li></ul></li><li>Open question: can you have best of both worlds, e.g., with partially evaluated interpreters?</li></ul>
]]></content:encoded>
      <enclosure length="16181137" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/44d7820d-218f-41eb-9539-b34fc735c8bf/audio/b01b1b34-a2a6-4a13-a3f4-8ef20170e246/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Code generation</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:51</itunes:duration>
      <itunes:summary>Why does PyTorch use code generation as part of its build process? Why doesn&apos;t it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?</itunes:summary>
      <itunes:subtitle>Why does PyTorch use code generation as part of its build process? Why doesn&apos;t it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>23</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">12c6a7a5-c076-4f43-ab3d-b04daef9c5bf</guid>
      <title>Why is autograd so complicated</title>
      <description><![CDATA[<p>Why is autograd so complicated? What are the constraints and features that go into making it complicated? What's up with it being written in C++? What's with derivatives.yaml and code generation? What's going on with views and mutation? What's up with hooks and anomaly mode? What's reentrant execution? Why is it relevant to checkpointing? What's the distributed autograd engine?</p><p><strong>Further reading.</strong></p><ul><li>Autograd notes in the docs <a href="https://pytorch.org/docs/stable/notes/autograd.html">https://pytorch.org/docs/stable/notes/autograd.html</a></li><li>derivatives.yaml <a href="https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml">https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml</a></li><li>Paper on autograd engine in PyTorch <a href="https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf">https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf</a></li></ul>
]]></description>
      <pubDate>Thu, 3 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/why-is-autograd-so-complicated-DJOOZoP0</link>
      <content:encoded><![CDATA[<p>Why is autograd so complicated? What are the constraints and features that go into making it complicated? What's up with it being written in C++? What's with derivatives.yaml and code generation? What's going on with views and mutation? What's up with hooks and anomaly mode? What's reentrant execution? Why is it relevant to checkpointing? What's the distributed autograd engine?</p><p><strong>Further reading.</strong></p><ul><li>Autograd notes in the docs <a href="https://pytorch.org/docs/stable/notes/autograd.html">https://pytorch.org/docs/stable/notes/autograd.html</a></li><li>derivatives.yaml <a href="https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml">https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml</a></li><li>Paper on autograd engine in PyTorch <a href="https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf">https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf</a></li></ul>
]]></content:encoded>
      <enclosure length="15123232" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/eb7de989-4f7a-46bf-9b39-28adbf394bd2/audio/29ff1bb0-9c13-420f-8c77-00ad99c0d24d/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Why is autograd so complicated</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:45</itunes:duration>
      <itunes:summary>Why is autograd so complicated? What are the constraints and features that go into making it complicated? What&apos;s up with it being written in C++? What&apos;s with derivatives.yaml and code generation? What&apos;s going on with views and mutation? What&apos;s up with hooks and anomaly mode? What&apos;s reentrant execution? Why is it relevant to checkpointing? What&apos;s the distributed autograd engine?</itunes:summary>
      <itunes:subtitle>Why is autograd so complicated? What are the constraints and features that go into making it complicated? What&apos;s up with it being written in C++? What&apos;s with derivatives.yaml and code generation? What&apos;s going on with views and mutation? What&apos;s up with hooks and anomaly mode? What&apos;s reentrant execution? Why is it relevant to checkpointing? What&apos;s the distributed autograd engine?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>22</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">57c13726-bbd1-4901-ac13-8ca2f941004e</guid>
      <title>__torch_function__</title>
      <description><![CDATA[<p>What is <code>__torch_function__</code>? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is <code>__torch_function__</code> a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it?</p><p><strong>Further reading.</strong></p><ul><li><code>__torch_function__</code> RFC: <a href="https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md">https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md</a></li><li>One of the original GitHub issues tracking the overall design discussion <a href="https://github.com/pytorch/pytorch/issues/24015">https://github.com/pytorch/pytorch/issues/24015</a></li><li>Documentation for using <code>__torch_function__</code> <a href="https://pytorch.org/docs/stable/notes/extending.html#extending-torch">https://pytorch.org/docs/stable/notes/extending.html#extending-torch</a></li></ul>
]]></description>
      <pubDate>Wed, 2 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/torch-function-Amez_iMz</link>
      <content:encoded><![CDATA[<p>What is <code>__torch_function__</code>? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is <code>__torch_function__</code> a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it?</p><p><strong>Further reading.</strong></p><ul><li><code>__torch_function__</code> RFC: <a href="https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md">https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md</a></li><li>One of the original GitHub issues tracking the overall design discussion <a href="https://github.com/pytorch/pytorch/issues/24015">https://github.com/pytorch/pytorch/issues/24015</a></li><li>Documentation for using <code>__torch_function__</code> <a href="https://pytorch.org/docs/stable/notes/extending.html#extending-torch">https://pytorch.org/docs/stable/notes/extending.html#extending-torch</a></li></ul>
]]></content:encoded>
      <enclosure length="16327060" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/a4771092-fca8-481c-90aa-5971104e95e9/audio/2e9c7b36-1846-45da-acc5-a157364b3348/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>__torch_function__</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:00</itunes:duration>
      <itunes:summary>What is `__torch_function__`? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is `__torch_function__` a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it?</itunes:summary>
      <itunes:subtitle>What is `__torch_function__`? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is `__torch_function__` a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>21</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">4b624f03-f48d-4739-892d-cb8ff1b8ba86</guid>
      <title>TensorIterator</title>
      <description><![CDATA[<p>You walk into the whiteboard room to do a technical interview. The interviewer looks you straight in the eye and says, "OK, can you show me how to add the elements of two lists together?" Confused, you write down a simple for loop that iterates through each element and adds them together. Your interviewer rubs his hands together evilly and cackles, "OK, let's make it more complicated."</p><p>What does TensorIterator do? Why the heck is TensorIterator so complicated? What's going on with broadcasting? Type promotion? Overlap checks? Layout? Dimension coalescing? Parallelization? Vectorization?</p><p><strong>Further reading.</strong></p><ul><li>PyTorch TensorIterator internals <a href="https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/">https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/</a></li><li>Why is TensorIterator so slow <a href="https://dev-discuss.pytorch.org/t/comparing-the-performance-of-0-4-1-and-master/136">https://dev-discuss.pytorch.org/t/comparing-the-performance-of-0-4-1-and-master/136</a></li><li>Broadcasting <a href="https://pytorch.org/docs/stable/notes/broadcasting.html">https://pytorch.org/docs/stable/notes/broadcasting.html</a> and type promotion <a href="https://pytorch.org/docs/stable/tensor_attributes.html#type-promotion-doc">https://pytorch.org/docs/stable/tensor_attributes.html#type-promotion-doc</a></li></ul>
]]></description>
      <pubDate>Tue, 1 Jun 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/tensoriterator-X_wj9HOy</link>
      <content:encoded><![CDATA[<p>You walk into the whiteboard room to do a technical interview. The interviewer looks you straight in the eye and says, "OK, can you show me how to add the elements of two lists together?" Confused, you write down a simple for loop that iterates through each element and adds them together. Your interviewer rubs his hands together evilly and cackles, "OK, let's make it more complicated."</p><p>What does TensorIterator do? Why the heck is TensorIterator so complicated? What's going on with broadcasting? Type promotion? Overlap checks? Layout? Dimension coalescing? Parallelization? Vectorization?</p><p><strong>Further reading.</strong></p><ul><li>PyTorch TensorIterator internals <a href="https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/">https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/</a></li><li>Why is TensorIterator so slow <a href="https://dev-discuss.pytorch.org/t/comparing-the-performance-of-0-4-1-and-master/136">https://dev-discuss.pytorch.org/t/comparing-the-performance-of-0-4-1-and-master/136</a></li><li>Broadcasting <a href="https://pytorch.org/docs/stable/notes/broadcasting.html">https://pytorch.org/docs/stable/notes/broadcasting.html</a> and type promotion <a href="https://pytorch.org/docs/stable/tensor_attributes.html#type-promotion-doc">https://pytorch.org/docs/stable/tensor_attributes.html#type-promotion-doc</a></li></ul>
]]></content:encoded>
      <enclosure length="17124240" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/57b3344d-0453-40f7-aa52-64eb7043f291/audio/4e2f21f3-6b3b-4b42-a2ba-062c6aafe3c3/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>TensorIterator</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:50</itunes:duration>
      <itunes:summary>You walk into the whiteboard room to do a technical interview. The interviewer looks you straight in the eye and says, &quot;OK, can you show me how to add the elements of two lists together?&quot; Confused, you write down a simple for loop that iterates through each element and adds them together. Your interviewer rubs his hands together evilly and cackles, &quot;OK, let&apos;s make it more complicated.&quot; What does TensorIterator do? Why the heck is TensorIterator so complicated? What&apos;s going on with broadcasting? Type promotion? Overlap checks? Layout? Dimension coalescing? Parallelization? Vectorization?</itunes:summary>
      <itunes:subtitle>You walk into the whiteboard room to do a technical interview. The interviewer looks you straight in the eye and says, &quot;OK, can you show me how to add the elements of two lists together?&quot; Confused, you write down a simple for loop that iterates through each element and adds them together. Your interviewer rubs his hands together evilly and cackles, &quot;OK, let&apos;s make it more complicated.&quot; What does TensorIterator do? Why the heck is TensorIterator so complicated? What&apos;s going on with broadcasting? Type promotion? Overlap checks? Layout? Dimension coalescing? Parallelization? Vectorization?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>20</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">0566d657-78ae-477d-aa87-3bfcb12ce39e</guid>
      <title>native_functions.yaml</title>
      <description><![CDATA[<p>What does native_functions.yaml have to do with the TorchScript compiler? What multiple use cases is native_functions.yaml trying to serve? What's up with the JIT schema type system? Why isn't it just Python types? What the heck is the (a!) thingy inside the schema? Why is it important that I actually annotate all of my functions accurately with this information? Why is my seemingly BC change to native_functions.yaml actually breaking people's code? Do I have to understand the entire compiler to understand how to work with these systems?</p><p><strong>Further reading.</strong></p><ul><li>native_functions.yaml README <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md</a></li><li>Tracking issue for serializing default arguments <a href="https://github.com/pytorch/pytorch/issues/54613">https://github.com/pytorch/pytorch/issues/54613</a></li><li>Test for BC breaking changes in native_functions.yaml <a href="https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/check_backward_compatibility.py">https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/check_backward_compatibility.py</a></li></ul>
]]></description>
      <pubDate>Fri, 28 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/native-functions-yaml-6rTMkSE_</link>
      <content:encoded><![CDATA[<p>What does native_functions.yaml have to do with the TorchScript compiler? What multiple use cases is native_functions.yaml trying to serve? What's up with the JIT schema type system? Why isn't it just Python types? What the heck is the (a!) thingy inside the schema? Why is it important that I actually annotate all of my functions accurately with this information? Why is my seemingly BC change to native_functions.yaml actually breaking people's code? Do I have to understand the entire compiler to understand how to work with these systems?</p><p><strong>Further reading.</strong></p><ul><li>native_functions.yaml README <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md</a></li><li>Tracking issue for serializing default arguments <a href="https://github.com/pytorch/pytorch/issues/54613">https://github.com/pytorch/pytorch/issues/54613</a></li><li>Test for BC breaking changes in native_functions.yaml <a href="https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/check_backward_compatibility.py">https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/check_backward_compatibility.py</a></li></ul>
]]></content:encoded>
      <enclosure length="14915863" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/a3e0135a-f170-430a-9493-445502e7ec4d/audio/ea78a67b-5ad4-4288-9c72-251465b3b798/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>native_functions.yaml</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:32</itunes:duration>
      <itunes:summary>What does native_functions.yaml have to do with the TorchScript compiler? What multiple use cases is native_functions.yaml trying to serve? What&apos;s up with the JIT schema type system? Why isn&apos;t it just Python types? What the heck is the (a!) thingy inside the schema? Why is it important that I actually annotate all of my functions accurately with this information? Why is my seemingly BC change to native_functions.yaml actually breaking people&apos;s code? Do I have to understand the entire compiler to understand how to work with these systems?</itunes:summary>
      <itunes:subtitle>What does native_functions.yaml have to do with the TorchScript compiler? What multiple use cases is native_functions.yaml trying to serve? What&apos;s up with the JIT schema type system? Why isn&apos;t it just Python types? What the heck is the (a!) thingy inside the schema? Why is it important that I actually annotate all of my functions accurately with this information? Why is my seemingly BC change to native_functions.yaml actually breaking people&apos;s code? Do I have to understand the entire compiler to understand how to work with these systems?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>19</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">44b9ab3b-80e0-429a-9117-508ae98f0ade</guid>
      <title>Serialization</title>
      <description><![CDATA[<p>What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What's the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some design principles for the serialization format? Why are there two implementations of serialization in PyTorch? Is the fact that PyTorch uses pickling for serialization mean that our serialization format is insecure?</p><p><strong>Further reading.</strong></p><ul><li>TorchScript serialization design doc <a href="https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/docs/serialization.md">https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/docs/serialization.md</a></li><li>Evolution of serialization formats over time <a href="https://github.com/pytorch/pytorch/issues/31877">https://github.com/pytorch/pytorch/issues/31877</a></li><li>Code pointers:<ul><li>Tensor <code>__reduce_ex__</code> <a href="https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/_tensor.py#L97-L178">https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/_tensor.py#L97-L178</a></li><li>Python side serialization <a href="https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/serialization.py#L384-L499">https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/serialization.py#L384-L499</a></li><li>C++ side serialization <a href="https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit/serialization">https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit/serialization</a></li></ul></li></ul>
]]></description>
      <pubDate>Thu, 27 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/serialization-IxNkpUJ2</link>
      <content:encoded><![CDATA[<p>What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What's the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some design principles for the serialization format? Why are there two implementations of serialization in PyTorch? Is the fact that PyTorch uses pickling for serialization mean that our serialization format is insecure?</p><p><strong>Further reading.</strong></p><ul><li>TorchScript serialization design doc <a href="https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/docs/serialization.md">https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/docs/serialization.md</a></li><li>Evolution of serialization formats over time <a href="https://github.com/pytorch/pytorch/issues/31877">https://github.com/pytorch/pytorch/issues/31877</a></li><li>Code pointers:<ul><li>Tensor <code>__reduce_ex__</code> <a href="https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/_tensor.py#L97-L178">https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/_tensor.py#L97-L178</a></li><li>Python side serialization <a href="https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/serialization.py#L384-L499">https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/serialization.py#L384-L499</a></li><li>C++ side serialization <a href="https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit/serialization">https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit/serialization</a></li></ul></li></ul>
]]></content:encoded>
      <enclosure length="16412303" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/3edd82b3-2f66-4d98-a02f-b11e0f777d0e/audio/b6027d6e-d163-4b89-9646-4358331dd9f5/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Serialization</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:06</itunes:duration>
      <itunes:summary>What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What&apos;s the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some design principles for the serialization format? Why are there two implementations of serialization in PyTorch? Is the fact that PyTorch uses pickling for serialization mean that our serialization format is insecure?</itunes:summary>
      <itunes:subtitle>What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What&apos;s the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some design principles for the serialization format? Why are there two implementations of serialization in PyTorch? Is the fact that PyTorch uses pickling for serialization mean that our serialization format is insecure?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>18</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">ff161a36-bfc5-43dc-b1c3-65d1b201961b</guid>
      <title>Continuous integration</title>
      <description><![CDATA[<p>How is our CI put together? What is the history of the CI? What constraints are under the CI? Why does the CI use Docker? Why are build and test split into two phases? Why are some parts of the CI so convoluted? How does the HUD work? What kinds of configurations is PyTorch tested under? How did we decide what configurations to test?  What are some of the weird CI configurations? What's up with the XLA CI? What's going on with the Facebook internal builds? </p><p><strong>Further reading.</strong></p><ul><li>The CI HUD for viewing the status of master https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master</li><li>Structure of CI https://github.com/pytorch/pytorch/blob/master/.circleci/README.md</li><li>How to debug Windows problems on CircleCI https://github.com/pytorch/pytorch/wiki/Debugging-Windows-with-Remote-Desktop-or-CDB-(CLI-windbg)-on-CircleCI</li></ul>
]]></description>
      <pubDate>Wed, 26 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/continuous-integration-ZdlBe0Pa</link>
      <content:encoded><![CDATA[<p>How is our CI put together? What is the history of the CI? What constraints are under the CI? Why does the CI use Docker? Why are build and test split into two phases? Why are some parts of the CI so convoluted? How does the HUD work? What kinds of configurations is PyTorch tested under? How did we decide what configurations to test?  What are some of the weird CI configurations? What's up with the XLA CI? What's going on with the Facebook internal builds? </p><p><strong>Further reading.</strong></p><ul><li>The CI HUD for viewing the status of master https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master</li><li>Structure of CI https://github.com/pytorch/pytorch/blob/master/.circleci/README.md</li><li>How to debug Windows problems on CircleCI https://github.com/pytorch/pytorch/wiki/Debugging-Windows-with-Remote-Desktop-or-CDB-(CLI-windbg)-on-CircleCI</li></ul>
]]></content:encoded>
      <enclosure length="16201880" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/83949ab7-63bb-483d-ba96-f453996bca35/audio/8bdb6ae3-56f4-4c85-9dc8-8b2520c46092/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Continuous integration</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:53</itunes:duration>
      <itunes:summary>How is our CI put together? What is the history of the CI? What constraints are under the CI? Why does the CI use Docker? Why are build and test split into two phases? Why are some parts of the CI so convoluted? How does the HUD work? What kinds of configurations is PyTorch tested under? How did we decide what configurations to test?  What are some of the weird CI configurations? What&apos;s up with the XLA CI? What&apos;s going on with the Facebook internal builds? </itunes:summary>
      <itunes:subtitle>How is our CI put together? What is the history of the CI? What constraints are under the CI? Why does the CI use Docker? Why are build and test split into two phases? Why are some parts of the CI so convoluted? How does the HUD work? What kinds of configurations is PyTorch tested under? How did we decide what configurations to test?  What are some of the weird CI configurations? What&apos;s up with the XLA CI? What&apos;s going on with the Facebook internal builds? </itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>17</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">203169a3-8055-4b42-9c15-5038d76a04ad</guid>
      <title>Stacked diffs and ghstack</title>
      <description><![CDATA[<p>What's a stacked diff? Why might you want to do it? What does the workflow for stacked diffs with ghstack look like? How to use interactive rebase to edit earlier diffs in my stack? How can you actually submit a stacked diff to PyTorch? What are some things to be aware of when using ghstack?</p><p><strong>Further reading.</strong></p><ul><li>The ghstack repository <a href="https://github.com/ezyang/ghstack/">https://github.com/ezyang/ghstack/</a></li><li>A decent explanation of how the stacked diff workflow works on Phabricator, including how to do rebases <a href="https://kurtisnusbaum.medium.com/stacked-diffs-keeping-phabricator-diffs-small-d9964f4dcfa6">https://kurtisnusbaum.medium.com/stacked-diffs-keeping-phabricator-diffs-small-d9964f4dcfa6</a></li></ul>
]]></description>
      <pubDate>Tue, 25 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/stacked-diffs-and-ghstack-Xhg_sN9W</link>
      <content:encoded><![CDATA[<p>What's a stacked diff? Why might you want to do it? What does the workflow for stacked diffs with ghstack look like? How to use interactive rebase to edit earlier diffs in my stack? How can you actually submit a stacked diff to PyTorch? What are some things to be aware of when using ghstack?</p><p><strong>Further reading.</strong></p><ul><li>The ghstack repository <a href="https://github.com/ezyang/ghstack/">https://github.com/ezyang/ghstack/</a></li><li>A decent explanation of how the stacked diff workflow works on Phabricator, including how to do rebases <a href="https://kurtisnusbaum.medium.com/stacked-diffs-keeping-phabricator-diffs-small-d9964f4dcfa6">https://kurtisnusbaum.medium.com/stacked-diffs-keeping-phabricator-diffs-small-d9964f4dcfa6</a></li></ul>
]]></content:encoded>
      <enclosure length="11630747" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/495cd0bc-ca62-4d72-8d2b-152d3b3bd6e7/audio/ddf02f4f-3cdd-4c11-b5b3-6c35f0a5f3eb/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Stacked diffs and ghstack</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:12:07</itunes:duration>
      <itunes:summary>What&apos;s a stacked diff? Why might you want to do it? What does the workflow for stacked diffs with ghstack look like? How to use interactive rebase to edit earlier diffs in my stack? How can you actually submit a stacked diff to PyTorch? What are some things to be aware of when using ghstack?
</itunes:summary>
      <itunes:subtitle>What&apos;s a stacked diff? Why might you want to do it? What does the workflow for stacked diffs with ghstack look like? How to use interactive rebase to edit earlier diffs in my stack? How can you actually submit a stacked diff to PyTorch? What are some things to be aware of when using ghstack?
</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>16</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">ceffdd90-4460-43d1-9c32-d82ba5d55a8d</guid>
      <title>Shared memory</title>
      <description><![CDATA[<p>What is shared memory? How is it used in your operating system? How is it used in PyTorch? What's shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What's the point of PyTorch's shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory?</p><p><strong>Further reading.</strong></p><ul><li>Implementations of vanilla shared memory allocator <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/TH/THAllocator.cpp">https://github.com/pytorch/pytorch/blob/master/aten/src/TH/THAllocator.cpp</a> and the fancy managed allocator <a href="https://github.com/pytorch/pytorch/blob/master/torch/lib/libshm/libshm.h">https://github.com/pytorch/pytorch/blob/master/torch/lib/libshm/libshm.h</a></li><li>Multiprocessing best practices describes some things one should be careful about when working with shared memory <a href="https://pytorch.org/docs/stable/notes/multiprocessing.html">https://pytorch.org/docs/stable/notes/multiprocessing.html</a></li><li>More details on how CUDA shared memory works <a href="https://pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details">https://pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details</a></li></ul>
]]></description>
      <pubDate>Mon, 24 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/shared-memory-6krDLWSL</link>
      <content:encoded><![CDATA[<p>What is shared memory? How is it used in your operating system? How is it used in PyTorch? What's shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What's the point of PyTorch's shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory?</p><p><strong>Further reading.</strong></p><ul><li>Implementations of vanilla shared memory allocator <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/TH/THAllocator.cpp">https://github.com/pytorch/pytorch/blob/master/aten/src/TH/THAllocator.cpp</a> and the fancy managed allocator <a href="https://github.com/pytorch/pytorch/blob/master/torch/lib/libshm/libshm.h">https://github.com/pytorch/pytorch/blob/master/torch/lib/libshm/libshm.h</a></li><li>Multiprocessing best practices describes some things one should be careful about when working with shared memory <a href="https://pytorch.org/docs/stable/notes/multiprocessing.html">https://pytorch.org/docs/stable/notes/multiprocessing.html</a></li><li>More details on how CUDA shared memory works <a href="https://pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details">https://pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details</a></li></ul>
]]></content:encoded>
      <enclosure length="10325903" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/2c16b1b3-8abe-454b-a27a-90d1d19890ca/audio/8a85f20c-d341-480c-adaa-82c07a6498a7/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Shared memory</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:10:45</itunes:duration>
      <itunes:summary>What is shared memory? How is it used in your operating system? How is it used in PyTorch? What&apos;s shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What&apos;s the point of PyTorch&apos;s shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory?</itunes:summary>
      <itunes:subtitle>What is shared memory? How is it used in your operating system? How is it used in PyTorch? What&apos;s shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What&apos;s the point of PyTorch&apos;s shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>15</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">ffd7bf6d-a716-441a-8860-c156403f4f4e</guid>
      <title>Automatic mixed precision</title>
      <description><![CDATA[<p>What is automatic mixed precision? How is it implemented? What does it have to do with mode dispatch keys, fallthrough kernels? What are AMP policies? How is its cast caching implemented? How does torchvision also support AMP? What's up with Intel's CPU autocast implementation?</p><p><strong>Further reading.</strong></p><ul><li>Autocast implementation lives at <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cpp">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cpp</a></li><li>How to add autocast implementations to custom operators that are out of tree <a href="https://pytorch.org/tutorials/advanced/dispatcher.html#autocast">https://pytorch.org/tutorials/advanced/dispatcher.html#autocast</a></li><li>CPU autocasting PR <a href="https://github.com/pytorch/pytorch/pull/57386">https://github.com/pytorch/pytorch/pull/57386</a></li></ul>
]]></description>
      <pubDate>Fri, 21 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/automatic-mixed-precision-L2wJSVA_</link>
      <content:encoded><![CDATA[<p>What is automatic mixed precision? How is it implemented? What does it have to do with mode dispatch keys, fallthrough kernels? What are AMP policies? How is its cast caching implemented? How does torchvision also support AMP? What's up with Intel's CPU autocast implementation?</p><p><strong>Further reading.</strong></p><ul><li>Autocast implementation lives at <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cpp">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cpp</a></li><li>How to add autocast implementations to custom operators that are out of tree <a href="https://pytorch.org/tutorials/advanced/dispatcher.html#autocast">https://pytorch.org/tutorials/advanced/dispatcher.html#autocast</a></li><li>CPU autocasting PR <a href="https://github.com/pytorch/pytorch/pull/57386">https://github.com/pytorch/pytorch/pull/57386</a></li></ul>
]]></content:encoded>
      <enclosure length="13551131" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/0e095e80-bb02-42d8-9728-f15d3422a1d6/audio/46eeb798-944c-45d3-8974-e04f0123473d/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Automatic mixed precision</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:07</itunes:duration>
      <itunes:summary>What is automatic mixed precision? How is it implemented? What does it have to do with mode dispatch keys, fallthrough kernels? What are AMP policies? How is its cast caching implemented? How does torchvision also support AMP? What&apos;s up with Intel&apos;s CPU autocast implementation?</itunes:summary>
      <itunes:subtitle>What is automatic mixed precision? How is it implemented? What does it have to do with mode dispatch keys, fallthrough kernels? What are AMP policies? How is its cast caching implemented? How does torchvision also support AMP? What&apos;s up with Intel&apos;s CPU autocast implementation?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>14</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">31742461-8280-49b0-aa66-7428fb134902</guid>
      <title>Conjugate views</title>
      <description><![CDATA[<p>What are complex numbers? What is conjugation? Why is conjugation so common in linear algebra? Why would we like conjugation to behave similarly to transposition (and why is matrix multiply with a transposed input so fast?) What is a conjugate view? How is it implemented? What's the relationship between views, laziness and call-by-name evaluation?</p><p><strong>Further reading.</strong></p><ul><li>Pull request that adds conjugate views https://github.com/pytorch/pytorch/pull/54987</li><li>The idea of conjugate views originally came up when we were deciding which complex autograd convention to use in https://github.com/pytorch/pytorch/issues/41857 . PyTorch uses the conjugate Wirtinger derivative which, true to its name, involves a lot of conjugations in its formulas.</li><li>Conjugate views are a form of bidirectional lens. This nice presentation explains what the concept is https://www.cis.upenn.edu/~bcpierce/papers/lenses-etapsslides.pdf</li></ul>
]]></description>
      <pubDate>Thu, 20 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/conjugate-views-LJq4XPii</link>
      <content:encoded><![CDATA[<p>What are complex numbers? What is conjugation? Why is conjugation so common in linear algebra? Why would we like conjugation to behave similarly to transposition (and why is matrix multiply with a transposed input so fast?) What is a conjugate view? How is it implemented? What's the relationship between views, laziness and call-by-name evaluation?</p><p><strong>Further reading.</strong></p><ul><li>Pull request that adds conjugate views https://github.com/pytorch/pytorch/pull/54987</li><li>The idea of conjugate views originally came up when we were deciding which complex autograd convention to use in https://github.com/pytorch/pytorch/issues/41857 . PyTorch uses the conjugate Wirtinger derivative which, true to its name, involves a lot of conjugations in its formulas.</li><li>Conjugate views are a form of bidirectional lens. This nice presentation explains what the concept is https://www.cis.upenn.edu/~bcpierce/papers/lenses-etapsslides.pdf</li></ul>
]]></content:encoded>
      <enclosure length="14838289" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/0fc710d3-6c0e-4553-99e1-26baa5971f27/audio/16c81316-d515-447b-bb15-85ae6540afa8/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Conjugate views</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:27</itunes:duration>
      <itunes:summary>What are complex numbers? What is conjugation? Why is conjugation so common in linear algebra? Why would we like conjugation to behave similarly to transposition (and why is matrix multiply with a transposed input so fast?) What is a conjugate view? How is it implemented? What&apos;s the relationship between views, laziness and call-by-name evaluation?</itunes:summary>
      <itunes:subtitle>What are complex numbers? What is conjugation? Why is conjugation so common in linear algebra? Why would we like conjugation to behave similarly to transposition (and why is matrix multiply with a transposed input so fast?) What is a conjugate view? How is it implemented? What&apos;s the relationship between views, laziness and call-by-name evaluation?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>13</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">749c069f-4405-4dc3-a951-0fe391a7d509</guid>
      <title>History and constraints of Tensor</title>
      <description><![CDATA[<p>What historical constraints and design choices lead to the design of Tensor/Storage (and their Impl variants) as they are today? Why do we use intrusive refcounting? Why are we trying to get rid of virtual methods on TensorImpl? Why are there so many frickin' bitfields?</p><p><strong>Further reading.</strong></p><ul><li>PyTorch internals blog post <a href="http://blog.ezyang.com/2019/05/pytorch-internals/">http://blog.ezyang.com/2019/05/pytorch-internals/</a></li><li>Writing Python in C++, a manifesto <a href="https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)">https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)</a></li><li>At time of writing, the breakdown of all fields on TensorImpl <a href="https://github.com/pytorch/pytorch/blob/71f4c5c1f436258adc303b710efb3f41b2d50c4e/c10/core/TensorImpl.h#L2155-L2177">https://github.com/pytorch/pytorch/blob/71f4c5c1f436258adc303b710efb3f41b2d50c4e/c10/core/TensorImpl.h#L2155-L2177</a></li></ul>
]]></description>
      <pubDate>Wed, 19 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/history-and-constraints-of-tensor-f9aVaL5z</link>
      <content:encoded><![CDATA[<p>What historical constraints and design choices lead to the design of Tensor/Storage (and their Impl variants) as they are today? Why do we use intrusive refcounting? Why are we trying to get rid of virtual methods on TensorImpl? Why are there so many frickin' bitfields?</p><p><strong>Further reading.</strong></p><ul><li>PyTorch internals blog post <a href="http://blog.ezyang.com/2019/05/pytorch-internals/">http://blog.ezyang.com/2019/05/pytorch-internals/</a></li><li>Writing Python in C++, a manifesto <a href="https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)">https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)</a></li><li>At time of writing, the breakdown of all fields on TensorImpl <a href="https://github.com/pytorch/pytorch/blob/71f4c5c1f436258adc303b710efb3f41b2d50c4e/c10/core/TensorImpl.h#L2155-L2177">https://github.com/pytorch/pytorch/blob/71f4c5c1f436258adc303b710efb3f41b2d50c4e/c10/core/TensorImpl.h#L2155-L2177</a></li></ul>
]]></content:encoded>
      <enclosure length="14201251" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/0691b2e8-e91e-438d-8198-1a6879123bfd/audio/02fd7e9f-7bcb-4ed9-b61e-dab3ae8570bc/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>History and constraints of Tensor</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:48</itunes:duration>
      <itunes:summary>What historical constraints and design choices lead to the design of Tensor/Storage (and their Impl variants) as they are today? Why do we use intrusive refcounting? Why are we trying to get rid of virtual methods on TensorImpl? Why are there so many frickin&apos; bitfields?</itunes:summary>
      <itunes:subtitle>What historical constraints and design choices lead to the design of Tensor/Storage (and their Impl variants) as they are today? Why do we use intrusive refcounting? Why are we trying to get rid of virtual methods on TensorImpl? Why are there so many frickin&apos; bitfields?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>12</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">c0820562-3703-40c4-a71c-0edcf8b34a06</guid>
      <title>How new operators are authored</title>
      <description><![CDATA[<p>What's the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch's system so it can be run end-to-end? What should I expect if I'm writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them?</p><p><strong>Further reading.</strong></p><ul><li>The README for the native/ directory, where all kernels get put <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md</a></li><li>A high level overview of how TensorIterator works <a href="https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/">https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/</a></li><li>Where OpInfos live <a href="https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py">https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py</a></li></ul>
]]></description>
      <pubDate>Tue, 18 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/how-new-operators-are-authored-jINCpF_S</link>
      <content:encoded><![CDATA[<p>What's the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch's system so it can be run end-to-end? What should I expect if I'm writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them?</p><p><strong>Further reading.</strong></p><ul><li>The README for the native/ directory, where all kernels get put <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md">https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md</a></li><li>A high level overview of how TensorIterator works <a href="https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/">https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/</a></li><li>Where OpInfos live <a href="https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py">https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py</a></li></ul>
]]></content:encoded>
      <enclosure length="14925472" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/d1ce0783-001a-4dbf-bc4c-5c9dc79e6e48/audio/452181ab-5974-4c9b-b522-11e96c6c9f0c/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>How new operators are authored</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:33</itunes:duration>
      <itunes:summary>What&apos;s the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch&apos;s system so it can be run end-to-end? What should I expect if I&apos;m writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them?</itunes:summary>
      <itunes:subtitle>What&apos;s the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch&apos;s system so it can be run end-to-end? What should I expect if I&apos;m writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>11</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">8dfb4187-e2c3-4aff-b23f-dbebcac7faf9</guid>
      <title>The life and death of Variable</title>
      <description><![CDATA[<p>What is a Variable? Why did it exist as a wrapper in the first place? Why did it get removed? How did we remove it? What are some of the lingering consequences of its removal?</p><p><strong>Further reading:</strong></p><ul><li>The release notes of PyTorch 0.4 do a good job explaining the user visible consequences of the removal, at the time, including how we "simulate" concepts on Variable that don't make sense anymore <a href="https://pytorch.org/blog/pytorch-0_4_0-migration-guide/">https://pytorch.org/blog/pytorch-0_4_0-migration-guide/</a></li><li>Part 1: Removal of Variable wrapper in C++ <a href="https://github.com/pytorch/pytorch/pull/17072">https://github.com/pytorch/pytorch/pull/17072</a></li><li>Part 2: Merge of Variable and Tensor types in C++ <a href="https://github.com/pytorch/pytorch/pull/28620">https://github.com/pytorch/pytorch/pull/28620</a></li></ul>
]]></description>
      <pubDate>Mon, 17 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/the-life-and-death-of-variable-hDN62BiG</link>
      <content:encoded><![CDATA[<p>What is a Variable? Why did it exist as a wrapper in the first place? Why did it get removed? How did we remove it? What are some of the lingering consequences of its removal?</p><p><strong>Further reading:</strong></p><ul><li>The release notes of PyTorch 0.4 do a good job explaining the user visible consequences of the removal, at the time, including how we "simulate" concepts on Variable that don't make sense anymore <a href="https://pytorch.org/blog/pytorch-0_4_0-migration-guide/">https://pytorch.org/blog/pytorch-0_4_0-migration-guide/</a></li><li>Part 1: Removal of Variable wrapper in C++ <a href="https://github.com/pytorch/pytorch/pull/17072">https://github.com/pytorch/pytorch/pull/17072</a></li><li>Part 2: Merge of Variable and Tensor types in C++ <a href="https://github.com/pytorch/pytorch/pull/28620">https://github.com/pytorch/pytorch/pull/28620</a></li></ul>
]]></content:encoded>
      <enclosure length="14864416" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/7763441e-17f4-4e85-97f8-2b061e4c2f60/audio/cc02b86b-9484-4a36-a0f2-ad6b825eea13/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>The life and death of Variable</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:29</itunes:duration>
      <itunes:summary>What is a Variable? Why did it exist as a wrapper in the first place? Why did it get removed? How did we remove it? What are some of the lingering consequences of its removal?</itunes:summary>
      <itunes:subtitle>What is a Variable? Why did it exist as a wrapper in the first place? Why did it get removed? How did we remove it? What are some of the lingering consequences of its removal?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>10</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">8b3ff37a-8feb-447d-a22b-37bd24e6be95</guid>
      <title>Backend extensibility</title>
      <description><![CDATA[<p>What's the current state of backend extensibility?  How did PyTorch evolve from being a CPU and CUDA only framework to also support AMD ROCm and XLA?  What are some problems with adding an out-of-tree backend, and what's some work to make it better?</p><p><strong>Further reading:</strong></p><ul><li>Script for HIPifying PyTorch's source when enabling ROCm <a href="https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py">https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py</a></li><li>PyTorch/XLA <a href="https://github.com/pytorch/xla/">https://github.com/pytorch/xla/</a></li><li>Brian Hirsh's spec on what out-of-tree backend codegen looks like <a href="https://github.com/pytorch/xla/issues/2871">https://github.com/pytorch/xla/issues/2871</a></li></ul>
]]></description>
      <pubDate>Fri, 14 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/backend-extensibility-966RCMHR</link>
      <content:encoded><![CDATA[<p>What's the current state of backend extensibility?  How did PyTorch evolve from being a CPU and CUDA only framework to also support AMD ROCm and XLA?  What are some problems with adding an out-of-tree backend, and what's some work to make it better?</p><p><strong>Further reading:</strong></p><ul><li>Script for HIPifying PyTorch's source when enabling ROCm <a href="https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py">https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py</a></li><li>PyTorch/XLA <a href="https://github.com/pytorch/xla/">https://github.com/pytorch/xla/</a></li><li>Brian Hirsh's spec on what out-of-tree backend codegen looks like <a href="https://github.com/pytorch/xla/issues/2871">https://github.com/pytorch/xla/issues/2871</a></li></ul>
]]></content:encoded>
      <enclosure length="14650135" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/56047b49-e69d-44de-8d7d-cf6a8d8bd510/audio/c3768513-db18-49e8-a28f-1155b5b4934a/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Backend extensibility</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:15:16</itunes:duration>
      <itunes:summary>What&apos;s the current state of backend extensibility?  How did PyTorch evolve from being a CPU and CUDA only framework to also support AMD ROCm and XLA?  What are some problems with adding an out-of-tree backend, and what&apos;s some work to make it better?</itunes:summary>
      <itunes:subtitle>What&apos;s the current state of backend extensibility?  How did PyTorch evolve from being a CPU and CUDA only framework to also support AMD ROCm and XLA?  What are some problems with adding an out-of-tree backend, and what&apos;s some work to make it better?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>9</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">ac1a9f7b-bf79-45be-a31d-aa80ae4b3324</guid>
      <title>The road to structured kernels</title>
      <description><![CDATA[<p>Structured kernels are a new way to write kernels in PyTorch. Why did they take so long? What finally convinced us that we should do them? Why did it end up taking me the better part of a year to only be half done with them?</p><p><strong>Further reading:</strong></p><ul><li>Structured kernels RFC <a href="https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md">https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md</a></li><li>Taxonomy of PyTorch operators by shape behavior <a href="http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/">http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/</a></li><li>Bram Wasti's lazy tensor prototype <a href="https://github.com/pytorch/pytorch/pull/25753">https://github.com/pytorch/pytorch/pull/25753</a></li></ul>
]]></description>
      <pubDate>Thu, 13 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/the-road-to-structured-kernels-L5l36Z5V</link>
      <content:encoded><![CDATA[<p>Structured kernels are a new way to write kernels in PyTorch. Why did they take so long? What finally convinced us that we should do them? Why did it end up taking me the better part of a year to only be half done with them?</p><p><strong>Further reading:</strong></p><ul><li>Structured kernels RFC <a href="https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md">https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md</a></li><li>Taxonomy of PyTorch operators by shape behavior <a href="http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/">http://blog.ezyang.com/2020/05/a-brief-taxonomy-of-pytorch-operators-by-shape-behavior/</a></li><li>Bram Wasti's lazy tensor prototype <a href="https://github.com/pytorch/pytorch/pull/25753">https://github.com/pytorch/pytorch/pull/25753</a></li></ul>
]]></content:encoded>
      <enclosure length="15940000" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/5836765c-d5b0-4b1d-9e43-490d5d46e110/audio/7cd64d4c-2e90-49a2-bc1d-5c37f2807a9c/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>The road to structured kernels</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:36</itunes:duration>
      <itunes:summary>Structured kernels are a new way to write kernels in PyTorch. Why did they take so long? What finally convinced us that we should do them? Why did it end up taking me the better part of a year to only be half done with them?</itunes:summary>
      <itunes:subtitle>Structured kernels are a new way to write kernels in PyTorch. Why did they take so long? What finally convinced us that we should do them? Why did it end up taking me the better part of a year to only be half done with them?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>8</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">b829d386-1ec5-4c7a-86f1-894df1a89449</guid>
      <title>Functionalization</title>
      <description><![CDATA[<p>Functionalization is the process by which we remove mutation from autograd graphs in PyTorch, leaving us with a purely functional graph that we can execute in the normal way. Why do we need to do functionalization? What makes it not so easy to do? How do we do it? And how does it compare to mutation removal that you might see in a compiler?</p><p><strong>Further reading:</strong></p><ul><li>Section 3.1 of this paper on PyTorch AD <a href="https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf">https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf</a> predates our implementation of inplace autograd but accurately reports the subtleties and correctly predicts the implementation strategy we ended up taking</li><li>RFC to generalize the functionalization mechanism to be available to arbitrary backends <a href="https://github.com/pytorch/rfcs/pull/19">https://github.com/pytorch/rfcs/pull/19</a></li><li>Code that handles lazily updating views when the base is updated <a href="https://github.com/pytorch/pytorch/blob/e5e095cbe4dbc5a601f98e6134dcbd59c6342d7d/torch/csrc/autograd/variable.cpp#L556-L603">https://github.com/pytorch/pytorch/blob/e5e095cbe4dbc5a601f98e6134dcbd59c6342d7d/torch/csrc/autograd/variable.cpp#L556-L603</a></li></ul><p> </p>
]]></description>
      <pubDate>Wed, 12 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/functionalization-t_HiJlOu</link>
      <content:encoded><![CDATA[<p>Functionalization is the process by which we remove mutation from autograd graphs in PyTorch, leaving us with a purely functional graph that we can execute in the normal way. Why do we need to do functionalization? What makes it not so easy to do? How do we do it? And how does it compare to mutation removal that you might see in a compiler?</p><p><strong>Further reading:</strong></p><ul><li>Section 3.1 of this paper on PyTorch AD <a href="https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf">https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf</a> predates our implementation of inplace autograd but accurately reports the subtleties and correctly predicts the implementation strategy we ended up taking</li><li>RFC to generalize the functionalization mechanism to be available to arbitrary backends <a href="https://github.com/pytorch/rfcs/pull/19">https://github.com/pytorch/rfcs/pull/19</a></li><li>Code that handles lazily updating views when the base is updated <a href="https://github.com/pytorch/pytorch/blob/e5e095cbe4dbc5a601f98e6134dcbd59c6342d7d/torch/csrc/autograd/variable.cpp#L556-L603">https://github.com/pytorch/pytorch/blob/e5e095cbe4dbc5a601f98e6134dcbd59c6342d7d/torch/csrc/autograd/variable.cpp#L556-L603</a></li></ul><p> </p>
]]></content:encoded>
      <enclosure length="13528673" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/d8586cf9-3bc7-4c6e-bfcc-412291547967/audio/caa5fa7e-e879-4e39-90cb-3494edb6b176/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Functionalization</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:05</itunes:duration>
      <itunes:summary>Functionalization is the process by which we remove mutation from autograd graphs in PyTorch, leaving us with a purely functional graph that we can execute in the normal way. Why do we need to do functionalization? What makes it not so easy to do? How do we do it? And how does it compare to mutation removal that you might see in a compiler?</itunes:summary>
      <itunes:subtitle>Functionalization is the process by which we remove mutation from autograd graphs in PyTorch, leaving us with a purely functional graph that we can execute in the normal way. Why do we need to do functionalization? What makes it not so easy to do? How do we do it? And how does it compare to mutation removal that you might see in a compiler?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>7</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">c71e04ad-c7bc-473a-9f07-fb7c103546f7</guid>
      <title>Just enough CUDA to be dangerous</title>
      <description><![CDATA[<p>Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won't know how to write a kernel after this episode, but you'll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.</p><p><strong>Further reading:</strong></p><ul><li>PyTorch docs on CUDA semantics <a href="https://pytorch.org/docs/stable/notes/cuda.html">https://pytorch.org/docs/stable/notes/cuda.html</a></li><li>The book I was recommended for learning CUDA when I first showed up at PyToch: Programming Massively Parallel Processors <a href="https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861">https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861</a></li><li>The environment variable that makes CUDA synchronous is CUDA_LAUNCH_BLOCKING=1. cuda-memcheck is also useful for debugging CUDA problems <a href="https://docs.nvidia.com/cuda/cuda-memcheck/index.html">https://docs.nvidia.com/cuda/cuda-memcheck/index.html</a></li></ul>
]]></description>
      <pubDate>Tue, 11 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/just-enough-cuda-to-be-dangerous-4J1DrjLZ</link>
      <content:encoded><![CDATA[<p>Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won't know how to write a kernel after this episode, but you'll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.</p><p><strong>Further reading:</strong></p><ul><li>PyTorch docs on CUDA semantics <a href="https://pytorch.org/docs/stable/notes/cuda.html">https://pytorch.org/docs/stable/notes/cuda.html</a></li><li>The book I was recommended for learning CUDA when I first showed up at PyToch: Programming Massively Parallel Processors <a href="https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861">https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0128119861</a></li><li>The environment variable that makes CUDA synchronous is CUDA_LAUNCH_BLOCKING=1. cuda-memcheck is also useful for debugging CUDA problems <a href="https://docs.nvidia.com/cuda/cuda-memcheck/index.html">https://docs.nvidia.com/cuda/cuda-memcheck/index.html</a></li></ul>
]]></content:encoded>
      <enclosure length="15868962" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/9ee6bfc2-07d2-48e4-89d1-16f80205a57f/audio/4424daf1-4a27-4d95-b1eb-5ddd4534755d/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Just enough CUDA to be dangerous</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:16:32</itunes:duration>
      <itunes:summary>Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won&apos;t know how to write a kernel after this episode, but you&apos;ll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.</itunes:summary>
      <itunes:subtitle>Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as could be reasonably expected in a podcast. You won&apos;t know how to write a kernel after this episode, but you&apos;ll know about what a GPU is, what the general CUDA programming model is, why asynchronous execution makes everything complicated, and some general principles PyTorch abides by when designing CUDA kernels.</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>6</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">9ef6973d-6666-42f2-8814-ea706bb567dd</guid>
      <title>Inference mode</title>
      <description><![CDATA[<p>What's inference mode? Why doesn't my code run fast if I use no_grad or make sure requires_grad=False? How come inference mode is safe but AutoNonVariableTypeMode is not?</p><p><strong>Further reading:</strong></p><ul><li>Inference mode RFC <a href="https://github.com/ailzhang/rfcs/blob/rfc0011/RFC-0011-InferenceMode.md" target="_blank">https://github.com/.../rfc0011/RFC-0011-InferenceMode.md</a></li><li>Inference mode docs for C++ frontend users <a href="https://github.com/pytorch/pytorch/blob/master/docs/cpp/source/notes/inference_mode.rst" target="_blank">https://github.com/.../cpp/source/notes/inference_mode.rst</a></li><li>Tracking issue for Python frontend support <a href="https://github.com/pytorch/pytorch/issues/56608" target="_blank">https://github.com/pytorch/pytorch/issues/56608</a></li></ul>
]]></description>
      <pubDate>Mon, 10 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/inference-mode-yO3SBXRB</link>
      <content:encoded><![CDATA[<p>What's inference mode? Why doesn't my code run fast if I use no_grad or make sure requires_grad=False? How come inference mode is safe but AutoNonVariableTypeMode is not?</p><p><strong>Further reading:</strong></p><ul><li>Inference mode RFC <a href="https://github.com/ailzhang/rfcs/blob/rfc0011/RFC-0011-InferenceMode.md" target="_blank">https://github.com/.../rfc0011/RFC-0011-InferenceMode.md</a></li><li>Inference mode docs for C++ frontend users <a href="https://github.com/pytorch/pytorch/blob/master/docs/cpp/source/notes/inference_mode.rst" target="_blank">https://github.com/.../cpp/source/notes/inference_mode.rst</a></li><li>Tracking issue for Python frontend support <a href="https://github.com/pytorch/pytorch/issues/56608" target="_blank">https://github.com/pytorch/pytorch/issues/56608</a></li></ul>
]]></content:encoded>
      <enclosure length="13864080" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/599d8108-78df-4d4d-a0d3-cd031e54424f/audio/c53ee8f4-b7c1-45ff-a9b2-d51649b17cec/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Inference mode</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:26</itunes:duration>
      <itunes:summary>What&apos;s inference mode? Why doesn&apos;t my code run fast if I use no_grad or make sure requires_grad=False? How come inference mode is safe but AutoNonVariableTypeMode is not?</itunes:summary>
      <itunes:subtitle>What&apos;s inference mode? Why doesn&apos;t my code run fast if I use no_grad or make sure requires_grad=False? How come inference mode is safe but AutoNonVariableTypeMode is not?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>5</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">cfbb017b-9fe9-4296-b8a1-e7c53ee71f8e</guid>
      <title>Vectorization</title>
      <description><![CDATA[<p>What is vectorization? How do you use it in PyTorch? What are some of the traps and pitfalls of writing vectorized code in PyTorch?</p><p><strong>Further reading:</strong></p><ul><li>native/cpu README <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/README.md">https://github.com/.../aten/src/ATen/native/cpu/README.md</a></li><li>Vec256 classes <a href="https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/cpu/vec256">https://github.com/.../tree/master/aten/src/ATen/cpu/vec256</a></li><li>AVX512 support tracking issue <a href="https://github.com/pytorch/pytorch/issues/56187">https://github.com/pytorch/pytorch/issues/56187</a></li></ul>
]]></description>
      <pubDate>Fri, 7 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/vectorization-PgOmFRTZ</link>
      <content:encoded><![CDATA[<p>What is vectorization? How do you use it in PyTorch? What are some of the traps and pitfalls of writing vectorized code in PyTorch?</p><p><strong>Further reading:</strong></p><ul><li>native/cpu README <a href="https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/README.md">https://github.com/.../aten/src/ATen/native/cpu/README.md</a></li><li>Vec256 classes <a href="https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/cpu/vec256">https://github.com/.../tree/master/aten/src/ATen/cpu/vec256</a></li><li>AVX512 support tracking issue <a href="https://github.com/pytorch/pytorch/issues/56187">https://github.com/pytorch/pytorch/issues/56187</a></li></ul>
]]></content:encoded>
      <enclosure length="13999631" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/13b2ec56-3c90-49d3-adf7-d2a9c4d34536/audio/1347c312-8c80-4b07-8f01-3a07bbf0b6f5/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Vectorization</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:35</itunes:duration>
      <itunes:summary>What is vectorization? How do you use it in PyTorch? What are some of the traps and pitfalls of writing vectorized code in PyTorch?</itunes:summary>
      <itunes:subtitle>What is vectorization? How do you use it in PyTorch? What are some of the traps and pitfalls of writing vectorized code in PyTorch?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>4</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">d0634a3a-7f00-448e-a7ce-1a3d0c3c76af</guid>
      <title>Dynamic library structure</title>
      <description><![CDATA[<p>Why is PyTorch split into so many libraries? What's the point of these splits? What do Windows, mobile and CUDA have to do with the library splits?</p><p><strong>Further reading:</strong></p><ul><li>c10 folder architecture description <a href="https://github.com/pytorch/pytorch/wiki/Software-Architecture-for-c10">https://github.com/.../wiki/Software-Architecture-for-c10</a></li><li>Implementation of the TORCH_API visibility macros <a href="https://github.com/pytorch/pytorch/blob/master/c10/macros/Export.h">https://github.com/.../blob/master/c10/macros/Export.h</a></li><li>An example of virtual call based hook to break library structure <a href="https://github.com/pytorch/pytorch/blob/master/c10/core/impl/DeviceGuardImplInterface.h">https://github.com/pytorch/pytorch/blob/master/c10/core/impl/DeviceGuardImplInterface.h</a></li></ul>
]]></description>
      <pubDate>Thu, 6 May 2021 13:00:00 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/dynamic-library-structure-kvyf_a_J</link>
      <content:encoded><![CDATA[<p>Why is PyTorch split into so many libraries? What's the point of these splits? What do Windows, mobile and CUDA have to do with the library splits?</p><p><strong>Further reading:</strong></p><ul><li>c10 folder architecture description <a href="https://github.com/pytorch/pytorch/wiki/Software-Architecture-for-c10">https://github.com/.../wiki/Software-Architecture-for-c10</a></li><li>Implementation of the TORCH_API visibility macros <a href="https://github.com/pytorch/pytorch/blob/master/c10/macros/Export.h">https://github.com/.../blob/master/c10/macros/Export.h</a></li><li>An example of virtual call based hook to break library structure <a href="https://github.com/pytorch/pytorch/blob/master/c10/core/impl/DeviceGuardImplInterface.h">https://github.com/pytorch/pytorch/blob/master/c10/core/impl/DeviceGuardImplInterface.h</a></li></ul>
]]></content:encoded>
      <enclosure length="14220827" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/cd27c78d-cc06-48c3-96d3-08b7561e3b41/audio/e76841c2-2a1f-4b64-b686-99c5ce6d3030/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Dynamic library structure</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:14:49</itunes:duration>
      <itunes:summary>Why is PyTorch split into so many libraries? What&apos;s the point of these splits? What do Windows, mobile and CUDA have to do with the library splits?</itunes:summary>
      <itunes:subtitle>Why is PyTorch split into so many libraries? What&apos;s the point of these splits? What do Windows, mobile and CUDA have to do with the library splits?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>3</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">036ddb72-7c65-4913-999e-99c4747d0e37</guid>
      <title>History and constraints of the dispatcher</title>
      <description><![CDATA[<p>Why is the dispatcher the way it is today? How did evolve over time, and what constraints got added so that it is the kind of complicated piece it is today?</p><p><strong>Further reading:</strong></p><ul><li>How the dispatcher actually works <a href="http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/">http://blog.ezyang.com/.../lets-talk-about-the-pytorch.../</a></li><li>Zachary DeVito's original version of ATen, before it got merged back into PyTorch mainline <a href="https://github.com/zdevito/ATen">https://github.com/zdevito/ATen</a></li><li>The multiple dispatch patch: <a href="https://github.com/pytorch/pytorch/pull/25653">https://github.com/pytorch/pytorch/pull/25653</a></li></ul>
]]></description>
      <pubDate>Wed, 5 May 2021 05:20:33 +0000</pubDate>
      <author>wookim@fb.com (PyTorch)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/history-and-constraints-of-the-dispatcher-pYBqf0Fr</link>
      <content:encoded><![CDATA[<p>Why is the dispatcher the way it is today? How did evolve over time, and what constraints got added so that it is the kind of complicated piece it is today?</p><p><strong>Further reading:</strong></p><ul><li>How the dispatcher actually works <a href="http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/">http://blog.ezyang.com/.../lets-talk-about-the-pytorch.../</a></li><li>Zachary DeVito's original version of ATen, before it got merged back into PyTorch mainline <a href="https://github.com/zdevito/ATen">https://github.com/zdevito/ATen</a></li><li>The multiple dispatch patch: <a href="https://github.com/pytorch/pytorch/pull/25653">https://github.com/pytorch/pytorch/pull/25653</a></li></ul>
]]></content:encoded>
      <enclosure length="16966827" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/b681c6a9-c44b-4247-8cec-a5d44b1fcf3e/audio/0ff8995d-6700-4ea2-aba7-3b494190b28a/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>History and constraints of the dispatcher</itunes:title>
      <itunes:author>PyTorch</itunes:author>
      <itunes:duration>00:17:40</itunes:duration>
      <itunes:summary>Why is the dispatcher the way it is today? How did evolve over time, and what constraints got added so that it is the kind of complicated piece it is today?</itunes:summary>
      <itunes:subtitle>Why is the dispatcher the way it is today? How did evolve over time, and what constraints got added so that it is the kind of complicated piece it is today?</itunes:subtitle>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>2</itunes:episode>
    </item>
    <item>
      <guid isPermaLink="false">a93e365e-3a05-4be9-93d3-2e276f2efba2</guid>
      <title>Binding C++ objects to Python</title>
      <description><![CDATA[<p>In this episode, we will discuss how to bind a C++ object in Python? We'll try to answer the following questions: How does pybind11 do it? What's different about how we implement it for Tensor? What are some downsides of the approach?</p><p>Note from the future: I recorded and then decided I didn't like my follow up episode about how to preserve PyObjects even when they go dead in Python. Maybe some day!</p><p><strong>Further reading:</strong></p><ul><li>Python bindings for Tensor in PyTorch <a href="https://github.com/pytorch/pytorch/blob/65968ab817db323a532f50a2f2ea131ae27dada5/torch/csrc/autograd/python_variable.cpp" target="_blank">https://github.com/.../csrc/autograd/python_variable.cpp</a></li><li>pybind11 hash map for maintaining object identity <a href="https://github.com/pybind/pybind11/blob/54430436fee2afc4f8443691075a6208f9ea8eba/include/pybind11/detail/internals.h#L99" target="_blank">https://github.com/.../inc.../pybind11/detail/internals.h...</a></li><li>Tensor subclasses don't save their properties <a href="https://github.com/pytorch/pytorch/issues/47117" target="_blank">https://github.com/pytorch/pytorch/issues/47117</a><br />(but the situation here is more complicated than I imply in the podcast)</li></ul>
]]></description>
      <pubDate>Tue, 4 May 2021 01:19:10 +0000</pubDate>
      <author>wookim@fb.com (Edward Yang - PyTorch Research Engineer at Facebook AI)</author>
      <link>https://pytorch-dev-podcast.simplecast.com/episodes/binding-c-objects-to-python-8q4MZZxp</link>
      <content:encoded><![CDATA[<p>In this episode, we will discuss how to bind a C++ object in Python? We'll try to answer the following questions: How does pybind11 do it? What's different about how we implement it for Tensor? What are some downsides of the approach?</p><p>Note from the future: I recorded and then decided I didn't like my follow up episode about how to preserve PyObjects even when they go dead in Python. Maybe some day!</p><p><strong>Further reading:</strong></p><ul><li>Python bindings for Tensor in PyTorch <a href="https://github.com/pytorch/pytorch/blob/65968ab817db323a532f50a2f2ea131ae27dada5/torch/csrc/autograd/python_variable.cpp" target="_blank">https://github.com/.../csrc/autograd/python_variable.cpp</a></li><li>pybind11 hash map for maintaining object identity <a href="https://github.com/pybind/pybind11/blob/54430436fee2afc4f8443691075a6208f9ea8eba/include/pybind11/detail/internals.h#L99" target="_blank">https://github.com/.../inc.../pybind11/detail/internals.h...</a></li><li>Tensor subclasses don't save their properties <a href="https://github.com/pytorch/pytorch/issues/47117" target="_blank">https://github.com/pytorch/pytorch/issues/47117</a><br />(but the situation here is more complicated than I imply in the podcast)</li></ul>
]]></content:encoded>
      <enclosure length="12748953" type="audio/mpeg" url="https://cdn.simplecast.com/audio/d7e800ce-0909-4ad6-9f2d-94858c32b092/episodes/a3faa52e-2d48-4c1d-9d66-50f1c8727bdd/audio/8a241843-5cc8-478e-aa21-94ad0ee3287b/default_tc.mp3?aid=rss_feed&amp;feed=OB5FkIl8"/>
      <itunes:title>Binding C++ objects to Python</itunes:title>
      <itunes:author>Edward Yang - PyTorch Research Engineer at Facebook AI</itunes:author>
      <itunes:duration>00:13:17</itunes:duration>
      <itunes:summary>In this episode, we will discuss how to bind a C++ object in Python? We&apos;ll try to answer the following questions: How does pybind11 do it? What&apos;s different about how we implement it for Tensor? What are some downsides of the approach?</itunes:summary>
      <itunes:subtitle>In this episode, we will discuss how to bind a C++ object in Python? We&apos;ll try to answer the following questions: How does pybind11 do it? What&apos;s different about how we implement it for Tensor? What are some downsides of the approach?</itunes:subtitle>
      <itunes:keywords>pytorch, machine learning, python, deep learning</itunes:keywords>
      <itunes:explicit>false</itunes:explicit>
      <itunes:episodeType>full</itunes:episodeType>
      <itunes:episode>1</itunes:episode>
    </item>
  </channel>
</rss>