|
1 | 1 | ===================================== |
2 | | -The PDB TPI Stream |
| 2 | +The PDB TPI and IPI Streams |
3 | 3 | ===================================== |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +.. _tpi_intro: |
| 9 | + |
| 10 | +Introduction |
| 11 | +============ |
| 12 | + |
| 13 | +The PDB TPI Stream (Index 2) and IPI Stream (Index 3) contain information about |
| 14 | +all types used in the program. It is organized as a :ref:`header <tpi_header>` |
| 15 | +followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are |
| 16 | +referenced from various streams and records throughout the PDB by their |
| 17 | +:ref:`type index <type_indices>`. In general, the sequence of type records |
| 18 | +following the :ref:`header <tpi_header>` forms a topologically sorted DAG |
| 19 | +(directed acyclic graph), which means that a type record B can only refer to |
| 20 | +the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where |
| 21 | +this property will not hold (particularly when dealing with object files |
| 22 | +compiled with MASM), an implementation should try very hard to make this |
| 23 | +property hold, as it means the entire type graph can be constructed in a single |
| 24 | +pass. |
| 25 | + |
| 26 | +.. important:: |
| 27 | + Type records form a topologically sorted DAG (directed acyclic graph). |
| 28 | + |
| 29 | +.. _tpi_ipi: |
| 30 | + |
| 31 | +TPI vs IPI Stream |
| 32 | +================= |
| 33 | + |
| 34 | +Recent versions of the PDB format (aka all versions covered by this document) |
| 35 | +have 2 streams with identical layout, henceforth referred to as the TPI stream |
| 36 | +and IPI stream. Subsequent contents of this document describing the on-disk |
| 37 | +format apply equally whether it is for the TPI Stream or the IPI Stream. The |
| 38 | +only difference between the two is in *which* CodeView records are allowed to |
| 39 | +appear in each one, summarized by the following table: |
| 40 | + |
| 41 | ++----------------------+---------------------+ |
| 42 | +| TPI Stream | IPI Stream | |
| 43 | ++======================+=====================+ |
| 44 | +| LF_POINTER | LF_FUNC_ID | |
| 45 | ++----------------------+---------------------+ |
| 46 | +| LF_MODIFIER | LF_MFUNC_ID | |
| 47 | ++----------------------+---------------------+ |
| 48 | +| LF_PROCEDURE | LF_BUILDINFO | |
| 49 | ++----------------------+---------------------+ |
| 50 | +| LF_MFUNCTION | LF_SUBSTR_LIST | |
| 51 | ++----------------------+---------------------+ |
| 52 | +| LF_LABEL | LF_STRING_ID | |
| 53 | ++----------------------+---------------------+ |
| 54 | +| LF_ARGLIST | LF_UDT_SRC_LINE | |
| 55 | ++----------------------+---------------------+ |
| 56 | +| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE | |
| 57 | ++----------------------+---------------------+ |
| 58 | +| LF_ARRAY | | |
| 59 | ++----------------------+---------------------+ |
| 60 | +| LF_CLASS | | |
| 61 | ++----------------------+---------------------+ |
| 62 | +| LF_STRUCTURE | | |
| 63 | ++----------------------+---------------------+ |
| 64 | +| LF_INTERFACE | | |
| 65 | ++----------------------+---------------------+ |
| 66 | +| LF_UNION | | |
| 67 | ++----------------------+---------------------+ |
| 68 | +| LF_ENUM | | |
| 69 | ++----------------------+---------------------+ |
| 70 | +| LF_TYPESERVER2 | | |
| 71 | ++----------------------+---------------------+ |
| 72 | +| LF_VFTABLE | | |
| 73 | ++----------------------+---------------------+ |
| 74 | +| LF_VTSHAPE | | |
| 75 | ++----------------------+---------------------+ |
| 76 | +| LF_BITFIELD | | |
| 77 | ++----------------------+---------------------+ |
| 78 | +| LF_METHODLIST | | |
| 79 | ++----------------------+---------------------+ |
| 80 | +| LF_PRECOMP | | |
| 81 | ++----------------------+---------------------+ |
| 82 | +| LF_ENDPRECOMP | | |
| 83 | ++----------------------+---------------------+ |
| 84 | + |
| 85 | +The usage of these records is described in more detail in |
| 86 | +:doc:`CodeView Type Records <CodeViewTypes>`. |
| 87 | + |
| 88 | +.. _type_indices: |
| 89 | + |
| 90 | +Type Indices |
| 91 | +============ |
| 92 | + |
| 93 | +A type index is a 32-bit integer that uniquely identifies a type inside of an |
| 94 | +object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The |
| 95 | +value of the type index for the first type record from the TPI stream is given |
| 96 | +by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>` |
| 97 | +although in practice this value is always equal to 0x1000 (4096). |
| 98 | + |
| 99 | +Any type index with a high bit set is considered to come from the IPI stream, |
| 100 | +although this appears to be more of a hack, and LLVM does not generate type |
| 101 | +indices of this nature. They can, however, be observed in Microsoft PDBs |
| 102 | +occasionally, so one should be prepared to handle them. Note that having the |
| 103 | +high bit set is not a necessary condition to determine whether a type index |
| 104 | +comes from the IPI stream, it is only sufficient. |
| 105 | + |
| 106 | +Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed |
| 107 | +to come from the appropriate stream, and any type index less than this is a |
| 108 | +bitmask which can be decomposed as follows: |
| 109 | + |
| 110 | +.. code-block:: none |
| 111 | +
|
| 112 | + .---------------------------.------.----------. |
| 113 | + | Unused | Mode | Kind | |
| 114 | + '---------------------------'------'----------' |
| 115 | + |+32 |+12 |+8 |+0 |
| 116 | +
|
| 117 | +
|
| 118 | +- **Kind** - A value from the following enum: |
| 119 | + |
| 120 | +.. code-block:: c++ |
| 121 | + |
| 122 | + enum class SimpleTypeKind : uint32_t { |
| 123 | + None = 0x0000, // uncharacterized type (no type) |
| 124 | + Void = 0x0003, // void |
| 125 | + NotTranslated = 0x0007, // type not translated by cvpack |
| 126 | + HResult = 0x0008, // OLE/COM HRESULT |
| 127 | + |
| 128 | + SignedCharacter = 0x0010, // 8 bit signed |
| 129 | + UnsignedCharacter = 0x0020, // 8 bit unsigned |
| 130 | + NarrowCharacter = 0x0070, // really a char |
| 131 | + WideCharacter = 0x0071, // wide char |
| 132 | + Character16 = 0x007a, // char16_t |
| 133 | + Character32 = 0x007b, // char32_t |
| 134 | + |
| 135 | + SByte = 0x0068, // 8 bit signed int |
| 136 | + Byte = 0x0069, // 8 bit unsigned int |
| 137 | + Int16Short = 0x0011, // 16 bit signed |
| 138 | + UInt16Short = 0x0021, // 16 bit unsigned |
| 139 | + Int16 = 0x0072, // 16 bit signed int |
| 140 | + UInt16 = 0x0073, // 16 bit unsigned int |
| 141 | + Int32Long = 0x0012, // 32 bit signed |
| 142 | + UInt32Long = 0x0022, // 32 bit unsigned |
| 143 | + Int32 = 0x0074, // 32 bit signed int |
| 144 | + UInt32 = 0x0075, // 32 bit unsigned int |
| 145 | + Int64Quad = 0x0013, // 64 bit signed |
| 146 | + UInt64Quad = 0x0023, // 64 bit unsigned |
| 147 | + Int64 = 0x0076, // 64 bit signed int |
| 148 | + UInt64 = 0x0077, // 64 bit unsigned int |
| 149 | + Int128Oct = 0x0014, // 128 bit signed int |
| 150 | + UInt128Oct = 0x0024, // 128 bit unsigned int |
| 151 | + Int128 = 0x0078, // 128 bit signed int |
| 152 | + UInt128 = 0x0079, // 128 bit unsigned int |
| 153 | + |
| 154 | + Float16 = 0x0046, // 16 bit real |
| 155 | + Float32 = 0x0040, // 32 bit real |
| 156 | + Float32PartialPrecision = 0x0045, // 32 bit PP real |
| 157 | + Float48 = 0x0044, // 48 bit real |
| 158 | + Float64 = 0x0041, // 64 bit real |
| 159 | + Float80 = 0x0042, // 80 bit real |
| 160 | + Float128 = 0x0043, // 128 bit real |
| 161 | + |
| 162 | + Complex16 = 0x0056, // 16 bit complex |
| 163 | + Complex32 = 0x0050, // 32 bit complex |
| 164 | + Complex32PartialPrecision = 0x0055, // 32 bit PP complex |
| 165 | + Complex48 = 0x0054, // 48 bit complex |
| 166 | + Complex64 = 0x0051, // 64 bit complex |
| 167 | + Complex80 = 0x0052, // 80 bit complex |
| 168 | + Complex128 = 0x0053, // 128 bit complex |
| 169 | + |
| 170 | + Boolean8 = 0x0030, // 8 bit boolean |
| 171 | + Boolean16 = 0x0031, // 16 bit boolean |
| 172 | + Boolean32 = 0x0032, // 32 bit boolean |
| 173 | + Boolean64 = 0x0033, // 64 bit boolean |
| 174 | + Boolean128 = 0x0034, // 128 bit boolean |
| 175 | + }; |
| 176 | + |
| 177 | +- **Mode** - A value from the following enum: |
| 178 | + |
| 179 | +.. code-block:: c++ |
| 180 | + |
| 181 | + enum class SimpleTypeMode : uint32_t { |
| 182 | + Direct = 0, // Not a pointer |
| 183 | + NearPointer = 1, // Near pointer |
| 184 | + FarPointer = 2, // Far pointer |
| 185 | + HugePointer = 3, // Huge pointer |
| 186 | + NearPointer32 = 4, // 32 bit near pointer |
| 187 | + FarPointer32 = 5, // 32 bit far pointer |
| 188 | + NearPointer64 = 6, // 64 bit near pointer |
| 189 | + NearPointer128 = 7 // 128 bit near pointer |
| 190 | + }; |
| 191 | + |
| 192 | +Note that for pointers, the bitness is represented in the mode. So a ``void*`` |
| 193 | +would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits |
| 194 | +but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits. |
| 195 | + |
| 196 | +By convention, the type index for ``std::nullptr_t`` is constructed the same way |
| 197 | +as the type index for ``void*``, but using the bitless enumeration value |
| 198 | +``NearPointer``. |
| 199 | + |
| 200 | + |
| 201 | + |
| 202 | +.. _tpi_header: |
| 203 | + |
| 204 | +Stream Header |
| 205 | +============= |
| 206 | +At offset 0 of the TPI Stream is a header with the following layout: |
| 207 | + |
| 208 | + |
| 209 | +.. code-block:: c++ |
| 210 | + |
| 211 | + struct TpiStreamHeader { |
| 212 | + uint32_t Version; |
| 213 | + uint32_t HeaderSize; |
| 214 | + uint32_t TypeIndexBegin; |
| 215 | + uint32_t TypeIndexEnd; |
| 216 | + uint32_t TypeRecordBytes; |
| 217 | + |
| 218 | + uint16_t HashStreamIndex; |
| 219 | + uint16_t HashAuxStreamIndex; |
| 220 | + uint32_t HashKeySize; |
| 221 | + uint32_t NumHashBuckets; |
| 222 | + |
| 223 | + int32_t HashValueBufferOffset; |
| 224 | + uint32_t HashValueBufferLength; |
| 225 | + |
| 226 | + int32_t IndexOffsetBufferOffset; |
| 227 | + uint32_t IndexOffsetBufferLength; |
| 228 | + |
| 229 | + int32_t HashAdjBufferOffset; |
| 230 | + uint32_t HashAdjBufferLength; |
| 231 | + }; |
| 232 | + |
| 233 | +- **Version** - A value from the following enum. |
| 234 | + |
| 235 | +.. code-block:: c++ |
| 236 | + |
| 237 | + enum class TpiStreamVersion : uint32_t { |
| 238 | + V40 = 19950410, |
| 239 | + V41 = 19951122, |
| 240 | + V50 = 19961031, |
| 241 | + V70 = 19990903, |
| 242 | + V80 = 20040203, |
| 243 | + }; |
| 244 | + |
| 245 | +Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be |
| 246 | +``V80``, and no other values have been observed. It is assumed that should |
| 247 | +another value be observed, the layout described by this document may not be |
| 248 | +accurate. |
| 249 | + |
| 250 | +- **HeaderSize** - ``sizeof(TpiStreamHeader)`` |
| 251 | + |
| 252 | +- **TypeIndexBegin** - The numeric value of the type index representing the |
| 253 | + first type record in the TPI stream. This is usually the value 0x1000 as type |
| 254 | + indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for |
| 255 | + a discussion of reserved type indices). |
| 256 | + |
| 257 | +- **TypeIndexEnd** - One greater than the numeric value of the type index |
| 258 | + representing the last type record in the TPI stream. The total number of type |
| 259 | + records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``. |
| 260 | + |
| 261 | +- **TypeRecordBytes** - The number of bytes of type record data following the header. |
| 262 | + |
| 263 | +- **HashStreamIndex** - The index of a stream which contains a list of hashes for |
| 264 | + every type record. This value may be -1, indicating that hash information is not |
| 265 | + present. In practice a valid stream index is always observed, so any producer |
| 266 | + implementation should be prepared to emit this stream to ensure compatibility with |
| 267 | + tools which may expect it to be present. |
| 268 | + |
| 269 | +- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate |
| 270 | + hash table, although this has not been observed in practice and it's unclear what it |
| 271 | + might be used for. |
| 272 | + |
| 273 | +- **HashKeySize** - The size of a hash value (usually 4 bytes). |
| 274 | + |
| 275 | +- **NumHashBuckets** - The number of buckets used to generate the hash values in the |
| 276 | + aforementioned hash streams. |
| 277 | + |
| 278 | +- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within |
| 279 | + the TPI Hash Stream of the list of hash values. It should be assumed that there |
| 280 | + are either 0 hash values, or a number equal to the number of type records in the |
| 281 | + TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is |
| 282 | + not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the |
| 283 | + PDB malformed. |
| 284 | + |
| 285 | +- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size |
| 286 | + within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of |
| 287 | + pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>` |
| 288 | + and the second value is the offset in the type record data of the type with this |
| 289 | + index. This can be used to do a binary search followed bin a linear search to |
| 290 | + get amortized O(log n) lookup by type index. |
| 291 | + |
| 292 | +- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within |
| 293 | + the TPI hash stream of a serialized hash table whose keys are the hash values |
| 294 | + in the hash value buffer and whose values are type indices. This appears to |
| 295 | + be useful in incremental linking scenarios, so that if a type is modified an |
| 296 | + entry can be created mapping the old hash value to the new type index so that |
| 297 | + a PDB file consumer can always have the most up to date version of the type |
| 298 | + without forcing the incremental linker to garbage collect and update |
| 299 | + references that point to the old version to now point to the new version. |
| 300 | + The layout of this hash table is described in :doc:`HashTable`. |
| 301 | + |
| 302 | +.. _tpi_records: |
| 303 | + |
| 304 | +CodeView Type Record List |
| 305 | +========================= |
| 306 | +Following the header, there are ``TypeRecordBytes`` bytes of data that represent a |
| 307 | +variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number |
| 308 | +of such records (e.g. the length of the array) can be determined by computing the |
| 309 | +value ``Header.TypeIndexEnd - Header.TypeIndexBegin``. |
| 310 | + |
| 311 | +log(n) random access is provided by way of the Type Index Offsets array (if present) |
| 312 | +described previously. |
0 commit comments