top of page

The Invisible Link: Inside the PE Header's Connection to PDBs

  • Writer: Josh Stroschein
    Josh Stroschein
  • Nov 23
  • 5 min read

Have you ever loaded an executable into WinDbg or Visual Studio and watched as it instantly found the matching symbols? It lights up the call stack with function names and snaps right to the source code line.


It feels seamless, but underneath that convenience lies a rigid, decades-old structure embedded in every Windows EXE and DLL. The binary itself holds the map to its own debugging information.

Whether you are into reverse engineering, malware analysis, or just optimizing your build pipeline, you need to understand how the Portable Executable (PE) file format manages this connection. Let’s take a technical deep dive into the structures that link code to symbols.


1. The Treasure Map: The Optional Header


The journey begins near the very start of a PE file, inside the IMAGE_OPTIONAL_HEADER. Despite the name "Optional," this header is mandatory for executable files.


At the end of this header sits an array of 16 structures called the DataDirectory. Think of this as the table of contents for the executable. It tells the OS loader where to find imports, exports, resources, and debug information.


We are looking for Index 6 of this array: the IMAGE_DIRECTORY_ENTRY_DEBUG.

// Located at OptionalHeader.DataDirectory[6]
typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress; // The RVA of the Debug Directory table
    DWORD   Size;           // The size of the table in bytes
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

A Note on Navigation: RVA vs. Raw Offset


There’s a catch. The VirtualAddress points to where data will reside after Windows loads the file into memory (a Relative Virtual Address or RVA).


If you are parsing the file statically on disk with a hex editor or a script, that RVA won't match the file offset. You must perform an address translation based on the PE's Section Headers to map that RVA into a "Raw Offset" on disk.



      [ Disk View ]                    [ Memory View (RVA) ]
 File Offset begins at 0             ImageBase begins 0x400000
+-----------------------+            +-----------------------+ 
| PE Headers            |            | PE Headers            | 
+-----------------------+            +-----------------------+
| .text Section         |            | .text Section         |
| (Code on Disk)        |            | (Code in RAM)         |
+-----------------------+            +-----------------------+ 
| .rdata Section        |  mapped to | .rdata Section        |
| [Debug Dir lives here]|  =======>  | [RVA 0x2050]          |
+-----------------------+            +-----------------------+
            ^
            |
The Data Directory gives you 0x2050.
You must calculate where that byte sits in the file on disk.

Calculating the RVA to File Offset 🤓


To perform the RVA-to-Offset translation, you need to iterate through the Section Header Table. This table immediately follows the optional header and defines the properties and locations of every section (like .text, .data, .rdata, etc.) in the file.

The relevant structure for each section is the IMAGE_SECTION_HEADER:


typedef struct _IMAGE_SECTION_HEADER {
    BYTE  Name[8];
    union {
        DWORD   PhysicalAddress;
        DWORD   VirtualSize;  // Size of the data in memory
    } Misc;
    DWORD   VirtualAddress;   // RVA where the section starts in
							    memory
    DWORD   SizeOfRawData;    // Size of the data on disk
    DWORD   PointerToRawData; // <-- File Offset where the section
                                     starts on disk
    // ... other fields not needed for the calculation
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

The Formula


You must iterate through the section headers until you find the one that contains your target RVA.


  1. Find the Containing Section: The target RVA must be greater than or equal to the section's VirtualAddress and less than the section's VirtualAddress plus its VirtualSize.

  2. Calculate the Offset: Once found, the formula is simple:

File Offset = (Target RVA - Section.VirtualAddress) +  
                                         Section.PointerToRawData

2. The Debug Directory Table


Once you translate the RVA and jump to that location in the file, you will find an array of IMAGE_DEBUG_DIRECTORY structures. A modern binary might have several entries here - one for POGO optimizations, one for VC++ feature usage, etc.


We have to iterate through this array looking for a specific type: the CodeView record.


Here is the structure defined in winnt.h:

typedef struct _IMAGE_DEBUG_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Type;            // <-- The crucial field
    DWORD   SizeOfData;      // How big the actual debug data block 
							   is
    DWORD   AddressOfRawData;// RVA when loaded
    DWORD   PointerToRawData;// <-- The FILE OFFSET to the data
 								   block
} IMAGE_DEBUG_DIRECTORY, *PIMAGE_DEBUG_DIRECTORY;

We iterate until we find Type == 2 (specifically IMAGE_DEBUG_TYPE_CODEVIEW). Once found, we grab the PointerToRawData. This is a direct file offset, so we can jump straight there without RVA translation logic.


3. "X" Marks the Spot: The CodeView Record (RSDS)


Landing at the PointerToRawData offset, we finally see the actual data used to match a PDB.


Historically, there were older formats (like "NB10"), but almost every modern Windows executable uses the RSDS format (PDB 7.0). You identify it by checking the first four bytes—the signature—which will literally be the ASCII characters "RSDS".


This structure is the nexus of debugging. It contains the unique fingerprint of the build.

struct CV_INFO_PDB70 {
    DWORD  CvSignature;     // The ASCII string "RSDS" (0x53445352)
    GUID   Signature;       // A 128-bit Globally Unique Identifier
    DWORD  Age;             // An incremental counter
    char   PdbFileName[1];  // A null-terminated string of variable length
};

The Fields That Matter


  1. The Signature (GUID): When the linker creates the PDB and EXE, it generates a brand new 128-bit GUID and stamps it into both files. This is how WinDbg knows that this specific app.exe matches this specific app.pdb, even if the timestamps are different.

  2. The Age: If you perform a linker update (linking without recompiling code), the GUID might stay the same, but the "Age" integer increments. The debugger needs both to match exactly.

  3. PdbFileName: This is the literal path to where the PDB was located on the build machine at the moment of compilation.


If you open a binary in a hex editor and look near the end of the file, you can often see this structure with your naked eye:


Hex View:
52 53 44 53 [ 16 Byte GUID ] [4 Byte Age] 43 3A 5C 42 75 69 6C 64 
R  S  D  S  . . . . . . . .   . . . .     C  :  \  B  u  i  l  d 

4. Going Dark: The Concept of "Stripping"


The presence of this data is vital for debugging, but sometimes you don't want it there. Removing or obscuring this data is known as "stripping," and it happens at three distinct levels.


Level 1: The "Debug Stripped" Flag


In the main IMAGE_FILE_HEADER, there is a Characteristics field. If the linker sets the flag IMAGE_FILE_DEBUG_STRIPPED (value 0x0200), it indicates that symbolic information is not contained within the executable itself. This is standard for release builds; it just tells the debugger, "Don't look inside me for symbols; go find the PDB file."


Level 2: Privacy Stripping (Path Redaction)


Look back at the CV_INFO_PDB70 structure. The PdbFileName might contain sensitive information about your infrastructure: C:\Users\TheCyberYeti\SecretProject\Release\bin\theyeti.pdb


By using linker switches (like /PDBALTPATH), developers can instruct the linker to strip the path information from the PE file, leaving only the filename.

  • Before: C:\Builds\Secret\theyeti.pdb

  • After: theyeti.pdb


The GUID and Age remain perfect, so matching still works, but the origin is obscured.


Level 3: Total Stripping (No Debug Info)


If a developer compiles with /DEBUG:NONE, they are severing the link entirely. Mechanically, the linker simply writes zeros into Index 6 of the Data Directory. Without this entry point, the entire chain breaks. A debugger loading this file has absolutely no way to ask a symbol server for a matching PDB.


5. How Symbol Servers Actually Work


You might wonder: If my PE file says the PDB is at C:\Builds\myapp.pdb, why does WinDbg find it when I'm not on the build machine?


Debuggers rarely use the literal path stored in the PE file for retrieval. Instead, they use the path to discover the name of the PDB, and then combine the GUID and the Age to create a unique lookup hash.


The debugger takes the hex representation of the GUID and appends the Age in hex.

If your GUID is {00112233-4455-6677-8899-AABBCCDDEEFF} and your Age is 1, the hash string is: 00112233445566778899AABBCCDDEEFF1


It then queries the symbol server URL looking for a very specific file layout:

http://msdl.microsoft.com/download/symbols / [PDB_FILENAME] / [GUID_AGE_HASH] / [PDB_FILENAME]


By standardizing on this GUID+Age hash hidden deep in the PE header, the entire ecosystem of Windows debugging and crash dump analysis is able to function smoothly.

 
 

Want to know when my latest content drops? Sign-up to receive email notications and access to other exclusive content!

bottom of page