Monday, January 30, 2017

From Telemetry to Open Source an Overview of Windows 10 Source Tree

From Telemetry to Open Source an Overview of Windows 10 Source Tree



There is a lot of internal information available about Microsoft software, despite the fact that it is closed-source. For example, export of library functions by names, which provides some information on the interfaces used. Debugging symbols used for troubleshooting of operating system errors are publicly available; however, there are only compiled binary modules at hand. In this article, we will try to determine what they looked like prior to compilation using only legal methods. 

Raising this question is not new, as Mark Russinovich and Alex Ionescu did this before; however, my research was more detailed. What we need is debugging symbol packages, which are publically available, in this case — the most recent release of Windows 10 (64 bit), both free and checked builds.

Debugging symbols are a set of .pdb (program database) files that keep various information used for debugging purposes of Windows binary modules including names for globals, functions, and data structures, sometimes even with field names.

We can also use information from an almost-publicly-available checked build of Windows 10. This kind of build is full of debugging assertions that contain sensitive information about local variable names and even source line numbers.



The example above, while not providing an absolute path, does expose extremely helpful path information. 

If we feed debugging symbols to the "strings" utility by Sysinternals, we get around 13 GB of raw data. However, repeating this with Windows installation files is a bad idea because it would generate useless data. Therefore, we limit target file types with the following list: exe — executable files, sys — drivers, dll — libraries, ocx — ActiveX components, cpl — control panel elements, efi — EFI applications, in particular, the bootloader. Then we get additional 5.3 GB of raw data. I was initially surprised that there were so few programs that can open gigabytes-large files and even fewer programs that can search for specific data inside those files. I used 010 Editor for manual operations on the raw and temporary data and python scripts for automated data filtering.

Filtering Symbol Data

The symbol file contains a list of object files used for linking of a corresponding executable image. Object file paths are absolute.


  • Filtering clue No. 1: find strings using the mask ":".