oh god how did this get here i am not good with computer

 


 

Background music:
Click here because I can't put an audio widget in the profile

 

The scenes with the shark are usually very intense and disturbing.

 

I use Arch BTW

 

Fun fact: Neo-Nazi dipshit cartoonist Stonetoss is in fact Hans Kristian Graebener of Spring, Texas


blep
@blep
This page's posts are visible only to users who are logged in.

DecayWTF
@DecayWTF

there's a bunch of dumb bullshit going on here. I started writing this last night and it's kinda long.


millenomi
@millenomi

which, very rare, I can talk about because it happened in the open.

Swift carries some amount of cross-platform compatibility. There is a reimplementation of Foundation called swift-corelibs-foundation, which reimplements substantial bits of the framework with the same name — so that your macOS code that uses import Foundation has at least a good subset of API it can rely on on Windows and Linux.

Poor compnerd, the Windows champion, and I (but vastly mostly him) spent an absurd amount of days trying to figure out how the heck we could adapt the way you refer to a file on a Mac to Windows paths.


Mac uses UNIX paths, in general, but, for reasons, you use Foundation's URL type to refer to a file on disk. In general, save for a bunch of edge cases that only apply on Apple OSes, for a path /a/b/c/d/e.txt, you have a URL pointing to file:///a/b/c/d/e.txt (where the host portion is canonically empty and generally ignored).

However, sometimes you need to use POSIX API or old Foundation API. The old API, some of which don't have a 1:1 replacement, take a string that is the full path to the file to operate on. Now, however, here's a problem: strings in Foundation/ObjC/Swift are collections of Unicode code points, but paths are byte buffers. Now, my friends, guess: if the system wants or gives you a file path, what encoding is it in?

The answer is obviously 'whatever the OS/filesystem decides it is', which means it's basically kind of whatever? And of course if you use one of the autodetect̨-encoding methods and it messes up, your path will be mangled and everyone will be sad. So what the system says is that you are not allowed to just grab random bytes and shove them in a String constructor — you need to ask the OS to turn what it calls the file system representation (the byte buffer with the path) into a string in a special way, so it can pick a platform-dependent OS path encoding that can represent all possible path byte buffers (usually converting from and to UTF-8, which is the underlying storage of Swift strings and URLs).

So, long story short:

  • the system wants you to keep your file reference as a URL…
  • … which it can either convert to a path byte buffer (what it calls the 'file system representation')…
  • … or from/to a string built in a special way, which is guaranteed to always be convertible to a path byte buffer.

This means having three transforms:

  • From a path byte buffer to a string and vice versa (via the FileManager class, which is responsible for basic disk I/O: its fileSystemRepresentation(withPath:) and string(withFileSystemRepresentation:length:) methods do that.)
  • From a path byte buffer to a URL (URL(fileURLWithFileSystemRepresentation:isDirectory:relativeTo:)) and back (withUnsafeFileSystemRepresentation(_:).)
  • From a string path to a URL (via the URL(fileURLWithPath:) constructor) and vice versa.

OK, so, what is that last one? Of course, people were Clever. You can see from the file:///a/b/c/d/e.txt example that the macOS path is just the URL's path portion. So, URL's path property — the one that would return the path portion of a non-file URL, for example the /millenomi in https://cohost.org/millenomi — to return a string for file URLs that is in the right format and encoding to map to the above. Easy, no?

Except when it came to Windows paths, that doesn't hold anymore :(

compnerd came up with this setup, which AFAICT works out nicely:

  • The 'file system representation', aka the path byte buffer, is just whatever the local API accept — in this case, the wchar_t array Windows path management functions use. (This is already a departure — macOS paths IIRC use UTF-8 in a specific normal form here.) This is your usual C:\A\B\C\D\E.txt string, in the format WinAPI expects.
  • The string is an appropriately mapped version of the above, with encoding/decoding choices that ensure it can encode anything Windows throws at it.
  • The URL is of the form file:///c:/a/b/c/d/e.txt. If you ask for its path, it returns a string in a different format: c:/a/b/c/d/e.txt.

It turns out — as the original post shows — that a lot of Windows API are agnostic to \ vs /; the canonical one they'll return is \, but they also accept /, which is not a valid filename character anyway. compnerd made sure that all string-accepting API that were cross-platform could operate independently of \ or /-ness. This preserves almost every transform back-and-forth more or less transparently, and allows the framework to return the full, regular string if you go down direct translation paths (e.g., from URL to byte array).

And of course, he had to make sure this setup also worked similarly, and correctly, with every variant of Windows path listed above. It was A Ton of Terrible Work.


† Ironically, URLs are used because that's what Core Foundation introduced to abstract paths between 'Classic' Mac OS's HFS-style paths, and modern Mac OS X's UNIX-style ones. There was a toolkit called Carbon which included Core Foundation that you could use to make apps that could run on both OSes.

‡ Mostly, these were API coming from NeXTSTEP's Foundation implementation. It took a long time, after the decision was made to use URLs to refer to documents with the introduction of CF, to propagate that decision to the higher-level API.

✶ Things I'm eliding here:

  • file reference URLs, that lie about their paths and are not supported outside Apple OSes;
  • how extended paths could work (basically, we would detect when they're needed and produce an appropriate file system rep, but we would be liberal wrt accepting strings with or without the marker and would still use file URLs of the form file:///c:/… even if they're long or use Unicode);
  • you may notice that getting a URL path to a file on the Mac yields a '/…' and on Windows yields a string without the '/' — there's a bunch of special casing IIRC.
  • I am not touching the new methods that use the new System-module FilePath parameters, which essentially are just fancy wrappers over file system representations (aka, path byte buffers), other than that they're probably better to work with, they should work on Windows similarly, and you may be using them inadvertently without knowing because there's a new constructor called URL(filePath:…) that takes them and they are expressible via string literals, so you may do:
    let root = URL(filePath: "/")
    and it works the way you'd expect even if you don't know you're taking a Module Detour ™.

You must log in to comment.

in reply to @blep's post:

when you access \localhost\c$, you access an SMB network share of your c drive that Windows sets up automatically. fun fact: this allows domain admins to access any domain machine’s c drive over the network. it can actually be removed, though i haven’t tried that.

just checked on my local machine (where I've disabled the default shares) and apparently \\localhost\[drive letter]$ works to access local drives even if you've disabled the shares, which is wild

what happens when you mount more than 26 disks?

you can mount a disk at any (empty) directory of any existing NTFS volume. so actually there’s no guarantee that C:\SomeDir is really on the “C” volume

If you mount more than 24 disks (A: and B: are reserved for floppy disks specifically), any further disks will be mounted but won't be easily accessible. Windows doesn't have "mount points": mounting a filesystem happens automatically when you first access a device that's marked as a volume, and things like drive letters or "mount points" are actually symbolic links.

The real path of a disk is something like \Device\HarddiskVolume3 (in kernel path syntax), and a drive letter is actually a symbolic link with a path like \GLOBAL??\C:. C: is not a file, but a symbolic link object, and \GLOBAL?? is not a directory, but an object directory: both are abstractions used to give a structure to the kernel object namespace. Symbolic links under \GLOBAL?? are known as "DOS devices", and include drive letters, proper DOS devices like COM1, AUX, PRN etc. and other kinds of symbolic links to devices.

A Win32 path like C:\dir\file is translated to a kernel path like \??\C:\dir\file; for various backwards compatibility reasons, the translation includes stripping trailing dots from path components, converting forward slashes to backslashes and collapsing special directory entries like .. and . (e.g. C:\dir\.\dir/../file... translates to \??\C:\dir\file as well). In the kernel namespace, there is actually no object directory named \??: it's shorthand for a somewhat complex per-process search path, with \GLOBAL?? as its fallback.

Does this look familiar? Yes, the \\?\-prefixed syntax is a way to skip the normalization step, and feed barely-disguised kernel paths straight to the APIs. If you pass \\?\C:\dir\file, a trivial translation algorithm will be used that turns that into \??\C:\dir\file (what about .. and . directory entries and forward slashes? you are at the mercy of the filesystem implementation for how those are handled). An older escape syntax is the \\.\ prefix, which works almost exactly the same, but is generally used to access devices that aren't in the old DOS set of devices accessible from any path; e.g. COM1 through COM9 are accessible by simply putting the device name in any part of the path (COM1, \\.\COM1 C:\COM1, COM1.txt, C:\COM1\etc all work), but COM10 and above must be accessed as \\.\COM10.

Don't read too much into the \\ syntax, all this is unrelated to UNC (network) paths, which predate Windows NT and the \\.\ and \\?\ escapes. In fact, UNC paths like \\localhost\c$\Users\cat\Desktop\help.txt are internally translated as \??\UNC\localhost\c$\Users\cat\Desktop\help.txt, where UNC is the "DOS device" (symbolic link) for the Multiple UNC Provider (MUP), a router that sits in front of network filesystems, calling each in turn until one accepts the host/share pair in the path. In this case, the host is localhost and the share is c$, and the path will be accepted by the SMB redirector, which automatically creates a share named x$ for each x: drive letter (yes, they're remotely accessible). The reason the share name doesn't include the : part is that it's an invalid character in paths, except to separate the drive letter from the rest of the path; and the reason for the $ is that it's an ancient convention to create hidden entities: all shares (and, I believe, user names and group names as well) that end with $ won't be enumerated.

The full explanation of the Windows path syntax is much more complex. This is the best explanation I know of, although it might be outdated by now: https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html

How can you access the 25th disk? Drive letters, as I said, aren't the only kind of symbolic links created for disk volumes. Others include:

  • mount manager symbolic links; they look like \\?\Volume{GUID}. You can enumerate thse with command line utility mountvol (which you can also use to assign and revoke drive letters or "mount points", which aren't really mount points). They can be enumerated programmatically with FindFirstVolume/FindNextVolume
  • device manager (aka "Plug & Play") symbolic links; they look like \\?\<device instance path>#{device interface GUID} (the device instance path's backslashes are replaced with #). For example, for a volume enumerated as STORAGE\VOLUME\{AF0B22BB-8E21-4013-9764-211103064B98}#0000000011D00000 that implements interface {53f5630d-b6bf-11d0-94f2-00a0c91efb8b} (regular disk volume), the device manager will create symbolic link \\?\STORAGE#Volume#{af0b22bb-8e21-4013-9764-211103064b98}#0000000033D00000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}. These can be enumerated with device manager APIs, like CM_Get_Device_Interface_List or SetupDiEnumDeviceInterfaces.

You can also create "mount points" on supported filesystems (like NTFS), as special "reparse point" directories. "Reparse points" are a multi-purpose superset of symbolic links, that come in many different kinds. One of the built-in kinds of reparse points is the "mount point", a symbolic link to a volume managed by the mount manager (not any volume but specifically those managed by the mount manager). These differ from proper UNIX mount points in a couple ways:

  • they're actually persisted on disk, they aren't just in-memory abstractions (as a corollary, the target filesystem's on-disk format must allow for storing reparse points). This has its advantages, because you can reorder volumes on a disk, or move them to different physical disks altogether, and all the mount points will keep magically working (volume GUIDs are stored on-disk, in that hidden reserved space at the end of the disk. Does your partitioning software correctly support the - undocumented - format of that database?).
  • since they're just symbolic links, a volume can have many of them. Corollary: if C:\mount is a mount point for \Device\HarddiskVolume4, \\?\C:\mount\.. translates to \Device\HarddiskVolume4\.., which points to the root directory of the target volume, not the root directory of C:; in regular use, this is hidden by the fact that paths like C:\mount\.. are normalized in user mode before being passed to the kernel (C:\mount\.. -> C:\ -> \??\C:\ -> \Device\HarddiskVolume3\), but if you use the \\?\ escape you have to perform this kind of normalization yourself (otherwise, \\?\C:\mount\.. \??\C:\mount\.. -> \Device\HarddiskVolume4\..). On UNIX, this doesn't happen because you can only have one mount point per filesystem (... with exceptions/complications that I won't go into), and the kernel can pretend that the root of the mounted filesystem is actually a subdirectory of the directory that contains the mount point: \media\mount\.. actually "exists" (albeit only in memory) and it correctly points to \media\, instead of being user mode trickery like on Windows.
  • Windows applications generally have no idea what to do with mount points and they'll behave like you pranked them. Expect all sorts of things to stop working correctly, from file operations failing to incorrect calculations of disk space (to avoid this, just pass a full path to GetDiskFreeSpaceEx, instead of passing just the drive letter to GetDiskFreeSpace).