postgarf

curious bobert cat

a passively nodal intravenously networked nervous-system fleabag with a smile :)



anarch-esperantisto who enjoys various weird things, like film photography, ham radio, writing systems, and ancient operating systems (win2000 to OS/2 to UNIX),

and big cats!



blanket CW: im weird sorry
there might be kinks here!

also @degarf



atomicthumbs
@atomicthumbs

this chassis has 4 nodes. each node has 1 chip. each chip has 64 Atom cores with AVX-512, 4-way hyperthreading, and 16GB of on-chip MCDRAM (a variant of Hybrid Memory Cube) along with support for 384GB of "far" DDR4.

You can boot Windows 10 on it but it makes no sense to do so; the intention was you boot Linux on it and run OpenMP workloads. This 2U server runs 1,024 threads of x86 code. Xeon Phi was supposed to kill GPUs for number crunching. It didn't


You must log in to comment.

in reply to @atomicthumbs's post:

in reply to @atomicthumbs's post:

Intel released Knight's Landing years behind schedule, and then used the sales figures for that outdated product in a highly competitive market to justify killing the product line entirely. Intel is too comfortable to continue working on product lines that aren't instantly successful. I'd argue their strength as such a large company is exactly being able to provide projects like these such a long runway. But I suppose perhaps if they consider Itanium they may have a point that they can't keep doing it. I'd just argue they can't afford not to.

Intel seems to do this often, Optane lived and died on the two year delay that resulted in releasing the Optane DIMMs in servers that already had faster regular RAM in them and couldn't afford to run them slower for any amount of extended capacity...

the fucked thing is that you could also use PMDIMMs as block storage which seems like it would be the optimal way to get an absurdly, ridiculously fast SSD. we have a couple of hp z6 g4 workstations on the shelf here that'd work for that and i desperately wish i could try it

I have to wonder how something like this would perform on e.g., highly parallelized software build workloads. like, run up against a modern many-cored Threadripper or something. I imagine the memory locality helps a bunch for compile times.