c-t-

My name is initials, I'm cool

  • she/her

artist, bisexual menace, smol


I've been working for 4 days straight on a rust exercise for college and I'm having a lot of fun. I made a big clumsy mockup of the solution on the first day and I've been iteratively improving on the code ever since. It iterates on a stackexchange dataset and I'm supposed to let the user decide how many threads to use. RN I just take the number of data files and separate them in chunks, then process those chunks in parallel, but I think there's smarter ways to parallelize this problem? I'm not sure.


You must log in to comment.

in reply to @c-t-'s post:

Neat. Depending on the problem and how big the data is, might be able to load it all in memory at once then divide the work by amount of data rather than file. Or if it's too large, maybe stream it in or something and hand it out to n threads that way.

Good thinking. Right now I make a rayon worker per file and then one per datapoint (each file is a stack overflow website and each line is a question). I think I can load a whole file at once but maybe not all the files? I shall check.

From what I know, each rayon worker does not create a new thread and shouldnt cause as muh overhead. I removed the separation by chunks entirely and it did not have an effect on performance. I just set the number of threads beforehand as a global rayon config.