

Day 116 diving into ML. Read the NetHack Learning Environment (NLE) paper. Key takeaway: to get generalizing agents, create envs which are procedural + stochastic worlds and build memory + structured encoders into the policy from the start. also, use unseen seeds during eval (seeds reserved for eval) to expand on structured encoders from above, don't flatten the entire obs array, encode each modality into the right format like map data into a CNN, text based data into a language encoder, etc. feeding these into your policy allows it to learn richer representations like a human would, rather then sending a single giant vector













