Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to implement an algorithm and I am facing an error in the following block of code :

def get_action(self, state):
        state = torch.from_numpy(state).float().unsqueeze(0)
        probs = self.policy_network.forward(Variable(state))
        highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy())) //This line
        log_prob = torch.log(probs.squeeze(0)[highest_prob_action])
        return highest_prob_action, log_prob

To pass in the probability distribution in np.random.choice, I used the probs tensor. I know I can use the detach function to make it work but I can't use that function because I have to submit this code on an autograder and apparently it returns an error when you use that function.

Is there a way I can do this without using the detach function?

Here's how the probs tensor looks like :

tensor([[0.2522, 0.2533, 0.2385, 0.2560]], grad_fn=<SoftmaxBackward>)

And I want the final output to be of the type (The values don't mean anything):

array([0., 1., 2., 3.], dtype=float32)

To reproduce the error, you can use:

import torch
tensor1 = torch.tensor([1.0,2.0],requires_grad=True)
print(tensor1)
print(type(tensor1))
tensor1 = tensor1.numpy()
print(tensor1)
print(type(tensor1))

What I tried : As suggested by GoodDeeds in the comments, I tried to use torch.multinomial as follows :

states.append(state)
            probs = self.policy.forward(Variable(torch.from_numpy(state).float().unsqueeze(0)))
            highest_prob_action = np.random.choice(torch.multinomial(probs, self.num_actions).squeeze(0))
#             highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy()))
            log_prob = torch.log(probs.squeeze(0)[highest_prob_action])

but after some iterations it returns the error:

    130             states.append(state)
    131             probs = self.policy.forward(Variable(torch.from_numpy(state).float().unsqueeze(0)))
--> 132             highest_prob_action = np.random.choice(torch.multinomial(probs, self.num_actions).squeeze(0))
    133 #             highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy()))
    134             log_prob = torch.log(probs.squeeze(0)[highest_prob_action])
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
                No I don't, I set the probs.requires_grad = False but it still is giving the same error. Here's the error trace : 132             probs.require_grad = False --> 133             highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy()))     134             log_prob = torch.log(probs.squeeze(0)[highest_prob_action])     135             action_l.append(highest_prob_action)  RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
– Ravish Jha
                May 6, 2021 at 21:46
                Ah yes, sorry, I missed that that would not work. In your small example, tensor1 is a leaf so it works.
– GoodDeeds
                May 6, 2021 at 21:47
                Does this help? discuss.pytorch.org/t/torch-equivalent-of-numpy-random-choice/… . You could avoid converting to numpy altogether.
– GoodDeeds
                May 6, 2021 at 21:50
                It does help but I am trying to figure out how to get the action which was returned by it
– Ravish Jha
                May 6, 2021 at 22:06
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.