Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to implement an algorithm and I am facing an error in the following block of code :
def get_action(self, state):
state = torch.from_numpy(state).float().unsqueeze(0)
probs = self.policy_network.forward(Variable(state))
highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy())) //This line
log_prob = torch.log(probs.squeeze(0)[highest_prob_action])
return highest_prob_action, log_prob
To pass in the probability distribution in np.random.choice, I used the probs tensor. I know I can use the detach function to make it work but I can't use that function because I have to submit this code on an autograder and apparently it returns an error when you use that function.
Is there a way I can do this without using the detach function?
Here's how the probs tensor looks like :
tensor([[0.2522, 0.2533, 0.2385, 0.2560]], grad_fn=<SoftmaxBackward>)
And I want the final output to be of the type (The values don't mean anything):
array([0., 1., 2., 3.], dtype=float32)
To reproduce the error, you can use:
import torch
tensor1 = torch.tensor([1.0,2.0],requires_grad=True)
print(tensor1)
print(type(tensor1))
tensor1 = tensor1.numpy()
print(tensor1)
print(type(tensor1))
What I tried : As suggested by GoodDeeds in the comments, I tried to use torch.multinomial as follows :
states.append(state)
probs = self.policy.forward(Variable(torch.from_numpy(state).float().unsqueeze(0)))
highest_prob_action = np.random.choice(torch.multinomial(probs, self.num_actions).squeeze(0))
# highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy()))
log_prob = torch.log(probs.squeeze(0)[highest_prob_action])
but after some iterations it returns the error:
130 states.append(state)
131 probs = self.policy.forward(Variable(torch.from_numpy(state).float().unsqueeze(0)))
--> 132 highest_prob_action = np.random.choice(torch.multinomial(probs, self.num_actions).squeeze(0))
133 # highest_prob_action = np.random.choice(self.num_actions, p=np.squeeze(probs.numpy()))
134 log_prob = torch.log(probs.squeeze(0)[highest_prob_action])
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
–
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.