A binary scale is the simplest option and makes grading quick and efficient. You will most likely also see your reviewers giving similar scores as there is less choice.
A 3-point scale can make grading a little more flexible. Many teams will use 1 as 'needs improvement', 2 as 'good' and 3 as 'excellent' giving the agent a little extra praise when they go above and beyond, which the binary system can't do.
A 4-point scale allows you to differentiate between 'could be better' and 'needs real improvement' as well as 'this was good' and 'wow this was amazing'.
A 5-point scale gives you what the 4 point scale does but will give you 3 as a middle option. Sometimes a conversation wasn't quite good or quite bad but just in the ok. This allows you to give a score with this whilst the n/a option does not.