Description

When using the fields BRCPFField and BRCNPJField in django.contrib.localflavor.br.forms I missed a uniform return format in the cleaned value of these fields, as they validate formatted and digits only inputs and the cleaned data preserves the same format as the entered value. If you're not enforcing a particular format with JavaScript, you may end up with a non standardized values on the database.

Ex.: For BRCPFField the values cleaned can be '999.999.999-99', '99999999999', '999999999-99', etc...

My suggestion is to keep the diversity of valid input formats, but provide a way to uniform the format of the cleaned value.

The implementation consists of adding two optional keyword arguments:

always_return_formatted: a boolean which indicates if the cleaned value must always be in a particular format, the default is False to prevent unexpected behavior to the current users of this FormField class.

return_format: a string to format the output, defaults to '%s.%s.%s-%s' in BRCPFField

In addition to these, I've changed the parent class of BRCNPJField from Field to CharField to take advantage of max_length and min_length validators.

Tests for these features included and fixed some errors in previous doctest that were expecting a unicode string.

1) simplify the API and take only the return_format keyword arg, but make it default to None. If that argument is not given, then keep the current behaviour. No need for the other boolean keyword argument, and we are still backward compatible.

2) also reject 111.111.111-11 222.222.222-22... and so on, because any CPF composed of one digit repeated eleven times passes the check, but these are not real CPFs.

1) I've thought about only having the parameter return_format defaulting to None, and the problem I see in this aproach is that there's no default return_format, obligating the user to think about the desired format, which unawered users may give wrong string format, leading to exceptions. Maybe a better error handling could solve this, as these exceptions could happen anyway. Any ideas?

1) I've thought about only having the parameter return_format defaulting to None, and the problem I see in this aproach is that there's no default return_format, obligating the user to think about the desired format, which unawered users may give wrong string format, leading to exceptions. Maybe a better error handling could solve this, as these exceptions could happen anyway. Any ideas?

I agree, but this usability issue can be addressed by providing a couple of ready to use formatting strings as constants inside the field class definition, called DIGITS_ONLY and DOTS_AND_SLASH or something similar.

In this case "them all" amounts to exactly 8 CPFs (in addition to those two that you already exclude). I just checked and none of these 8 CPFs belong to real persons: 3 were cancelled (probably because the poor citizens could not use them) and the other 5 do not exist in the CPF database. But they are often abused by people who don't want to provide a real CPF, which partially defeats the purpose of validating the field. So I think the benefit of rejecting those 9 CPFs far outweights the remote possibility that a real person will assigned one of those numbers ever again.

1) I've thought about only having the parameter return_format defaulting to None, and the problem I see in this aproach is that there's no default return_format, obligating the user to think about the desired format, which unawered users may give wrong string format, leading to exceptions. Maybe a better error handling could solve this, as these exceptions could happen anyway. Any ideas?

I agree, but this usability issue can be addressed by providing a couple of ready to use formatting strings as constants inside the field class definition, called DIGITS_ONLY and DOTS_AND_SLASH or something similar.

In this case "them all" amounts to exactly 8 CPFs (in addition to those two that you already exclude). I just checked and none of these 8 CPFs belong to real persons: 3 were cancelled (probably because the poor citizens could not use them) and the other 5 do not exist in the CPF database. But they are often abused by people who don't want to provide a real CPF, which partially defeats the purpose of validating the field. So I think the benefit of rejecting those 9 CPFs far outweights the remote possibility that a real person will assigned one of those numbers ever again.

I agree with all your points. Attaching an updated patch!
Thanks for the advices!