Correctly parsing numbers
Problem
People from different cultures don't write numbers the same way. Another problem? People from a culture sometimes write numbers the way another culture does. There are two things causing these problems: the decimal separator and the digit group separator (commonly named the thousands separator because most cultures split the digits by groups of three).
I will not explore the cultures that split the digits in groups of 2 or 4 today (and probably never will). I will look at the ways to correctly parse numbers when we don't know how people write their numbers.
Separators
There are 3 separators that are commonly used. However, one of those is used both as a thousands separator and a decimal separator depending on the culture.
The point (.) is the most common decimal separator and a lot of people use it even if it's not in their culture.
The comma (,) is the second decimal separator. However, it also plays the part of the thousands separator in English cultures.
The third separator is the space ( ) which is used the thousands separator in cultures where the comma is used as the decimal separator.
Sample issues
Here are some examples of possible number representations from different cultures and the incorrect parsing situations that can occur.
1000 - 1,000 - 1 000 : The middle one could be parsed as 1 if the culture specifies the comma as the decimal separator
100.100 - 100,100 : The second one could be parsed as both one hundred thousand one hundred or 100.1
100,000.000 - 100 000,000 : Both of those should be a hundred thousand but depending on the culture the second one could be a hundred millions.
Solution
I'll expose my solution using C# syntax, but you can make it work for any language, we'll just use C# because it provides good support for cultures (and that's what I program with).
The idea is to first parse using the user's culture, then use an invariant culture (which specifies the dot as the decimal separator and removes every other special character).
decimal number;// Try parsing using the user’s culture if (decimal.TryParse(NumberInput.Text, NumberStyles.Number, System.Globalization.NumberFormatInfo.CurrentInfo, out number)) { // A number } // Parse using an invariant culture else if (decimal.TryParse(NumberInput.Text, NumberStyles.Number, System.Globalization.NumberFormatInfo.InvariantInfo, out number)) { // A number } else { // Not a number }
By using this solution, people from cultures using the comma as the decimal separator can write by using either the point or the comma. If they use the comma, the first parsing will work. If they use the point, the first parsing will fail because the point is not recognized and that's where the second parsing will be successful.
Conclusion
This may seem like a small issue, but if you work with money in an application used by people from different cultures, correctly parsing their input is critical. Numbers can be a thousand times too big or too small if you don't parse correctly.