Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have written this function to auto correct gender to M or F from different values in a string array. It works fine but my manager told me to use Dictionary which he said is more efficient. But I have no idea. Anyone like to help me to understand how this can be done ? Thanks.

    Public Function AutoGender(ByVal dt As DataTable) As DataTable        
    Dim Gender As String = ""
    Dim Mkeywords() As String = {"boy", "boys", "male", "man", "m", "men", "guy"}
    Dim Fkeywords() As String = {"girl", "girls", "female", "woman", "f", "women", "chick"}
    Dim row As DataRow
        For Each row In dt.Rows
            If Mkeywords.Contains(row("Gender").ToString.ToLower) Then
                Gender = "M"
                row("Gender") = Gender
            ElseIf Fkeywords.Contains(row("Gender").ToString.ToLower) Then
                Gender = "F"
                row("Gender") = Gender
            End If
    Return dt
    End Function

Here is an example how you could implement the Dictionary(Of String, String) to lookup whether this synonym is known or not:

Shared GenderSynonyms As Dictionary(Of String, String) = New Dictionary(Of String, String) From
    {{"boy", "M"}, {"boys", "M"}, {"male", "M"}, {"man", "M"}, {"m", "M"}, {"men", "M"}, {"guy", "M"},
     {"girl", "F"}, {"girls", "F"}, {"female", "F"}, {"woman", "F"}, {"f", "F"}, {"women", "F"}, {"chick", "F"}}
Public Function AutoGender(ByVal dt As DataTable) As DataTable
    If dt.Columns.Contains("Gender") Then
        For Each row As DataRow In dt.Rows
            Dim oldGender = row.Field(Of String)("Gender").ToLower
            Dim newGender As String = String.Empty
            If GenderSynonyms.TryGetValue(oldGender, newGender) Then
                row.SetField("Gender", newGender)
            End If
    End If
    Return dt
End Function

Note that i've used the collection initializer to fill the Dictionary that is a convenient way to use literals to initialize collections. You could also use the Add method.

Edit: Just another approach that might be more concise is using two HashSet(Of String), one for the male synonyms and one for the female:

Shared maleSynonyms As New HashSet(Of String) From
    {"boy", "boys", "male", "man", "m", "men", "guy"}
Shared femaleSynonyms As New HashSet(Of String) From
    {"girl", "girls", "female", "woman", "f", "women", "chick"}
Public Function AutoGender(ByVal dt As DataTable) As DataTable
    If dt.Columns.Contains("Gender") Then
        For Each row As DataRow In dt.Rows
            Dim oldGender = row.Field(Of String)("Gender").ToLower
            Dim newGender As String = String.Empty
            If maleSynonyms.Contains(oldGender) Then
                row.SetField("Gender", "M")
            ElseIf femaleSynonyms.Contains(oldGender) Then
                row.SetField("Gender", "F")
            End If
    End If
    Return dt
End Function

A HashSet must also be unique, so it cannot contain duplicate Strings (like the key in the Dictionary), but it's not a key-value pair but only a set.

Simply change both of your arrays to dictionaries, and do a ContainsKey instead of Contains.

Dim Mkeywords = New Dictionary(Of String, String) From
    {{"boy", ""}, {"boys", ""}, {"male", ""}, {"man", ""}, {"m", ""}, {"men", ""}, {"guy", ""}}

(and follow suit for the female)

However, as you might've noticed I put in all those empty strings. This is because dictionaries have values as well as keys, but since we're not using the values, I made them empty strings. To have the same O(1) lookup but avoiding all the extraneous values, you can use a HashSet in a similar manner.

All you have to change now is, like I said, use ContainsKey (or for HashSet if you go that route, it's still just Contains):

If Mkeywords.ContainsKey(row("Gender").ToString.ToLower) Then

One final note: this will only be "more efficient" if the data starts growing in size considerably. Right now as you have it, with only those few elements, it may even be slower to use a dictionary.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.