Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I need to write a program which will browse through strings of various lengths and select only those which are written using symbols from set defined by me (particularly Japanese letters). Strings will contain words written in different languages (German, French, Arabic, Russian, English etc). Obviously there is huge number of possible characters. I do not know which structure to use for that? I am using Delphi 7 right now. Can anybody suggest how to write such program?

Obviously you would be better off with Delphi 2010, since the VCL in delphi 7 is not aware of Unicode strings. You can use WideString types, and WideChar types in Delphi 7, and you can install a component set like the TNT Unicode Components to help you create a user interface that can display your results.

For a very-large-set type, consider using a bit array like TBits. A bit array of length 65536 would hold enough to contain every UTF-16 code-point. Checking if Char X is in Set Y, would be basically:

function WideCharsInSet( wcstr:WideString; wcset:TBits):Boolean;
 n:Integer;
 wc:WideChar;
begin
result := false;
for n := 1 to Length(wcstr) do begin
  wc := wcstr[n];
  if wcset[Ord(wc)] then
      result := true;
procedure Demo;
 wcset1:TBits;
 s:WideString;
begin
 wcset1 := TBits.Create;
  // 1157 - Hangul Korean codepoint I found with Char Map
    wcset1[1157] := true;         
    // go get a string value s:
    s := WideChar(1157);
// return true if at least one element in set wcset is found in string s:
    if WideCharsInSet(s,wcset1) then begin
        Application.MessageBox('Found it','found it',MB_OK);
 finally
  wcset1.Free;
                +1 all the good bits in the answer. bigsets, TNT and recommend not doing this in D7 at all.
– Marco van de Voort
                Feb 17, 2010 at 14:25
                One great feature in Delphi 2010 TStringList class is the ability to load a file from disk, automatically determine UTF8 or UTF16 encoding from the byte-markers, and so on. That is another part of your task, Tofig, that will be made more tricky on version of Delphi older than 2009/2010.
– Warren  P
                Feb 17, 2010 at 14:31
                Delphi 2010 really makes very little difference here.  The poster is looking to process strings at a very simple level.  Invoking the help of an entire Unicode enabled framework simply to gain access to a handful of functions and a couple of classes that encapsulate the needed Unicode Windows API functions is overkill.  I suspect that all the poster really needs is the Unicode support unit(s) provided by the JEDI JCL.
– Deltics
                Feb 17, 2010 at 19:30
                @Warren: TWideStringList in the JEDI JCL also loads Unicode from disk with proper respect for encoding.  JclUnicode is free and works with pretty much all versions of Delphi.  Unicode TStringList requires a purchase of new software.  Poster: I need to get to the shops and my car is out of petrol.  My answer: Get some more petrol.  Your answer is presumably: "Buy a new car".  :)  If being "practical" is trolling then colour me green and call me Shrek (yeah, I know: he was an ogre, not a troll).  Note: This accepted answer does not need 2010.  The opposite of troll might be "Fanboy"?  ;)
– Deltics
                Feb 17, 2010 at 23:29

For the simple processing of strings in the manner you describe, do not be put off by suggestions that you should upgrade to the latest compiler and Unicode enabled framework. The Unicode support itself is of course provided by the underlying Windows API which is of course (directly) accessible from "non-Unicode" versions of Delphi just as much as from "Unicode versions".

I suspect that most if not all of the Unicode support that you need for the purposes outlined in your question can be obtained from the Unicode support provided in the JEDI JCL.

For any visual component support you may require the TNT control set has the appeal of being free.

+1, excellent argument. The code in the accepted answer compiles and works flawlessly in Delphi 4 even. – mghie Feb 17, 2010 at 19:53 I prefer to think of it as "getting the job done with the minimum of fuss, bother and expense" people and "change for changes sake without thinking about what is actually needed" people. :) – Deltics Feb 17, 2010 at 23:25 IMHO it is "I can do everything I want with the things I have" or "I can do that and more easier and help keeping my favorite development environment alive". – Uwe Raabe Feb 18, 2010 at 12:51

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.