Emerald Editor Discussion
July 23, 2017, 01:50:45 am *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: National character support  (Read 7018 times)
0 Members and 1 Guest are viewing this topic.
KjeBja
Senior Miner
***
Posts: 76


« on: July 27, 2006, 06:48:45 am »

I have been wondering why CE has a peculiarity in its Find in Files method; namely that it assumes that normal source files are binary, and skips searching through them. However, I have never bothered to find out what makes this happen. Instead, I use a different tool when I need to do a search (at least if CE fails). The other day I stumbled across this bug report in the CE forum: http://www.crimsoneditor.com/english/board/CrazyWWWBoard.cgi?db=forum2&mode=read&num=3422&page=10&ftype=6&fval=language&backdepth=1, which gives a simple explanation. Being Norwegian, I often work on programs with national characters in the comments, and this is of course one reason why the search fails. I am sure there are lots of other users with the same problem, so PLEASE put the support of national characters/extended ASCII high on the priority list.
Logged
daemon
Developers
Gem Cutter
***
Posts: 107


WWW
« Reply #1 on: July 27, 2006, 09:33:30 am »

I do think that this is very important, but I do not have a whole lot of experience with UNICODE and various character sets in languages other than PHP, Cocoa (which is handled by Mac OS X), and Java--none of which we are using for EE. But, as I said, I do think this is important, and this probably will be done (I think it may even be on the roadmap).
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #2 on: July 27, 2006, 12:11:09 pm »

The only trouble with extended (and multibyte) support is that it is going to be much slower than a regular search.

Partly because it has to try more combinations (more for MB issues) but if you're searching hundreds of files, EE would have to try each file.

There are ways this can be overcome, each of which we can debate:

1. Skipping files by extension where the extension is a known binary type (e.g. .GIF, .PNG, .JPG, .ZIP)

2. Examining the first 1K of a file - if it isn't Unicode (based on the first couple of characters, with my limited knowledge), look for characters where the ASCII value is below 32, which would remove the file from consideration.

If it is Unicode, run it as a search as you would normally against Unicode strings (e.g. using PCRE-style regexps where the appropriate character mappings have been applied if applicable)

3. The current behaviour be maintained. (Least desirable while fastest when the file is examined)

Whichever route we choose (and I'm looking to the community first before making a decision here), the default behaviour could perhaps be overridden, but with a note to the user that overriding the behaviour and using the full search whatever will definitely be slower.

Thoughts anyone?
Logged

"Cleverly disguised as a responsible adult!"
KjeBja
Senior Miner
***
Posts: 76


« Reply #3 on: July 28, 2006, 06:52:35 am »

I forgot to mention one “small” point, which might be of help here. When I do a search, I normally enter a spesific extension in the file type box. The file type is known to CE, as an extension.xxx file has been declared for it. So in my case, skipping known binary files should do the trick.
Logged
shammat
Prospector
*
Posts: 5


« Reply #4 on: September 19, 2006, 09:33:13 am »

I think this is an absolute must-have. Not necessarily for "find in files" but to support editing and displaying UNICODE/UTF-8 files
Logged
rageboy
Jeweller
*****
Posts: 305

Ankit Singla


« Reply #5 on: March 27, 2007, 02:23:05 pm »

In response to (2), I think ideas 1 and 2 wouldn't be impossible to implement together, which might make it a bit faster than either alone, and I definitely agree with the option idea.
Logged
drazhar
Prospector
*
Posts: 1


« Reply #6 on: August 15, 2007, 02:59:06 pm »

Polish (my) language is encoded for web with ISO-8859-2 standard.
However MS uses its own encoding (Windows-1250) which causes a lot of problems and lack of compatibility, and Crimson Editor / Emerald Editor from what I can see uses the encoding given by the OS.

I'd like to see the ability to choose the output encoding of the file, so that when I write Polish special chars in Crimson/Emerald Editor they are encoded in ISO-8859-2 so that I can upload them directly onto my web server.

(now I have to use an online converter which is dramatically annoying).

Thank you for your attention Smiley
Logged
Pvt_Ryan
Master Jeweller
******
Posts: 422



WWW
« Reply #7 on: August 15, 2007, 03:19:25 pm »

forgive me for being stupid here but..

Would using UTF-8 work?

Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.241 seconds with 20 queries.