Emerald Editor Discussion
June 26, 2017, 09:41:45 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Recognition of formats  (Read 9300 times)
0 Members and 1 Guest are viewing this topic.
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« on: January 12, 2007, 11:25:21 pm »

OK, so we're making headway on the whole "syntax format" debate.

Here's another interesting question: how should EE determine what type of highlighting to use? CE bases its initial decision on the file's extension, matched against the details it has in its repository of syntaxes, then switches to that highlighting.

Under a Windows environment, this is totally normal and acceptable behaviour since it is very much dictated by extension, while other operating systems do not rely so heavily on extensions and more on file contents.

Do we want to consider leaving file-type detection at extensions, or try and add in something to help it determine file types (e.g. rulesets of some kind)? If we have do have it as a form of central repository of rules, we could make it downloadable from within EE itself if we want to.
Logged

"Cleverly disguised as a responsible adult!"
John Yeung
Senior Miner
***
Posts: 85


« Reply #1 on: January 13, 2007, 02:00:59 am »

I think it would be silly not to at least check the beginning of the file to see if it tries to announce what it is.  I think filename extension should only be used if there is no such internal identification.  Yes, that means we would have to inspect the contents of the file, but isn't that a normal thing to do anyway, to detect Unicode and (perhaps) prevent opening of binary files?

As for whether these rules should be collected in one configuration file or dispersed among the appropriate language-specific configuration files, I don't know.

John
Logged
Feldon
Gem Cutter
****
Posts: 106


« Reply #2 on: January 13, 2007, 02:26:52 am »

Since CE only ever had to deal with one syntax per file, looking at the file extension made more sense.  If you intend to allow nested syntax highlighting (for example, php/javascript/java/whatever within html) then that demands that you look at the content.
Logged
John Yeung
Senior Miner
***
Posts: 85


« Reply #3 on: January 13, 2007, 07:07:42 am »

Since CE only ever had to deal with one syntax per file, looking at the file extension made more sense.  If you intend to allow nested syntax highlighting (for example, php/javascript/java/whatever within html) then that demands that you look at the content.

I was just referring to the first line, most common in scripts on Unix and Unix-like systems, that specifies which program should process the rest of the file.  Crimson apparently does check this (I have never tried it)--notice the firstline.* files in the link directory.

I wasn't suggesting that Emerald should take a look at much more than the first line or so.  If it can't figure it out at that point, it should give up and use other means, such as filename, to guess the syntax.  If it's still unclear, it should use some default, just as if a new, empty file had been created.

John
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #4 on: January 13, 2007, 12:57:01 pm »

XML is a good examples of a syntax that identifies itself at the beginning. I see a problem if the whole file is concidered - apart from the obvious on about the time it would take - and that is that in some cases the same code might want different syntax. There are a good number of different implementations of SGML for instance (HTML and XML are two examples) and in order to differentiate between them a fileending is used.

As for finding different code in the document, I think it would be more of a hassle to automatically find it than just defining in the SDD for HTML that JavaScript may occur.

Looking through the first line first and then checking the file-ending is not a bad idea though. I would also like to see a more advanced syntax chooser where I can define the base syntax of the file and then what other types may show up and, if so, where they can show up.

Let's say I'm writing a simple program that will display "Hello world!" in as many languages as possible and then wrap it all up in a nice XML document. In order to properly higlight everything I would tell EE that the base syntax is XML and that within "<code type="foo">" and "</code>" the highlighting should be that of "foo". This way I could use several syntax files for the same document, even though it is not defined in the SDD.
Logged

"Cleverly disguised as an original signature..."
KjeBja
Senior Miner
***
Posts: 76


« Reply #5 on: January 15, 2007, 06:59:12 am »

If automatic code recognition is to work, I definitely think that EE would have to look at more than the first line. A lot of languages permit comment lines at the beginning of the file, so the editor has to allow for that as well. In some cases it might be enough to look at the first line even if it is a comment line, at least if there is only one syntax that has this particular identification for the comment lines. In other cases one might have to look further, but the search could then be limited to the languages where comments start with the given code. As for checking the file extension; could this be an option?
Logged
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #6 on: January 15, 2007, 07:41:37 pm »

What Feldon said makes a lot of sense to me.  We could potentially have several nested languages, so it will have to inspect the whole file.

Here's an example:  Pro*C (which is Oracle's C/C=++ precompiler) nests PL/SQL inside of C++ code--but not in comments.  So KjeBja's suggestion about locating the nested language by looking for comments in the top language wouldn't work.
Logged
John Yeung
Senior Miner
***
Posts: 85


« Reply #7 on: January 16, 2007, 12:33:16 am »

I strongly disagree with parsing the whole file just to determine... how to parse it.  My understanding is that one of the design goals of Emerald is for it to be lightweight and feel responsive.  (That is certainly something I look for in an editor.)  Emerald should check some quick things and take its best guess.  It will be right probably 90-95% of the time for most users, more like 95-98% of the time for Windows people who can usually be counted on to use well-established filename extensions.

When it guesses wrong, let the user pick the correct syntax.  Or even provide a facility that does parse the whole file, but let the user have to invoke it or configure it as a setting; do not make this mandatory behavior just to open a file.

By the way, if Emerald does manage to implement fairly powerful syntax definitions as discussed elsewhere on the forum, it will be a simple matter to just pick the HTML syntax definition (for example), which will already have almost any kind of embeddable language built into it, and any that the user cares to add.

John
Logged
KjeBja
Senior Miner
***
Posts: 76


« Reply #8 on: January 16, 2007, 07:58:20 am »

I have to agree with John here; having to parse the hole file would take way too much time, especially if there is more than one file to be opened when the editor is started.

So KjeBja's suggestion about locating the nested language by looking for comments in the top language wouldn't work.

This is not quite what I meant. After all, the possibility to have nested languages does not exist in all languages. So the need to search beyond the first proper line would be limited to those languages that allow for this.
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #9 on: January 16, 2007, 08:43:49 am »

This is where the first real problem comes in, then, since if you have a repository of more than a few hundred syntaxes (and hey, some people do), you'd have to load each and every one to see which ones allowed nested languages.

But then again, I guess that explains the need for something else which doesn't need to be parsed as thoroughly - like extension matching.
Logged

"Cleverly disguised as a responsible adult!"
Phil
Administrator
Master Jeweller
*****
Posts: 427


« Reply #10 on: March 19, 2009, 11:55:15 pm »

Moderator note:

A reply on this thread was moved to a new topic on a CE forum. Here's the link: http://forum.emeraldeditor.com/index.php?topic=576.0
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.253 seconds with 18 queries.