Emerald Editor Discussion

Emerald Editor => Syntax files => Topic started by: Derek Parnell on May 19, 2006, 02:43:29 am

Title: Functionality of Syntax Files
Post by: Derek Parnell on May 19, 2006, 02:43:29 am
Apart from how they are to be stored, what do we want from them? I ask this because there are some deficiencies in the current functionality that cause me some problems.

For example, the D language has three types of code comments...

//  Line comments
/*   multi-line comments */
/+  /+ nested multi-line comments +/ +/

and three types of documentation comments ...

///  Line comments
/**   multi-line comments */
/++  /+ nested multi-line comments +/ +/

It also has four types of quoted literals ...

 'a'     -- Single character
 "a"    -- character string with optional escaped contents
 `a`   -- raw string
 r"a"   -- alternate raw string

and further to that, each of these can have an optional suffix that denotes the UTF encoding to use.

  "abc"c   -- UTF 8
  "abc"w  -- UTF 16
  "abc"d  -- UTF 32

The current CE syntax files just can't deal with this.

Now some languages allow characters such as @#$%- to be a part of an identifier name. But these can cause CE issue too.

Then there's the 'bracket' matching problem. It is easy to match {[( but some languages need more complex matching. For example the Progress 4GL needs ":" matched with "end."

There is also a great diversity in designating the start and end of functions/routines/procedures. And this is needed for easy code collapsing.

Anyhow, these are just some notes to say that we might want to consider enhancing the syntax file capabilities too.

Title: Functionality of Syntax Files
Post by: Matthew1344 on May 19, 2006, 05:05:41 am
Amen.  I'd love to see plenty of thought go into the syntax files ahead of time.  Further, I think the ability to add stuff in the future is one reason why XML would be a great choice for the syntax files.  It's ... extensible.  :)  Anywho...

It seems that there are several sets of element pairs that are treated very similarly, yet not exactly the same:

A) program/procedure terminators --  like:  sub...end, main(args){...}
B) control structures --  like:  while(){...},  if(){...} elseif() {...} else {...}
C) multi-line comments --  like:  /* ... */

When it comes to code folding and autocomplete, it seems that it might be important for the syntax file to be smart enough to know not simply that "{" is the beginning of "{...}", but rather that "if(myConditions){" is the beginning of "if(myConditions){...}".

Also, unlike A and B, C can issue its "begin" clause and any second begin clause is ignored unless the "end" clause is issued first.  For instance:

/* this comment is valid */
/* /* /* so is this one */
/* though this one is not */ */
/* /* and this one is not either */ */
However, according to your comments about the D language, even the last one could be valid.

It's getting late, so I may not be thinking so clearly.  I better get some sleep before rambling further.

Title: Functionality of Syntax Files
Post by: Arantor on May 19, 2006, 12:49:59 pm
I think there may be something there.

Perhaps we could include a syntax for regular expression parsing as part of syntactical analysis.

Eg. for regular use it could be told:


OK, so it's a terribly contrived example, syntactically ambiguous (the regexp isn't great) but it does define the point, and possibly the best way to define a pair in XML, for example. (I wouldn't claim it was the best way, but better than having empty tags everywhere)

I am coming round - slowly, mind you - to the argument of XML being involved.

By also adding regexp handling in syntax files, and by our quality checking, we can create some quite wonderful syntaxes and their handling.

As for handling of keywords, think about it this way: (extract from PHP4/5)

The benefit here should hopefully be obvious - instead of matching a phrase to 3,000+ keywords (in PHP 5), you're matching it to a few hundred prefixes first, then if you match it to a prefix, you can then try matching it to the whole thing.

The equivalent RE would be image(2wbmp|_type_to_mime_type|alphablending|etc....) - if it doesn't match against image, it will fail the entire expression as it is required.

As for nesting comments, this is always going to be an issue for any editor, but if you have regexp handling, you could say:
To match /* /* /* comment */

Use: (/\*\w*)+(.*)*/
I know it looks messy but it essentially looks for any number of /* followed by zero or more whitespace characters - as a single unit, so it would match /* and a tab, followed by /* and two spaces, and even /* followed by no space and so on.

We also know that wxWidgets includes a regexp parser (the same one CE uses, although it may be a different version), so we should hopefully be able to add this content in.

Title: Re: Functionality of Syntax Files
Post by: kesselhaus on June 05, 2006, 03:55:35 am
RegEx syntax highlighting feature is something I'd like to see in EE. It's a very rare feature seen in editors. It has not been in CE though I'd like to see it in EE. This makes highlighting user defined types much easier.

Say you want to highlight structure/union/enum naming using conventions e.g. [SEU]_, using just normal highlighting features makes it impossible.

Considering the above, e.g.
typedef struct { ... } S_MYSTRUCT;
typedef union { ... } U_MYUNION;
typedef enum { ... } E_MYUNION;

S_MYSTRUCT s_s1; ...

This also makes it possbile to highlight user errors in naming:

T_UBYTE MODULE_ul_Var2;   <-- Highlight naming error by regex searchphrase

Just my 0.02$

Title: Re: Functionality of Syntax Files
Post by: Feldon on June 05, 2006, 04:21:18 am
The searching for prefixes business sounds a lot like a tree search.  And while probably more complex, to use your image example, you could search for i, then m, then a, etc..  I have no idea how this fits in, I'm just saying you may be able to apply some of the tree search theory here.

Title: Re: Functionality of Syntax Files
Post by: Matthew1344 on June 15, 2006, 09:28:32 pm
I was just writing some Pro*C code and had a practical problem with CE's syntax definition.  This may be obvious to the team, but I thought I'd mention it just in case.

CE has no problem determining the startpoint and endpoint for delimiters like "{" and "}", so I'm sure that if CE could do code folding it would have no trouble folding this section.  But I was writing a section of embedded PL/SQL which begins "EXEC" and ends "END-EXEC".  CE has no way of seeing these as the start and end points of that section (so it couldn't fold it even if CE featured folding).

EE syntax should be designed flexibly enough to handle this kind of thing.

Title: Re: Functionality of Syntax Files
Post by: Chancedo1 on February 02, 2015, 12:01:29 pm
tbh i would keep each (major) revision of a language seperate because as stated look at the differences between php4 and php5.

and top level folders would be good but we should be carefull not to over use them. or we could end up going deep.. better to keep as simple as possible.