Emerald Editor Discussion
April 24, 2017, 12:15:24 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Functionality of Syntax Files  (Read 10918 times)
0 Members and 1 Guest are viewing this topic.
Derek Parnell
Lead Architect
Miner
**
Posts: 36



« on: May 19, 2006, 02:43:29 am »

Apart from how they are to be stored, what do we want from them? I ask this because there are some deficiencies in the current functionality that cause me some problems.

For example, the D language has three types of code comments...

//  Line comments
/*   multi-line comments */
/+  /+ nested multi-line comments +/ +/

and three types of documentation comments ...

///  Line comments
/**   multi-line comments */
/++  /+ nested multi-line comments +/ +/

It also has four types of quoted literals ...

 'a'     -- Single character
 "a"    -- character string with optional escaped contents
 `a`   -- raw string
 r"a"   -- alternate raw string

and further to that, each of these can have an optional suffix that denotes the UTF encoding to use.

  "abc"c   -- UTF 8
  "abc"w  -- UTF 16
  "abc"d  -- UTF 32

The current CE syntax files just can't deal with this.

Now some languages allow characters such as @#$%- to be a part of an identifier name. But these can cause CE issue too.

Then there's the 'bracket' matching problem. It is easy to match {[( but some languages need more complex matching. For example the Progress 4GL needs ":" matched with "end."

There is also a great diversity in designating the start and end of functions/routines/procedures. And this is needed for easy code collapsing.

Anyhow, these are just some notes to say that we might want to consider enhancing the syntax file capabilities too.
Logged

--
Derek Parnell
"Down with Mediocrity!"
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #1 on: May 19, 2006, 05:05:41 am »

Amen.  I'd love to see plenty of thought go into the syntax files ahead of time.  Further, I think the ability to add stuff in the future is one reason why XML would be a great choice for the syntax files.  It's ... extensible.  Smiley  Anywho...

It seems that there are several sets of element pairs that are treated very similarly, yet not exactly the same:

A) program/procedure terminators --  like:  sub...end, main(args){...}
B) control structures --  like:  while(){...},  if(){...} elseif() {...} else {...}
C) multi-line comments --  like:  /* ... */

When it comes to code folding and autocomplete, it seems that it might be important for the syntax file to be smart enough to know not simply that "{" is the beginning of "{...}", but rather that "if(myConditions){" is the beginning of "if(myConditions){...}".

Also, unlike A and B, C can issue its "begin" clause and any second begin clause is ignored unless the "end" clause is issued first.  For instance:

Code:
/* this comment is valid */
/* /* /* so is this one */
/* though this one is not */ */
/* /* and this one is not either */ */
However, according to your comments about the D language, even the last one could be valid.

It's getting late, so I may not be thinking so clearly.  I better get some sleep before rambling further.
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #2 on: May 19, 2006, 12:49:59 pm »

I think there may be something there.

Perhaps we could include a syntax for regular expression parsing as part of syntactical analysis.

Eg. for regular use it could be told:
Code:

 sub
 end



 if\([A-Za-z0-9]+\){
 }
OK, so it's a terribly contrived example, syntactically ambiguous (the regexp isn't great) but it does define the point, and possibly the best way to define a pair in XML, for example. (I wouldn't claim it was the best way, but better than having empty tags everywhere)

I am coming round - slowly, mind you - to the argument of XML being involved.

By also adding regexp handling in syntax files, and by our quality checking, we can create some quite wonderful syntaxes and their handling.

As for handling of keywords, think about it this way: (extract from PHP4/5)
Code:

 2wbmp
 _type_to_mime_type
 alphablending
 antialias
 arc
 char
 charup
The benefit here should hopefully be obvious - instead of matching a phrase to 3,000+ keywords (in PHP 5), you're matching it to a few hundred prefixes first, then if you match it to a prefix, you can then try matching it to the whole thing.

The equivalent RE would be image(2wbmp|_type_to_mime_type|alphablending|etc....) - if it doesn't match against image, it will fail the entire expression as it is required.

As for nesting comments, this is always going to be an issue for any editor, but if you have regexp handling, you could say:
Code:
To match /* /* /* comment */

Use: (/\*\w*)+(.*)*/
I know it looks messy but it essentially looks for any number of /* followed by zero or more whitespace characters - as a single unit, so it would match /* and a tab, followed by /* and two spaces, and even /* followed by no space and so on.

We also know that wxWidgets includes a regexp parser (the same one CE uses, although it may be a different version), so we should hopefully be able to add this content in.
Logged

"Cleverly disguised as a responsible adult!"
kesselhaus
Prospector
*
Posts: 2


« Reply #3 on: June 05, 2006, 03:55:35 am »

RegEx syntax highlighting feature is something I'd like to see in EE. It's a very rare feature seen in editors. It has not been in CE though I'd like to see it in EE. This makes highlighting user defined types much easier.

Say you want to highlight structure/union/enum naming using conventions e.g. [SEU]_, using just normal highlighting features makes it impossible.

Considering the above, e.g.
typedef struct { ... } S_MYSTRUCT;
typedef union { ... } U_MYUNION;
typedef enum { ... } E_MYUNION;

S_MYSTRUCT s_s1; ...

This also makes it possbile to highlight user errors in naming:

T_UBYTE MODULE_ub_Var1;
T_UBYTE MODULE_ul_Var2;   <-- Highlight naming error by regex searchphrase


Just my 0.02$
Logged
Feldon
Gem Cutter
****
Posts: 106


« Reply #4 on: June 05, 2006, 04:21:18 am »

The searching for prefixes business sounds a lot like a tree search.  And while probably more complex, to use your image example, you could search for i, then m, then a, etc..  I have no idea how this fits in, I'm just saying you may be able to apply some of the tree search theory here.
Logged
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #5 on: June 15, 2006, 09:28:32 pm »

I was just writing some Pro*C code and had a practical problem with CE's syntax definition.  This may be obvious to the team, but I thought I'd mention it just in case.

CE has no problem determining the startpoint and endpoint for delimiters like "{" and "}", so I'm sure that if CE could do code folding it would have no trouble folding this section.  But I was writing a section of embedded PL/SQL which begins "EXEC" and ends "END-EXEC".  CE has no way of seeing these as the start and end points of that section (so it couldn't fold it even if CE featured folding).

EE syntax should be designed flexibly enough to handle this kind of thing.
Logged
Chancedo1
Prospector
*
Posts: 1


« Reply #6 on: February 02, 2015, 12:01:29 pm »

tbh i would keep each (major) revision of a language seperate because as stated look at the differences between php4 and php5.

and top level folders would be good but we should be carefull not to over use them. or we could end up going deep.. better to keep as simple as possible.
Logged

You can easily check out our high quality 70-336 which prepares you well for the 70-680 study guide
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.127 seconds with 18 queries.