Emerald Editor Discussion
June 29, 2017, 01:26:19 am *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1] 2 3
  Print  
Author Topic: I like to syntax, syntax...  (Read 33504 times)
0 Members and 1 Guest are viewing this topic.
Szandor
Senior Miner
***
Posts: 92



« on: January 04, 2007, 11:50:00 pm »

I was browsing around looking for a nice text editor and found Crimson Editor. Long story short, I ended up here and began to read about the syntax issues. I found myself unable to define a syntax in CE for the LiteStep RC-files and wanted a more powerful and flexible way of defining syntax.
Such a definition must be easy to use and write, readable and easy to understand yet powerful. It must also take up as little space as possible in order to be quick to load.
After some thought I came up with this:
Code:
: options
case-sensitive | yes
another-option | value
These are a predefined set of options that EE wants defined in order to highlight and validate the code properly. All options should have default values. The ": options" begins a section of options. When a new ": whatever" begins or the document ends, the section is closed. This priciple is the same for the entire document and every section is predefined and treated in any way needed.
The basic syntax of the syntax syntax (gee, this is fun!) is "name | value". The value can be in the form of a complex definition or just a simple statement, depending on what is needed for that particular element.
Code:
: elements
comment | {com}*/r : {cblockstart}*{cblockend}
This is a list of elements to highlight. The syntax here follows the same syntax as the "options" section, but as you can see I have now used a more complex code for the value. The contents of that value are defined in the next section. Note that this time I used a tab to indent the list. It's handy and nice and optional.
The color of the elements can only be changed in the program, not here. Adding code for color and fonts and such would not be hard, but I think it's not a good idea to have that in the code definition document (CDD) The list should probably be predefined to ensure that the highlighting will work properly in the program.
OK, let's look at some more advanced syntax.
Code:
: definition
a | "b" {c} : {d}{e}
"a" is the name of the defined element, "b" is a string of characters and the letters in brackets are other defined elements. The | separates the name from the definition, the "" encloses strings, brackets enclose other elements and the : notes an "OR". To define how XML is written, for example, one could write:
Code:
line | {starttag}*{endtag} : {singletag}
starttag | "<" #tagname1 /s {attribute1} ">"
endtag | "</" #tagname1 ">"
singletag | "<" #tagname2 /s {attribute2} " />"
attribute2 | #attrib2 " = \"" * \"
This tells us that a line either has a starttag plus any text plus an endtag or just a single tag. It also tells us how those tags are written and that they contain different keywords (#name) and entities, defined deeper down in the document. Note that spaces can either be contained within comments or written as "/s". There are other such characters, like "/r" for return and /t for tab. We can also use "\" as an escape character.
Now go back and look at the elements. The value for "comment" is taken from entities defined in the "definition" section, making advanced rules possible.
More can be written and enhanced in this area, but this gives you a basic understanding of the syntax. Let's add some keywords now.
Code:
: key tagname1
heading1 heading2 paragraph

: key tagname2
footnote

: key attrib2
text
Every ": key foo" defines a group of keywords with the name "foo". There is no limit on the number of groups that can be created, but each group must have a unique name. The name can be the same as that of an entity, but it would be bad semantics.
Keywords are separated by spaces and can, too, be as numerous as needed.
Code:
: meta
author | Staffan Lindsgård
language | XML for books
extentions | *.xml;*.xmb;*.xmlb
This is meta-info that can be interesting to know or used in other ways. The meta info should be optional and while a set list of basic entities is good to have, user defined entities should be allowed as well.

This just about concludes the syntax, but there is one more area that we could address: Automatic syntax formatting. By defining rules for how the code should be formatted, the program should be able to auto format the code by a simple press of a button. However, such a section would probably be big and complex and would make the CDD much bigger. It is a nice feature though and I think the option should be there. The loading time is not an issue, provided the syntax formatting info is kept at the end of the document. EE could simply be told to only read the CDD until it hits the ": format" line and then stop. When the auto format function is called, the document is re-read with all info.

So, what do you guys think? Interesting? Worthless? Useable?
« Last Edit: January 06, 2007, 06:53:48 pm by Szandor » Logged

"Cleverly disguised as an original signature..."
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #1 on: January 05, 2007, 12:46:52 am »

Welcome to Emerald Editor!

I have to admit, I do like the general look of this approach, though if I'm honest it's just another way of looking at it from before, here.

I still think a custom text format is the way to go on this, and perhaps this is the solution, or some variant at least (this decision will have to go to Soulfish and Derek Parnell for the final say, though)

On a side note, one of the most requested features, was the ability to nest languages, e.g. blocks of CSS or JavaScript within HTML documents. Is there any provision for this, or how could this be accomplished?
« Last Edit: January 06, 2007, 03:57:38 pm by Arantor » Logged

"Cleverly disguised as a responsible adult!"
Szandor
Senior Miner
***
Posts: 92



« Reply #2 on: January 05, 2007, 01:10:24 am »

Welcome to Emerald Editor!
Thank you.

I have to admit, I do like the general look of this approach, though if I'm honest it's just another way of looking at it from before, here]http://forum.emeraldeditor.com/index.php?topic=62.15]here.
I tried to read all of it, but for some reason my browser won't let me see the code in the code blocks. Kinda annoying really...

On a side note, one of the most requested features, was the ability to nest languages, e.g. blocks of CSS or JavaScript within HTML documents. Is there any provision for this, or how could this be accomplished?
As long as the CSS and JavaScript is noted as such in the code, there shouldn't be a problem. How about this?

file:css.cdd | "<style type=\"text\/css\">"*"</style>"

If a CSS style block is encountered, an appropriate external CDD is used instead. The same priciple can be used for any code and thanks to the powerful system of definitions, it is easy to single out code that should be highlighted differently.
Logged

"Cleverly disguised as an original signature..."
Szandor
Senior Miner
***
Posts: 92



« Reply #3 on: January 06, 2007, 01:56:23 pm »

I'm currently writing a sample SDD (Syntax Definition Document) to see if it works and what things need changing. I've already made some minor adjustments to make it a bit smaller. I will post two SDD:s later, one defining itself and one defining CSS.
Logged

"Cleverly disguised as an original signature..."
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #4 on: January 06, 2007, 04:01:41 pm »

Long story cut short (yay): BTW, I have now fixed the code fragments in the earlier post.

I look forward to seeing the SDDs.
Logged

"Cleverly disguised as a responsible adult!"
John Yeung
Senior Miner
***
Posts: 85


« Reply #5 on: January 06, 2007, 08:54:23 pm »

Reading Szandor's post reminded me of my formal languages course, which was one of my favorites as a computer science undergrad.  I was initially kind of resistant to the idea of creating a new metasyntax notation rather than using an existing one, such as Extended Backus-Naur Form, but I am warming up to it now.  As always, the disadvantage of using something new is that it is unfamiliar and doesn't have tools already built for it; but the advantage is that it is unfamiliar and doesn't have tools already built for it.

I just wanted to say that whatever we use, it would be desirable to be able to define keywords containing spaces.  One classic limitation of syntax highlighters is that you cannot highlight "foo bar" without also highlighting "foo" and "bar" independently.  It would also be nice to allow characters which would otherwise be considered delimiters, so that "foo.bar" is a keyword, but "foo" and "bar" are not.

I don't know how feasible these are.  I mean, it is very easy to make the syntax definition allow for such keywords, but it may or may not be difficult to make the editor actually do it.

John
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #6 on: January 06, 2007, 09:01:05 pm »

Having worked on language parsers (of sorts: like this one!) before, I think that it doesn't matter too much about implementation at this stage. Let's just get the format nailed down, because we can work on refining it later.

At the end of it, if it can be defined as a series of rules, and the behaviour is constantly predictable based on those rules (i.e. no crazy rules like 'colour a random 50% in'), it should be programmable.
Logged

"Cleverly disguised as a responsible adult!"
John Yeung
Senior Miner
***
Posts: 85


« Reply #7 on: January 06, 2007, 10:02:26 pm »

At the end of it, if it can be defined as a series of rules, and the behaviour is constantly predictable based on those rules (i.e. no crazy rules like 'colour a random 50% in'), it should be programmable.

Very true, but I was just wondering how well it dovetails with Scintilla.  I still don't know what exactly is handled by Scintilla and what would be custom-programmed by the Emerald team.  I know from reading forums for other component-based editors (like PSPad) that sometimes certain functionality is part of the component or is constrained by it; to the point that you would have to rebuild the component or create very unwieldy work-arounds to accomplish the desired behavior.

(And I say there is no reason why we couldn't program in a "50% random highlight" mode. Wink)

John
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #8 on: January 06, 2007, 11:46:12 pm »

My understanding of Scintilla is that at least part of it will have to be rewritten because it, as standard, has the syntax highlighters built into it, so to extend the functionality to handle external files requires some work.

If we have the format nailed down, we can make that work easier - it doesn't matter quite so much how much work is involved, but there's no point in starting if you don't have the destination...

A random 50% highlight mode could be interesting - perhaps for a future April Fool's plugin? Wink
Logged

"Cleverly disguised as a responsible adult!"
John Yeung
Senior Miner
***
Posts: 85


« Reply #9 on: January 07, 2007, 03:20:26 am »

My understanding of Scintilla is that at least part of it will have to be rewritten because it, as standard, has the syntax highlighters built into it, so to extend the functionality to handle external files requires some work.

Well, sort of.  Scintilla already has a facility for "external lexers".  These are shared libraries which can be used directly by Scintilla as-is.  Further, the lexers already included with Scintilla provide reasonable coverage of the most popular languages (and a number of less popular ones!).

It's important to understand what is currently handled by a Scintilla lexer and what's not, because there is a significant amount you can do to modify behavior without changing or adding any lexers.  I am not an expert on this topic, but I do know that a single .properties file in SciTE does roughly what Crimson's .key, .spc, and extension.* files do (plus a bit of .cmd thrown in for good measure).  I don't think all Scintilla-based editors must use the same configuration files as SciTE, but it gives you at least some idea what can be achieved without resorting to new lexers.

At the other end of the spectrum, there may certainly be things that cannot be accomplished even with a custom lexer, in which case you are left with modifying Scintilla or doing without its lexical support (it seems it can be built without any included lexers--so if it can't actively help, perhaps it can at least get out of the way).

John
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #10 on: January 07, 2007, 03:42:21 am »

Well, my thought on the language is that it should be powerful enough to allow for just about anything, and it just so happens that my language is! In the case of defining keywords with spaces (or other normal delimiter characters) in them the solution could simply be to use an escape character, right?

Code:
foo bar
foo/ bar
foo// bar

The first line would define the keywords "foo" and "bar".
The second line would define the keyword "foo bar", using an escape character.
The third line would define the keywords "foo/" and "bar".
Logged

"Cleverly disguised as an original signature..."
Szandor
Senior Miner
***
Posts: 92



« Reply #11 on: January 07, 2007, 06:28:28 pm »

Ok, I've been writing on the SDD for CSS for a while now and the language has been altered. While the language concept is still based on Backus-Naur Form, some things have been changed, added and simplified. I wanted to construct a language that is powerful enough to

This is how my SDD language works so far:

Code:
: whatever

This will begin a new section. There are seven sections in total (so far...): options, elements, externals, definitions, meta, key and format. Withing each section, different entities are written. The syntax of an antity is: "name | value".

Options are what they sound like, options for how EE would read the syntax. So far, this only includes a setting for case sensitivity, but this will change of course.

Elements tell EE what should be highlighted. Some common names should be defined (like "Comment") to ensure that all elements of that type have the same color, no matter the language. Still, custom elements should be defineable.

Externals are a list of external SDD:s and rules for when they apply. This will make it possible to have multiple syntax highlightings in one document. For example, an HTML SDD could contain linkt to JavaScript and CSS SDD:s.

Definitions make up the rules for how the syntax is structured. This is roughly based on Backus-Naur Form and works like a breakdown of the code into smaller segments, in the end defining each small piece exactly the way it should be. The definitions made here are used in the element section.

Meta information is info that isn't needed by EE but can be interesting to have. Info about the code, author, version and such should be placed here. EE could be made to read a basic set of meta info and show it in the program, but anything can be written into this section.

Keywords are still used, even though they are not really needed to use the highlighting. I could tell EE to highlight everything between < and >, but the additional use of keywords makes it possible to proofread the document. There are other reasons to keep them as well. There can be multiple sections of keywords, creating different groups. The section ": key foo" would define a group of keywords referred to as "foo". Keywords are used in definitions and are written like this ".group". The dot tells EE it's a keyword

The format is a try to make auto formatting of the code possible within EE. This section would define how the code can be automatically arranged. For instance, concider this piece of code:

Code:
h1, h2, h3 { margin: 0; margin-top: 1em; color: #107F55; }
h1 { font-size: 2.1em; }
h2 { font-size: 1.6em; }
a { color: #f07a03; margin: 0; text-decoration: none; font-weight: bold; }

While this code is nice and compact, it's not very easy to edit. By defining different formats for the code, EE could rewrite it to look better. Sure, this has its hazards (if certain elements are forgotten, the whole code could end up misaligned) but it's a feature I've been looking for for some time now and as long as you know what you're doing, it's not a problem.

Just like keywords, multiple formats can be defined. By using the sections ": format Compact" and ": format Nice" we define two different formats for the CSS in the example, one to make the CSS compact and faster to load and one to make it easier to edit.


Okey, let's look at a very simple example.

Code:
: options
case-sensitive | no

: elements
Comment | ";"*/lb
Function | #name
Value | #value

: definitions
Code | /nl #name #value
name | * [/sp:/t]
value | * /lb

: meta
code | LiteStep RC
author | "Staffan Lindsgård"
version | 1.0
extentions | *.rc

This file defines three different colors for an RC-file, the type of file used to configure LiteStep. It's basic syntax is really simple and though it can be made much more complex - for example by adding different colors for bang commands and core functions - it will serve as a simple example.

First we have the options. RC-files are not case sensitive so we note this. After this comes three elements; "Comment", "Function" and "Value". These three elements will show up in EE where you can change how they look. Since these three are common elements, they already have predefined settings and therefore you probably won't need to change anything. (If we add "foo | bar" to the elements, "foo" should be assigned a base formatting and a random color that is not already in use.)

A comment is defined as beginning with a ";", having any text efter it and ending with a line break. The values "#name" and "#value" refer to definitions, as indicated by the # in front of them. A . would instead have indicated the use of a keyword, making it possible for definitions and keyword groups to share the same name without conflict.

In the definitions we find three different definitions. The first define the code in full, the second defines a "name" and the third defines a "value". The code begins on new line (note the difference between /LB and /NL, the former is placed at the very end of a line and the latter at the very beginning), has a name and then a value.

A name is defined as first having any text and then either a space or a tab. The [] are used for grouping and : is equal to OR. Now, since LiteStep uses a multitude of modules and the names are taken from those modules, having a complete set of keywords would be impossible. Sure, we could define the core ones but to make life simple we'll just highlight any word. The value is defined as being anything, but always ending with a line break.

Cool, eh? But wait, I haven't shown you everything yet. What about that one feature everybody wants? You know what I mean; nested syntax highlighting. Actually, that's very simple and the whole reason for the ": external" section.

Let's say we want to highlight HTML and internal CSS in a PHP document. This is quite common I believe and not a major problem, as long as you define the CSS and HTML properly. First we begin with the external section:

Code:
: external
<html.sdd> | #html
<css.sdd> | #css

OK, so we have now told EE that the external file "html.sdd" should be used for any code defined in "#html" and "css.sdd" for any code defined in "#css". All we have to do now is to define those two and EE should highlight them properly. In the case of HTML you probably want to define HTML as not being inside <? and ?>. Let's do that.

Code:
#html | * ( ! ["<?" /x "?>"] )

Hey, new thingies! The ! is a simple NOT, the stuff inside parenthesis is a WHERE statement and the /x refers to itself. So this means that #html is defined as any text that is not between <? and ?>. Simple, huh? In fact, an even simpler version is possible.

Code:
#html | ! ["<?" * "?>"]

But I am opposed to negative defining so I vote that this should not be possible, or at least concidered bad.

Okey, more rantings later. I have work to do. Give me feedback! I demand all your feedback, give me all your feedback or prepare to meet your horrible DOOM!
« Last Edit: January 07, 2007, 06:37:13 pm by Szandor » Logged

"Cleverly disguised as an original signature..."
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #12 on: January 08, 2007, 03:50:09 pm »

I think I missed the discussion where XML got the boot.  From the beginning, I (in my limited knowledge of such things) haven't been able to imagine why anything other than XML would be used for syntax files, but I'm sure that discussion must have taken place.  Could somebody post a link for me?
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #13 on: January 08, 2007, 05:46:33 pm »

This is a long post. Please just read the summary at the bottom if you don't want to read all of it.

I think I missed the discussion where XML got the boot.  From the beginning, I (in my limited knowledge of such things) haven't been able to imagine why anything other than XML would be used for syntax files, but I'm sure that discussion must have taken place.  Could somebody post a link for me?

I originally posted here because I had an idea for a syntax. This is by no means the absolute answer to how the syntax should be handled, even though I hope it is since my syntax is great. I've only read parts of the original discussion, but I can answer why XML is a bad option anyway.

XML is a markup language, making it ideal for writing different kinds of documents where marking is needed. The problem with XML, however, is that it's not very powerful for doing anything else. In fact, it is downright cumbersome at times. Let's take a look at how an XML-file would look compared to the custom made SDD-metasyntax I propose we use.

Code:
<keywordgroup name="moving">
<keyword>up</keyword>
<keyword>down</keyword>
<keyword>left</keyword>
<keyword>right</keyword>
</keywordgroup>

Here we have a group of keywords defined, named "moving." In order to easily read it we need the same amount of lines as keywords, plus two for the opening and closing tags. The syntax has the benifit that we can easily write keywords having spaces, plus that XML is an established format.

This is how the same code would look using my syntax:

Code:
: key moving
up down left right

A whole lot easier to read and a great deal smaller. Defining a keyword with a space in it would require the use of an escape character, but this isn't really an issue unless you happen to hate escape characters. The byte count of the first example is 148, compared to 33 in my example. This is (roughly) a ratio of 1:4.5. If we take the HTML KEY-file of CE and convert it into SDD-syntax we would end up with a size of 2269 bytes (provided we use all keywords, including duplicates, of the file and create only one group). If we assume that the ratio is the same, the XML syntax would take up about 10211 bytes. That's quite a difference.

The same theory is naturally applied to all other sections, but I suspect the ratio would be much larger. When defining one single line of meta info the ratio went up to 1:7. I used the following pieces of code:

Code:
: meta
foo | bar

Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<syntax>
<syntaxinfo>
<foo>bar</foo>
</syntaxinfo>
</syntax>

Also since XML is a pure markup language it does not possess the power of SDD. SDD is not limited by the use of keywords, it's enhanced by it. In an XML based system, rules cannot be defined. XML could tell EE that this character is a delimiter and that this character begins a group, but actually defining rules would take an enormous amount of code. How would SDD and XML define a simple element named "tag" being either the word "foo" or "bar" within < and >? Let's take a look.

Code:
; SDD code example.

tag | "<" ["foo":"bar"] ">"

Code:
; XML code example.

<element name="tag">
<part type="char">&lt;</part>
<part type="logic" logic="OR">
<choice>foo</choice>
<choice>bar</choice>
</part>
<part type="char">&gt;</part>
</element>

The SDD code is a single line of code, easy to read, fast to write and simple to parse. It takes up 27 bytes of space.
The XML code has eight lines, is more cumbersome to read, takes longer to write and will take longer for EE to parse. It takes up 187 bytes of space, giving us a ratio of almost 1:7 compared to SDD. Once again, XML proves to be bloated.

Now, when implementing these codes in EE, you will need EE to understand the logic behind them as well as what different characters and keywords mean. The same size ratios will apply here too, naturally. XML will therefore not only bloat itself, but EE as well.

Summary:
XML is cumbersome, bloated, hard to read and not very powerful. Also, even though XML is a widely used format and lots of people know how to write it, you still have to learn what tags to use, where top put them and how they interact.

SDD is small, easy to read and very powerful. It is based on simple logic and even though you have to learn a new language to use it, it is a simple language that is easy to grasp.

Another good way of summarizing would be this:
A table is a great piece of furniture with many uses, but I still prefer to sleep in a bed since it's custom built for the job.
« Last Edit: January 08, 2007, 05:48:54 pm by Szandor » Logged

"Cleverly disguised as an original signature..."
John Yeung
Senior Miner
***
Posts: 85


« Reply #14 on: January 09, 2007, 04:28:53 am »

I think I missed the discussion where XML got the boot.  From the beginning, I (in my limited knowledge of such things) haven't been able to imagine why anything other than XML would be used for syntax files, but I'm sure that discussion must have taken place.  Could somebody post a link for me?

I don't think XML was ever given the boot by the Emerald Team.  However, there was definitely considerable debate as to whether it would be best.  The two most pertinent threads (that I am aware of--it's a little hard to keep track) are Syntax format and Which syntax format should we use?

The debate seems to center around XML's bulkiness.  However, there is another issue, which is just how user-definable we want the syntax to be.  The early posts by Szandor and Arantor in this thread (the one you're reading now) suggested to me that we should be striving to allow the Emerald end-user to specify an almost arbitrary grammar, not just choose settings within a pre-established framework.  I'm not sure I'm adequately expressing how general, how powerful, how abstract, and how ambitious this would be, though daryquene seems to do a decent job in reply #7 of the second thread linked to above.  I have not heard of any editor that has anything like this, at least not one which could be considered "lightweight" even by today's bloated standards.

Most other posts I have seen involve specifying a framework of settings (I don't know XML but I am imagining this would be roughly what a schema is), which would be good enough for the vast majority of users if it were sufficiently well thought out.  Crimson's settings are already good enough for quite a large portion of users.  Adding keywords as block delimiters (like "begin" and "end"), arbitrary characters in keywords (like spaces, periods, and hyphens), and the choice of nesting or nonnesting comments would close most of the gap already; and these are all "setting"-type specifications, rather than "grammar"-type.

If we want to allow users to specify grammar, it would make the most sense to use a metasyntax designed for this, such as Extended BNF, Wirth Syntax Notation, or a parsing expression grammar; rather than XML.  It seems Szandor is trying to go in that direction, but I am still not convinced it is practical to implement.

John
« Last Edit: January 09, 2007, 05:04:46 am by John Yeung » Logged
Pages: [1] 2 3
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.221 seconds with 18 queries.