Emerald Editor Discussion
April 24, 2017, 12:22:44 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1] 2 3 4
  Print  
Author Topic: Syntax format  (Read 37388 times)
0 Members and 1 Guest are viewing this topic.
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« on: May 12, 2006, 06:16:36 pm »

Before we get down to implementing syntax files, it might be worth setting up the syntaxes now - in the same way a protocol would be defined before implementation.

I have heard it requested that we look to either support, or add conversion systems, for existing CE syntaxes. Personally I think the latter is probably better, since the languages may/will have moved on themselves since CE's syntax files were created (e.g. PHP)

However, I am also reluctant to use XML. I know XML is supposed to be this wonderful standardised layout/data format thing, but I'm not convinced. By all means, let's have an easy to use format, but the issue with using XML is that we then need to add an XML parser, which usually means libxml. I think writing a simpler format means we can avoid bloating EE with an XML handler.

Then again, if the consensus is that we go with XML, then so be it, but IMO it isn't the be-all-and-end-all superformat it is cracked up to be.

I also reckon it should be one file per syntax (not CE's two), perhaps in an INI-style format:

Code:
[Syntax]
Name=SomeFileType
Author=E. Editor
LastModified=12 May 2006

[SpecialCharacters]
$EscapeCharacter = /
$QuoteMark1 = "
$QuoteMark2 = '

[SpecialWords]
SpecialWord
MagicWord
VerySpecialWord
KeyWord

[EndSyntax]
Just a thought...
Logged

"Cleverly disguised as a responsible adult!"
dsvick
Beta Testers
Senior Miner
***
Posts: 52



WWW
« Reply #1 on: May 12, 2006, 06:50:46 pm »

I think xml is great at what it is supposed to do, but I don't think that syntax files are what it is supposed to do. Since they are a proprietary sort of file, and not going to be overly complex we wont need it. Not only would  we have to add the parser but the size of the files themselves would end up bigger, adding bloat and slowing things down even more.

I like the ini format. It lets it be all in one file and should make changing/updating/creating them easier. We'll need to hash out the details of exactly what to put into them later but that is a good start.
Logged

Dave
Zhrakkan
Official Mascot!
Beta Testers
Gem Cutter
***
Posts: 177



WWW
« Reply #2 on: May 12, 2006, 06:55:04 pm »

I agree...I dont think we need to go the way of XML (just my uneducated thought)
Logged

News Manager and Unofficial Mascot
Join the Emerald Editor Project - Message Me!
Emerald Editor - "A Jewel of an Editor"
-----by the way, that name is pronounced "Za-Rack-In"
daemon
Developers
Gem Cutter
***
Posts: 107


WWW
« Reply #3 on: May 13, 2006, 11:56:43 pm »

I'd go with XML, personally, because it's a known format and then a parser wouldn't have to be written. What could be done is that the XML files could be parsed when a new language is installed and then the parser information stored as a serialized/binary class to speed things up.
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #4 on: May 14, 2006, 10:39:56 am »

The issue with using XML is that, somewhere along the line, an XML parser has to be included, the obvious choice being libxml. Now, libxml is quite useful, and provides all the methods you can reasonably expect a parsing library to provide - but the downside is that it adds a megabyte or more to the code.

We can't expect Windows users to have libxml installed so we will end up bundling it either with EE, or statically linked into EE itself. Either way, it will bloat up the source, and to be honest, I'm not sure XML is the right base format for syntax files. (CE didn't use XML either, but to be fair, XML was invented after CE)

If we take the idea of parsing the XML and creating a serialised/binary class, we're already halfway there by creating an INI-style format, which predates XML by a number of years (Windows 3.0 was using it at the start of the 1990s and it may have been applicable to even earlier versions, let alone what Unix was doing with .rc files)

I do take the point that XML is a known format, especially in recent times, but I think that the format itself isn't quite what we're after - if you have a number of items that vary in depth (e.g. nesting levels), or a number of different elements that can share the same level, then it's fine. But if you have half a dozen elements total, with potentially hundreds of entries per element, you'll end up with something like: (rewriting my original example)

Code:

 SomeFileType
 E. Editor
 14 May 2006


 /
 "
 '


 SpecialWord
 MagicWord
 VerySpecialWord
 KeyWord
For the purposes of brevity, I have ignored the XML header or global-container style tags.

While more readable, it is significantly larger, and the difference will be clear with larger syntaxes - e.g. PHP's syntax has at least a couple of thousand special words, which means the matching tags plus any whitespace.

As a comparison, I have taken the above syntaxes and compared them - ignoring any whitespace such as non-essential spaces (the space in E. Editor, for example, remains), end-of-line characters and blank lines.

The original: 195 characters
The XML equivalent: 414 characters

You could probably shorten it down a little by being careful, and by extending some of them to be character groups (a la regular expressions), so you could in theory generate:

/#

instead of the above syntax:

/
#

But that's just a matter of semantics and coding.

To be honest, I think XML is one of those things that is great, but just not for the requirements we have.
Logged

"Cleverly disguised as a responsible adult!"
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #5 on: May 14, 2006, 03:08:46 pm »

There is what looks like a XML handling class in wxWidgets, which we are using to build EE, so there is no reason why we couldn't use XML in theory. But the above does kind of cause issues.

What does everyone else think?
Logged

"Cleverly disguised as a responsible adult!"
Guest
Guest
« Reply #6 on: May 14, 2006, 07:57:11 pm »

Go with XML, XML for syntax files, XML for settings/configs
Logged
hoeltgman
Miner
**
Posts: 17


« Reply #7 on: May 14, 2006, 08:33:57 pm »

Even though I'm not much experienced with these things... if I had to decide, I would go for the ini format...
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #8 on: May 14, 2006, 09:27:48 pm »

Well, I have no semantic problems with using an XML file for the config file, but I still reckon it's extra load to have an XML parser in EE, and I reckon it'll end up being extra load we don't need, but maybe that's just me.
Logged

"Cleverly disguised as a responsible adult!"
JoBe
Guest
« Reply #9 on: May 15, 2006, 09:45:46 am »

Is it possible to use the same syntax files that crimson editor uses, this way we would not need to do them all again?
Logged
Emerald Fox
Global Moderator
Miner
*****
Posts: 23



« Reply #10 on: May 15, 2006, 10:07:24 am »

Arantor, I agree with all of your points...

Quote from: Arantor
Well, I have no semantic problems with using an XML file for the config file, but I still reckon it's extra load to have an XML parser in EE, and I reckon it'll end up being extra load we don't need, but maybe that's just me.
I found that once I got my head around the CE syntax I was able to tweak one or two of the syntax files to my liking.

The only problem I had was trying to create a syntax file that could handle multiple languages in a single source file, (i.e. an ASP file containing HTML and JavaScript). Due to the languages having identical reserved words I couldn't get it to work perfectly - unless of course I was doing it wrong!

So, maybe multiple languages in one source file would be worth considering when drawing up the EE syntax.
Logged

No signature currently stored in profile.
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #11 on: May 15, 2006, 12:15:48 pm »

To be honest, I do not expect to use the same parsing methodologies in EE as CE did, and there are limitations in the CE format - specifically embedding of languages (as Emerald Fox describes).

I came to the EE table fully expecting to create a new description format for syntaxes, knowing the work required to translate existing syntaxes. It is a controversial topic - the entirety of the syntax format is, really - but I think whatever we do to make it scalable and more effective (and futureproofing, to a degree) I think breaking away from the exact format of CE's syntaxes might be a good idea.

Although there are now thousands of syntax files, it should be made easy to write/rewrite them, or even write a conversion tool of some kind, but I guess that really depends on the exact format of the syntax language.

At this stage, it is largely irrelevant what we plan to do, because we can't do too much until part of the Lexer backend of Scintilla has been rewritten (Scintilla is what we plan to use as the main editing component and while very effective, all of the syntaxes are built in, and this will need to be rewritten before we can consider implementing a syntax system)

Looking through the comments, CE's format itself is probably obselete now, but a descendent of it may not be.

I'll have a play over the next couple of days and see if I can't come up with something that effectively describes languages a bit better than CE's without adding masses of coding.
Logged

"Cleverly disguised as a responsible adult!"
awmyhr
Senior Miner
***
Posts: 95

Maintainer of Obscure and Unused Ports


WWW
« Reply #12 on: May 18, 2006, 07:42:48 pm »

What about the possiblity of adopting an existing syntex format? (such as the one for nedit or vim or ? ?)  Preferably from a current, relatively widely used program, giving an instant library of syntex files.  

Main question being, how necessary is it to reinvent the wheel?
Logged

-->>  This Space 4 Rent  <<--
Zhrakkan
Official Mascot!
Beta Testers
Gem Cutter
***
Posts: 177



WWW
« Reply #13 on: May 18, 2006, 07:54:20 pm »

Quote from: awmyhr
What about the possiblity of adopting an existing syntex format? (such as the one for nedit or vim or ? ?)  Preferably from a current, relatively widely used program, giving an instant library of syntex files.  

Main question being, how necessary is it to reinvent the wheel?
Good question...

Two issues...
1)  Are we able to use someone elses code....we obviously need approval here....

2)  Are we going to actually try to implement something else not available, thus needing a custom syntax parser.  For instance, the posibility of loading a few syntax files as once....
Logged

News Manager and Unofficial Mascot
Join the Emerald Editor Project - Message Me!
Emerald Editor - "A Jewel of an Editor"
-----by the way, that name is pronounced "Za-Rack-In"
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #14 on: May 18, 2006, 08:36:09 pm »

To answer that:
1) It depends, mostly on the license of the original software. Vim itself has a license which does not appear to conflict with that of EE at the moment (i.e. GNU LGPL)

2) If we are sticking simply to one syntax per file (a la CE, etc) then we could reuse existing material from Vim or something similar.

I've had a look at Vim's syntax and do not believe it is particularly friendly to add new languages (which is something of a requirement). CE's syntax format is simple but as has been discussed can be a bit limited.

It may not be a case of reinventing the wheel, more redesigning the tyres that fit on it to make it run more like what we're after.
Logged

"Cleverly disguised as a responsible adult!"
Pages: [1] 2 3 4
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.134 seconds with 18 queries.