Emerald Editor Discussion
March 27, 2017, 05:38:18 am *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: 1 [2] 3 4
  Print  
Author Topic: Syntax format  (Read 37100 times)
0 Members and 2 Guests are viewing this topic.
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #15 on: May 18, 2006, 09:21:38 pm »

In my opinion, it's only a matter of time before *somebody* creates an XML standard for language syntax.  I'm not understand how XML isn't appropriate for syntax file... it seems to me that this is a perfect example of its intended usage.

Also, what we've been calling "Core" languages really aren't.  I mean, EE would work fine without any syntax files at all.  If someone is going to put it on a flash drive, they probably don't need the syntax file for COBOL.  The size of XML files shouldn't be a problem since a user only need to install the ones he needs.  For that matter, one could argue that NO syntax files should be included in the program download because there's no reason to assume that all users will need any particular set of languages.

One more thing... I would think that adding a new language would be as simple as putting the syntax file in the correct directory?
Logged
Zhrakkan
Official Mascot!
Beta Testers
Gem Cutter
***
Posts: 177



WWW
« Reply #16 on: May 18, 2006, 09:27:12 pm »

Matthew, you have great points...
That is something we need to weigh.  Do we include any syntaxes in the default install...

The Core languages or Primaries as I put it, are simply those we need to make sure are done, then the rest will be put together by supporters of the project after the fact.
Logged

News Manager and Unofficial Mascot
Join the Emerald Editor Project - Message Me!
Emerald Editor - "A Jewel of an Editor"
-----by the way, that name is pronounced "Za-Rack-In"
awmyhr
Senior Miner
***
Posts: 95

Maintainer of Obscure and Unused Ports


WWW
« Reply #17 on: May 18, 2006, 09:46:37 pm »

My thought on this:

The disticntion of "Core"/"Primary"/"Secondary" syntex files should be purely for human consumption.  EE should not make any such distinction.  "Core" would refer to syntax files included with the install (i.e., the most common), "Primary" could be those created and maintained by the development team, and "Secondary" could refer to user submittles, but again, EE itself should not make any distinction between these.  (Of course, I think I'm just re-phrasing what others are saying here?)

XML would seem to be a "modern" choice, and I'd be very surprised if there really are not any schemas out there at least close to what we'd need.  After all, there's schemas to describe AD&D character sheets, for crying out loud.  As far as the 'bulk' this adds, is it really, relatively speaking, that much?  Does the 'target audience' include those for whom every byte/cpu cycle counts? (for example, are we looking to eventually include a port for cell phones?)  It is my opinion that in the modern context "light-weight" still includes room for an XML parser.
Logged

-->>  This Space 4 Rent  <<--
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #18 on: May 18, 2006, 11:15:25 pm »

Some very good points, everyone.

There are XML formats out there for language syntax. Well, there is one schema, that used with Notepad++. Having looked through it, there are some issues that would arise if EE were to use that schema, noticeably the fact that the colour highlighting options are specified (i.e. the colours that should be used), while I think that words that fall into should be consistent across the different languages, e.g. all commands should be one colour, all variables should be another - irrespective of syntax.

On the subject of install packages, I was thinking about doing like a minimalist install which covers the core requirements to run only, then a medium package to cover the basic extras (like our "core" functionality), then a power package which covers everything.

It is true that "Core", "Primary" etc are for human consumption - EE will not care about the different syntaxes, it's just sets of rules for EE to parse.

As for "bulk", my little example earlier suggested that an XML format would be triple the size of an equivalent more-native format, which for individual files won't be a problem, but if you have a hundred (like CE) or more it soon adds to the size of the package, although I do acknowledge that in the modern day and age, a few hundred extra K probably won't be a problem.

On the flip side, though, one of the major selling points for CE was that it was small - the installed package is only something like 1.7MB including 100 syntaxes. While the argument for getting it on a floppy disk is no longer as valid as it was at the end of the 1990s (as it was for CE), the argument for fitting it on a USB key is important. And before we get into the "they come in sizes of many megabytes" argument, taking up between 5 and 10% of a USB key with a text editor (albeit a very powerful one) is not really a compelling thought.

CPU counts aren't the be-all and end-all of things, but speed is another compelling argument in development. Looking at the scheme of things, I don't reckon adding an XML parser would add that much slowdown in things, but it might be a big chunk of code. The expat parser is actually available in the current builds we've done of EE since it features in wxWidgets, so it's not as if it'll be such a huge thing to implement.

To be honest, when I think of the modern context and apply the term "light-weight", I think of what would be appropriate for users from 5 years ago. What was acceptably fast then would be very light-weight today, hence all the references and comparisons with CE, which is the main model we have for basis.
Logged

"Cleverly disguised as a responsible adult!"
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #19 on: May 21, 2006, 11:15:14 pm »

I've had a few days to reflect on this, since this is probably one of the biggest not-absolutely-critical decision to make with regards to EE (things like using wxWidgets is pretty mission-critical)

It seems to me that the best way to work out how to describe something in a language is to actually try it. E.g. actually try describing a format using XML, creating a valid structure and seeing what happens.

As has been discussed elsewhere I'm off on holiday tomorrow morning, but before I go, I'll write up my structure for PHP 4 (and I'll keep it to PHP 4 only) to describe syntax. I'm not going to consider the likes of autocomplete, or tooltips showing the function prototypes or anything.

I'll post it here when I'm done, and while I'm away, please have a look and make suggestions as appropriate. I expect it could be tidied up drastically!

That all said, I will also consider how we could go about adding in function declarations and prototypes into an XML schema.
Logged

"Cleverly disguised as a responsible adult!"
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #20 on: May 22, 2006, 12:30:17 am »

OK, here is my proposal for a format. It is XML, and hopefully should be fairly self-explanatory. Please note my XML is not great so some things may have been done in error, or are otherwise wrong. Please let me know what you think...

It is not a complete PHP syntax, it simply includes enough to cover off what I'm getting at with everything I've said so far. I also haven't trimmed it down very much, which I'm sure could be done. (I just don't want it to get as big as Bluefish's which I looked at for an idea of what they did, and which is over a meg in size, which I think is a bit big...)

http://www.emeraldeditor.com/ee_php4.xml
Logged

"Cleverly disguised as a responsible adult!"
daemon
Developers
Gem Cutter
***
Posts: 107


WWW
« Reply #21 on: May 22, 2006, 03:07:29 am »

I like it Smiley. Can you check it into SVN as a draft so that we can have a history of the changes that people make to the draft? I can see a few things that I'd want to change (and it'd be easier to do it myself than to try to describe it, obviously).
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #22 on: May 22, 2006, 09:19:49 am »

OK, it's now in the SVN under trunk/syntaxes/ee_php4.xml

As I've said, it's not complete, it is little more than a test-drive of actually creating a syntax. It won't be ideal, probably far from it, but it is something to play with.
Logged

"Cleverly disguised as a responsible adult!"
Wraithan
Prospector
*
Posts: 9


« Reply #23 on: May 23, 2006, 11:15:49 pm »

I am new to these forums but not new to XML, and know a fair deal about modern programming languages. I see a lot of places that should be redone in the XML language specs. For example:

Code:
<keyword>
<name>die</name>
</keyword>
to:
Code:
<keyword name="die" />
That saves space, and is more intuitive. While it may not seem like a lot, consider some of us keep MANY syntax files around for CE and would like to be able to convert them to EE (Perl would be your best friend for this) and some of these languages have hundreds of words that should be highlighted. In the long run things like this will make it more streamline. If I have time in the next week or so, I wil post a syntax file with all the things needed, and be smaller.

ANT is a good example of how to use XML, though they need a bit more standardization, there is a lot going on for it.

EDIT by Arantor: cleaning up the XML so it becomes viewable.
« Last Edit: January 06, 2007, 03:59:04 pm by Arantor » Logged
daemon
Developers
Gem Cutter
***
Posts: 107


WWW
« Reply #24 on: May 24, 2006, 01:01:03 am »

When I was looking at the file, I thought about the same XML collapse for keywords, but then if you want named parameters this falls apart (look at the end of the keywords list). So maybe they're just equal, and if you want extra features, you use the expanded version.
Logged
Wraithan
Prospector
*
Posts: 9


« Reply #25 on: May 24, 2006, 01:49:42 am »

Code:
<keyword>
<name>connect</name>
<type kind="function" return="resource" />
<description>Connects to a MySQL server</description>
<parameters>
<param required="no" type="string" name="server" />
<param required="no" type="string" name="username" />
<param required="no" type="string" name="password" />
<param required="no" type="bool" name="new_link" />
<param required="no" type="int" name="client_flags" />
</parameters>
</keyword>
To:
Code:
<keyword name="connect" type="function" returns="resource">
<description value="Connects to a MySQL servert">
<parameters>
<param required="no" type="string" name="server" />
<param required="no" type="string" name="username" />
<param required="no" type="string" name="password" />
<param required="no" type="bool" name="new_link" />
<param required="no" type="int" name="client_flags" />
</parameters>
</keyword>
Just because you have attributes, doesn't mean you can't also have tags in them also, think like HTML (which while not a strict, can teach you this).
Code:
<body bgcolor="#000000">
<table border="1">
<tr>
<td>hax</td>
</tr>
</table>
</body>

EDIT by Arantor: cleaning up the XML so it becomes viewable.
« Last Edit: January 06, 2007, 03:59:27 pm by Arantor » Logged
Derek Parnell
Lead Architect
Miner
**
Posts: 36



« Reply #26 on: May 24, 2006, 02:18:28 am »

Please excuse this rather large posting.

Here is an alternative idea to the wordy XML syntax. I think it satisifies the EE requirements to capture current CE functionality, allows expansion, is easy to read, is easy (quick) to parse, and is comprehensive enough to capture all the required data.

I've used an example to convey the concepts of the syntax as that might be easier to grasp that a description of it for now. But basically, anything outside of a pair of parenthesis is comments, and all inside paired parenthesis is parsable data. Each command or group begins with (cmd and they can contain nested commands to any depth. A command extends to is matching parenthesis.

Code:
D LANGUAGE (DigitalMars.com) SPECIFICATION FILE FOR EMERALD EDITOR
FIRST EDITED BY Qwerty Uiop 30/01/04

(CASESENSITIVE YES)
(DELIMITERS ~`!@#$%^&*()-+=|\{}[]:;"',.<>/?)  Notice it contains parenthesis but as they are paired it doesn't matter
(KEYWORDPREFIX #)
(HEXADECIMALMARK 0x)
(ESCAPECHAR \)
(QUOTATION
   (SET ")
   (SET ')
   (SET `)
)

Zero or more sets of ranges
(RANGE
  (SET
     (BEGIN /**)
     (END */)
  )
)

(COMMENT
  (LINE //)

  (BLOCK
    (ON /*)
    (OFF */)
  )
  (BLOCK
    (ON /+)
    (OFF +/)
    (NESTED YES)
  )
)

Zero or more indentation sets.
(INDENTATION
  (SET
    (ON {)
    (OFF })
  )
)

Zero or more sets of paired characters
(PAIRS
  (SET ())
  (SET [])
  (SET {})
)

(KEYWORDS
  (SET
    (ID S0)
    (SCOPE GLOBAL)
    (CATEGORY #Flow)
    (WORDS
       asm break case catch continue default do else finally
       for foreach goto if return switch synchronized throw
       try volatile while with
    )
  )

  (SET
    (ID S1)
    (SCOPE GLOBAL)
    (CATEGORY #Primary)
    (WORDS
       assert false import module null super this true typeid unittest
    )
  )

  (SET
    (ID S2)
    (SCOPE GLOBAL)
    (CATEGORY #Attrib)
    (WORDS
       body deprecated export extern instance lib msg package pragma private
       protected public
    )
  )

  (SET
    (ID S3)
    (SCOPE GLOBAL)
    (CATEGORY #Storage)
    (WORDS
      abstract auto const final in inout out override static align even
      naked offset seg ptr void main
    )
  )

  (SET
    (ID S4)
    (SCOPE GLOBAL)
    (CATEGORY #Basics)
    (WORDS
       bit bool byte cdouble cent cfloat char creal dchar double float idouble
       ifloat int ireal long ptrdiff_t real short size_t ubyte ucent uint ulong
       ushort void wchar
    )
  )

  (SET
    (ID S5)
    (SCOPE GLOBAL)
    (CATEGORY #User)
    (WORDS
       alias class delegate enum function interface invariant mixin struct template
       typedef typeof union
    )
  )

  (SET
    (ID S6)
    (SCOPE GLOBAL)
    (CATEGORY #Properties)
    (WORDS
       alignof dig dup epsilon infinity init keys length mant_dig max max_10_exp
       max_exp min min_10_exp min_exp nan offsetof rehash reverse sizeof sort
       values _argptr _arguments
    )
  )

  (SET
    (ID S7)
    (SCOPE GLOBAL)
    (CATEGORY #Operators)
    (WORDS
       call cast delete remove is new opAdd opAddAssign opAdd_r opAnd opAndAssign
       opAnd_r opApply opCall opCast opCat opCatAssign opCat_r opCmp opCmp opCom
       opDiv opDivAssign opDiv_r opEquals opIndex opIndexAssign opMod opModAssign
       opMod_r opMul opMulAssign opMul_r opNeg opOr opOrAssign opOr_r opPos
       opPostDec opPostInc opShl opShlAssign opShl_r opShr opShrAssign opShr_r
       opSlice opSub opSubAssign opSub_r opUShr opUShrAssign opUShr_r opXor
       opXorAssign opXor_r toHash type var
    )
  )

  (SET
    (ID S8)
    (SCOPE GLOBAL)
    (CATEGORY #Version)
    (WORDS
       all BigEndian debug DigitalMars D_InlineAsm linux LittleEndian none
       version Win32 Win64 Windows X86 X86_64
    )
  )

  (SET
    (ID S9)
    (SCOPE GLOBAL)
    (CATEGORY #Registers)
    (WORDS
       AH AL AX BH BL BP BX CH CL CR0 CR2 CR3 CR4 CS CX DH DI DL
       DR0 DR1 DR2 DR3 DR6 DR7 DS DX EAX EBP EBX ECX EDI EDX ES
       ESI ESP FS GS MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7 SI SP SS ST
       TR3 TR4 TR5 TR6 TR7
    )
  )
)
Logged

--
Derek Parnell
"Down with Mediocrity!"
dsvick
Beta Testers
Senior Miner
***
Posts: 52



WWW
« Reply #27 on: May 24, 2006, 01:28:38 pm »

I think Pete's file is a good start, I'm not a PHP expert but I can follow it and would make the following suggestions:

A lot of the elements could be renamed to make them more intuitive and smaller, in the general section, instead of 'allowedsublangs' it could be just 'sublangs' or 'embeddedSyntax' then the 'canexistfrom' and 'canexistuntil' could be 'beginSyntax' and 'endSyntax'. I like Pete's set up better than the CE one, so that there can be more than just two syntax types applied to a file.

There is an escapes element in the quotes element, why not just have a single escape element up in the general section. I've not seen a language that has different escapes for different characters (I'm sure there is one out there, but we could probably handle those as special cases).

For the keywords, I'm not sure what the intent of the prefix attribute is in the group definition, will the prefix appear in the preferences dialog to set the colors by, sort of like the keywords 0 - 9 in CE? Maybe change 'prefix' to 'type' as Wraithan suggested.

What are the ranges for?

Some of the other names could also be changed but those are the minute details and we can tackle those when we get the general format defined.
Logged

Dave
SnakE
Miner
**
Posts: 13


« Reply #28 on: May 24, 2006, 03:49:45 pm »

Arantor, I would reduce some constructions significantly:
Code:
<syntax>
<sublang name="HTML" definition="ee_html4.xml" start="?>" end="<?"/>
...
<quote char="'">
 <escape char="\"/>
</quote>
<!-- Perl quotes, all support escapes -->
<quote start="q/" end="/"><escape char="\"/></quote>
<quote start="q(" end=")"><escape char="\"/></quote>
<quote start="q[" end="]"><escape char="\"/></quote>
...
<!-- Extended Perl quotes, also support variables -->
<quote start="qq/" end="/" variables="yes"><escape char="\"/></quote>
<quote start="qq(" end=")" variables="yes"><escape char="\"/></quote>
<quote start="qq[" end="]" variables="yes"><escape char="\"/></quote>
...
<keywords prefix="">die echo for foreach include</keywords>
<keywords prefix="image">2wbmp _type_to_mime_type</keywords>
...
etc.

There is a "connect" keyword description, which is very language-specific.  This functionality is much closer to full syntax parsing than to a regexp highlighter.  I don't think that a basic highlighter should support such descriptions.

BTW what is a keyword prefix?

EDIT by Arantor: cleaning up the XML so it becomes viewable.
« Last Edit: January 06, 2007, 03:59:47 pm by Arantor » Logged
dsvick
Beta Testers
Senior Miner
***
Posts: 52



WWW
« Reply #29 on: May 24, 2006, 06:20:48 pm »

I was torn both ways on the keywords sections too. I like Arantors way because each word is distinctly named and separate from all the others and can still be grouped. By the way snake puts it would make for significantly smaller files.

If we went that route then keywords would be simply individual words and the editor would not have to worry about doing anything else with them except using the correct color. For functions we could define them separately to enable code completion. Something, to borrow from Wraithan, like this maybe...

Code:
<functions>
<function name="connect" returns="resource">
    <description value="Connects to a MySQL servert">
    <parameters>
        <param required="no" type="string" name="server"    />
        <param required="no" type="string" name="username"    />
        <param required="no" type="string" name="password"    />
        <param required="no" type="bool" name="new_link"    />
        <param required="no" type="int" name="client_flags" />
    </parameters>
</function>
</functions>
I think, that the keyword prefix is how he is defining groups of keywords, sort of like the CE keyword1, keyword2, etc. ...

EDIT by Arantor: cleaning up the XML so it becomes viewable.
« Last Edit: January 06, 2007, 04:00:03 pm by Arantor » Logged

Dave
Pages: 1 [2] 3 4
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.13 seconds with 18 queries.