Emerald Editor Discussion
July 20, 2017, 01:34:43 pm *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1] 2
  Print  
Author Topic: A whack at something official...  (Read 13128 times)
0 Members and 1 Guest are viewing this topic.
Szandor
Senior Miner
***
Posts: 92



« on: February 05, 2007, 06:52:52 am »

Hi there!

I have been rummaging through my brain a bit and actually began to write things down in some sort of document. At first I wanted to release it as an HTML document, but the code generated by the program I used to write the document was all horrid and stuff so I started thinking about other formats to use. In the end I went for the PDF-option since the XML looked really bad and the CHM didn't want to exist.

Ok, that went rather bollocks. Turns out my sleek and nifty PDF is too big for this forum. I'll just have to attach an RTF instead. Hope you're all happy. (No, I'm not drunk, I've just watched Black Books.)

I'm off to work.

* syntax definition document_manual.rtf (57.77 KB - downloaded 821 times.)
Logged

"Cleverly disguised as an original signature..."
Szandor
Senior Miner
***
Posts: 92



« Reply #1 on: February 07, 2007, 06:49:14 pm »

Ok, I'm kinda anal when it comes to HTML, right? So when I publish a website I want it to be nice crisp sementically correct HTML without tables and other such horrid stuff.

Still, I'm thinking about putting my pride in a small box for the time being and just publish the HTML for everyone to see, what do you guys think?

Oh, and if you can find the time to read the RTF, please post any comments here.
Logged

"Cleverly disguised as an original signature..."
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #2 on: February 07, 2007, 07:31:42 pm »

I've read it through and have a few comments. Make of them what you will.

As much as I like the idea of using : as an OR separator, unfortunately this is perhaps a little too unconventional; I wouldn't want to stray too far from convention on this sort of subject, as too many people will become confused. I like it because it seems visually more sensible to me. (But then again I'm a little strange)

In that respect, would the following not be a better example of the sort of thing in question:
Code:
: options
case-sensitive: no

: elements
Comment: ";"*/lb
Function: #name
Value: #value

: definitions
Code: /nl name value
name: *[/sp|/t]
value: * /lb

: meta
code: LiteStep RC
author: Staffan Lindsgåd
version: 1.0
extentions: *.rc

Or, alternatively have the syntax of param = value instead of param: value as this is more traditional. This also brings the "name" value under the definitions part more consistent with traditionally defined forms of regular expression, which is what this will end up being.

I like what I see though, I really do. It allows very powerful flexibility, because it is incorporating true syntax into a definition as well as just keywords with a smattering of contextualising.

Post in HTML if you want to, however the one thing I would say is this: whatever form we end up using on the EE site itself will not be in the same form of HTML that gets originally posted - it depends on how we manage the content as it gets added to the site (e.g. whether it is one single page or several pages - either way it'll probably end up as part of the current content management system Wink)

The only thing that I am wary of (and this is just down to my natural cynicism) is the license. I have to mention this, because I know others may make an issue of this. As the SDD is a metaformat (and thus not just locked to EE), you are of course perfectly free to copyright and request accreditation.

The issues I see from this are:
  • Already there have been hints at license purism (when there was the whole GPL/LGPL/BSD debate first mentioned), and adding this in may seem contrary to this.
  • What happens if EE uses it (correctly) but the code forks and someone else uses it incorrectly (e.g. removes the accreditation)?

That aside, I still think fundamentally the SDD is the way to go, it's tighter to its purpose than anything else I could come up with, and indeed better than XML for this purpose. I never really thought XML was the way to go for this, but hey, this really does kind of prove it.

The format is readable and clean (see the CSS example). The only thing I think I would stipulate if we used this with EE is that we have either a plugin or a helper program to help people write this stuff. For programmers it shouldn't be too difficult, but I do know that non-programmers, or at least non-techie people (some people are perfectly happy to mush HTML together but feel that anything more than that is 'techie' and out of their league, even when it isn't).

Addendum: The SDD specification sort of intimates that it could handle any file; in theory it would be able to parse and highlight anything, however for binary files the SDD does become sort of redundant. It might be worth adding a line or two to suggest that it applies to text-based (or 'human readable') formats, rather than any file.
« Last Edit: February 07, 2007, 09:14:10 pm by Arantor » Logged

"Cleverly disguised as a responsible adult!"
John Yeung
Senior Miner
***
Posts: 85


« Reply #3 on: February 08, 2007, 03:24:33 am »

I am still undecided about how unconventional to make the special characters used by SDD.  I keep thinking that if you want to use '=' to separate the left and right sides of a production rule and '|' to mean OR, then you are wandering very close to established notations like Extended BNF or Wirth Syntax Notation and probably many others.  And if you get close enough, you may as well use one of them, or start with one of them and make the minimal adaptations necessary for the purpose at hand.  With an EBNF-like SDD that isn't EBNF, there is the danger that people will think they can do something in the SDD exactly like they are used to in EBNF, and be quite confused in the cases where they differ.  Or you could argue that there would only be relatively little to learn if they could use their EBNF familiarity instead of learning a new notation from scratch.  But you could also argue that the whole of SDD will not be too much to learn anyway, especially for someone who is geeky enough to be into EBNF.

For purposes of discussion, I will assume the notation remains as Szandor has chosen.  I haven't looked too thoroughly at the document, but here are my comments so far:

What exactly are "logical" places for a space to mean AND, and why is it not logical for a space to mean AND when adjacent to grouping brackets?  The example of
Quote
cola | "bubbly" ["too sweet" : "too sour"]
> "The bubbly soap is too sour, and quite ugly too sweet Henry."

looks like it really should be written
Quote
cola | "bubbly" : "too sweet" : "too sour"

It would be nice to be able to say
Quote
cola | "bubbly but very " ["sweet" : "sour"]
> "The bubbly but very sweet soap may be bubbly but very sour, but soap may be only bubbly or only sweet or only sour."

The stuff inside brackets should appear to the neighboring syntax as a single object, just as if it had been written

Quote
cola | "bubbly but very " taste
taste | "sweet" : "sour"

I personally think AND should bind tighter than OR, whereas the spec indicates they are equal and bind left to right.  Regardless, the spec document should note the order of precedence explicitly.

Also, I think it would be helpful to mention that the '*' is nongreedy.  Many people who use regular expressions will expect it to be greedy by default.

The '~' operator, as shown, would be more accurately called "contains" rather than "part of" (the description of it even uses the word "contains").

I guess that's it for now.  Maybe I will have more to say later.

John
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #4 on: February 08, 2007, 06:46:04 pm »

Ahh, great feedback! Let's write up some answers, shall we?

As much as I like the idea of using : as an OR separator, unfortunately this is perhaps a little too unconventional; I wouldn't want to stray too far from convention on this sort of subject, as too many people will become confused. I like it because it seems visually more sensible to me. (But then again I'm a little strange)
[...]
Or, alternatively have the syntax of param = value instead of param: value as this is more traditional. This also brings the "name" value under the definitions part more consistent with traditionally defined forms of regular expression, which is what this will end up being.

I am not familiar with this "too unconventional" of which you speek...

Well, I do see your point actually. I don't like using = though since I think that belongs inside the logic and is ugly, even though it actually is the most correct symbol to use. I could settle for switching the : and the | if people want me to, but the taller glyph of the bar made more sense to be used as a barrier between name and logic to me.

The only thing that I am wary of (and this is just down to my natural cynicism) is the license. I have to mention this, because I know others may make an issue of this. As the SDD is a metaformat (and thus not just locked to EE), you are of course perfectly free to copyright and request accreditation.

There are three major reasons for me releasing this under my own license. First, this is not a program and neither the Creative Commons nor the GPL are very good for it.
Second, I want this syntax to be available for implementation in any program. This means that I don't want people to alter it in any way they want since that would make different versions of SDD:s uncompatible with eachother. My dream is that every program that supports SDD will be able to load any SDD. I want a standard format, not a proprietory.
Third, I know I'll get credit in this project if the SDD is included, but I want to make sure that people who I'm not working actively with and still include my SDD will credit me as well.

Already there have been hints at license purism (when there was the whole GPL/LGPL/BSD debate first mentioned), and adding this in may seem contrary to this.
True.

What happens if EE uses it (correctly) but the code forks and someone else uses it incorrectly (e.g. removes the accreditation)?

Then the people responsible for the development of the fork will be license breakers. As simple as that.

The SDD specification sort of intimates that it could handle any file; in theory it would be able to parse and highlight anything, however for binary files the SDD does become sort of redundant. It might be worth adding a line or two to suggest that it applies to text-based (or 'human readable') formats, rather than any file.

Quite true.

What exactly are "logical" places for a space to mean AND, and why is it not logical for a space to mean AND when adjacent to grouping brackets?  The example of
Quote
cola | "bubbly" ["too sweet" : "too sour"]
> "The bubbly soap is too sour, and quite ugly too sweet Henry."

looks like it really should be written
Quote
cola | "bubbly" : "too sweet" : "too sour"

Sorry, I wrote that example wrong. The space indicates an AND whenever another operator isn't specified at that location. "[foo] [bar]" is indeed "foobar" and "[foo] : [bar]" is "foo" or "bar".

It would be nice to be able to say
Quote
cola | "bubbly but very " ["sweet" : "sour"]
> "The bubbly but very sweet soap may be bubbly but very sour, but soap may be only bubbly or only sweet or only sour."

The stuff inside brackets should appear to the neighboring syntax as a single object, just as if it had been written

Quote
cola | "bubbly but very " taste
taste | "sweet" : "sour"

True. This is what I intended, but the example was written wrong. I'll fix it.

I personally think AND should bind tighter than OR, whereas the spec indicates they are equal and bind left to right.  Regardless, the spec document should note the order of precedence explicitly.

I will think a bit about this. I think you may be right, but I'm not sure yet.

Also, I think it would be helpful to mention that the '*' is nongreedy.  Many people who use regular expressions will expect it to be greedy by default.

True.

 author=John Yeung link=topic=272.msg1833#msg1833 date=1170905073The '~' operator, as shown, would be more accurately called "contains" rather than "part of" (the description of it even uses the word "contains").

 Also true.

I guess that's it for now.  Maybe I will have more to say later.

Yay!


I think we're getting places with the syntax highlighting now. What would be nice is if you check through the SDD definition and try to find a language that can't be written using an SDD. I have already made it compatible with RPG IV (now there's an ugly language) and Inform 7 (now there's a pretty language), but surely there must be something I've missed?
Logged

"Cleverly disguised as an original signature..."
KjeBja
Senior Miner
***
Posts: 76


« Reply #5 on: February 12, 2007, 09:08:31 am »

For the moment, I have just one comment, and that concerns just that - the comment option. In the much-used language COBOL, the asterisk (*) is used to identify comment lines. If I am not mistaken, this character is used in the SDD to indicate "any text", be it between delimiters, or until the end of the line. Is there a potential pitfall here?
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #6 on: February 12, 2007, 12:02:56 pm »

For the moment, I have just one comment, and that concerns just that - the comment option. In the much-used language COBOL, the asterisk (*) is used to identify comment lines. If I am not mistaken, this character is used in the SDD to indicate "any text", be it between delimiters, or until the end of the line. Is there a potential pitfall here?
Not really. In the SDD, * would indeed indicate any text, unless the asterisk is put inside quotes in which case it is taken literally. So, if a comment is started with * and lasts for the entire line it would look like this:

Code:
comment | "*" * /lf

The first * is put inside quotes and will indicate a literal asterisk. The second asterisk is not inside quotes and will indicate any text.
Logged

"Cleverly disguised as an original signature..."
John Yeung
Senior Miner
***
Posts: 85


« Reply #7 on: February 13, 2007, 01:49:47 pm »

OK, here is a comment about the SDD draft manual as well as a question for those who are good at and familiar with the use and history of Unix regular expressions and grep.

I found three different ways to match a newline character in the manual:  /lf, /lb, and /nl.  It says the first one matches any kind of newline, while the other two match the end of line only or the beginning of line only.  I can see how /lb and /nl behave somewhat like $ and ^ in grep, but I have always wondered why you need two different symbols for it.  Why can't you just use one symbol?  Won't it be obvious from its position in your search pattern whether you are trying to match the beginning or end of a line?  After all, if you are writing a program (in C, for example) which processes streams of characters, you will look for \n regardless of whether you are trying to find the beginning or end of a line.

Does anyone know why grep needs two symbols?  It is hard for me to imagine that its creator(s) didn't have a damned good reason, because Ken Thompson (who probably created it) and all those early Unix gurus seem to be coding demigods, and just ridiculously smart people in general.

John
Logged
Arantor
Site Administrator
Administrator
Master Jeweller
*****
Posts: 618



« Reply #8 on: February 13, 2007, 08:09:50 pm »

Well, the answer is actually suggested in the tome of Regular Expressions (yes, the one by Jeffrey Friedl)

^ and $ do not actually match characters, they match positions in the string.

Consider the following:
Code:
Some text line 1
Some text line 2

Using \n gives two problems we would encounter:
1. On Windows or Macs (except Mac OS X, I think) \n does not match true end of line, as Windows uses \r\n and Macs use \r.
2. What if we wanted to match a string to the end of line but there was no closing blank line?

Taking the above, let's say we wanted to add the word foo to the end of each line.

If we assume that the above is the entire file, and there is no blank line, replacing against \n would leave:
Code:
Some text line 1 foo
Some text line 2

This doesn't help us because we wanted to add it to the end of every line, not to the start of the whitespace between lines. Matching against $ should give:
Code:
Some text line 1 foo
Some text line 2 foo

I say 'should' because not all interpretations of regexp actually do this. The theory, however, is totally applicable (and indeed I have used $ in my own regexps to catch this exact scenario - just not in CE!)
Logged

"Cleverly disguised as a responsible adult!"
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #9 on: February 13, 2007, 11:18:51 pm »

I think John was just questioning why you need two different expressions.  Why can't a single expression (i.e., "±") be used in either case.  The expression "±foo" would match the first part of "foo bar", and "bar±" would match the last part.  "±foo bar±" would match the entire line.

But how does one match:

Code:
foo
bar

... with regular expressions in a grep?  "^foo$^bar$" doesn't seem to match it, though "^foo$bar$" does.

Does that seem right?  I'm not surprised that the first doesn't match, but I am surprised that the second does.
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #10 on: February 15, 2007, 07:19:18 pm »

I think John was just questioning why you need two different expressions.  Why can't a single expression (i.e., "±") be used in either case.  The expression "±foo" would match the first part of "foo bar", and "bar±" would match the last part.  "±foo bar±" would match the entire line.

Because if ± is equal to a line break, "bar±" would only match the last part of "foo bar" if there is a line break after it. After the last line in a document there seldom is. The ± would have to be equal to a line break, the end of a line and the beginning of a line. This might very well be the good way to go.

Actually, I first had two expressions - one for the beginning of a line and one for the end - but changed this to only one later on, apparently forgetting to change this everywhere making three different ones in the document. I kinda like the power that approach brings, but the question is if it is really necessary. The /lb would then have three different meanings, depending on where in the code it occurs.
Logged

"Cleverly disguised as an original signature..."
Matthew1344
Gem Cutter
****
Posts: 103


« Reply #11 on: February 15, 2007, 08:14:02 pm »

I'm saying that ± is not a line break, but rather an anchor to the end of the line (whichever end it's on).  "bar±" would match all these lines:
Code:
foobar
mybar
yourbar
bar

likewise, "±foo" would match
Code:
fooyou
foome
foothem
foobar

I guess ± in the middle of the line (i.e., "foo±bar") would match nothing.  (?)

And that pretty much exhausts my not-so-educated guesses on the subject.
Logged
Szandor
Senior Miner
***
Posts: 92



« Reply #12 on: February 16, 2007, 02:48:38 pm »

I'm saying that ± is not a line break, but rather an anchor to the end of the line (whichever end it's on).  "bar±" would match all these lines:
Code:
foobar
mybar
yourbar
bar

likewise, "±foo" would match
Code:
fooyou
foome
foothem
foobar

I guess ± in the middle of the line (i.e., "foo±bar") would match nothing.  (?)

And that pretty much exhausts my not-so-educated guesses on the subject.


So, your proposal would look something like this using proper SDD syntax (line stops marked using "/ls"):
Code:
fooline | /ls * "foo" /ls
barline | /ls "bar" * /ls

Every fooline and barline would have to begin on a new line, then contain stuff, then end when the line does. The above definitions would be true for:

infoo
barkeep
barfoo

In the following example, the middle one wouldn't be highlighted:

infoo
foobar
barkeep

The following syntax would be wrong since it semantically isn't able to apply to anything:

Code:
line | "foo" /ls "bar"

It would seem that:

foo
bar

would apply, but after the line ends a line feed is the only thing that can occur - apart from nothing - and therefore it is impossible to write anything that would apply. I can live with having one function for both ends of the line, but we must still have a separate for line feeds.
Logged

"Cleverly disguised as an original signature..."
Szandor
Senior Miner
***
Posts: 92



« Reply #13 on: March 21, 2007, 01:15:31 pm »

Just wanted to say hi and assure people that I'm not dead. I do have a new computer however and will now play all important games released from the time where my former 450 Pentium III was hot stuff. Fortunately for me, I'm quite picky about my games so the list isn't actually that long. Thief 3 will demand some time though, as will Half-Life 2 (played through HL2:Episode One by mistake) and possibly Elder Scrolls: Oblivion.

Still, my girlfriend will poke at me if I play for too much and I really want to see EE released so I guess I'll put in a few hours here as well, perfecting the SDD-format. I have some feedback to ponder, though it seems that people generally like my proposal.

When I release the new version of the SD-syntax I will have a small competition. I will not appoint a winner, nor is there a price to claim so if you want to call it a competition or not is actually up to you. The rules are simple: Take your favourite language and make an SDD out of it. Try to find problem areas, such as nested code, exceptions from rules and such. If you want to cheat and begin before I release the next version, that's ok. You can even submit it here before the next version is out. In essence, I guess the competition has already started. Or whatever you want to call it.
Logged

"Cleverly disguised as an original signature..."
Pvt_Ryan
Master Jeweller
******
Posts: 422



WWW
« Reply #14 on: March 21, 2007, 05:03:19 pm »

lol

i was gonna suggest you do one of the smaller CE syntax files as an e.g.  Smiley

Logged
Pages: [1] 2
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.248 seconds with 18 queries.