Little things that matter in language design
The designers of a new programming language are probably most interested in the big features — the things that just couldn't be done with whichever language they are trying to escape from. So they are probably thinking of the type system, the data model, the concurrency support, the approach to polymorphism, or whatever it is that they feel will affect the expressiveness of the language in the way they want.
There is a good chance they will also have a pet peeve about syntax, whether it relates to the exact meaning of the humble semicolon, or some abhorrent feature such as the C conditional expression which (they feel) should never be allowed to see the light of day again. However, designing a language requires more than just addressing the things you care about. It requires making a wide range of decisions concerning various sorts of abstractions, and making sure the choices all fit together into a coherent, and hopefully consistent, whole.
One might hope that, with over half a century of language development behind us, there would be some established norms which can be simply taken as "best practice" without further concern. While this is true to an extent, there appears to be plenty of room for languages to diverge even on apparently simple concepts.
Having begun an exploration of the relatively new languages Rust and Go and, in particular, having two languages to provide illuminating contrasts, it seems apropos to examine some of those language features that we might think should be uncontroversial to see just how uniform they have, or have not, become.
Comments
When first coming to C [PDF] from Pascal, the usage of braces can be a bit of a surprise. While Pascal sees them as one option for enclosing comments, C sees them as a means of grouping statements. This harsh conflict between the languages is bound to cause confusion, or at least a little friction, when moving from one language to the next, but fortunately appears to be a thing of the past.
One last vestige of this sort of confusion can be seen in the configuration files for BIND, the Berkeley Internet Name Daemon. In the BIND configuration files semicolons are used as statement terminators while in the database files they introduce comments.
When not hampered by standards conformance as these database files are, many languages have settled on C-style block comments:
/* This is a comment */
and C++-style one-line comments:
// This line has a comment
these having won over from the other Pascal option of:
(* similar but different block comments *)
and Ada's:
-- again a similar yet different single line comment.
The other popular alternative is to start comments with a "#" character, which is a style championed by the C-shell and Bourne shell, and consequently used by many scripting languages. Thankfully the idea of starting a comment with "COMMENT" and ending with "TNEMMOC" never really took off and may be entirely apocryphal.
Both Rust and Go have embraced these trends, though not as fully as BIND configuration files and other languages like Crack which allow all three (/* */, //, #). Rust and Go only support the C and C++ styles.
Go doesn't use the "#" character at all, allowing it only inside comments and string constants, so it is available as a comment character for a future revision, or maybe for something else.
Rust has another use for "#" which is slightly reminiscent of its use by the preprocessor in C. The construct:
#[attribute....]
attaches arbitrary metadata to nearby parts of the program which can enable or disable compiler warnings, guide conditional compilation, specify a license, or any of various other things.
Identifiers
Identifiers are even more standard than comments. Any combination of letters, digits, and the underscore that does not start with a digit is usually acceptable as an identifier providing it hasn't already been claimed as a reserved word (like if or while).
With the increasing awareness of languages and writing systems other than English, UTF-8 is more broadly supported in programming languages these days. This extends the range of characters that can go into an identifier, though different languages extend it differently.
Unicode defines a category for every character, and Go simply extends the definition given above to allow "Unicode letter" (which has 5 sub-categories: uppercase, lowercase, titlecase, modifier, and other) and "Unicode decimal digit" (which is one of 3 sub-categories of "Number", the others being "Number,letter" and "Number,other") to be combined with the underscore. The Go FAQ suggests this definition may be extended depending on how standardization efforts progress.
Rust gives a hint of what these efforts may look like by delegating the task of determining valid identifiers to the Unicode standard. The Unicode Standard Annex #31 defines two character classes, "ID_Start" and "ID_Continue", that can be used to form identifiers in a standard way. The Annex offers these as a resource, rather than imposing them as a standard, and acknowledges that particular use cases may extend them is various ways. It particularly highlights that some languages like to allow identifiers to start with an underscore, which ID_Start does not contain. The particular rule used by Rust is to allow an identifier to start with an ASCII letter, underscore, or any ID_Start, and to be continued with ASCII letters, ASCII digits, underscores, or Unicode ID_Continue characters.
Allowing Unicode can introduce interesting issues if case is significant, as Unicode supports three cases (upper, lower, and title) and also supports characters without case. Most programming languages very sensibly have no understanding of case and treat two characters of different case as different characters, with no attempt to fold case or have a canonical representation. Go however does pay some attention to case.
In Go, identifiers where the first character is an uppercase letter are treated differently in terms of visibility between packages. A name defined in one package is only exported to other packages if it starts with an uppercase letter. This suggests that writing systems without case, such as Chinese, cannot be used to name exported identifiers without some sort of non-Chinese uppercase prefix. The Go FAQ acknowledges this weakness but shows a strong reluctance to give up the significance of case in exports.
Numbers
Numbers don't face any new issues with Unicode though possibly that is just due to continued English parochialism, as Unicode does contain a complete set of Roman numerals as well as those from more current numeral systems. So you might think that numbers would be fairly well standardized by now. To a large extent they are, but there still seems to be wiggle room.
Numbers can be integers or, with a decimal point or exponent suffix (e.g. "1.0e10"), floating point. Integers can be expressed in decimal, octal with a leading "0", or hexadecimal with a leading "0x".
In C99 and D [PDF], floating point numbers can also be hexadecimal. The exponent suffix must then have a "p" rather than "e" and gives a power of two expressed in decimal. This allows precise specification of floating point numbers without any risk of conversion errors. C11 and D also allow a "0b" prefix on integers to indicate a binary representation (e.g. "0b101010") and D allows underscores to be sprinkled though numbers to improve readability, so 1_000_000_000 is clearly the same value as 1e9.
Neither Rust nor Go have included hexadecimal floats. While Rust has included binary integers and the underscore spacing character, Go has left these out.
Another subtlety is that while C, D, Go, and many other languages allow a floating point number to start with a period (e.g. ".314159e1"), Rust does not. All numbers in Rust must start with a digit. There does not appear to be any syntactic ambiguity that would arise if a leading period were permitted, so this is presumably due to personal preference or accident.
In the language Virgil-III this choice is much clearer. Virgil has a fairly rich "tuple" concept [PDF] which provides a useful shorthand for a list of values. Members of a tuple can be accessed with a syntax similar to structure field references, only with a number rather than a name. So in:
var x:(int, int) = (3, 4); var w:int = x.1;
The variable "w" is assigned the value "4" as it is element one of the tuple "x". Supporting this syntax while also allowing ".1" to be a floating point number would require the tokenizer to know when to report two tokens ("dot" and "int") and when it is just one ("float"). While possible, this would be clumsy.
Many fractional numbers (e.g. 0.75) will start with a zero even in languages which allow a leading period (.75). Unlike the case with integers, the leading zero does not mean these number are interpreted in base eight. For 0.75 this is unlikely to cause confusion. For 0777.0 it might. Best practice for programmers would be to avoid the unnecessary digit in these cases and it would be nice if the language required that.
As well as prefixes, many languages allow suffixes on numbers with a couple of different meanings. Those few languages which have "complex" as a built-in type need a syntax for specifying "imaginary" constants. Go, like D, uses an "i" suffix. Python uses "j". Spreadsheets like LibreOffice localc or Microsoft Excel allow either "i" or "j". It is a pity more languages don't take that approach. Rust doesn't support native complex numbers, so it doesn't need to choose.
The other meaning of a suffix is to indicate the "size" of the value - how many bytes are expected to be used to store it. C and D allow u, l, ll, or f for unsigned, long, long long, and float, with a few combinations permitted. Rust allows u, u8, u16, u32, u64, i8, i16, i32, i64, f32, and f64 which cover much the same set of sizes, but are more explicit. Perhaps fortunately, i is not a permitted suffix, so there is room to add imaginary numbers in the future if that turned out to be useful.
Go takes a completely different approach to the sizing of constants. The language specification talks about "untyped" constants though this seems to be some strange usage of the word "untyped" that I wasn't previously aware of. There are in fact "untyped integer" constants, "untyped floating point" constants, and even "untyped boolean" constants, which seem like they are untyped types. A more accurate term might be "unsized constants with unnamed types" though that is a little cumbersome.
These "untyped" constants have two particular properties. They are calculated using high precision with overflow forbidden, and they can be transparently converted to a different type provided that the exact value can be represented in the target type. So "1e15" is an untyped floating point constant which can be used where an int64 is expected, but not where an int32 is expected, as it requires 50 bits to store as an integer.
The specification states that "Constant expressions are always evaluated exactly" however some edge cases are to be expected:
print((1 + 1/1e130)-1, "\n") print(1/1e130, "\n")
results in:
+9.016581e-131 +1.000000e-130
so there does seem to be some limit to precision. Maintaining high precision and forbidding overflow means that there really is no need for size suffixes.
Strings
Everyone knows that strings are enclosed in single or double quotes. Or maybe backquotes (`) or triple quotes ('''). And that while they used to contain ASCII characters, UTF-8 is preferred these days. Except when it isn't, and UTF-16 or UTF-32 are needed.
Both Rust and Go, like C and others, use single quotes for characters and double quotes for strings, both with the standard set of escape sequences (though Rust inexplicably excludes \b, \v, \a, and \f). This set includes \uXXXX and \UXXXXXXXX so that all Unicode code-points can be expressed using pure ASCII program text.
Go chooses to refer to character constants as "Runes" and provides the built in type "rune" to store them. In C and related languages "char" is used both for ASCII characters and 8-bit values. It appears that the Go developers wanted a clean break with that and do not provide a char type at all. rune (presumably more aesthetic than wchar) stores (32-bit) Unicode characters while byte or uint8 store 8-bit values.
Rust keeps the name char for 32-bit Unicode characters and introduces u8 for 8-bit values.
The modern trend seems to be to disallow literal newlines inside quoted strings, so that missing quote characters can be quickly detected by the compiler or interpreter. Go follows this trend and, like D, uses the back quote (rather than the Python triple-quote) to surround "raw" strings in which escapes are not recognized and newlines are permitted. Rust bucks the trend by allowing literal newlines in strings and does not provide for uninterpreted strings at all.
Both Rust and Go assume UTF-8. They do not support the prefixes of C (U"this is a string of 32bit characters") or the suffixes of D ("another string of 32bit chars"d), to declare a string to be a multibyte string.
Semicolons and expressions
The phrase "missing semicolon" still brings back memories from first-year computer science and learning Pascal. It was a running joke that whenever the lecturer asked "What does this code fragment do?" someone would call out "missing semicolon", and they were right more often than you would think.
In Pascal, a semicolon separates statements while in C it terminates some statements — if, for, while, switch and compound statements do not require a semicolon. Neither rule is particularly difficult to get used to, but both often require semicolons at the end of lines that can look unnecessary.
Go follows Pascal in that semicolons separate statements — every pair of statements must be separated. A semicolon is not needed before the "}" at the end of a block, though it is permitted there. Go also follows the pattern seen in Python and JavaScript where the semicolon is sometimes assumed at the end of a line (when a newline character is seen). The details of this "sometimes" is quite different between languages.
In Go, the insertion of semicolons happens during "lexical analysis", which is the step of language processing that breaks the stream of characters into a stream of tokens (i.e. a tokenizer). If a newline is detected on a non-empty line and the last token on the line was one of:
- an identifier,
- one of the keywords break, continue, fallthrough, or return
- a numeric, rune, or string literal
- one of ++, --, ), ], or }
then a semicolon is inserted at the location of the newline.
This imposes some style choices on the programmer such that:
if some_test { some_statement }
is not legal (the open brace must go on the same line as the condition), and:
a = c + d + e
is not legal — the operation (+) must go at the end of the first line, not the start of the second.
In contrast to this, JavaScript waits until the "parsing" step of language processing when the sequence of tokens is gathered into syntactic units (statements, expressions, etc.) following a context free grammar. JavaScript will insert a semicolon, provided that semicolon would serve to terminate a non-empty statement, if:
- it finds a newline in a location that the grammar forbids a newline, such as after the word "break" or before the postfix operator "++";
- it finds a "}" or End-of-file that is not expected by the grammar
- it finds any token that is not expected, which was separated from the previous token by at least one newline.
This often works well but brings its own share of style choices including the interesting suggestion to sometimes use a semicolon to start a statement.
While both of these approaches are workable, neither really seems ideal. They both force style choices which are rather arbitrary and seem designed to make life easy for the compiler rather than for the programmer.
Rust takes a very different approach to semicolons than Go or JavaScript or many other languages. Rather than making them less important and often unnecessary they are more important and have a significant semantic meaning.
One use involves the attributes mentioned earlier. When followed by a semicolon:
#[some_attribute];
the attribute applies to the entity (e.g. the function or module) that the attribute appears within. When not followed by a semicolon, the attribute applies to the entity that follows it. A missing semicolon could certainly make a big difference here.
The primary use of semicolons in Rust is much like that in C — they are used to terminate expressions by turning the expressions into statements, discarding any result. The effect is really quite different from C because of a related difference: many things that C considers to be statements, Rust considers to be expressions. A simple example is the if expression.
a = if b == c { 4 } else { 5 };
Here the if expression returns either "4" or "5", which is stored in "a".
A block, enclosed in braces ({ }), typically includes a sequence of expressions with semicolons separating them. If the last expression is also followed by a semicolon, then the block-expression as a whole does not have a value — that last semicolon discards the final value. If the last expression is not followed by a semicolon, then the value of the block is the value of the last expression.
If this completely summed up the use of semicolons it would produce some undesirable requirements.
if condition { expression1; } else { expression2; } expression3;
This would not be permitted as there is no semicolon to discard the value of the if expression before expression3. Having a semicolon after the last closing brace would be ugly, and that if expression doesn't actually return a value anyway (both internal expressions are terminated with a semicolon) so the language does not require the ugly semicolon and the above is valid Rust code. If the internal expression did return a value, for example if the internal semicolons were missing, then a semicolon would be required before expression3.
Following this line of reasoning leads to an interesting result.
if condition { function1() } else { function2() } expression3;
Is this code correct or is there a missing semicolon? To know the answer you need to know the types of the functions. If they do not return a value, then the code is correct. If they do, a semicolon is needed, either one at the end of the whole "if" expression, or one after each function call. So in Rust, we need to evaluate the types of expressions before we can be sure of correct semicolon usage in every case.
Now the above is probably just a silly example, and no one would ever write code like that, at least not deliberately. But the rules do seem to add an unnecessary complexity to the language, and the task of programming is complex enough as it is — adding more complexity through subtle language rules is not likely to help.
Possibly a bigger problem is that any tool that wishes to accurately analyze the syntax of a program needs to perform a complete type analysis. It is a known problem that the correct parsing of C code requires you to know which identifiers are typedefs and which are not. Rust isn't quite that bad as missing type information wouldn't lead to an incorrect parse, but at the very least it is a potential source of confusion.
Return
A final example of divergence on the little issues, though perhaps not quite so little as the others, can be found in returning values from functions using a return statement. Both Rust and Go support the traditional return and both allow multiple values to be returned: Go by simply allowing a list of return types, Rust through the "tuple" type which allows easy anonymous structures. Each language has its own variation on this theme.
If we look at the half million return statements in the Linux kernel, nearly 35,000 of them return a variable called "ret", "retval", "retn", or similar, and a further 20,000 return "err", "error", or similar. This totals more than 10% of total usage of return in the kernel. This suggests that there is often a need to declare a variable to hold the intended result of a function, rather than to just return a result as soon as it is known.
Go acknowledges this need by allowing the signature of a function to give names to the return values as well as the parameter values:
func open(filename string, flags int) (fd int, err int)
Here the (hypothetical) open() function returns two integers named fd (the file descriptor) and err. This provides useful documentation of the meaning of the return values (assuming programmers can be more creative than "retval") and also declares variables with the given names. These can be set whenever convenient in the code of the function and a simple:
return
with no expressions listed will use the values in those variables. Go requires that this return be present, even if it lists no values and is at the end of the function, which seems a little unnecessary, but isn't too burdensome.
There is evidence [YouTube] that some Go developers are not completely comfortable with this feature, though it isn't clear whether the feature itself is a problem, or rather the interplay with other features of Go.
Rust's variation on this theme we have already glimpsed with the observation that Rust has "expressions" in preference to "statements". The whole body of a function can be viewed as an expression and, provided it doesn't end with a semicolon, the value produced by that expression is the value returned from the function. The word return is not needed at all, though it is available and an explicit return expression within the function body will cause an early return with the given value.
Conclusion
There are many other little details, but this survey provides a good sampling of the many decisions that a language designer needs to make even after they have made the important decisions that shape the general utility of the language. There certainly are standards that are appearing and broadly being adhered to, such as for comments and identifiers, but it is a little disappointing that there is still such variability concerning the available representations of numbers and strings.
The story of semicolons and statement separation is clearly not a story we've heard the end of yet. While it is good to see language designers exploring the options, none of the approaches explored above seem entirely satisfactory. The recognition of a line-break as being distinct from other kinds of white space seems to be a clear recognition that the two dimensional appearance of the code has relevance for parsing it. It is therefore a little surprising that we don't see the line indent playing a bigger role in interpretation of code. The particular rules used by Python may not be to everyone's liking, but the principle of making use of this very obvious aspect of a program seems sound.
We cannot expect ever to converge on a single language that suits every programmer and every task, but the more uniformity we can find on the little details, the easier it will be for programmers to move from language to language and maximize their productivity.
Index entries for this article | |
---|---|
GuestArticles | Brown, Neil |
Posted Jun 8, 2013 1:22 UTC (Sat)
by dskoll (subscriber, #1630)
[Link] (7 responses)
I realize Perl is no longer cool and wasn't mentioned in the article, but it has a fairly nice extension for numeric constants. You can write big numbers like 5429874625 as 5_429_874_625 which makes them significantly more pleasant for humans to parse.
Posted Jun 8, 2013 8:21 UTC (Sat)
by rahulsundaram (subscriber, #21946)
[Link] (1 responses)
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/...
Posted Jun 10, 2013 11:09 UTC (Mon)
by niner (subscriber, #26151)
[Link]
Posted Jun 8, 2013 8:41 UTC (Sat)
by eru (subscriber, #2753)
[Link] (2 responses)
Posted Jun 8, 2013 12:53 UTC (Sat)
by dark (guest, #8483)
[Link] (1 responses)
It was actually a pain to work with since I could never just grep for an identifier and be sure I got all uses. I considered implementing an --ignore-underscore flag for GNU grep but after the project was done I no longer felt the need :)
Posted Jun 8, 2013 15:02 UTC (Sat)
by dskoll (subscriber, #1630)
[Link]
Ignored underscores in identifiers is a terrible idea for the reason you mentioned (non-greppableness). However, in large numbers I like it; you are unlikely to want to grep a number and typically you'd use it only in one place like this:
use contstant SOME_NUMBER => 1_234_345_837;
Posted Jun 8, 2013 10:38 UTC (Sat)
by andreasb (guest, #80258)
[Link] (1 responses)
Ada also allows underscores in numeric literals, while we're at it.
Posted Jun 8, 2013 18:11 UTC (Sat)
by dvdeug (guest, #10998)
[Link]
Posted Jun 8, 2013 2:24 UTC (Sat)
by Richard_J_Neill (subscriber, #23093)
[Link] (29 responses)
It would also be wonderful if compilers could track-back with error messages. For example, a missing } somewhere in the middle of the program will usually only throw an error on the last line of the file. It would be far more helpful to report the line number of the opening { that wasn't ever closed.
Posted Jun 8, 2013 2:46 UTC (Sat)
by nlucas (guest, #33793)
[Link]
I learned octal much latter than hexadecimal, and the only use is when creating files.
Even if I code in C since 1990, once or two times per year I still get bitten by "octal bugs". My brain forgets that C and math numbers are not the same.
OTOH, trigraph warnings occur more or less with the same frequency. If they didn't got disabled until now, this is a lost cause...
Posted Jun 8, 2013 3:15 UTC (Sat)
by geofft (subscriber, #59789)
[Link] (17 responses)
Of course, that's an easy change to make right now in a new language. Python 3 (which is effectively a new language, for this purpose, since it explicitly disclaims syntax compatibility) considers a leading 0 a parse error, and requires 0o, just as you describe.
Posted Jun 8, 2013 6:37 UTC (Sat)
by smurf (subscriber, #17840)
[Link] (16 responses)
IMHO the fact that a language cannot be parsed without semantic analysis automatically disqualifies it for anything I would want to do – again, didn't we learn from Perl or C (and no, I'm not just talking about the Obfuscated C|Perl Contests) why that is a bad idea?
Also, given the plethora of bugs C's 0-denotes-octal design mistake has caused, IMHO new laguages should learn from that mistake and require 0o.
Posted Jun 8, 2013 7:19 UTC (Sat)
by jzbiciak (guest, #5246)
[Link]
Posted Jun 8, 2013 10:21 UTC (Sat)
by oever (guest, #987)
[Link] (11 responses)
If I want to list all functions in a collection of source code, the language tools should make this easy. Using grep for this is not precise and error prone.
Most language design centers on silly things like the serialization, the syntactic sugar. The way the instructions are shown on screen is not nearly as important as the conceptual cleanness of the language and the ability to automatically reason about the software.
Posted Jun 8, 2013 18:28 UTC (Sat)
by khim (subscriber, #9252)
[Link] (7 responses)
If this is true then why C++ is so popular and LISP dialects are so rarely used?
Posted Jun 8, 2013 22:37 UTC (Sat)
by oever (guest, #987)
[Link] (6 responses)
Ok, let's not be snarky. C++ is still popular because it is low level and can be used to create fast code. There is also a network effect: people will be trained in languages that are used a lot. The network effect is also apparent in details of programming languages, as the parent article points out; the way comments are written is often similar in programming languages to make it less hard for programmers to read code in the new language.
One of the names for the field in which programmers work is 'automation'. Automation of repetitive tasks such as performing a multitude of logical tests on source code is something that should come naturally to workers in the field of automation. And yet, the very thing we work on most, the source code, is not easy to automate at all. The syntax rules of most programming languages are quite intricate and take a while to learn and there is usually no algorithm library to help parse and automate tasks on the source code. C and C++ rank high in the list of offenders because combinations of macros and includes make it impossible to parse a source code file without knowing the include directories.
Posted Jun 9, 2013 1:18 UTC (Sun)
by khim (subscriber, #9252)
[Link] (3 responses)
Because C++ programs don't run in browser, obviously. If JavaScript was indeed a better way to write programs and not just more convenient way to deliver them to the end user then we would not see so many projects which try to somehow make sane language from that abomination (starting from CoffeeScript/TypeScript and ending with Emscripten/asm.js). And that is good thing. There are some languages where such automation is possible (lisp dialects and C#/Java are among them). C#/Java tend to provide the tools you so crave which leads to utter disaster: pointless churn quickly produces programs which noone understands (original code was transformed so many times by "automated tasks" that original meaning was mixed with so many changes that it's basically impossible to understand what goes on where on the code; your only hope are unittests which help to produce something sensible, but the idea that you can actually find and fix all the errors in the code is considered blasphemy). Lisp development does not favor such tricks at all: instead it transforms text written by developer (and which is considered sacred) at runtime. Works much better. And guess what? For that style of work it does not really matter if your language is easily parseable/automatable: lex, yacc and other such tools are happy with any language. Right - but why is that a bad thing? If you feel the need to organize some kind of automatic surgery on the "source code" then it just means that you've chosen badly and some pieces which should be either kept in database or generated in compile time or runtime are stored in sources instead. Early on both C and C++ quickly evolved to make sure the tools you talk about will not ever be needed - and that was a good thing. But in the last decades they were essentially frozen which made them inadequate for today's requirements. Well, perhaps it's time to do something about that and not try to add bandaids upon bandaids?
Posted Jun 9, 2013 2:47 UTC (Sun)
by rahulsundaram (subscriber, #21946)
[Link]
Posted Jun 9, 2013 9:52 UTC (Sun)
by oever (guest, #987)
[Link]
Analysis and generation of code are the main uses. If custom checks for a rule in code, e.g. "only use RAII, and no isolated new/malloc", "always use 'const' on variables at are not changed" can be written easily, then keeping project code cleaner, more readable and faultless is simpler.
Generating code is a common use-case. Consider for example an application that accesses an SQL database. Instead of hand writing type unsafe SQL statements everywhere in the code, one could generate type safe classes from which the SQL statements are generated at compile time (C++11 templates cannot collate strings at compile time easily, so a separate code generation step is preferred). In such a scenario, having an API is much more comfortable and less error prone than writing strings.
Posted Jun 10, 2013 0:10 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
> And that is good thing. There are some languages where such automation is possible (lisp dialects and C#/Java are among them). C#/Java tend to provide the tools you so crave which leads to utter disaster: pointless churn quickly produces programs which noone understands
What are you on?
Posted Jun 20, 2013 22:02 UTC (Thu)
by VITTUIX-MAN (guest, #82895)
[Link] (1 responses)
You know, in the industrial automation field we have a whole bunch of languages that are both unique and don't really shine on their design, or what do I know, surely all the features must be well justified, but uniformity is not sadly amongst the features.
First of all there's the venerable IEC 61131-3 that defines 5 languages (originally 3), the originals being "function block trees", "ladder diagrams" and a macro assembler for a two accumulator machine that carefully avoids indirect access one the variables. There's simply no means of getting and address of of some variable, or get a value by address, save defining an array (which the language supports) as big as the whole memory, though the readability would suffer a bit...
Beyond those languages, we have a whole bunch of different BASIC dialects, specially in robotics, meaning there are the lovable control structures such as multi-label ON GOTO and ON GUSUB and FOR TO STEP NEXT loop and so on, and if one is really lucky, variables can be defined with DIM. No safe subroutines; what one would even do with them when the program can be only 999 lines long? With FOR-loop there's an interesting restriction that using GOTO statement to escape it is forbidden for a reason unknown.
One idiom that seem to be a common in that kind of environments is that the variables that can be used are quite limited, and one is expected to manually assign variables to correspond a memory address, though the IDE does most of it automatically these days. One gets a total of maybe few tens of kilobytes of memory, even if the said system runs on XP embedded!
in SLIM by Nachi-Fujikoshi the variables are arranged in particularly cool way: V$1, that's a global variable (V) of string type ($) number 1 (out of 50). How ever one may also write V$[1], and 1 may be another variable, thus allowing indexing through the string segment. Brilliant. L$1 would be a string from the local for-process string segment (another 50 strings right there!), though one has to keep in mind these don't work with all the commands. For example the socket related commands only allow global variables as argument. The command reference of course does not mention which commands require the use of global segments, leaving they joy of discovery for the user.
There are also variables available (that are not power-failure safe) with DIM statement, but there is a catch that they don't work with any statements at all! All "DIM variables" are good for is arithmetics.
If one wants to have named variables, they are usually supported by find and replace -operation performed by a preprocessor, meaning there's a separate variable include-file that assigns names that correspond to SLIM type variable literals and constants. As I said, it's a search and replace, so one has to be careful not to have a variable defined that is a part of some longer identifier like a command name. Say one has a constant definition "SOCK,9600" and there is a command SOCKBIND -that just got replaced and is now 9600BIND making the compiler halt and die. Beautiful.
Posted Jun 25, 2013 20:18 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Jun 8, 2013 18:43 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
Naturally. Programming languages are designed for the convenience of meatbags, not the convenience of boxes of thinking rock.
Posted Jun 9, 2013 6:22 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (1 responses)
I beg to differ; so far, the rocks decline to actually think. Not as we understand that word.
Which is the crux of the problem, because if the language is not easily parseable by both human and silicon processing, the meatbags will too easily assume that the code means something else than what the rocks interpret it as.
Posted Jun 11, 2013 21:03 UTC (Tue)
by brouhaha (subscriber, #1698)
[Link]
The alternative would be to claim that if a computer solves a problem via a particular algorithm but is not thinking, that if I solve the same problem using the same algorithm I couldn't be said to be thinking either.
I certainly won't claim that everything the computer does is thinking, nor that the computer can do all the kinds of thinking that I can.
Posted Jun 9, 2013 23:56 UTC (Sun)
by hpa (guest, #48575)
[Link] (2 responses)
Posted Jun 10, 2013 5:04 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Jun 10, 2013 11:26 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
Posted Jun 8, 2013 7:30 UTC (Sat)
by renox (guest, #23785)
[Link]
Posted Jun 8, 2013 13:14 UTC (Sat)
by cmrx64 (guest, #89304)
[Link] (2 responses)
But Rust has macros and syntax extensions. Octal literals can be supported outside the core language, ie o!(566) or oct!(755). Not as pretty, I think.
Rust doesn't have octal literals right now because they're seen as on occasionally useful, and nobody has bothered adding them in yet.
Posted Jun 8, 2013 21:44 UTC (Sat)
by Tobu (subscriber, #24111)
[Link] (1 responses)
Posted Jun 9, 2013 16:27 UTC (Sun)
by alankila (guest, #47141)
[Link]
Posted Jun 8, 2013 21:53 UTC (Sat)
by edeloget (subscriber, #88392)
[Link] (1 responses)
Finding the opening { that match a missing } is not that simple. There is a good chance to point the programmer to the wrong opening {, meaning that the information is now wrong (as opposed to not useful).
Posted Jun 8, 2013 22:14 UTC (Sat)
by hummassa (guest, #307)
[Link]
just a correction: finding the opening { that match a missing } is not that simple IF THE PROGRAM IS NOT INDENTED CORRECTLY. It is about time some IDE could see exactly *where* the runaway block began and where it *should* end, because IDEs many times have (at least rudimentary) parsers to the language.
Posted Jun 9, 2013 23:26 UTC (Sun)
by tjc (guest, #137)
[Link]
Top-down parsing (recursive descent parsing, for example) is better in this respect, which is probably one of the reasons the GCC C compiler got a new parser a few years ago.
Posted Jun 20, 2013 10:10 UTC (Thu)
by moltonel (guest, #45207)
[Link] (2 responses)
So you have 10 = 16#a = 8#12 = 2#1010 = 10#10 = 36#a = 5#20.
This works for any base between 2 and 36, it's clear, it's consistent. There is similar functionality when printing or parsing a string : any base can be used.
Posted Jun 20, 2013 11:07 UTC (Thu)
by renox (guest, #23785)
[Link] (1 responses)
Posted Jun 20, 2013 13:13 UTC (Thu)
by anselm (subscriber, #2796)
[Link]
Ada, at least, was hyped as the programming language to make all other programming languages obsolete. Such a programming language would naturally have to cater to the preferences of, e.g., six-fingered space aliens, too.
Posted Jun 8, 2013 2:34 UTC (Sat)
by nlucas (guest, #33793)
[Link] (19 responses)
Off course, that would be impossible to accomplish with simple text files as source (we would need meta-data that doesn't get deleted by mistake).
Posted Jun 8, 2013 2:58 UTC (Sat)
by neilbrown (subscriber, #359)
[Link] (2 responses)
[#decimal_comma];
or maybe
#pragma decimal_commas
I don't think I would recommend that though.
Interesting problem - thanks for mentioning it.
Posted Jun 8, 2013 16:46 UTC (Sat)
by nlucas (guest, #33793)
[Link] (1 responses)
For example, suppose we have a simple file with constants:
const double CONST1 = 123,456;
If someone deletes the meta-data indicating the locale, how do you parse it? There is no way to know what the original values were if you don't know the original locale.
You could fix this by making the thousand separator an invalid character (by only allowing '_' or space as the thousand separator), but with so many locales out there could this really be fixed on a global scale?
Localization is hard, and should never be taken lightly. For example, my country uses ',' as decimal separator, but the keyboard "numpad" has '.', not ','. So it's usual for applications to accept both on input as decimal separators. No standard libraries I know of support this case, which means most applications have to implement (or filter) it's input functions instead of relying on the standard libraries.
Posted Jun 9, 2013 16:32 UTC (Sun)
by alankila (guest, #47141)
[Link]
I wrote a very similar horror for date parsing, trying to support yyyy-mm-dd, dd.mm.yyyy and dd/mm/yy OR mm/dd/yy based on whether user is expected to reside in UK or US.
I hate people and their myriad conventions.
Posted Jun 8, 2013 7:17 UTC (Sat)
by jzbiciak (guest, #5246)
[Link] (15 responses)
I don't know enough about Go or Rust to comment on them specifically, but more generally, allowing decimal commas seems more disruptive than allowing Unicode identifiers, unless the decimal comma has a different Unicode code point. (Does it? I honestly don't know.) That is, allowing αβγδ = 3; in a C or C++ program (or in most other languages) doesn't change the meaning of any program that doesn't use that flexibility. But, allowing the programmer to select the meaning of "1,23" is far more disruptive, because it changes the meaning of the ubiquitous comma. This problem arises because just about every language I've programmed in my 30 years as a programmer uses a comma for an argument separator if it uses a separator at all. Allowing a decimal comma gives the comma two very distinct roles in the same context. If "argument separator comma" and "decimal separator comma" are the same Unicode code point, then you need to use whitespace to disambiguate "1,23" from "1, 23". Ugly. I suppose you could design a programming language that didn't use commas as C, C++, Perl, and 100s of other languages do. In that case, the decimal comma would never introduce surprises. In any case, my point is that it should be easy to see why supporting decimal comma is a harder problem than supporting Unicode identifiers.
Posted Jun 8, 2013 11:57 UTC (Sat)
by l0b0 (guest, #80670)
[Link] (14 responses)
Posted Jun 8, 2013 14:01 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (11 responses)
Posted Jun 8, 2013 15:18 UTC (Sat)
by jzbiciak (guest, #5246)
[Link] (10 responses)
As I recall, LISP bullies its way out of that with an explosion of parentheses. Haskell looks like it largely avoids that, just glancing at some Haskell code on the net. But I don't know Haskell really at all, so I don't know how it addresses, say, sending the arguments 1, -2 to a function. Without a comma, does that look like the expression "1 - 2" or are there other rules you have to be aware of?
Posted Jun 8, 2013 16:00 UTC (Sat)
by SLi (subscriber, #53131)
[Link] (1 responses)
foo 1 (-2) is parsed as ((foo 1) (-2))
(in Haskell all functions really take exactly one argument; here foo would take an integer and return a function taking an integer, i.e. the type would be Integer -> Integer -> a)
foo 1 -2 would get parsed as (foo 1) - 2, so foo needs to be a function of the type Integer -> Integer.
Posted Jun 10, 2013 0:32 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
This page for instance shows a good few interesting syntax examples in a quite short space
http://www2.lib.uchicago.edu/keith/ocaml-class/functions....
Posted Jun 8, 2013 16:15 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (6 responses)
Posted Jun 8, 2013 18:45 UTC (Sat)
by jzbiciak (guest, #5246)
[Link] (1 responses)
Well, what I was getting at with my LISP comment is that an expression such as a + b * c - d, which needs no parentheses and is completely unambiguous in a C-like language ends up being (+ a (- (* b c) d)) in prefix notation. I went from 0 parentheses to 3 pairs of parentheses. Now, C has its own problems with its umpteen levels of precedence, problems that tend to lead to excessive parenthesis, but that's really a different conversation.
Posted Jun 9, 2013 23:56 UTC (Sun)
by tjc (guest, #137)
[Link]
(Or it would be, if the "bitwise" AND/XOR/OR operators where at a higher precedence level, just below the bit shift operators.)
Posted Jun 17, 2013 10:44 UTC (Mon)
by erich (guest, #7127)
[Link] (3 responses)
http://readable.sourceforge.net/
Now if we could come up with a language that requires both S-expression paramtheses, C-style braces *and* python indentation (maybe also add in brainfuck/whitespace and some visual basic for applications) then we can finally build the ultimate programming language of hell.
Posted Jun 17, 2013 20:03 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
[1]Arguments to functions use exactly one space between them (two spaces passes as the second argument).
Posted Jun 20, 2013 13:02 UTC (Thu)
by jzbiciak (guest, #5246)
[Link]
I would have thought COME FROM would be a better choice for such a language. That, and ABSTAIN to allow for conditional COME FROM. I forgot how evil MUMPS was, despite multiple articles on TheDailyWTF about it.
Posted Jun 20, 2013 6:12 UTC (Thu)
by dakas (guest, #88146)
[Link]
Posted Jun 9, 2013 14:37 UTC (Sun)
by joey (guest, #328)
[Link]
Haskell code often also avoids parens via other means. For example, the function
f x = foo (bar (baz x))
could be written as
f x = foo $ bar $ baz x
but is more likely to be written in point-free style as
f = foo . bar . baz
Incidentially, something very like the the Virgil-II tuple access syntax mentioned in the article is also available in haskell via the lens library. Haskell's syntax is well-suited to defining really interesting and useful operators. For example:
ghci> _1 .~ "hello" $ ("","world")
Posted Jun 8, 2013 15:18 UTC (Sat)
by jzbiciak (guest, #5246)
[Link]
Another, potentially much larger problem, isn't it? If you use white space as argument separators, then you need to use some other grouping construct to group together terms in expressions if you also want to allow whitespace in expressions. (more comment below, replying to mathstuf directly.)
Posted Jun 8, 2013 16:29 UTC (Sat)
by nlucas (guest, #33793)
[Link]
But just for the sake of discussion, on countries where the decimal separator is a comma, they just use ";" as the list separator (at least in my country, don't know about others). E.g. instead of func(1.2,1.3), just do func(1,2;1,3). It's just the same as what is done on mathematics (e.g. a range [-1.2,+1.3] would be [-1,2;+1,3]).
Posted Jun 8, 2013 7:39 UTC (Sat)
by renox (guest, #23785)
[Link] (2 responses)
Posted Jun 8, 2013 13:19 UTC (Sat)
by cmrx64 (guest, #89304)
[Link]
See https://github.com/cmr/terminfo-rs/blob/master/searcher.rs or any other rust code, for example.
The return keyword is *encouraged* to be used, it's not to be avoided at all. But it doesn't make sense to use as the result of an expression, because then you can't return from the function!
Posted Jun 9, 2013 2:15 UTC (Sun)
by ofranja (subscriber, #11084)
[Link]
The semicolon idea is not new - ML-derived languages had that syntax for decades. Don't think of implicit behaviour, but uniform behaviour: everything is an expression. Semicolon is just a shorthand for grouping them when you don't care about the return. And if you forget one, your code simply does not compile anymore.
I find it very simple, and easier to follow - specially at large code bases.
Posted Jun 8, 2013 8:15 UTC (Sat)
by mgedmin (subscriber, #34497)
[Link] (35 responses)
Posted Jun 8, 2013 10:22 UTC (Sat)
by lsl (subscriber, #86508)
[Link]
Posted Jun 9, 2013 12:15 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (33 responses)
Unicode is fairly insistent that e.g. although it provides two separate ways to "spell" the e-acute in café for compatibility reasons these two spellings are equivalent and an equality test for the two should pass. For this purpose it provides UAX #15 which specifies four distinct normalisation methods, each of which results in equivalent strings becoming codepoint identical.
If you don't do this normalisation step you can end up with a confusing situation where when the programmer types a symbol (in their text editor which happens to emit pre-combined characters) the toolchain can't match it to a visually and lexicographically identical character mentioned in another file which happened to be written with separate combining characters. This would obviously be very frustrating.
On the other hand, to completely fulfil Unicode's intentions either your language runtime or any binary you compile that does a string comparison needs to embed many kilobytes (perhaps megabytes) of Unicode tables in order to perform the normalisation steps correctly.
Posted Jun 9, 2013 12:32 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link] (20 responses)
Posted Jun 10, 2013 0:23 UTC (Mon)
by dvdeug (guest, #10998)
[Link] (19 responses)
Posted Jun 11, 2013 9:56 UTC (Tue)
by khim (subscriber, #9252)
[Link] (18 responses)
Since offer was "pick two" and you've decided to throw Unicode out the solution is obvious. Sure, but it is as way to achieve case-insensitivity and interoperation between Turks and non-Turks.
Posted Jun 11, 2013 20:04 UTC (Tue)
by dvdeug (guest, #10998)
[Link] (17 responses)
Creating a new character set only achieves interoperation in a theoretical way, since nobody is using it. You've not thrown out just Unicode; you've thrown out any character set that has seen actual use for Turkish.
Even if you do and get everyone to use it, how much bad data is going to get created? Imagine a keyboard with 3 i keys; we'd get a bunch of data with the wrong i or wrong I. You've also created a whole new set of spoofing characters; Microsoft had better race to get Microsoft.com (with a Turkish i) as should everyone else with an i in their name.
Posted Jun 11, 2013 22:04 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (16 responses)
It would be another story if dotless 'i' was the only unique letter in Turkish, but it's not. There also are: Ç, Ğ, I, İ, Ö, Ş, and Ü.
Posted Jun 12, 2013 5:12 UTC (Wed)
by dvdeug (guest, #10998)
[Link] (15 responses)
Even then, it doesn't work just fine. There's rules against registering mixed script domain names, and webbrowsers will display microsoft.com differently from mіcrosoft.com, because they detect that mixed script. Other places without that special code will provide no hint that the two aren't the same.
Having different characters with the same glyphs in the same script is even more problematic, because that special code won't work; there's no way a program could tell that microsoft.com (with a Turkish i) was a spoofing attempt.
Posted Jun 12, 2013 13:57 UTC (Wed)
by khim (subscriber, #9252)
[Link] (14 responses)
Posted Jun 12, 2013 19:01 UTC (Wed)
by dvdeug (guest, #10998)
[Link] (13 responses)
Posted Jun 14, 2013 21:40 UTC (Fri)
by khim (subscriber, #9252)
[Link] (12 responses)
Posted Jun 14, 2013 23:33 UTC (Fri)
by dvdeug (guest, #10998)
[Link] (11 responses)
Russian is written in the Cyrillic alphabet, unlike Turkish which is written in the Latin alphabet. It's not written in the Latin alphabet by accident; it was changed from the Arabic alphabet in 1927 in an attempt to modernize the country and attach themselves politically and culturally to the successful West. Separating the Turkish alphabet from the Latin is not a neutral act, particularly when you don't do the same to the French or Romanian.
Posted Jun 15, 2013 14:19 UTC (Sat)
by khim (subscriber, #9252)
[Link] (10 responses)
Sure. But this is what Unicode is all about. Unicode didn't happen in one step. Early character encodings were... strange (from today's POV). Not just Russian computers, US-based computers, too (think EBCDIC and all these strange symbols used by APL). Eventually some groups of symbols were put together and some other symbols were separated. Not just Cyrillic, but Greek (charset which is as closely related to Cyrillic as Turkish as related to Romanian), etc. Why Telugu and Kannada are separated but Chineese and Japanese Han characters are merged? If we want to make upcase/lowercase functions locale-independent we can do with Turkish (French, Romanian, etc) what was done with Telugu and Kannada.
Posted Jun 15, 2013 14:52 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link]
Posted Jun 15, 2013 22:52 UTC (Sat)
by dvdeug (guest, #10998)
[Link] (6 responses)
If you don't care about if the Turks are going to use your character set, go ahead and tell them to use ASCII. If you choose to separate their alphabet from the Latin, you're going to have a problem that they consider their alphabet part of the extended Latin alphabet, and they're not going to find that an acceptable solution. If you choose to separate out the alphabets of thousands of languages (even though the English alphabet is a superset of the French and Latin), you might mollify the Turks, but nobody is going to use your character set.
In reality, Turkish support requires locale-sensitive casing functions because every other solution has serious technical and often political problems, as well as not being compatible with existing systems, including keyboards.
Posted Jun 16, 2013 3:30 UTC (Sun)
by hummassa (guest, #307)
[Link] (2 responses)
Let's be plain: there is no "casing functions" that are not locale-sensitive. The Turkish dotted "i"s are one example, the German vs. Austrian "ß" is another, etc. And don't get me started on collation order. If one is going to try to facilitate computations by separating each locale to an alphabet, I wish good luck with its newnicode. The real Unicode thankfully does not work that way. Usually, at least. :-D
Posted Jun 16, 2013 8:21 UTC (Sun)
by khim (subscriber, #9252)
[Link] (1 responses)
Well, that's certainly a pity: Unicode was developed to fit in 16bit and thus merged many scripts (it assumed language will be separated "on the side" and/or will be less important then glyphs themselves). They have failed (today there are over 90'000 glyphs in Unicode) yet as a result we can not properly work with English+Turkish (or even German+Austrian) texts as you've correctly pointed out. Today we are stuck: yes, it's not perfect and this decision certainly made life harder, not easier, but it'll be hard to replace it with anything else at this point. Similar story to QWERTY. Numerous problems which stems from that old decision are considered minor enough and it'll be hard to switch. But note that the most popular OS does exactly that for CJK. It's slowly but surely is replaced by Unicode-based OSes (such as Android) thus in the end Unicode is probably inevitable, but it does not means that you can not achieve interoperability with Turkish people and working upcase/lowercase simultaneously. You can - Unicode prevents that, nothing else.
Posted Jun 16, 2013 10:33 UTC (Sun)
by dvdeug (guest, #10998)
[Link]
"this decision certainly made life harder, not easier"
There's no certainly about it. To type "mv Das_Boot_German.avi Boata_filmoj" in your system you'd have to change keyboards several times, from whatever language mv is in, to German, to English, possibly to whatever language you count avi as, then to Esperanto. Right now, you can type that from any keyboard that supports the ISO standard 26-letter alphabet. You can't search a document for Bremen without knowing whether someone considered that a German word or an English word, and e = mc², originally written by a German speaker but understood worldwide, would get an arbitrary language tag. While there are some Cyrillic and Greek look-alikes for Latin-script words, you would explode that; "go" could be encoded any number of ways, and any non-English speaker would have to switch their keyboard to go to lwn.net or google.com or any other English-named sight.
"note that the most popular OS does exactly that for CJK."
Note that the article you link to does not say Tron is the most popular OS, and that it does not do exactly that for CJK, because Chinese is not one language; it's a rather messy collection of languages. Tron forces Cantonese to be written in the same script as Mandarin and Jinyu. Note also that Tron treats Turkish the exactly same way Unicode does, as it's a copy of Unicode in everywhere but the Han characters.
"You can - Unicode prevents that, nothing else."
If by Unicode, you mean every character set ever used for Turkish (including Tron). I've never seen a fully worked out draft of a character set that fits your specifications. That's never really impressive, is it, when someone is claiming that something would be clearly easier yet it's never been tried.
Posted Jun 16, 2013 6:35 UTC (Sun)
by micka (subscriber, #38720)
[Link] (2 responses)
I suppose you mean "subset" ? Like English alphabet is strictly included inside French (without é, è, à, ...) and latin alphabet (I see no difference) ?
Posted Jun 16, 2013 9:17 UTC (Sun)
by dvdeug (guest, #10998)
[Link] (1 responses)
(If we're strictly speaking of the alphabet, neither of them count accents, so both French and English have the same 26 letters for the alphabet.)
Posted Jun 16, 2013 10:49 UTC (Sun)
by micka (subscriber, #38720)
[Link]
The spanish alphabet is more consistently considered as having 27 letters even though ñ could be considered a n with diacritic. And in the past, even some combination of letters (from the point o view of the latin alphabet) were considered separate letters.
And I don't even talk about http://en.wikipedia.org/wiki/Alphabet_%28computer_science%29 (each diacritic variant would be considered a different letter).
Posted Jun 16, 2013 5:36 UTC (Sun)
by viro (subscriber, #7872)
[Link] (1 responses)
[1] lowercase glyphs aside, (И, Н) and (Η, Ν) alone are enough to render the result unreadable (shift circa 16th century, IIRC; at some point both Eta and Nu conterparts got the slant of the middle strokes changed in the same way, turning 'Ν' into 'Н' and 'Η' into 'И')
Posted Jun 16, 2013 7:58 UTC (Sun)
by khim (subscriber, #9252)
[Link]
Of course you could. What's the problem? You'll be forced to read Greek letter-by-letter probably, but English-speaking person will mangle French or Turkish, too. It's not as if just resemblance letters of the alphabet matters in this case: English and French may use similarly looking characters, but they use them to encode radically different consonants, vowels and words. If you don't know which language is used you can not read your word, period. Identically-looking words in French and Turkish will have radically different pronouncements and will be, in fact, different words.
Posted Jun 9, 2013 20:09 UTC (Sun)
by khim (subscriber, #9252)
[Link] (11 responses)
Which nobody uses in programming languages because of performance reason. It's not as frustrating as you think. They don't type ı followed by ˙, they just type i. And the same with other cases. Any other approach is crazy. Why? Well, because many programming languages will show ı combined with ˙ as "ı˙", not as "ı̇". You may say that ı˙ is not canonical representation of "i". Ok. "и" plus " ̆" is the canonical representation of "й". Try this for size:
$ cat test.c Thus, in the end you have two choices:
Frankly I don't see second alternative as superior.
Posted Jun 9, 2013 20:48 UTC (Sun)
by hummassa (guest, #307)
[Link] (7 responses)
(UAX-15). I use it. Perl offers NFC, NFD, NFKC, NFKD without a huge perceivable (to me) performance penalty. AFAICT MySQL uses it, too.
> It's not as frustrating as you think. They don't type ı followed by ˙, they just type i. And the same with other cases. Any other approach is crazy. Why? Well, because many programming languages will show ı combined with ˙ as "ı˙", not as "ı̇".
This silly example tells me you don't have diacritics in your name, do you? Sometimes the "ã" in my last name is in one of the Alt-Gr keys. Sometimes I have to enter it via vi digraphs, either as "a" + "~" or "~" + "a". Sometimes I click "a", "combine", "~" or "~", "combine", "a". Or "~" (it's combining in my current keyboard by deafult, so that if I want to type a tilde, I have to follow it with a space or type it twice) followed by "a".
> й == й
it seems to me that your system is misconfigured. I could not see the difference between "й" and "й" in my computer, be it in Chrome's main window, location bar, gvim, or in yakuake's konsole window.
> Frankly I don't see second alternative as superior.
UAX15 is important. People sometimes type their names with or without diacritcs (André versus Andre). Some names are in different databases with variant -- and database/time/platform dependent -- spellings. In some keyboards, a "ç" c-cedilla is a single character, in others, you punch first the cedilla dead key and then "c", and in others you type, for instance, the acute dead key followed by "c" (it's the case in the keyboard I'm typing right now). Sometimes you have to say your name over the phone and the person on the other side of the call must be capable of searching the database by the perceived name. Someone could have entered "fi" and another person is searching by "fi".
So, sometimes your "second alternative" is the only viable alternative. Anyway, the programming language should support "compare bytes" and "compare runes/characters" as two different use cases.
Posted Jun 9, 2013 21:14 UTC (Sun)
by khim (subscriber, #9252)
[Link] (2 responses)
I may be mistaken, but it looks like you are discussion completely different problem. Both tialaramex and me are talking about programming langauges themselves. Really?. Let me check: Am I missing something? What should I add to my program to make sure I can refer to $й as $й? Of course not! You've replaced all occurrences of "й" with "й" - of course there will be no difference! Not sure why you've did that (perhaps your browser did that for you?) but if you do a "view source" on my message then you'll see a difference, if you do the same with your message - both cases are byte-to-byte identical. It'll be a little strange to see different symbols in such a case. Sure. In databases, search systems and so on (where fuzzy matching is better then no matching) it's important. In programming languages? Not so much. Most of the time when language tries to save programmer from himself (or herself) it just makes him (or her) miserable long (and even medium) term.
Posted Jun 10, 2013 16:12 UTC (Mon)
by jzbiciak (guest, #5246)
[Link]
Posted Jun 10, 2013 17:27 UTC (Mon)
by hummassa (guest, #307)
[Link]
You are right about this and I apologize for any confusion.
Posted Jun 9, 2013 23:38 UTC (Sun)
by wahern (subscriber, #37304)
[Link] (3 responses)
Using NFG solves all the low-level problems, including identifiers in source code, by getting rid of combining sequences altogether. Frankly I don't understand why it hasn't become more common. Maybe because most people just don't care about Unicode. Every individual has come to terms with the little issues with their locale. It's only when you look at all of them from 10,000 feet that you can see the cluster f*ck of problems. But few people look at it from 10,000 feet.
Posted Jun 11, 2013 1:07 UTC (Tue)
by dvdeug (guest, #10998)
[Link] (2 responses)
Posted Jun 13, 2013 1:19 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (1 responses)
And it's not like existing systems don't have their own issues. The nice thing about NFG is that all the complexity is placed at the edges, in the I/O layers. All the other code, including the rapidly developed code that is usually poorly scrutinized for errors, is provided a much safer and more convenient interface for manipulation of I18N text. NFG isn't more complex to implement than any other system that provides absolute grapheme indexing. It's grapheme indexing that is the most intuitive, because it's the model everybody has been using for generations.
But most languages merely aim for half measures, and are content leaving applications to deal w/ all the corner cases. This is why UTF-8 is so popular. And it is the best solution when your goal is pushing all the complexity onto the application.
Posted Jun 14, 2013 0:22 UTC (Fri)
by dvdeug (guest, #10998)
[Link]
Grapheme indexing is not what everybody has been using for generations. In the 60 years of computing history, there have been a lot of cases where people working with scripts more complex then ASCII or Chinese have handled it a number of ways, including character sets that explicitly encoded combining characters (like ISO/IEC 6937) and the use of BS with ASCII to overstrike characters like ^ with the previous character.
UTF-8 is so popular because for many purposes it's 1/4th the size of UTF-32, and for normal data never worse then 3/4 the size. And as long as you're messing with ASCII, you can generally ignore the differences. If people want UTF-32, it's easy to find.
Posted Jun 10, 2013 16:58 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (2 responses)
You seem to be suffering some quite serious display problems with non-ASCII text on your system, I don't know what to suggest other than maybe you can find someone to help figure out what you did wrong, or upgrade to something a bit more modern. I've seen glitches like those you describe but mostly quite some years ago. Your example program displays two visually identical characters on my system but I can believe your system doesn't do this, only I would point out that it's /a bug/.
Even allowing for that your last paragraph is hard to understand. Are you claiming that because on your system some symbols are rendered incorrectly depending on how they were encoded those symbols are _different_ lexicographically and everybody else (who can't see these erroneous display differences) should accept that?
Posted Jun 11, 2013 9:07 UTC (Tue)
by etienne (guest, #25256)
[Link] (1 responses)
It seems (some) people want to use a fixed-width font to write programs, mostly because some Quality Enhancement Program declared the TAB character obsolete, and SPACE character width is not a constant in variable-width fonts editors.
Posted Jun 11, 2013 10:13 UTC (Tue)
by mpr22 (subscriber, #60784)
[Link]
Posted Jun 8, 2013 13:19 UTC (Sat)
by bokr (subscriber, #58369)
[Link] (11 responses)
I am working on a language of my own (isn't everyone? ;-)
My language's comments follow either # or ##.
This means e.g. you can comment out a string of whatever length,
Which brings me to string syntax. I have a sort of purist gag reflex
My strings are always quoted with single double-quote delimiters,
Incidentally, this makes an easy way of commenting out blocks of code,
BTW, the preceding usage is not line oriented, strings begin exactly
String content is always raw, and whether to convert at read time or
Regards,
Posted Jun 8, 2013 13:44 UTC (Sat)
by cmrx64 (guest, #89304)
[Link] (4 responses)
Posted Jun 8, 2013 16:47 UTC (Sat)
by bokr (subscriber, #58369)
[Link] (3 responses)
(Naturally, I want my language to have good "important characteristics"
Posted Jun 8, 2013 19:04 UTC (Sat)
by ncm (guest, #165)
[Link] (2 responses)
Posted Jun 9, 2013 1:01 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
As one who implements code in languages with such characteristics, I'd *rather* focus on those. Those are the things that are going to have me ripping my hair out for a week tracking down some simple bug. One particularly nasty one I had to track down in C++ recently is where classes change size based on preprocessor defines typically defined on the command line (such as NDEBUG). Not much helps you in this case until you notice that the 'this' pointer in the ctor of one of the members of a derived class (the size-shifting class was a member of the base class) with an inlined constructor is not the same as &this->member.
Posted Jun 10, 2013 23:50 UTC (Mon)
by ncm (guest, #165)
[Link]
Posted Jun 10, 2013 0:13 UTC (Mon)
by tjc (guest, #137)
[Link] (1 responses)
I think I would probably flip those around , since # is already widely used for line comments. You will do yourself no favors by breaking common conventions.
Posted Jun 11, 2013 2:00 UTC (Tue)
by bokr (subscriber, #58369)
[Link]
Posted Jun 10, 2013 20:46 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (3 responses)
However, for your string expressions, I'd suggest looking at perl's generalised quoting. It's well thought out and works really well. You don't necessarily need to include qx(), but qw() and qr// are IMHO useful ideas.
Posted Jun 12, 2013 15:55 UTC (Wed)
by bokr (subscriber, #58369)
[Link] (1 responses)
Haven't played with perl since python became my most fluent pl,
The qq/qr/qx/qw functionalities are certainly useful.
The question is how built-in to make them, and what to make optional import,
At this point I am trying to get the primitives right ;-)
Posted Jun 12, 2013 16:46 UTC (Wed)
by bokr (subscriber, #58369)
[Link]
I thought I did python as well, but can't find it on this box. Let's see if google
http://www-personal.umich.edu/~jlawler/foggy.lsp
Anyone know who originally wrote it?
Posted Jun 14, 2013 11:04 UTC (Fri)
by bokr (subscriber, #58369)
[Link]
The token gets saved along with source line and char position, as a kind of
##"speed m/s"
e.g. might fail because of time instead of $time or $(time) not producing a number,
The inter-expression position allows error messages to use the anchors to locate
This is preliminary musing ;-)
Posted Jun 8, 2013 14:07 UTC (Sat)
by jhhaller (guest, #56103)
[Link] (3 responses)
Posted Jun 8, 2013 14:27 UTC (Sat)
by hummassa (guest, #307)
[Link]
Posted Jun 13, 2013 5:17 UTC (Thu)
by tnoo (subscriber, #20427)
[Link] (1 responses)
Microsoft Excel excels at that. Which is a complete nightmare, opening a german spreadsheet in an english version of Excel.
Posted Jun 13, 2013 10:53 UTC (Thu)
by storner (subscriber, #119)
[Link]
Posted Jun 8, 2013 16:21 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
Posted Jun 8, 2013 21:56 UTC (Sat)
by Tobu (subscriber, #24111)
[Link] (1 responses)
Posted Jun 8, 2013 22:01 UTC (Sat)
by Tobu (subscriber, #24111)
[Link]
Posted Jun 8, 2013 18:51 UTC (Sat)
by ncm (guest, #165)
[Link] (16 responses)
One such opportunity was almost touched on in the article. In western languages, flyspecks such as commas and semicolons are put at the end of a sequence, but they really introduce what comes after. Programming practice mimics this usage, but the usage interacts poorly with revision management systems that present a text line as the unit of change. Python elegantly sidesteps the problem at some cost(*). Go institutionalizes it. In C++, we sometimes see
The missed opportunity is to prefer markers that do not look odd preceding each item, so that lines can be added at top, bottom, or the middle with no confusing diffs resulting. Regular punctuation does not offer many alternatives, but ":", "*", "+" and "|" have worked well in various contexts. Usually, though, such preceders have been chosen to be deliberately jarring, as in assembly languages and TROFF that use "." to mark meta-directives.
Another common missed opportunity is to eliminate the preceding "*" pointer dereference operator. Pascal's postfix "^" was extremely practical, perhaps the only real virtue in the language. It fell away along with Pascal. In C-influenced languages "^" is too useful for other roles, but "@" would serve admirably. Curiously "@" is rarely used in programming languages, and remains eminently available for such a use in C++1x. "@" as both a unary postfix operator and as a binary array or map indexing operator would free up "[]" brackets for much better uses.
(*) The cost to Python users is that mis-indented lines often cannot be recognized as such. When cutting and pasting code into different contexts, finding the right indentation for each fragment is tedious and (therefore) error-prone.
Posted Jun 8, 2013 21:18 UTC (Sat)
by eru (subscriber, #2753)
[Link]
I would say another feature that should be borrowed from Pascal and its relatives is the declaration grammar that allows unambiguous parsing using simple techniques and without requiring feedback from the symbol table. The way C treats typedef names and C++ classes complicates the compiler, and also makes diagnostics worse: C and C++ compilers really cannot tell bare syntax errors apart from missing or mis-spelled declarations.
Posted Jun 10, 2013 0:32 UTC (Mon)
by tjc (guest, #137)
[Link] (8 responses)
I agree. The Unix signal function declaration, for example, would look a lot nicer with a postfix pointer declarator. But having a corresponding postfix indirection operator causes other problems, most notably with type casts, since postfix operators have higher precedence than prefix operators. You end up with something that looks like this:
((T@)p)@ // where 'p' is a pointer and 'T' is a type
Unless you make type casts postfix as well, but that looks even more unfamiliar:
p(T@)@
It might be best to break the rules and have a postfix pointer declarator while retaining a prefix indirection operator, like this:
*(T^)p
That looks more "normal" to me.
Posted Jun 10, 2013 13:35 UTC (Mon)
by renox (guest, #23785)
[Link] (7 responses)
Which have an AWFUL notation in C-like language: let's make a very *dangerous* operation non-greppable, yeah fun!
That said, your issue isn't too difficult to fix when you realize that a cast is in fact a two parameters operation: the type name and the object, so this syntax fix your issue I think: cast(T@,p)
IMHO this is a better way to solve the issue..
Posted Jun 10, 2013 16:06 UTC (Mon)
by tjc (guest, #137)
[Link] (6 responses)
Thanks for the suggestions. What I like about: I don't like the "angle brackets" in the second form, since these lexemes are already commonly overloaded as operators: And I'm not crazy about the comma in the first form. It makes it look like a function call, but a cast is not much like a function call. A function call has operands that are at the same lexical level, but a cast has one operand that acts on the other. I think the syntax should reflect this difference in semantics. ':' might be a better separator. One alternative is
Posted Jun 10, 2013 22:05 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (5 responses)
Not to mention that if T is something like std::map<K, T>, the comma is ambiguous. Related, GCC bug #35 (or so) where this is a parse error:
> void foo(std::map<std::string, int> const& map = std::map<std::string, int>())
because "int>()" is not a valid parameter declaration. You need parentheses (or a typedef) to get GCC to accept it.
Posted Jun 14, 2013 10:53 UTC (Fri)
by pjm (guest, #2080)
[Link] (4 responses)
Posted Jun 14, 2013 11:59 UTC (Fri)
by mpr22 (subscriber, #60784)
[Link] (3 responses)
Posted Jun 16, 2013 3:20 UTC (Sun)
by tjc (guest, #137)
[Link] (2 responses)
Posted Jun 16, 2013 3:31 UTC (Sun)
by hummassa (guest, #307)
[Link] (1 responses)
Posted Jun 16, 2013 12:19 UTC (Sun)
by pjm (guest, #2080)
[Link]
Posted Jun 10, 2013 8:11 UTC (Mon)
by jezuch (subscriber, #52988)
[Link] (1 responses)
I've seen it in some projects and I agree it's ugly as heck, even though I understand the intention behind it. In Java the parser allows "extra" commas after the last element in some places like array initalizers and enum declarations. I'm not sure if this was an accidental omission or intentional but it's quite useful, e.g.:
Object[] arr = new Object[] {
enum Test {
But it's not allowed in parameter lists, alas.
Posted Jun 10, 2013 11:03 UTC (Mon)
by sorpigal (guest, #36106)
[Link]
This is one of those little conveniences that I like about Perl: trailing commas are ignored (more or less), so you can say or in an argument list
Posted Jun 11, 2013 3:14 UTC (Tue)
by ceswiedler (guest, #24638)
[Link] (3 responses)
Declare a pointer: Foo^ (Foo plus a pointy thing)
Posted Jun 11, 2013 14:34 UTC (Tue)
by tjc (guest, #137)
[Link] (2 responses)
Another thing that might work well is implicit indirection, similar to Algol 68, but with more familiar syntax. That would result in a lot of
Posted Jun 11, 2013 15:17 UTC (Tue)
by viro (subscriber, #7872)
[Link] (1 responses)
The trouble with that approach is the shitload of hard to spot bugs happening when the programmer's idea of how the expression will be interpreted is different from what the Revised Report says (not to mention the places where compiler's idea of how it should be interpreted differs from either). And the rules are appallingly convoluted, exactly because it tries hard to DWIM. With usual nastiness following from that..
C is actually on a sweetspot between A68-level opaque attempt at DWIM (6 kinds of contexts, etc.) and things like BLISS where you have to spell *all* dereferences out - i = j + 1 is spelled i = .j + 1 (and yes, they went and used . for dereference operator, leading to no end of joy when trying to RTFS, especially when it's a lineprinter-produced listing).
Posted Jun 11, 2013 17:38 UTC (Tue)
by tjc (guest, #137)
[Link]
Posted Jun 9, 2013 15:33 UTC (Sun)
by deepfire (guest, #26138)
[Link] (2 responses)
Really, we should read more of http://www.lambda-the-ultimate.org/
Posted Jun 10, 2013 0:40 UTC (Mon)
by dvdeug (guest, #10998)
[Link] (1 responses)
Posted Jun 10, 2013 9:46 UTC (Mon)
by eru (subscriber, #2753)
[Link]
Back around 1980, before I really knew anything about programming langauges, I recall coming across an advert by IBM in some magazine (probably Scientific American), where it highlighted its research. It quoted one IBM researcher as saying something like "programming language design is like designing traffic signs: the meaning must be clear". For some reason that stuck in my mind. A pretty good insight for an advertisement.
Posted Jun 10, 2013 8:05 UTC (Mon)
by grahame (guest, #5823)
[Link]
Posted Jun 10, 2013 13:13 UTC (Mon)
by etienne (guest, #25256)
[Link] (13 responses)
Also, using digit separators is really needed when dealing with 64 bits numbers: 1,152,921,504,606,785,809 - 2,921,504,000,000,000 != 1,474,154,769 (obvious truncation to 32 bits, not enough commas); it should be 1,150,000,000,606,785,809.
Posted Jun 10, 2013 13:31 UTC (Mon)
by micka (subscriber, #38720)
[Link]
OK, I agree with that, but please, not the comma, it renders numbers unreadable for the part of the world population that use comma as decimal comma.
Posted Jun 10, 2013 14:08 UTC (Mon)
by oever (guest, #987)
[Link] (11 responses)
Macros allow any word (int, void, static, etc) in the code to be redefined which makes parsing the code impossible without knowing the macro definitions. I'd hate to see preprocessor use become more common.
Posted Jun 10, 2013 22:00 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
I will grant that there are times and situations where using the preprocessor is ugly and unnecessary, but that does not mean that it is always a worse solution.
Posted Jun 11, 2013 17:22 UTC (Tue)
by cesarb (subscriber, #6266)
[Link] (3 responses)
The C++ standard does not have it, but every relevant implementation (even MSVC) has it: #pragma once (https://en.wikipedia.org/wiki/pragma_once).
Posted Jun 12, 2013 14:16 UTC (Wed)
by khim (subscriber, #9252)
[Link] (2 responses)
MSVC actually introduced it... and it does not work. It only works if only ever have one project, never copy headers around and thus never have two versions of the same header. In practice GCC will actually compare files which will generate many nice debugging hours if you use VCS (which tend to mess with dates of files). Now it works: Please, don't use
Posted Jun 12, 2013 19:48 UTC (Wed)
by dvdeug (guest, #10998)
[Link] (1 responses)
Posted Jun 14, 2013 21:47 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Posted Jun 12, 2013 11:39 UTC (Wed)
by etienne (guest, #25256)
[Link] (5 responses)
C++ (without CPP) has no way to self reference names, I mean:
C++ (without CPP) has no way to print/read each of the fields of a struct, the only (dirty) way is:
struct mystruct {
void printstruct(const struct mystruct *str)
int scanstruct(struct mystruct *str, char *inputline)
static const int scanf_nb = 0
return scanf_nb == sscanf(inputline, scanf_format,
C++ (without CPP) has no way to conditionally comment part of the code at compilation time (make DEBUG=1 or gcc -DDEBUG=1) so that the exact same file is kept in your source management system (no special tree for debug).
C++ (without CPP) cannot manage simple special exception like a new field in a (memory mapped) structure only when generating for this special hardware.
C++ (without CPP) do not have automatic tools to remove a "conditional comment" from source code like "man unifdef"
Posted Jun 12, 2013 14:34 UTC (Wed)
by khim (subscriber, #9252)
[Link] (1 responses)
Works fine here: Other examples looks like a classic case for the If anything your examples support Stroustrup's position, not contradict it. The fact that most languages out there work just fine without a CPP (even low-level ones used to interact with hardware and write standalone OSes!) says something, after all.
Posted Jun 12, 2013 16:21 UTC (Wed)
by nybble41 (subscriber, #55106)
[Link]
In this case, however, the original sample would actually work, because __FUNCTION__ (and __func__) are handled by the compiler rather than CPP. The preprocessor doesn't parse the code, and consequently doesn't have any idea what the current function's name is. The __FILE__ and __LINE__ macros would be an entirely different matter.
Posted Jun 12, 2013 19:01 UTC (Wed)
by daglwn (guest, #65432)
[Link] (1 responses)
True. __LINE__ is one of the few reasons I use the preprocessor.
> C++ (without CPP) has no way to print/read each of the fields of a struct,
Wow, that's totally unreadable. Lots of people want introspection and I think we'll get it soon in C++.
> C++ (without CPP) has no way to conditionally comment part of the code at
Yes, but not quite in the way you think. I prefer:
#ifdef DEBUG
if (debugEnabled) { ... }
Using the preprocessor to hide code has bitten me so many times (different results with DEBUG on/off, etc.) that I just don't want to do it anymore.
> C++ (without CPP) cannot manage simple special exception like a new
Not true. Template metaprogramming.
Ok, you might need one #define TARGET, but that's it.
> C++ (without CPP) do not have automatic tools to remove a "conditional
I don't have that tool and can't imagine what I'd need it for. Can you give an example?
Posted Jun 14, 2013 11:47 UTC (Fri)
by etienne (guest, #25256)
[Link]
In those few cases, I more wanted a kind of simple database (30 elements with 10 properties), not complex introspection.
> > C++ (without CPP) do not have automatic tools to remove a "conditional
If you manage big and complex software which has decades lifespan, you will have some code which is no more valid because this hardware is no more in use.
Posted Jun 13, 2013 0:01 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
#define mystruct_members(call) \
struct mystruct {
If you use this extensively enough, declare_member and such could be factored out into a separate header so that the same expansion for FIELD_DEF isn't used dozens of times.
Maybe it doesn't work on older compilers (passing macro names as arguments and all), but I don't see much of a reason to not use this pattern if it's available and I haven't run into a compiler that hasn't supported it where I've used it so far (granted, that includes MSVC, newer GCC, and LLVM for the projects which use this).
[1]Because C++ has a could different contexts in which things like this can be expanded, the actual meta-macro takes a "ctx" parameter as well which is then used as: "BEG(ctx) call(...) SEP(ctx) call(...) SEP(ctx) call(...) END(ctx)" so that stray semicolons are avoided and that the macro can be expanded as part of an initializer list or argument list if needed.
Posted Jun 12, 2013 7:58 UTC (Wed)
by baberuth (guest, #15655)
[Link]
It seems most new languages take a revolutionary, rather than an Evolutionary approach. This might indicate the maturity level of the ecosystem as a whole.
The C2 language tries to be an evolutionary step of C, instead of designing a completely new language. Many of its syntax decisions are still open and will be determined by online polls among programmers.
Posted Jun 13, 2013 13:28 UTC (Thu)
by NRArnot (subscriber, #3033)
[Link] (12 responses)
I find the thought of unicode identifiers horrifying. It's hard enough trying to read code written by a programmer whose main (natural) language is not yours, and whose variable names therefore convey rather less hints of meaning to you than they might. But at least they are strings of 63 or so glyphs familiar to all of today's programmers. Trying to recognise strings of unfamiliar glyphs from an "alphabet" of 60,000 or more patterns most of which one has never seen before would be, to me, rather harder than parsing machine code dumped in hexadecimal.
Of course a (say) Arabic-world programmer might wish to write his variable names in Arabic, but in that case isn't the logical progression also to replace the language's reserved words with Arabic equivalents and (of course) to switch left and right on the page or screen. This would fragment programming the same way multiple natural languages fragment human discourse. Programming wasn't fragmented to start with, so shouldn't we keep it that way?
Posted Jun 13, 2013 14:47 UTC (Thu)
by renox (guest, #23785)
[Link]
Posted Jun 20, 2013 13:08 UTC (Thu)
by Otus (subscriber, #67685)
[Link] (10 responses)
I used to think Python's significant whitespace was awful, but since using it more I find it actually rather pleasant to work with. However, I now find I hate the colon. Since the indent already tells you where a block starts, why is it needed? Google tells me it's because "explicit is better than implicit", but in that case why is the *end* of a block implicit? Makes no sense to me...
Posted Jun 20, 2013 21:32 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (8 responses)
I don't think a newline-followed-by-indent clearly marks an end so well.
I can write:
So the requirement of a colon causes unbalanced brackets in the condition to be easily detected. Without it the compiler might not notice until much later.
A "dedent" (I think that is what python call the opposite of indent), on the other hand, always clearly marks the end of something. There is no uncertainty so no need for extra syntax.
Posted Dec 28, 2014 8:29 UTC (Sun)
by maryjmcdermott57 (guest, #100380)
[Link] (7 responses)
First of all, I want to say thank you for your article. You have the answer for my question (the section "semicolons & expression"), that I asked when should add ; at the end of whole if/else expression because it's also an expression. I asked that question in many forums but many people either not pay attention to it or thinking that it's as a silly question.
But as you answered my question. This leads to more questions.
As before, if the internal of if/else expression (i.e function call return unit-() type then we won't necessary add ; at the end of either each internal function call or the whole if/else expression.
So what about the function call outside of if/else expression.
fn main() {
As the code run, compile raise error. Why does that happens when println!() return unit-() type? So as the assumption above, this code should work.
And can I ask you one more question? Is the function declaration is expression or not? Because I don't see the ; at the end of } of function declaration. I mean if it's an expression, it should look like this
fn foo() {
Posted Dec 28, 2014 9:15 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Here you have a 'fn main()' returning the result of 'println!' invocation (a macro).
Posted Dec 28, 2014 13:16 UTC (Sun)
by maryjmcdermott57 (guest, #100380)
[Link]
But what I want to know is that why although both println!("hello") & println!("world") return unit-() type, we still need separate each of these with ;
Because as you saw in the article, if the return value inside if/else is unit-() type, it don't need to add ; at the end of whole if/else expression. Otherwise we have to.
Posted Dec 29, 2014 9:11 UTC (Mon)
by jem (subscriber, #24231)
[Link] (4 responses)
"As you said, I assume that in Rust, if the expression return the unit-() type then we wouldn't require add ; at the end of that expression." No, it's the other way around: if you add a semicolon at the end of an expression it turns the expression into a statement. Doing this throws away the value of the expression and returns () instead. Rust does not allow an expression to follow another expression, you will have to turn all but the last consecutive expression into statements by inserting semicolons after them. This means that you are only interested in the side effects of all but the last expressions, and return the value of the last expression. If you wish, you can put a semicolon after the last expression too, if the value of the last expression is not useful.
Posted Dec 29, 2014 10:26 UTC (Mon)
by maryjmcdermott57 (guest, #100380)
[Link] (3 responses)
But it's also give me more confusing. So can you explain a little bit more about this code.
if condition {
With this code above, the if/else is not last expression. And if both function don't return value so we don't need add ; at the end of } of whole if/else. That makes sense (as author said that the language doesn't require that ; if whole if/else doesn't have value). With that knowledge, why we have to add ; at the end of println!("hello"). Because println!("hello") also doesn't return any value
fn main() {
And if can, please answer me one more question. What about other blocks like match, loop, struct, function declaration? These are also expression as whole or not.
Because I saw function declarations next each other without ; between them. like this
fn foo() {
If they are expression, we have separate them with ; to compile correct, right?
Please bear with me if my questions make you annoying. I'm very confusing about these and I also add question in other places but not get the good answer or some just ignore & vote close topic.
Posted Dec 29, 2014 10:51 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
You need ';' because you can't just combine two expressions together. What operation should be used for that?
(never mind the fact that units can't be used in _any_ operation)
Posted Dec 29, 2014 11:21 UTC (Mon)
by maryjmcdermott57 (guest, #100380)
[Link]
Posted Dec 29, 2014 12:50 UTC (Mon)
by jem (subscriber, #24231)
[Link]
"With this code above, the if/else is not last expression. And if both function don't return value so we don't need add ; at the end of } of whole if/else. That makes sense (as author said that the language doesn't require that ; if whole if/else doesn't have value)." You don't put a semicolon at the end of } of the whole if/else. This has nothing to do with whether the if should return a value or not – you never put a semicolon there. The if/else returns a value if the last expressions in the if and else branches do not end with a semicolon. "What about other blocks like match, loop, struct, function declaration? These are also expression as whole or not." Just like in the if/else case, you don't put a semicolon there. As with the if case, this does not mean these constructs have a value. Rust (mostly) borrows this syntax from C. I think the best way of thinking about this is not to try to minimize the amount of semicolons. Do not focus all the time on whether you can leave out a semicolon, but instead think "do I want to return the value of the last expression in this block as the value of the block?". If that is the case, then you should leave out the semicolon.
Posted Jun 21, 2013 8:41 UTC (Fri)
by renox (guest, #23785)
[Link]
Bah, this is very selectively applied in Python, for example other languages distinguish variable declaration from variable assignment:
Posted Jun 26, 2013 13:30 UTC (Wed)
by nye (subscriber, #51576)
[Link]
Posted Sep 30, 2014 7:47 UTC (Tue)
by stefanct (guest, #89200)
[Link] (1 responses)
Posted Sep 30, 2014 14:49 UTC (Tue)
by jwakely (subscriber, #60262)
[Link]
Perl numeric constants
Perl numeric constants
Perl numeric constants
Several old languages allowed such "noise characters" to be inserted arbitrarily for supposed readability. For example, in PL/M you can insert a $ into identifiers and numbers, and it is ignored (100$000 = 100000, FO$O = FOO). I have always wondered why the designers of the language picked $, which looks a lot like a letter.
Perl numeric constants
I've worked on a variant of Pascal that allowed underscores freely in identifiers, and the underscores were ignored. So, similar to what you describe except not usable in numbers.Perl numeric constants
Perl numeric constants
Perl numeric constants
Perl numeric constants
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Language UNparseability
I'm with you on the octal thing. Does modern GCC have a -Woctal that warns when you use 0octal? If so, I think I'll make it part of my standard compiler warning flag set.
Language UNparseability
Language UNparseability
Language UNparseability
The way the instructions are shown on screen is not nearly as important as the conceptual cleanness of the language and the ability to automatically reason about the software.
Language UNparseability
Language UNparseability
If C++ is so awesome, why is everybody writing apps in JavaScript?
And yet, the very thing we work on most, the source code, is not easy to automate at all.
C and C++ rank high in the list of offenders because combinations of macros and includes make it impossible to parse a source code file without knowing the include directories.
Language UNparseability
Language UNparseability
Language UNparseability
Language UNparseability
Language UNparseability
Language UNparseability
Most language design centers on silly things like the serialization, the syntactic sugar.
Language UNparseability
Language UNparseability
Octal numbers
Octal numbers
Octal numbers
Little things that matter in language design
Little things that matter in language design
As long as the umask-style functions take symbols/strings and refuse integers, there's not much point to bother with octal. Less syntax, and one source of bugs removed.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
> error messages. For example, a missing } somewhere in the middle
> of the program will usually only throw an error on the last
> line of the file. It would be far more helpful to report the
> line number of the opening { that wasn't ever closed.
Little things that matter in language design
Little things that matter in language design
It would also be wonderful if compilers could track-back with error messages. For example, a missing } somewhere in the middle of the program will usually only throw an error on the last line of the file. It would be far more helpful to report the line number of the opening { that wasn't ever closed.
Error reporting is difficult with bottom-up shift-reduce parsing. Such parsers do right derivation, which is why lines numbers are sometimes misreported. LALR parsers generated by Bison have this problem.Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
const double CONST2 = 123.456;
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
I wonder if Lisp programmers can read their own code a month later...
And finding a misplaced ) in Lisp code is a masochists job.
Little things that matter in language design
Little things that matter in language design
... and INTERCAL's "PLEASE" keyword!
MUMPS's whitespace1
1 Arguments to functions use exactly one space between them (two spaces passes as the second argument).Little things that matter in language design
And finding a misplaced ) in Lisp code is a masochists job.
It's rather an editor's job. Lisp does not have a program syntax, it has a read syntax: programs are just "evaluated" standard data structures (lists, usually).
The read syntax is simple enough to be amenable to a lot of automated processing. Emacs has something like M-x check-parens RET
to find file-wide problems, but paren matching and indentation is also quite helpful. Even non-LISP aware editors like vi at least offer paren matching via %.
Now this is a language design choice: using macros is so much more dependable, powerful and coherent than with C/C++ that it is not funny.
Evaluating a macro call means taking the unevaluated arguments, calling the macro on it, and evaluating the result. Evaluating a function call means evaluating the arguments and calling the function on it.
Orthogonal, straightforward, powerful. There are is no technical "parser" barrier between input and code. Instead, there is a cultural barrier between code and programmer as humans are used to plenty of interpunction and semi-graphical representations (one of the reasons people prefer mathematical formulas in composed form rather than computer-language versions of them).
It is a tradeoff at a language conceptual level. It's not actually giving in to the machine (programming in assembly or machine language is that) but rather finding a common expressive ground easily manipulated by programs themselves.
Making it human-accessible involves optical patterning via programming styles and proper indentation. It's not the same as punctuation, but then punctuation without proper indentation does not really work all too well, either, and the worst case for programs generated by programs is meaning-carrying whitespace like in Python: you can't just write the program elements without knowing indentation context, running an indenter on the result if you need nice human readability.
LISP/Scheme is a superb environment for writing code that generates and/or analyzes code, because programs are not represented by a grammar but rather directly by their parse tree which has a computer- and tolerably human-readable and -writeable representation.
Little things that matter in language design
("hello", "world")
Little things that matter in language design
Little things that matter in language design
Rust semicolon handling is risky
If they wanted so much to avoid the return keyword, they should have used the Smalltalk ^ operator..
Rust semicolon handling is risky
POV
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Case-insensitivity, Unicode, interoperation between Turks and non-Turks. Pick two.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Short of making a special Turkish i and I, which comes with its own problems and nobody does, that's going to be a problem.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
I don't really see your point: you still are trying to explain why dropping Unicode for the sake of keeping case-insensitivity and interoperation between Turks and non-Turks is dumb choice. Yes, it's dumb, people usually pick some other pair. But it does not change the fact that it may work just fine (for some certain definition of "just").
Little things that matter in language design
Little things that matter in language design
Yet this is what used to solve the problem for Russian. Early computers in USSR only had Russian letters which were different from latin. And they, too, had this upcase problem (upcase for Russian "у" was "У" and for latin "y" was "Y"). It's not clear why Turks can not adopt the same solution. Well, "for historical reasons" probably - but that's still a "Unicode" choice.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Separating the Turkish alphabet from the Latin is not a neutral act, particularly when you don't do the same to the French or Romanian.
The relationship between the Turkish variant of the Latin alphabet and some other random European variant of the Latin alphabet more closely resembles the relationship between the Serbian and Russian variants of the Cyrillic alphabet than the relationship between the Cyrillic alphabet and the Greek alphabet.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
> In reality, Turkish support requires locale-sensitive casing functions
...
Little things that matter in language design
If one is going to try to facilitate computations by separating each locale to an alphabet, I wish good luck with its newnicode. The real Unicode thankfully does not work that way. Usually, at least. :-D
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
You can easily have a text in English with quoted sentences in French or in Turkish, using the same font. Try the same with e.g. Russian and Greek and see if you will be able to read the result[1].
[1] lowercase glyphs aside, (И, Н) and (Η, Ν) alone are enough to render the result unreadable (shift circa 16th century, IIRC; at some point both Eta and Nu conterparts got the slant of the middle strokes changed in the same way, turning 'Ν' into 'Н' and 'Η' into 'И')
Little things that matter in language design
For this purpose it provides UAX #15
If you don't do this normalisation step you can end up with a confusing situation where when the programmer types a symbol (in their text editor which happens to emit pre-combined characters) the toolchain can't match it to a visually and lexicographically identical character mentioned in another file which happened to be written with separate combining characters. This would obviously be very frustrating.
#include <stdio.h>
int main() {
printf("%c%c%c%c == %c%c\n", 0xD0, 0xB8, 0xCC, 0x86, 0xD0, 0xB9);
}
$ gcc test.c -o test
$ ./test | tee test.txt
й == й
Not sure about you but on my system these two symbols only look similar when copy-pasted in browser - and then only in the main window (if I copy-paste them to "location" line they suddenly looks differently!). And of course these two symbols are different in GNOME terminal, gEdit, Emacs and other tools!
Little things that matter in language design
> Not sure about you but on my system these two symbols only look similar when copy-pasted in browser - and then only in the main window (if I copy-paste them to "location" line they suddenly looks differently!). And of course these two symbols are different in GNOME terminal, gEdit, Emacs and other tools!
Little things that matter in language design
Anyway, the programming language should support "compare bytes" and "compare runes/characters" as two different use cases.
(UAX-15). I use it. Perl offers NFC, NFD, NFKC, NFKD without a huge perceivable (to me) performance penalty.
$ cat test.pl
use utf8;
$й="This is test";
print "Combined version works: \"$й\"\n";
print "Decomposed version does not work: \"$й\"\n";
$ perl test.pl
Combined version works: "This is test"
Decomposed version does not work: ""
it seems to me that your system is misconfigured. I could not see the difference between "й" and "й" in my computer, be it in Chrome's main window, location bar, gvim, or in yakuake's konsole window.
UAX15 is important.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
> You seem to be suffering some quite serious display problems with non-ASCII text on your system
Most software language needs indentations...
With non-ASCII chars in fixed-width font, if you even get the char shape in the font you are using, the only solution is probably to start drawing each char every N (constant) pixels and have the end of large chars superimpose with the beginning of the next char...
I use a fixed-width font to write code chiefly out of pure inertia: most of my coding is done in text editors running in character-cell terminals. Code written in Inform 7 is an exception (the Inform 7 IDE's editor uses a proportional font by default, and the IDE is so well-adapted to the needs of typical Inform 7 programming that not using it is silly), but Inform 7 statements look like (somewhat stilted) English prose so I don't mind so much.
Little things that matter in language design
Little things that matter in language design
been thinking about!
and would be interested in your reactions to what I am
planning for comment and string syntax.
The latter is the traditional rest-of-line comment prefix,
and the single # only comments out the immediately following expression.
multi-line or not, and #( expression ).another_part(more) foo
comments out from # to foo.
against non-nestability, e.g. re xml's <![CDATA[ ... ]]>, but even
a little purist angst re python's practical ''' and """ ;-)
but to make nested quoting work, I took a hint from MIME boundaries
and here-docs, and optionally delimit strings like
<identifier><double quote><content><double quote><identifier>
e.g., foo"Can quote bar"the bar content"bar without a problem"foo
so long as <double quote><identifier> does not occur in the content
that was started by <identifier><double quote>.
using the single expression comment prefix #
#unique_ignore_delimiter"
... arbitrary stuff here
"unique_ignore_delimiter
after the prefixed delimiter and end with the postfixed delimiter.
foo""foo is a zero length string just like _""_ and "".
run time according to C escaping or something else is specified
by a postfix notation whose exact syntax and semantics is for
another time ;-)
Bengt Richter
Little things that matter in language design
Little things that matter in language design
as opposed to "little things"?
(hm, are there bad "important characteristics"?) ;-)
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design
> The latter is the traditional rest-of-line comment prefix,
> and the single # only comments out the immediately following expression.
Little things that matter in language design
I will try that and see how it works out.
Little things that matter in language design
Little things that matter in language design
and made me wonder how much subconscious plagiarizing I am doing vs reinventing ;-/
about the time I decided for fun to create a chomsky.py from chomsky.pl[1] ;-)
I can do all those in various ways, including bootstrapping
by defining in terms of my language's more primitive ops.
and/or fiddle with startup configuration with invocation options.
Little things that matter in language design
and I wrote a perl chomsky.pl based on the lisp original, appending the latter
to the perl script as DATA, and scraping the good stuff without editing the original ;-)
can find a copy of the lisp .. yup ..
[sorry for getting a bit off topic]
Little things that matter in language design
if the expression is a name or string, so in that case it can be available for
output at both compile time and run time.
source-anchor token. I anticipate debugging use something like (switching as
suggested to ## for this comment prefix),
speed ##meters $(dist) ##seconds time # this ##is all just #-comment to eol
;{; # example syntax error if never closed with }
or at the syntax error if speed can handle a name instead of a number.
errors more precisely even if full tracebacks are not available, and hopefully
can also locate the last anchor passed for syntax errors, e.g. in case of a runaway
bracket or quote.
Decimals used by some of my European colleagues
COBOL had Decimals used by some of my European colleagues
ENVIRONMENT DIVISION.
DECIMAL-POINT IS COMMA.
:-D
Decimals used by some of my European colleagues
Decimals used by some of my European colleagues
Curiously, I find Python's indentation-based block structure quite annoying in Python, but not at all annoying in Inform 7. (Probably because Inform 7 doesn't have a REPL, which is where Python's block structure system manages to fail "no sharp edges".)
Little things that matter in language design
IPython has a %paste function that deals with this intelligently.
It would be even better if it just did it by default, by just looking at how indented the first line is. For more complicated editing, like combining several fragments, you can go with %edit. An inline editor would be great, but that might be considered feature creep, and the IPython notebook provides sort-of the same thing.
Little things that matter in language design
Anyway, copy-pasting into the interpreter is annoying because you have to retrace the initialisations that make the block of code work. It's far more practical to insert Little things that matter in language design
import IPython; IPython.embed()
, switch to prototyping, and paste back into your editor.
A new industrial language catches on so infrequently that it's almost tragic when opportunities for real syntactic improvements are passed up, almost as much so as when features inherited from earlier languages are misunderstood and corrupted.
Missed opportunities
Ob::Ob
( int a
, int b
, int c
)
: _x( a + b - c)
, _y( a - b + c)
, _z(-a + b + c)
{}
enum T
{ T1
, T2
, T3
};
which while practical can be jarring.
Pascal's postfix "^" was extremely practical, perhaps the only real virtue in the language
Missed opportunities
Missed opportunities
Missed opportunities
or better(?) in C++ like notation cast<T@>(p)
Missed opportunities
cast(T@,p)
:
<
, <=
, <<
, etc.cast(p T@)
, since 'p' is an identifier and never contains spaces. It reads nicely too: "cast p to whatever." If the first operand is a complex expression, then things get messy again, and extra parenthesis are required. But at least the common case is clean. There's always something. :)Missed opportunities
Missed opportunities
Regardless of whether [] would be easier for the machine to parse, the decision to allow values (rather than just types) as template parameters means that [] would make life harder for humans trying to parse the code.
Missed opportunities
Missed opportunities
because you couldn't know by glancing ifMissed opportunities
a[3]
is an array dereferencing or a template instantiation.
Can you expand on that? So far I'm not convinced:
Missed opportunities
Missed opportunities
>
> Ob::Ob
> ( int a
> , int b
> , int c
> )
> : _x( a + b - c)
> , _y( a - b + c)
> , _z(-a + b + c)
> {}
>
> enum T
> { T1
> , T2
> , T3
> };
obj1,
obj2,
obj3,
}
TEST1,
TEST2,
TEST3,
;
}
Missed opportunities
my %map = (
'one' => 1,
'two' => 2,
'three' => 3,
);
This is worth it even if only because the addition of a line to the map creates a one-line diff and not a two-line.
sub bar{
return join(',',@_);
}
print bar(
1,
2,
3,
);
Missed opportunities
Dereference an address: @Foo (what's at Foo?)
Take an address: &Foo (address-of Foo--I guess I'm just used to this one)
Yeah, I like that too. There's no requirement that a postfix declarator has to have a matching postfix operator.
Missed opportunities
if (&node == &head)
to suppress indirection in some cases, but the common case would be clean. The problem is, one would have to write a compiler and then write a lot of code to see how well this works in practice.Missed opportunities
I'm not an expert on Algol 68 (Adriaan van Wijngaarden was probably the first, one of few, and last), but I think implicit indirection only worked in the language because it restricted the things you could do with pointers. Something like Missed opportunities
*p--
in C, for example — I don't know how that could be expressed without an explicit indirection operator.
Little things that matter in language design
Little things that matter in language design
Exactly. Language design is not only a computer-science problem, it is also an ergonomic problem.
Little things that matter in language design
Little things that matter in language design
Little things that matter in language design: preprocessor support?
- adds/modify source lines to be executed like printf()
- increase size of array to add testable cases
- create variables to help problem finding (and pass them to sub-function)
- check some state for integrity (conditional commenting)
All that keeping the same source code in your source control system.
I know people who would love it in VHDL, because managing sub-branches or un-commenting a lot of non consecutive lines just to debug is very complex.
Digit separators are used every 4 digits in hexadecimal, I am not sure it shall be the same "digit separator" as for decimal.
Little things that matter in language design: preprocessor support?
Using preprocessor macros in C++ is strongly discouraged by Stroustrup. He says macros should only be used for include guards, something for which C++ has no other mechanism. For other cases can use constexpr, templates and
Little things that matter in language design: preprocessor support?
Little things that matter in language design: preprocessor support?
Little things that matter in language design: preprocessor support?
Little things that matter in language design: preprocessor support?
$ mkdir lib
$ echo $'#pragma once\nint a;' > lib/test.h
$ mkdir installed
$ cp -a lib/test.h installed/test.h
$ echo $'#include "lib/test.h"\n#include "installed/test.h"' > test.c
$ gcc -E test.c -I. -o-
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
# 1 "lib/test.h" 1
int a;
# 2 "test.c" 2
And now it does not:
$ touch installed/test.h
$ gcc -E test.c -I. -o-
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
# 1 "lib/test.h" 1
int a;
# 2 "test.c" 2
# 1 "installed/test.h" 1
int a;
# 2 "test.c" 2
#pragma once
- it's not worth it.Little things that matter in language design: preprocessor support?
Why would you fail? If newer versions of components are backward-compatible (and they should be backward compatible if they are separate components) then you just need to copy headers in proper order... which happens automatically: first updated header is in new component itself (and it's headers always included before headers from other components), then you update next component in dependencies DAG, etc.
Little things that matter in language design: preprocessor support?
Little things that matter in language design: preprocessor support?
printf ("Entering %s\n", __FUNCTION__);
#undef FIELD_DEF
#define FIELD_LIST() \
FIELD_DEF(char, fieldname1, "0x%X", "%hhx") \
FIELD_DEF(unsigned, fieldname2, "0x%X", "%i")
#define FIELD_DEF(type, name, howto_print, howto_scan) type name;
FIELD_LIST()
#undef FIELD_DEF
}
{
#define FIELD_DEF(type, name, howto_print, howto_scan) \
printf (#name howto_print "\n", str->name);
FIELD_LIST()
#undef FIELD_DEF
}
{
static const char *scanf_format =
#define FIELD_DEF(type, name, howto_print, howto_scan) #name " " howto_scan " "
FIELD_LIST();
#undef FIELD_DEF
#define FIELD_DEF(type, name, howto_print, howto_scan) + 1
FIELD_LIST();
#undef FIELD_DEF
#define FIELD_DEF(type, name, howto_print, howto_scan) &str->name,
FIELD_LIST()
#undef FIELD_DEF
);
}
(obviously for methodologies which do allow bugs to enter the source management system, others don't need special stuff as bugs are fully denied).
Little things that matter in language design: preprocessor support?
C++ (without CPP) has no way to self reference names, I mean:
printf ("Entering %s\n", __FUNCTION__);
$ cat test.cc
#include <stdio.h>
int main() {
printf ("Entering %s\n", __func__);
}
$ gcc test.cc -o test
$ ./test
Entering main
boost::serialization
or other metaprogramming tricks except for the requirement to use preprocessor without preprocessor. I mean: you can't use -D directive which is preprocessor-specific... well, duh - that's directive for CPP, not for the compiler! With C++ you implement you special cases as template specializations and then just construct correct then version you actually need from the main prohgram). make DEBUG=1
works while gcc -DDEBUG=1
, of course, does not.Little things that matter in language design: preprocessor support?
Little things that matter in language design: preprocessor support?
> printf ("Entering %s\n", __FUNCTION__);
> the only (dirty) way is:
> compilation time (make DEBUG=1 or gcc -DDEBUG=1) so that the exact same
> file is kept in your source management system (no special tree for debug).
const int debugEnabled = true;
#else
const int debugEnabled = false;
#endif
> field in a (memory mapped) structure only when generating for this
> special hardware.
> comment" from source code like "man unifdef"
Little things that matter in language design: preprocessor support?
> > the only (dirty) way is: ...
> Wow, that's totally unreadable. Lots of people want introspection
> and I think we'll get it soon in C++.
Some target I have have very small memory size (256 Kbytes total internal memory on processor before the DDR-RAM is initialised; or soft-processor (written in VHDL inside an FPGA) with 96 Kbytes RAM), I cannot afford indirections and data hiding.
These macros enabled me to reduce the number of lines of source code to maintain, while keeping total control of the structure for the tens of different exceptions where you cannot use the database.
> > comment" from source code like "man unifdef"
>
> I don't have that tool and can't imagine what I'd need it for.
> Can you give an example?
At some point nobody you know remember why this #ifdef was added, and when you try to compile with the #ifdef enabled it does not compile (for the last 3 years).
That is the right time to run "unifdef" to remove that part of code automatically from all your sources.
Sometimes these parts of code are extremely dirty hacks, made to handle the bug of an external company (don't ask, you can't get it fixed), and you really do not want to alter your design to handle that possible bug (only when you sell box A to third party which has box B).
Lucky you are if you are not forced to eat some other company's dog food for years at a time...
Little things that matter in language design: preprocessor support?
call(char, fieldname1, "0x%X", "%hhx") \
call(unsigned, fieldname2, "0x%X", "%i")
#define declare_member(type, name, print, scan) type name;
mystruct_members(declare_member)
#undef declare_member
};
Little things that matter in language design
Little things that matter in language design: make it do what it looks like it does
Little things that matter in language design: make it do what it looks like it does
That said, gofmt(or similar tools) is another way to make sure that indentation is correct without having an n-th discussion on whether you should use tab or space to indent your code and how to configure correctly your editor..
Little things that matter in language design: make it do what it looks like it does
I actually like the colon. It clearly marks the end of the condition and the start of the statement block.
Little things that matter in language design: make it do what it looks like it does
if something and (otherthing
or whatever) :
statements
and the first "newline-followed-by-indent" doesn't mark the end of anything.
Little things that matter in language design: make it do what it looks like it does
I want to ask one question about Rust.
As you said, I assume that in Rust, if the expression return the unit-() type then we wouldn't require add ; at the end of that expression.
The rule seems not true anymore
I mean I have this code
println!("hello") // I think don't need add ; at here but it's wrong
println!("world") // I think don't need add ; at here but it's wrong
}
// do something
}; // ; should add in here but in practical, it's not there
Little things that matter in language design: make it do what it looks like it does
>
>fn main()
>{
> println!("Hello, world!")
>}
>
Little things that matter in language design: make it do what it looks like it does
Little things that matter in language design: make it do what it looks like it does
Little things that matter in language design: make it do what it looks like it does
- with the last expression in a block, add ; is up to me (depend on my intention)
function1()
} else {
function2()
}
expression3;
println!("hello") // why we have to add ; at here
println!("world")
}
// ...
} // don't have ; at here
fn bar() {
// ...
}
Little things that matter in language design: make it do what it looks like it does
>println!("hello") // why we have to add ; at here
>println!("world")
>}
Little things that matter in language design: make it do what it looks like it does
Little things that matter in language design: make it do what it looks like it does
Little things that matter in language design: make it do what it looks like it does
var x = ... (declare x and assign a value to it)
x = ... (only assignment)
in Python you only have "x = ...": the first assignment implicitly declare x.
Little things that matter in language design
There is no 0b prefix in C11
There is no 0b prefix in C11