DYOL: Design Your Own Language


Access Modifier

DwI:Angles, DB-PD:65, CD-AH:42, LD-WH:58

Annotate components with information about how others are allowed or not allowed to access them. Access can be limited by inheritance (protected in C++), modular structure (internal in C#), etc. The most popular modifiers are public (everyone welcome) and private (fully restricted). Similar modifiers can be used to manage scope, such as global and nonlocal in Python.

Alphabet§

DwI:Perceived affordances, DB-GD:28, DB-RD:92, DB-PD:165, CC-DG:15, CC-NW:10, CD-AH:52, LI-BH:10, PT-AO:34, PT-HU:1, PT-GJ:6, LD-ED:5

The basic alphabet is often taken for granted, especially for textual languages, but it is an important design aspect. In some languages (APL being the extreme) the alphabet is extremely broad, with specific symbols being used for built-in operators, which shifts the visual feel of the language closer to mathematics. In other languages keywords are taken from English, which limits language appeal to some groups of users (and may lead to reimplementations with translated keywords).

Assignment

DB-GD:50, DB-RD:478, CC-WG:23, CD-AH:620, CD-SM:36, CD-GR:270, LI-RM:87, LI-PZ:201, PL-RS:293, PL-WC:82, PL-BM:105, PT-AO:206, LD-ED:3, LD-JW:28, LD-WH:54, SL-AS:190, SL-RL:9

Moving a data from one place to another. Some 4GLs have separate statements for straightforward (byte-copying) and composite (pattern-matching) assignments such as Cobol's MOVE CORRESPONDING which requires unification. In modern languages the source data structure (and sometimes the target one) can often be created on the fly. Many languages combine assignment with trivial manipulation (such as +=).

Backtracking

DwI:Kairos, DB-GD:174, DB-RD:181, CC-DG:85, CD-GR:688, LI-PZ:378, PL-RS:629, PL-BM:516, PT-GJ:183, SL-RL:201

A computation strategy commonly found in declarative languages. Every choice in the evaluation path becomes a save point to which the computation returns in case of failure. All the changes made between the save point and the point of failure are undone. Backtracking is common in parsers and logic programming, and used for error recovery everywhere else.

Backward Compatibility

DwI:Worry resolution, SL-RL:38

In language evolution, introduce new features that should supercede older ones, but ensure the users that their existing code will still run. Ideally, this code should eventually be rewritten and coevolved.

Block§

Wile:Behaviour, DB-GD:53, DB-RD:411, DB-PD:62, CC-DG:198, CD-AH:559, CD-SM:676, LI-BH:95, LI-RM:88, LI-PZ:356, PL-RS:305, PL-WC:85, PL-BM:106, LD-ED:13, LD-JW:1, SL-AS:34, SL-RL:178

Viewing a list of statements as a specific (compound) kind of statement is a conceptual eye-opener and allows to treat composite constructs in a uniform and orthogonal way (if … begin … end and do … begin … end instead of if … endif and do … enddo). Languages either use delimiters (begin/end or curly brackets) or indentation. Blocks can be seen as degenerate subprograms and be useful in optimisation.

Branching

DB-GD:282, DB-RD:497, DB-PD:508, CC-DG:277, CC-WG:31, CC-NW:61, CD-AH:474, CD-SM:38, CD-GR:580, LI-BH:133, LI-RM:87, LI-PZ:357, PL-RS:306, PL-WC:86, PL-BM:42, PT-AO:173, LD-ED:20, LD-JW:43, SL-AS:26, SL-RL:380

Forking the computation based on conditions known at runtime, is a popular construct. Control flow can be transferred unconditionally (branch, jump, goto), or conditionally (based on true/false, zero/positive/negative, explicit condition, exhaustive patterns, etc). In some languages branching can be done by guarding statements with constraints.

Character Type

DB-GD:43, CC-DG:180, CC-WG:25, LI-RM:35, LI-PZ:215, PL-RS:219, PL-WC:67, PL-BM:185, LD-ED:81, LD-JW:13, LD-WH:27, SL-AS:130

A family of value types that can be used in a language: single characters, special characters, zero-terminated strings, fixed length strings, variable length strings, structured strings, etc.

Class

Wile:Relationships, CD-GR:544, PL-RS:110, PL-WC:107, PL-BM:464, LD-WH:40, SL-AS:189

A class or a trait represents a template that can be followed by objects: a particular collection of properties and methods that can be always relied on. A class can be then instantiated with appropriate parameters to form an object that conforms to the class definition. Classes are the ultimate form of encapsulation. They can be inherited from one another to form subclasses.

Client/Server

LI-PZ:526, LD-WH:12

A language may allow one conceptual model to be split into two intercommunicating components to be executed in parallel: the server side which has access to all the necessary system data and runs in a fully controlled environment, and the client side which runs closer to the system user's data and has to survive in a much less controllable environment. Client code and server code can be written in different languages or compiled to different languages before deployment.

Code Completion

DwI:Choice editing, DwI:Portions

Many IDEs monitor what the language user is typing and make suggestions based on their knowledge of the language keywords, constructs allowed in the context, variables visible from the current namespace, etc. The list of such suggestions must be short to be useful, otherwise it does nothing but annoy the users.

Code Generation

Wile:Code Generation, DB-GD:518, DB-RD:513, DB-PD:617, CC-DG:337, CC-WG:253, CD-AH:445, CD-SM:137, CD-GR:313, LI-BH:183, LI-PZ:111, PT-GJ:570, LD-WH:372, SL-AS:473, SL-RL:73

Generation of machine code, intermediate code, a model in a target language, an output model or a textual result, is the last phase of a classic compiler (before or after optimisation). What is typical for code generation is the richness of the input (generously annotated intermediate graphs) and a deliberate limitedness of the output (which is often platform-specific and/or hardware-specific). In MDE code generation is usually implemented by model-to-text transformations.

Code Mining

DwI:Peer feedback, SL-RL:446

Besides user surveys and expert opinions, there is a third way to uncover points to improve the language in its next versions: examining existing models created in this language. There are many modern techniques in mining software repositories that can be helpful here: clustering, vocabulary analysis, statistics (especially correlations), natural language processing, information retrieval, machine learning, etc.

Code Ownership

DwI:Watermarking

Signing the user's name under a piece of code has the same effect as signing a person's name on an item: caring about what happens to the item later. Comments explaining which dev made which code changes existed since very early on. In modern ecosystems, ownership is tracked automatically by a version control system and can be checked at any time (git blame).

Collection

DB-GD:39, DB-RD:552, DB-PD:471, CC-DG:175, CC-WG:23, CC-NW:83, CD-AH:599, CD-SM:30, CD-GR:473, LI-BH:122, LI-RM:123, LI-PZ:238, PL-RS:71, PL-WC:156, PL-BM:70, PT-AO:142, LD-ED:37, LD-JW:55, LD-WH:34, SL-AS:92

Arrays, lists, tuples, sets and multisets are the most common composite user-defined parametrised types for collections of elements. It is up to the language designer to decide which ones are supported and how they are handled — can elements within one collection have different types, are they mutable, passed by name/value/reference, etc.

Comment

DwI:Contrast, DB-RD:54, CC-DG:64, CD-GR:71, LI-BH:10, LI-RM:33, LI-PZ:99, PL-WC:12, LD-ED:36, LD-JW:9, SL-AS:111, SL-CF:23, SL-RL:212

Comments are pieces of documentation built directly into the source of the system. Most IDEs support comments visually by presenting them in a completely different colour, usually dimmer than the rest of the model, to focus developers on executable constructs first. In some languages like BibTeX or INTERCAL everything uncompilable is a comment. Some comments like codetags, Javadoc or Documentation Comments are strictly or semi-structured.

Compilation Error

DwI:Conditional warnings, DB-GD:21, DB-RD:11, DB-PD:160, CC-DG:455, CC-WG:303, CD-AH:88, CD-GR:95, LI-BH:21, LI-RM:39, LI-PZ:109, PT-GJ:521, LD-JW:201, SL-RL:151

Modern languages have many means of assessing validity of the model before it is actually used. Thus, compilers tend to have a sophisticated error handling facility and try to provide enough information for the language user to fix the problems. Some languages are notoriously known for providing bad error messages. There are many ways to recover from an error in order to analyse the rest of the program and report multiple problems at once. Can be provided as a live feedback.

Compilation Warning

DwI:Did you mean?, DB-GD:382, DB-RD:164, DB-PD:161, CC-DG:315, CC-WG:304, CD-AH:34, CD-GR:121, LI-BH:83, LI-RM:79, PT-GJ:522

When a compiler detects a possibly dangerous situation with extremely limited applicability, it displays a warning message and proceeds with the build process anyway. In many cases there is a special option for disabling a particular warning for a particular piece of code. Warnings can be given when an anomaly or a smell is detected, and may involve some form of error correction. Can be provided as a live feedback.

Comprehension§

CD-GR:621

List and set comprehensions are language constructs resembling the mathematical notation for creating a set by its characteristic function ("for all numbers from 1 to 10, give me their squared values"), and combine map and filter classical for functional programming. Comprehensions as a language construct exist in Haskell, Python, Rascal, C# and some other languages.

Concrete Syntax

DwI:Transparency, Wile:Artificial Language, Mernik:Notation, DB-GD:28, DB-RD:25, DB-PD:78, CC-WG:17, CD-AH:166, CD-GR:115, LI-BH:3, LI-PZ:41, PL-RS:124, PL-BM:89, PT-AO:39, PT-HU:2, PT-GJ:7, SL-CF:21, SL-RL:65

The way to describe the concrete representation of the programs. The concrete syntax is used by humans to read, write, create and understand sentences of the language. Usually the only languages that do not have concrete syntax are those intended for internal intermediate representation. Some languages have more than one.

Concurrency

DB-PD:51, CC-WG:32, CD-SM:571, CD-GR:331, LI-PZ:483, PL-RS:503, PT-AO:254, LD-WH:419, SL-AS:254

Since modern computers and systems are good at multitasking, a language designer may decide to use that. An executable model can then be decomposed into components that are executable in parallel on different CPU cores or different devices. This can be completely undesirable (to avoid deadlocks, overhead, race conditions, etc), or performed automatically, or use the language user's guidance in synchronisation of threads, tasks and processes.

Constraint

Wile:Constraints, PL-BM:279, PT-GJ:567, LD-WH:474, SL-AS:393, SL-RL:400

Besides languages which programs are expressed only in terms of constraints (OCL, CLR(R), Oz), there are many that have them in one form or another. The most popular form is assertions, a non-invasive form of exception handling allows language users to explicitly state (assert) invariants, pre-conditions and post-conditions as logic expressions that must universally hold. Such assertions can be easily removed before deploying the system into production.

Cross-compilation

DB-GD:24, CD-GR:32

A cross-compiler works on one platform but ultimately targets another. Relying on a cross-compiler allows to separate the development platform from the one where the programs get deployed to — for instance, a mobile app developer can work with a proper keyboard and a big screen. The IDE for a cross-compiled language may include a virtual machine for execution, debugging, etc. A compiler capable of producing code for different targets, is called retargetable.

Debugging

DwI:Are you sure?, Wile:Debugging, DB-RD:703, CC-DG:334, CC-WG:322, CD-AH:263, LI-BH:51, LI-RM:261, LI-PZ:55, LD-WH:116, DwI:Interlock

The activity of finding and fixing sources of incorrect behaviour is not enjoyed by many language users, but is used by all of them without exception anyway. Declarative and constraint languages are the hardest to debug due to their complex evaluation strategies (unification, backtracking, etc) and imperative ones are the easiest since they specify the algorithm most explicitly. Most modern languages are shipped with a dedicated debugger or have debugging functionality in the IDE.

Default

DwI:Defaults, SL-RL:57, DwI:Opt-outs

Unchanged configuration options, uninitialised variables and unspecified optional modifiers are examples of situations when a default value must be used by the compiler. These default values are decided by the language designer and typically represent the best option within the paradigm.

Deployment

DwI:Conveyor belts, Mernik:Deployment, LD-WH:195

Once the model written in a language, has been checked, compiled, linked and otherwise prepared for use, it may need to be deployed. This happens directly by copying it to the machine of the end user, or by connecting it to the network, or by creating a special installer, etc. In many cases deployment is not viewed as a concern of a language designer, but among practitioners it is perceived as a part of language design.

Deprecated Construct

DwI:Feature deletion, SL-RL:15

In language evolution, sometimes a no longer desired construct cannot be simply removed to avoid breaking backward compatibility. However, it can be marked explicitly as deprecated to discourage language users to rely on it.

Design Chart/Diagram

DwI:Possibility trees, Wile:Usage Graphs, DB-GD:79, DB-RD:6, DB-PD:30, CC-DG:53, CC-WG:90, CD-AH:13, CD-SM:100, CD-GR:215, LI-BH:21, LI-RM:63, LI-PZ:363, PL-RS:139, PT-AO:43, PT-HU:2, PT-GJ:58, LD-JW:4, SL-CF:30, SL-RL:8

UML distinguishes between structural (class, package, object, component, composite structure, deployment) and behavioural diagrams (activity, sequence, use case, state, communication, interaction overview, timing). The former specify and visualise structure breakdown, the latter — events and interaction. Some languages (e.g., syntactic diagrams) are both.

Documentation

DwI:Serving suggestion, LI-PZ:532, LD-ED, LD-JW, LD-WH, SL-CF:264

There are two equally important kinds of language manuals: for people learning the language and for its active users — and sometimes these are two disjoint sets of documents. Documentation may contain executable examples and can/should be automatically checked for internal validity and consistency. Some documentation elements must be provided through an IDE, especially if the language is an API.

Encapsulation

DwI:Hiding things, LI-PZ:236, PL-RS:37, PL-WC:104, PL-BM:12, LD-WH:229, SL-AS:15, SL-RL:19

Most high level language abstract from low level details like video memory access, memory allocation, register values, caching, etc. Depending on the language design and philosophy, these features may be prohibited or just hard to find for beginners. Data structures can also be encapsulated by bundling them into records or classes, and code can be organised in hierarchical modules and subprograms.

Energy Saving

DwI:Assuaging guilt, CD-GR:443

Computationally heavy code requires more CPU or GPU cycles, which consumes more power, which in turn makes the applications spend more energy. Making a compiler of a language especially optimised towards power reduction may increase its appeal for users that intend to run their programs on devices with limited power (mobile phones and smaller). Power reduction and energy saving techniques also contribute towards global sustainability, and can be used/chosen for ethical reasons.

Enumeration Type

DB-RD:488, CC-WG:21, CC-NW:65, CD-AH:550, CD-SM:29, CD-GR:533, LI-RM:123, LI-PZ:213, PL-RS:224, PL-WC:66, PL-BM:188, LD-ED:65, LD-JW:50, LD-WH:30

An enumeration is a data type that defines a very limited set of possible values which are, nevertheless, more comfortably referred to by their names and not by encoded numbers. The most famous enumeration is the Boolean (logical) type, which contains only two values: true and false. If the domain permits, the language does not have to support user-defined enumerations.

Esotericism

DwI:Challenges & targets, DwI:Make it a meme

INTERCAL, Unlambda, Befunge, Malbolge and other esoteric languages are based on paradigms so unconventional that writing even one program puts disproportional strain on the users. This challenging nature makes people engage and compete in programming in such languages as a form of entertainment. LOLCODE, ArnoldC and others are languages developed based on the memes that are circulating among software engineers: the popularity of them piggybacks entirely on the viral nature of those memes.

Event

DwI:Feedback through form, Mernik:Interaction, CD-GR:473, PL-WC:265, LD-WH:14, SL-AS:415, SL-RL:10

The first implementations of user interfaces were turning the entire program into a giant loop waiting for the user to activate its functionality by choosing the way to communicate (click, tap, edit, etc). Since direct implementations of such an event loop are not green (consume too much energy), event handling can be built natively into the language and implemented efficiently by the compiler and hardware. Events are used for interacting with end users, sensors, threads, etc.

Exception Handling

CD-SM:637, CD-GR:600, LI-PZ:484, PL-RS:38, PL-WC:95, PL-BM:316, LD-WH:379, SL-RL:64

An emergency sibling of branching used for extraordinary situations — can be slower than the normal branching, but usually more robust in handling situations like a cricial failure during the handling of another failure. A less invasive form of exception handling are assertions.

Execution Error

DwI:Conditional warnings, CC-DG:457, CD-GR:597, SL-AS:68

Errors can happen at compile time, but also at run time, due to hardware faults, communication problems, invalid user input or simply bugs that were left undetected at compile time by static analysis. Some languages (Erlang) have very well-designed strategies for handling execution errors, but all others also feature some form of partial recovery from them. The language user controls runtime error handling with exceptions.

Expressivity

PL-RS:37, SL-RL:247

There is ultimate expressivity of a software language, typically incorporated in answers to questions like "is it Turing complete?" (i.e., does it have enough constructs to emulate a Turing machine?), and there is a much more important and subtle issue of local expressivity in the sense of how small programs can get without sacrificing their readability. Many languages eventually develop shorthand constructs for writing commonly used combinations of constructs shorter and thus faster.

First Class Citizen

DB-RD:418, CC-DG:191, CD-GR:625, LI-PZ:431, PL-RS:373, PL-WC:151, PL-BM:397, LD-JW:117, LD-WH:410, SL-AS:59, SL-RL:12

It is an important design point to decide which entities within a program have the right to be saved, passed as arguments, transferred through other means, etc. Numbers? Collections? Objects? Functions? Unfinished computations? Data streams? Unfilled templates?

Garbage Collection

DwI:Bundling, DB-RD:441, DB-PD:568, CC-DG:209, CD-GR:476, LI-PZ:471, PL-RS:117, PL-BM:443, LD-WH:123, SL-AS:444

Automatic release of memory is impossible for cyclic data structures. Languages that want to support them, have a garbage collector — a runtime compiler component that occasionally marks data structures that have become inaccessible and then sweeps them away, freeing the memory. GC can compromise language responsiveness and performance.

Generation

DwI:Mazes, Mernik:Task automation, DB-RD:12, CC-DG:6, CD-GR:8, PL-RS:126, PL-BM:5, SL-CF:119, SL-RL:25, DwI:Tunnelling & wizards

Tedious, repetitive and error-prone programming tasks can be automated by using templates, wizards, explicit staging/morphing constructs of generative programming, construction workbenches, etc. In many practical cases the language user is allowed to edit the result to fine-tune it. The final generation phase is called code generation.

Heterogeneous Data

CC-DG:186, PL-RS:246, LD-JW:69, LD-WH:309, SL-AS:99, SL-RL:332

Some languages allow considerable freedom in types that makes collections capable of carrying elements of varying structure. Examples: variant records in Modula and Ada, heterogeneous lists in Python, polytypic functions in Haskell, GADTs in Haskell. Allowing heterogeneity empowers the language user but makes the language harder to learn.

IDE

DwI:Conveyor belts, LI-PZ:53, PL-RS:53, SL-CF:264, SL-RL:19

Integrated Development Environments (IDEs) are used to support language users in their common tasks: code navigation, debugging, building, modularising, refactoring, etc. Can take a form of a dedicated standalone editor, a website or a plugin for a universal editor. Needs to have a well-designed UI.

IDE GUI

DwI:Positioning, LD-WH:224

Most IDEs divide the screen space among areas with different functionality: for navigating through adjacent models, for editing the code, for reviewing the architecture, for watching how values change at runtime, etc. Advanced IDEs like IntelliJ, Eclipse or VS.NET have so many subwindows that the user has to choose which ones to keep open at each given time.

Indentation & Whitespace

DwI:Positioning, DB-RD:54, CD-GR:619, LI-BH:33, LI-PZ:100, PL-BM:91, LD-JW:9, SL-CF:23, SL-RL:219

The two extremes for this aspect are: treat indentation as something crucial to the program structure (and thus process constructs differently based on columns where they start) and discard all possible indentation (even in the middle of names, as FORTRAN does). Most languages are somewhere in the middle. Normalisation of whitespace use is called pretty-printing.

Inheritance

CD-GR:545, LI-PZ:311, PL-RS:453, PL-WC:194, PL-BM:465, LD-WH:15

An "is-a" relation can be represented by a language construct when one class, object or function inherits all the properties of its parent and possibly adds others exclusive to itself. It is a design consideration which entities can be derived from which, what are the rules for inheriting from several parents, etc.

Input/Output

LI-BH:177, LI-RM:341, LI-PZ:223, PL-WC:187, PL-BM:13, LD-ED:73, LD-JW:86

Most executable models are not self-contained and require input data to run and produce results, which in turn need to be propagated somewhere. There are languages that are volatile with input and output, those that only work with files, those that wrap I/O as a side effect of a monad, etc.

Instruction Set

DB-RD:17, DB-PD:626, CC-DG:253, CC-WG:292, CD-SM:599, CD-GR:367, LI-BH:164, LI-RM:288, LD-WH:103, SL-AS:427, SL-RL:144

Instead of freely combinable statements and expressions, low level languages (microcodes, assemblers, virtual machine bytecodes, etc) have limited non-extendable instruction sets. Each of the instructions typically has a mnemonic (name) and a bit-level encoding. Realistic assemblers had to introduce macro expansions to make expressivity and programming experience tolerable.

Iteration

DB-GD:282, DB-RD:533, DB-PD:509, CC-DG:284, CC-WG:31, CC-NW:61, CD-AH:642, CD-SM:39, CD-GR:582, LI-BH:133, LI-RM:87, LI-PZ:358, PL-RS:318, PL-WC:89, PL-BM:47, PT-AO:171, LD-ED:26, LD-JW:35, SL-AS:40, SL-RL:288

There are many looping constructs, ranging from the imperative classics such as a for loop, to the functional classics such as map, filter and fold (or reduce). It is not uncommon for languages to support only some of these constructs. Some older GPLs and 4GLs also have one iterative construct which can be annotated with all kinds of conditions and steppers.

Keyword

DB-GD:33, DB-RD:56, DB-PD:121, CC-WG:140, CC-NW:33, CD-SM:40, LI-BH:10, LI-RM:34, LI-PZ:99, PL-RS:35, PL-WC:11, PL-BM:92, LD-JW:10

Special words in concrete syntax of the language that carry identical meaning across all possible models in the same language. Can be made reserved so that programmers may not redefine them. A language can get new keywords by evolution.

Labelling

DwI:Do as you’re told, DwI:Emotional engagement, DwI:Anchoring, DB-PD:43, PT-GJ:563, LD-JW:7, LD-WH:417, SL-AS:201, SL-RL:12, DwI:Forced dichotomy

Since most engineers know several languages, some language manuals directly assume initial familiarity of their users with other languages. Can refer to paradigms or families ("this language is strongly typed") or directly to other languages ("inheritance works like in Java"). Also, by explicitly stating which camp the language is siding with or which key community figures endorse it, the designer can invoke an emotional response directly mappable to language's acceptance and popularity.

Lazy Evaluation

CD-GR:627, LI-PZ:345, PL-RS:74, PL-WC:239, SL-AS:292

A lazy compiler defers evaluation to the latest possible moment. Lazy languages allow infinite data structures (as long as they are processed one chunk at a time) and may have unpredictable outcomes if calculations are allowed to have side effects (like C's ++). Lazy evaluation has many applications from optimisation of generated code to stream data processing.

Live Feedback

DwI:Real-time feedback

An advanced IDE running on modern hardware can utilise its idle cycles to attempt parsing, compilation, dependency analysis and other kinds of checks while the language user is still typing the model. Errorneous and suspicious pieces of code are commonly underlined with red or yellow squiggly lines familiar from natural word processors.

Lock-out/Opt-in

DwI:Opt-outs, DwI:Task lock-in/out

Certain combinations of language features may be disabled (errorneous) by default, with a possibility of enabling them explicitly. For example, redefining a method in a derived class is only allowed in C# when a specific override keyword is used, which leaves visual cues to future readers of the piece of code in question.

Macro

Mernik:Preprocessor, DB-GD:3, DB-RD:16, DB-PD:32, CC-DG:413, CC-WG:294, CD-AH:101, CD-GR:102, LI-PZ:74, SL-AS:314, SL-RL:24

A mechanism commonly found in low level languages that allow users to define a piece of syntactic sugar to be expanded into a longer sequence of instructions. Advanced parametrised macros resemble subprograms in expressivity but may behave less reliably due to their lexical nature. In bigger languages macros are typically handled by a preprocessor.

Metaphor

DwI:Metaphors

Give your language constructs names that need no explanation: atom, backtracking, binding, body, build, cloud, collision, compiler, dangling else, debugging, desugaring, dictionary, duck typing, environment, filter, floating point, forest, framework, garbage collection, go to, heap, inheritance, jump, library, linking, map, pointer, pruning, rendezvous, stack, turtle, weaving, window, …

Module

DwI:Segmentation, CC-DG:366, CC-NW:92, CD-AH:459, CD-GR:32, PL-RS:380, PL-WC:113, PL-BM:267, PT-AO:161, LD-WH:14, SL-AS:296, DwI:Proximity & grouping

Large models inevitably outgrow their creators' capabilities to understand them all at once. Comprehension can be aided greatly by the language providing modules, packages, classes, procedures, blocks and other elements to group related code fragments together. Modern IDEs can analyse code for cohesion and coupling to help improve modularisation. Modules are often [one of the possible] compilation units.

Natural Pattern

DwI:Simplicity

Design patterns, implementation patterns and architecture patterns are used across language boundaries, but many domain-specific languages incorporate well-known patterns as native language constructs: Model-View-Controller, Singleton, State, Visitor, etc.

Numeric Data Type

CC-WG:22, CC-NW:81, CD-GR:532, LI-BH:10, LI-RM:36, LI-PZ:205, PL-RS:216, PL-WC:63, PL-BM:65, LD-ED:16, LD-JW:12, LD-WH:26, SL-AS:80, SL-RL:268

Often gets overlooked at the early stages of language design, but could significantly shape the application area of the language. There are many integer types, distinguished by their byte sizes and therefore value ranges; also decimal types with fixed scale and precision; and floating point types good for scientific computations but not for handling finances.

Operator Overloading

DB-PD:481, PL-RS:284, LD-WH:336

A language designer may decide to reuse the same symbol for several different operators, usually conceptually related (such as + for arithmetic addition and string concatenation). Using it for totally unrelated operations is considered harmful for readability (such as & for pointer referencing and bit conjunction in C). In some languages (C++, Ada, Fortran) language users can also redefine their own operators that complement their own defined types.

Operator Precedence

DB-GD:47, DB-RD:31, DB-PD:86, CC-DG:103, CC-WG:28, CD-AH:819, CD-GR:158, LI-RM:71, LI-PZ:332, PL-RS:133, PL-WC:79, PL-BM:94, PT-GJ:266, LD-ED:9, LD-JW:30

To avoid excessive use of parentheses, a language can provide a default convention of disambiguating constructs with 3+ entities bound by binary operators. In arithmetic expressions, the precedence usually follows mathematical laws.

Optimisation

DB-GD:406, DB-RD:585, DB-PD:39, CC-DG:377, CC-WG:326, CD-AH:657, CD-SM:6, CD-GR:70, LI-BH:204, LI-RM:382, LI-PZ:110, SL-AS:497, SL-RL:156

It is always easier and less error-prone to generate intermediate code or machine code with simple and straightforward patterns and subsequently optimise the result in a different phase. The effect on the language users is that they do not need to optimise their models to the fullest, since their own naïve code will be optimised together with the rest. Small efficiences are only relevant 3% of the time, for the rest premature optimisation is considered the root of all evil.

Order

DwI:Storytelling, DB-RD:285, CD-GR:388, LD-ED:45, LD-JW:3, DwI:Implied sequences, DwI:Nakedness

Many languages have ordering constraints: a variable must be declared before its use, a function signature known before its call, etc. Sometimes constructs are grouped and it is the groups that must follow the order: e.g., first all declarations, then all functions, then the rest of the code (COBOL's divisions are the extreme example of this).

Orthogonal Design

DwI:(A)symmetry, DwI:Similarity, PL-RS:31, PL-BM:8

Independent features should be controlled by independent mechanisms. Related constructs should look similar and different ones should look different. Regular rules without exceptions are easier to learn. The fewer surprises one has while learning the language, the higher the language quality.

Parameter Passing

DB-GD:60, DB-RD:424, DB-PD:68, CC-DG:187, CC-NW:73, CD-SM:116, CD-GR:559, LI-BH:173, LI-RM:161, PL-RS:348, LD-ED:48, LD-JW:106, LD-WH:122, SL-AS:24, SL-RL:315

There are several strategies in mapping arguments that are being passed to a procedure in a call with the parameters that procedure expects to get: call by value (expose only the values, safe but inefficient for composite data), call by result (can return several values at once), call by value-result (the caller gets values, updates them, they are passed back), call by reference (expose pointers to values, efficient but unsafe), call by name (evaluate pointers when they are used inside the caller), etc.

Parametrised Type

LI-PZ:291, PL-RS:446, PL-WC:180, PL-BM:279

Some types can be defined partially by the user and partially by the language designer. For example, the language designer knows what a list is, and the language user can select any other type for list elements — this will change handling of such elements, but the philosophy behind their collection will stay the same.

Performance

DwI:Pave the cowpaths, DB-RD:587, CD-GR:349, LI-RM:279

Performance testing and its variations like profiling and stress testing are commonly desired nice-to-have features in IDEs. Languages and their ecosystems greatly vary in the extent to which this aspect is recognised and supported.

Phased Process

DwI:Partial completion, Spinellis:Pipeline, DB-GD:6, DB-RD:5, DB-PD:31, CC-DG:4, CC-WG:4, CC-NW:7, CD-AH:2, CD-SM:2, CD-GR:3, LI-RM:7, LI-PZ:73, PL-RS:48, PL-BM:38, SL-RL:18

Breaking a process into phases is one of the most used divide-and-conquer principles applied in language processing. Most compilers are designed to work in phases, and different competences and skills are required to implement each phase.

Picture Clause

A data type that saves a specially formatted entity (usually a float or a date) that can be used directly in printing statements but also manipulated as data.

Platform Lock-in/out

DwI:Format lock-in/out, LD-WH

Supporting a great language only for one particular hardware platform, OS or IDE, implicitly forces people to use them. For example, malware practices of Java installers turned some users agains JVM, which also deprived them of Scala and Clojure. Another example is .NET Core, a redesign of the .NET Framework which allows typically Windows-specific code to run on Linux.

Pointer

Wile:Dereferencing, DB-GD:34, DB-RD:649, CC-DG:462, CC-WG:25, CC-NW:86, CD-AH:469, CD-SM:110, CD-GR:464, LI-PZ:220, PL-RS:255, PL-WC:69, PL-BM:206, LD-JW:94, LD-WH:49, SL-RL:111

A popular data type in low level languages, representing a memory address where the data structure is stored — which is more efficient to pass across functions than the structure itself. The type of the structure needs to be known to decipher itse contents, since the pointer itself is nothing more than a number.

Pretty-printing

DwI:Implied sequences, Wile:Unparsing, DB-RD:3, LI-RM:42, PL-BM:89, SL-AS:19, SL-CF:23, SL-RL:219

A language can have a default formatting convention that is not only accepted by the community to improve the representation quality of the models, but also automated and shipped in a form of a tool. Such a tool can be very configurable, have limited feature selection or none at all. A pretty-printer that scans the input and minimises the delimiters in it, is sometimes called a program compactor. Pretty-printers are omnipresent in textual languages and may require layout strategies/policies for graphical ones.

Preview

DwI:Simulation & feedforward

Some features are very useful in general, but implemented in a way that sometimes fails. In this case, the impact of an application of a feature can be explicitly examined by the language user before agreeing to proceed. Common for database queries and object-oriented refactorings.

Product Line

Mernik:Product Line, CD-AH:275, CD-GR:107, SL-RL:24

Unified production of things that belong to the same family (similar cars, computers, furniture, drinks) is typical for any industry, including software engineering. To make it so that one codebase specifies the behaviour of a system that must be compiled and deployed under a variety of devices and hardware architectures, the model can be annotated with conditions to be checked during compile time and result in different code to be produced by the compiler to be run later. To simplify compiler construction, conditional compilation is handled by a preprocessor.

Readability

DwI:Storytelling, PL-RS:30

In many cases a model is written once but read many times for many purposes: to fix bugs, introduce new features, understand its behaviour, change it, etc. Hence, the ease of reading a model can become a cornerstone of the design of a language. Some existing languages, like COBOL and many modern DSLs, were specifically designed to empower domain experts to read and write in them. Others, like APL and Perl, while expressive, are known to produce unreadable models.

Record

DB-GD:41, DB-RD:477, DB-PD:466, CC-DG:182, CC-NW:41, CD-AH:543, CD-SM:33, CD-GR:538, LI-BH:125, LI-RM:124, LI-PZ:257, PL-RS:224, PL-WC:169, PL-BM:199, LD-JW:65, LD-WH:386, SL-RL:332

Many languages have some kinds of records or structures that bundle several related pieces of data without attaching methods to work with that data. A dynamic variation thereof is known as a dictionary or a map (e.g., hashmap) and it allows users to add and remove fields at runtime.

Redefine

DB-RD:361, CC-DG:304, CD-SM:293, CD-GR:520, PL-RS:39, PL-WC:138, PL-BM:85, LD-WH:304

Once something has been defined, it can be redefined in many languages: derived classes can overload base classes' properties, local variables can shadow the global ones with the same name, the same memory fragment can be treated as belonging to two separate data types (requiring alias analysis), etc.

Refactoring

DwI:Rephrasing & renaming, SL-CF:154, SL-RL:24

Refactorings are code changes that do not impact the system's behaviour but change its internal structure to improve code quality, prepare for the subsequent change, etc. Some DSMLs mean their models to only change through refactorings and refinements. In other domains it is also not uncommon to eventually get refactoring support in the IDE (often with previews).

Runtime

Wile:Analyzers and Simulators, Mernik:Interpreter, DB-GD:350, DB-RD:389, DB-PD:525, CC-DG:170, CC-WG:319, CC-NW:42, CD-AH:477, CD-SM:105, CD-GR:25, LI-RM:199, PT-AO:203, LD-WH:8

A runtime environment is a system component that must accompany the result of the compilation in order for it to function correctly. May be completely non-existent, contain standard libraries or a virtual machine. The multi-stage paradigm can accommodate several intermediate stages of generation between compile-time and runtime.

Scope & Binding

DB-GD:60, DB-RD:394, DB-PD:58, CC-WG:20, CD-AH:559, CD-SM:43, CD-GR:515, LI-BH:97, LI-RM:150, LI-PZ:82, PL-RS:179, PL-WC:55, LD-ED:48, LD-JW:5, LD-WH:248, SL-AS:33, SL-RL:20

If a type or a variable is declared, how far from the declaration can you still use them? If an outside entity is used in a subprogram, will it be taken from the parent scope of the subprogram or from the scope of the call? Several equally viable paradigms are known for scoping, name-type binding and declaration-reference binding.

Security

DwI:What you know, LD-WH:126

Security concerns were added to most mainstream programming and modelling languages only after their initial release, and fit there with varying degree of comfort. When designing a new language, it is possible to build in constructs for policy management, encryption, authentication and others, right from the start.

Smell

DwI:Desire for order, PT-GJ:47, SL-RL:446

A smell is found when a code fragment has suspicious characteristics even without being wrong. Modern computer science identifies smells for code, models, architecture, spreadsheets, hardware, grammars, etc. The very use of the word "smell" strongly implies that any neat self-respecting language user should try to leave less of those when touching the code.

Standard Library

CC-NW:78, CD-GR:364, LI-RM:164, LD-ED:32, LD-WH:229

A library or a set of libraries that are built into the language. It can be a tough design decision for the language designer to decide which functionality needs to become native constructs and which can go into the standard library. Such library functions can be expanded into code directly or shipped together with the compiled code as a runtime.

Static Analysis

DwI:Roadblock, Wile:Parsing, Mernik:AVOPT, DB-RD:3, DB-PD:100, CD-SM:169, CD-GR:115, LI-RM:105, LI-PZ:169, PL-RS:196, PL-BM:7, PT-AO:57, PT-HU:45, LD-WH:372, SL-CF:264, SL-RL:25

If a language does not directly limit its users' ability to express thing in a "bad" way, this is still possible for the compiler of the language. Parsing, type analysis, dependence analysis, formatting, conventions are all examples of this. One of the language design principles states that if an error gets though one line of defence, it should be caught by the next one.

Sublanguage

DwI:Nakedness, Wile:Sub-languages, Spinellis:Piggyback, Mernik:Language Exploitation, SL-RL:16, Spinellis:Language Extension, Mernik:Language Invention, Mernik:Embedding

A language can consist of several smaller languages that model its separate aspects, or incorporate other languages for solving specific subtasks. For example, a programming language may embed a database query sublanguage, or a specification language may embed a constraint language. An embedded language can be reinvented to fit its context or reused by adopting an existing one that has already demonstrated its usefulness.

Subprogram

DB-GD:55, DB-RD:506, DB-PD:63, CC-DG:200, CC-WG:75, CC-NW:74, CD-AH:466, CD-SM:34, CD-GR:544, LI-BH:8, LI-RM:157, LI-PZ:276, PL-RS:346, PL-WC:106, PL-BM:35, PT-AO:167, LD-ED:42, LD-JW:27, LD-WH:32, SL-AS:22

Pieces of code executable from other places promote reuse, but can be designed differently. Some languages only allow them to be independent (procedures, functions), or attached to an object (methods) or a class (static methods), others provide special synchronisation among them (coroutines, delegates) or allow them to be passed as arguments. If they may have parameters, the designer must decide on the parameter passing strategy.

Substitution

DB-RD:370, DB-PD:461, CD-GR:530, LI-PZ:199, PL-WC:197, PL-BM:208, LD-JW:159, SL-AS:172, SL-RL:335

When a subprogram specifies the types of input it expects, these types do not need to be treated precisely: often one can use entities of subtypes of the specified types (e.g., put a circle in a function that draws a shape because a circle is a subtype of shape). Subtyping is nontrivial, and the designer must choose among covariance, contravariance, invariance, etc. Non-strict handling of values of different types involves designing rules for type casts and conversions.

Synchronisation

DwI:Converging & diverging, DB-PD:978, CD-GR:740, LI-PZ:321, PL-RS:511, PL-WC:267, PL-BM:328, PT-AO:259, LD-WH:419, SL-AS:164

Managing the use of resources by some predefined form of synchronisation between readers and writers. Can be synchronous or asynchronous, and take forms of resource locks, semaphors, pipes, rendezvous, handshakes, message passing channels, etc. Always needed for concurrent computing.

Syntactic Sugar

DwI:Fake affordances, CD-GR:633

Nice-to-have constructs that are not extending the expressive power of the language, are sometimes not actually implemented directly — just expanded into bigger sequences of more primitive and less user-friendly constructs.

Syntax Highlighting

DwI:Material properties, SL-CF:27, DwI:Colour associations

A development environment of the language can profit from visualisation even if the language is textual by colour-coding different categories of words (strings, numbers, standard libraries, reserved words, etc). Colours are also commonly used for non-textual languages, unless models are expected to be printed or viewed on grayscale devices.

Trade Off

DwI:Framing, DB-RD:127, DB-PD:242, CD-GR:445, PL-RS:44

When direct optimisation is impossible or not sufficiently effective, the language designer can identify trade offs and leave them all inside the language for the users to choose. For example, many compilers have compilation options optimising code generation for speed, size or power, but not all three. Many languages have libraries and structures for both arrays (fast, immutable length) and lists (slower but flexible).

Type Analysis§

DwI:Matched affordances, Wile:Type Checking, DB-GD:49, DB-RD:343, DB-PD:56, CC-WG:26, CD-AH:489, CD-GR:521, LI-BH:110, LI-RM:91, LI-PZ:195, PL-RS:38, PL-BM:129, PT-AO:98, LD-WH:13, SL-RL:267

Components can be identified, explicitly or automatically, to belong to a particular type. Among other things, the type determines applicability and compatibility of components with one another. In complex scenarios (like a monadic bind) hard to understand components can only fit together in one possible way. Type equivalence rules can be based on names, structure, scopes, etc.

Type Definition

Wile:Structure, Spinellis:Data Structure Representation, Mernik:Data Structure Representation, DB-GD:314, DB-RD:345, DB-PD:404, CC-DG:461, CC-WG:234, CC-NW:26, CD-AH:489, CD-SM:25, CD-GR:522, LI-BH:6, LI-RM:111, LI-PZ:186, PL-RS:34, PL-WC:60, PL-BM:268, PT-AO:163, LD-ED:15, LD-JW:24, LD-WH:364

Several GPLs and many DSLs can exist perfectly without ever needing any user-defined types. However, in many cases it can prove useful to allow the language user to make their own data structures and algebraic data types to provide their input for type analysis.

Undefined Behaviour

DwI:Antifeatures & crippleware, CD-AH:598

When a particular combination of language constructs is not explicitly specified by the standard, its implementers can take different shortcuts in interpreting it. As a result, the same piece of code produces different results based on the compiler, the computer, time of day, etc. Common in legacy languages like C or COBOL.

Unification

Mernik:Data Structure Traversal, DB-RD:370, DB-PD:487, CD-GR:622, LI-PZ:369, PL-RS:621, PL-WC:248, PL-BM:507, PT-GJ:573, LD-WH:92, SL-AS:379, SL-RL:400

Given two hierarchical composite data structures, a compiler can be tasked to find their matching components and proceed with assignment, transformation, etc. Limited forms of unification may be called pattern matching. Widely used in logic programming, metaprogramming, model synchronisation, bidirectional transformation, 4GLs for banking, etc.

Variable

DB-RD:704, DB-PD:61, CC-DG:197, CC-WG:9, CC-NW:72, CD-AH:522, CD-SM:26, LI-BH:7, LI-PZ:183, PL-RS:35, PL-WC:52, PL-BM:22, PT-AO:46, LD-ED:5, LD-JW:10, LD-WH:240, SL-AS:20

Assigning names to memory areas or expressions is thought to be fundamental for the nature of computation, be it within the von Neumann paradigm or functional one. The designer can make their language more functional and force its users to think about data flow, or make it more imperative and let them worry about where the intermediate data is stored. In some languages naming or stropping of the variables can implicitly define their types.

Virtual Machine

CC-WG:271, CD-AH:446, CD-SM:767, CD-GR:450, LI-BH:145, LI-PZ:75, LD-WH:8, SL-AS:428

An emulator for a real or imaginary hardware architecture that has a low level coding language that can be used by other components or tools to compiler high level languages to. VMs trades off performance for an extra layer of abstraction. Some virtual machines (e.g., Dis) compile their code into native machine code just before running it.


The collection of 96 cards created and maintained by Dr. Vadim Zaytsev a.k.a. @grammarware.
Sources colour coded and explained/linked around this notice. See also the separate page about the books.
Last updated: July 2017.
XHTML 1.1 CSS 3