Reducing specific use cases in a language to improve overall usability

This last summer I spent a considerable amount of time refactoring i-lang.  Since I started implementing this programming language in early 2012, it had accumulated quite a bit of cruft, and it was difficult to continue moving forward.

Refactoring type system

One of the first internal improvements that was made was to overhaul the internal type system.  Before, the type system was simply passing around a boost::any.  However, this became trouble some as all parts of the code would have to know about each type so that it could cast it locally.  In many places the code began to look like:


if(a->type() == typeid(Object*)) {
 a = boost::any_cast<Object*>(a);
 // ...
} else if(a->type() == typeid(Array*)) {
 // ...
}

It became even worse when there were two different types involved, as can be seen in the case of performing arithmetic.

Now, the type system has been rewritten to better make use of C++ template and virtual function systems.  This means that one can write code like:


ValuePass a = valueMaker(1);
ValuePass b = valueMaker(2.4);

ValuePass c = a + b;

assert(c->cast<float>() == 3.4);
assert(c->cast<int>() == 3);
assert(c->cast<bool>() == true);
assert(c->cast<std::string>() == “3.4”);

The real beauty of this type system can be seen when using the foreign function interface, where the value of arguments can be “injected” into local variables.  This means that a function can be written as:


ValuePass func(Arguments &args) {
 int a;
 double b;
 std::string c;
 ilang::Function d;

 args.inject(a, b, c, d);

 // The arguments are casted to the type of the local variable
 // in the case that a cast fails, an exception will be raised
 // which is the logical equivalent to calling a function with
 // the wrong type signature

 // ... (body of function)

 return valueMaker("hello world");
}

Changes in the type system at the language level

Before this refactor, types in i-lang were defined in a global table of identifiers called variable modifiers.  A variable could have more than one modifier attached to it, and each modifier is used to check the value being set to a variable.  What this roughly translates to would be something like:


define_type("Unsigned", {|a|
 // check if the value being set, which is passed as argument a, is greater or equal to 0
 return a >= 0;
});

// use this type like
Unsigned Int b = 5; // both Int and Unsigned are called to validate the value of '5' is the correct type.

Looking at this implementation of a type system, it does not seem that bad when compared to other programming languages.  As displayed here it is missing the concept of a namespace or import scope, but otherwise it is fundamentally a type system where types are given names and then later used to reference that type.  However, this concept of a type having a name fundamentally goes against i-lang’s concept of names only being used as place holders for values, vs having an explicit places in the language. (eg: class required_name_of_class {} vs name_bound_for_use_later = class {}).  This lead me to question what does a type system fundamentally do.  In lower level languages such as C/C++ a type system provides information about the space required for an object, however in a higher level language such as python (which i-lang is more similar to on this point) values are just fixed sizes and then pointers to larger dynamically sized objects when required.  Type system also provided casting between primitive types, such as a 4 byte integer casted to a floating point.  This on its own isn’t that interesting as there are limited number of primitive types and similar operations can be accomplished with code like `1 * 1.0` or `Math.floor(1.2)` for casting.  Finally, type systems provide a way to identify the type of some value, which can be further used by a language to provided features such as pattern matching when calling a function.  Now, choosing to focus on this last issue of a type system lead to i-lang concept of a type system, which is that a type is simply a function which can identify if a value is a member of a given type.

The idea of using just a function to identify a type can sound a little strange at first, however, after playing with it some, the idea itself can be seen to be quite powerful.  Here is a quick example of using this type system to implement pattern matching on the value passed to a function.


// This is a function which returns a function that compares the value
// the returned function gets with the value the first function was called with.
GreaterThan = {|a|
 return {|b|
  return b > a;
 };
};
LessThan = {|a|
 return {|b|
  return b < a;
 };
};
EqualTo = {|a|
 return {|b|
  return b == a;
 };
};

Example_function = {|GreaterThan(5) a|
 return "The value you called with is greater than 5";
} + {|LessThan(5) a|
 return "Then value you called with is less than 5";
} + {|EqualTo(5) a|
 return "The value you called with is equal to 5";
} + {
 return "The value you called with didn't compare well with 5, must not have been a number";
};

Int GreaterThan(0) LessThan(10) int_between_zero_and_ten = 5;

In the Example_function, we are combining 4 different functions, each with different type signatures.  Additionally, we are creating types on the fly by calling the GreaterThan/LessThan/EqualTo functions which are using annoymious functions and closures.  This method also allows for classes to have a place in the type system.  We can now easily create special member of a class to check if a value passed is an instance or interface of the class type.


sortable = class {
 Function compare = {};
};

// check that the array members all implement a compare function and are an instance of a class
sort = {|Array(sortable.interface) items|
 // ...
};

Refactoring Objects and Class to appear like functions

Before, i-lang use syntax similar to Python or JavaScript dicts/objects when constructing a class or object.  This meant that these items looked like:


class {
 Function_to_check_type type_name: value_of_type,
 Another_type another_name: another_value,
 no_type_on_this: value
}

However, in ilang, except when prefixed with `object` or `class` the use of `{}` means that it is a function.  (eg: a = { Print("hello world"); };)  Additionally, colons are not used anywhere else in the language, which made me question why was this case such a special one.  This lead me to ask why not use equal signs and semicolons like everywhere else, meaning that defining a class would appear as:


class {
 Function_to_check_type type_name = value_of_type;
 Another_type another_name = another_value;
 no_type_on_this = value;
}

Furthermore, is there any reason to exclude loops and if statements when constructing a class?  Allowing control flow when the class definition is constructed makes this act identical to a function.


a = true;
class {
 if(a) {
  b = 5;
 } else {
  b = 6;
 }
}

Final thoughts

By starting by cleaning up the internals of i-lang, it allowed me to take another look at the language and determine why certain choices were made at first.  Bootstrapping a new programming language takes a considerable amount of effort, and could easily lead someone to defining something like print a function (as python recently changed away from in version 3).  In my case, I was creating special mechanisms for constructing class/objects and defining types for variables largely because the type system, scopes, and function internal interfaces were all poorly designed in the first iteration of the language.  Now that the internals have been cleaned up, it makes it easier to see that these changes are wins for the language.  Now, I doubt that I would have been able to come up with these changes right off the bat with the first implementation, as it was only through the pain of the first implementation for which the need for these changes became apparent.

Current state of ilang

This is the first week back at school, which means that the speed of development on ilang will begin to fluxuate again. Over this last break, and the final parts of last semester, I was able to clean up a bunch of features with the ilang programming language.

When I originally set out to create this new language, I was mostly thinking about how the syntax would be different and the specific features that I would want in the language to make it worthwhile of being created. However, what I failed to thinking about was what really make a programming language useful today is the fact that there are an absurd amount of libraries already programmed, debugged and available for download for any successful language. As a result of this realization, I have been working on trying to get useful libraries written for the language. However in trying to work with the language, I have a few beliefs that I am trying to stick with, but I am not so such how well it will work out.

The first belief, is that there should be no need to pause or have any sort of timer system, this is because I feel as if the language should attempt to run as fast as possible and focus on processing data. However when writing testing frameworks to automatically check if the language is working, it has become apparent that it would be useful to have some timer system. ** I still haven’t written in the timer system so this is still somewhat of an issue of internal debate.

Along with the timers, there is the problem of getting data efficiently in and out of the programming language. One of the concepts that I have for the future with this language is that the system will be able to distribute the computations between a large number of computers, this means that it is not particularly natural for the system to have access to features of the local computer, such as the file-system or the standard in/out. I am thinking that for the time being that the system could be designed to have access to the standard input and part of the file-system, however when the system becomes networked across a lot of computers, there could possibly be a way to specify where the standard input/output should go along with where the file-system has access to. The other alternate that I am working on, is using the concept of just having the http server be the way to input data, however I expect that it will become cumbersome quickly to input large data files. A possible compromise is to use the parameters to support some way to map specific files or folders to names that can be access from inside the program.

When considering the libraries that have already been included, there is still a lot of space for development. The modification library is still lacking too many features to be really usable. Along with the modification, the database library, still lacks the ability to save arrays into the database. The complication with arrays is trying to figure out an efficient way to store the data without causing a significant slow down. My plan for arrays in the database, was that they would be “smaller” then objects in the database, as objects when stored in the database do not have any limiting factors for the number of elements. However with arrays, I plan to try to have the system load all the data into memory when reading an array from the database. However the current way that the system is designed, it does not allow for the elements, to be easily accessed under the hood. I am thinking that the system might try and store all the elements in their own container, however the problem with this is that there would be a large number of database queries when trying to read data out. And inserting in the middle of the array would require a large number of read and writes into the database. However, on the flip, if the system was using one database object to store all of the array elements, there would be a few read and writes, but the object would likely be come very large very quickly.

The current plan for future features, is to keep working on adding more libraries into the core system to be useful. This will mostly focus on processing and interpreting data. (Web server, http client, as well as methods to parse string such as regex, also expecting some form of CSV or sqlite reader for when working with data that has been downloaded). Along with supporting reading the data, I plan to attempt to include a wide range of machine learning and artificial intelligence libraries and tools. Hopefully, it will be easy enough to integrate their database systems with the ilang database. Once these components are in somewhat of a usable state, I should have some framework for which experiments with the code modification library can be conducted.

Random last thought:
I currently plan for the ilang system to have a formatting tool, much like golang does. The reason for this, is when working with the modification system, I plan to have the system completely regenerate a file using the system “print from code tree” feature. This should greatly simplify writing the code back to disk, when compared to other possible ideas such as trying to find where the code has been changed with corresponding lines and then try to recreate those changes on disk.

ilang – a new programming language

I have been working on developing a new type of programming language over the last few months.  From a physiological perspective, it is interesting to try to create ones idea programming language and see what they create, what features does one add, change or remove.

ilang is still very pre-alpha-ish software, and I don’t expect anyone to jump and start using or even download and try it at this point, there are still too many things unimplemented at the moment to call it a complete language.

An overview of what is different:

  • The power of anonymous.  In may programming language, functions, classes and may other types are given names that can be looked up inside the type system.  However in ilang, attempts to have classes and function and other basic types anonymous, or without names.  The names are viewed as being useful to the programmers who are writing the programs.
  • Short access to making function: a function is anything between {}.  this means that to create function main, it looks like: main = {};
  • optional typing: This seems to be a new and growing trend in programming languages that are coming out now.  By default the types on variables are not checked at all.  This means that more than one check can be imposed on a variable.  Also additional types can be easily encoded with some additional C++ code, and soon ilang code from within the language itself.  The type checking can also do some other interesting things, more later.
  • Built in database: This has always been a feature that I think higher level languages should include, now web browsers include localStorage for example.  This feature can already take all primitive types and objects.  classes and functions can not yet be encoded into the database.  However the advantages having this built in as already noticeable in testing.
  • Python style imports, but only at the top and no ‘as’, *.  I originally made it this way, because I was annoyed when some code I was reading through would import something in the middle.  One you have to find where the import was performed to figured out what is being included into the source, also if you go back to modify above the point where the import was performed then you have move the import up so that it will be available.

To come/planned/supppppper pre-alpha features:

  • Access to the parse tree of files and the ability to modify the tree, changing the code that is running.  There will be the typical system where it is able to access the direct parse tree in a ‘raw’ format, however I plan to experiment some and try and find some natural way to access and modify the syntax tree.  In the aspect of natural modification, I have already noticed some of these properties being easily implemented as a function can easily be replaced by simply overwriting its value.
  • Network distribution.  I am hoping to turn this language into something that is useful when it comes to processing large amounts of data.  The trend at this point has been to utilities a large number of computer and attempt to distribute tasks in a sort of map reduce framework.  The plan at this point is to allowed for unstructured computation across the network where the system automatically determine if it is most effective to move a copy of the code for the computation or to move the data that the computation is working on.

Very incomplete documentation

Link to Github repo

This is only the first intro post.  I believe that there will be more to come as the language further advances and develops.