HLinq design

A bit of history

I remember the day when I started writing the HLinq query language. It was weekend and I was experimenting with writing Single endpoints for Hamster Wheel. Single endpoints were almost done and next step was to write List endpoint. This would allow to get list of records of specific type. After I got single GET returns correct data for chosen record id, list should not be hard to implement. Then just create, update and delete. Basic CRUD API in Hamster Wheel would be finished. I was thrilled that I actually was able to do such thing.

But then I started writing List and when I got first response… it hit me.

How the hell I am going to filter the records?!

The need for some unified way of dealing with at least the filtering was very apparent. Actually one of the reasons I started working on this was because during implementing simple CRUD API for my smart home automation of Nibe heat pump I was implementing filtering of temperature history: by day, a week and month.

So I need filtering.

Also I need paging. Obviously! No system presents data without some kind of paging. Even if this is infinite scroll.

Yes, there are already existing query languages, like Graph QL for example. But I did not had good time working with it. It felt to complicated. Yes, it very powerful in a manner of possibilities you are allowed there as an user. From other point of view on the server side it may be a reason you go bald when some users are killing your API by writing very expensive DB queries. But I did use it on one project and I did not got good time writing the queries. Syntax felt very foreign to me. Maybe it was partially because I did not got good IDE helper – it was just very big string that I had to provide to the HTTP client. But it did not felt like a right way to integrate it as a first query language in the Hamster Wheel.

I wanted something much simpler that could be used via browser address bar. Integration with web UI grid for presenting data would be very nice. Having a way of putting “I want contacts that starts with ‘Aleks'” and having a grid being automatically filtered by that value in “Contact Name” column was one of the goals.

On other project I did use Dynamic Linq. That would be doable. But again I did not had good time working with that library either. Syntax seems to be too much C# crammed inside the string.

var example4 = list.AsQueryable()
   .Where("!np(City.Name.ToLower(), \"\").Contains(@0)", "london")
   .ToList();

What is ‘!np’? Why I have to provide it? Why ‘@0’? Ok it probably have something to do with parameters being external user input – so unsafe and need to be escaped or sanitized. That is sensible but why during implementation need to think about that kind of stuff? It is sanitized and passed as parameter to DbCommand by Entity Framework any way. Why it have to be done by me passing parameters to Where extension method.

Ah it was probably because of parameter types. If everything would be a string Dynamic Linq would have to deal with conversions by itself.

Ok but then implementation would look like that:

  • parse the query of some kind
    • if this would be Dynamic Linq it would be very C# like so very foreign to people that do not deal with C# or programming all together
    • if this would be something else then I need to invent another language that would be translated to Dynamic Linq
    • extract user provided search parameters
    • extract properties those parameters would have to applied to
    • check the types of both and apply conversion so you can actually call Dynamic Linq query
  • then take all that and create query

If I have to invent new query language syntax, parse types and properties and write conversions of string values to appropriate types. This is like half of work of writing new query language.

The other part consists of actually applying all of it to the query and returning response to the API user. And half of it would be dealing with Dynamic Linq strings. For example if you want to do case insensitive search, you have to build different query than for case sensitive search. So Where query parameter string would be different. But you have to take account types, to search for valid property name. I would not force, for example, to the user to remember property names casing. So you have to have case intensive search of property inside the type to be able to write correct Dynamic Linq. But then you would basically end up with code similar to below:

var property = type.GetProperty(propName);
if (property is null)
{
   throw new PropertyNotFoundException(propName);
}

if(property.Type == typeof(string))
{
   if(caseSensitiveSearch)
   {
       return property.Name + ".Contains(" + value + ")";
   }
   else 
   {
       return property.Name + ".ToLower().Contains(" + value.ToLower() + ")";
   }
}
else if (property.Type == typeof(int)
{
    //handle int search operators ==,!=,>,<,>=,<=
}
else if (property.Type == typeof(DateTime)
{
    //handle search operators ==,!=,>,<,>=,<=
}
//handle other types, combinations of string values and conversions to property types
//handle nested properties, methods etc

This is feels like having to deal with half of applying the query to IQueryable by your self anyway. Then only thing you can get out of Dynamic Linq is that you do not have deal with writing Expressions tree syntax on your own.

But I already did in Delegates Factory. So it is not unknown to me.

What was a big unknown was writing language parsers, tokenizer and similar stuff that is very close to compilers. And it seemed like fun. I did a little research on how to do it and it felt like something I would want to try to do on my own.
Even if this would be only to check – can I do it?

I had actually browser page open with an article What is a programming language parser?. And I deliberately did not read it – just to check AFTER how did I do (seems like I wrote Top-down, leftmost parser, not the most complex solution, but HLinq is not most complex query language out there too 🙂 )

From the other point of view I never did use Span<> for anything serious and parsing a string would be exactly the place where you use Span<char> for inspection of string contents. So again: something new and fun to do!

So this felt like a decision was made: I will going to build myself new query language for my Hamster Wheel platform.

Why “HLinq” name?

The main reason for building this new query language was to use it my new API. So obviously HTTP will be used.

And this query language will be used to integrate with Entity Framework and IQueryable interface. So basically Linq.

I was using Linq to Db before Entity Framework was even a thing. So Linq to something, and in this case something is HTTP Api type of record, represented by API path and your own Json Schema that you provided to Hamster Wheel API.

So Linq to HTTP.

HLinq.

At that point in time I was using Abs(olute) Platform name for Hamster Wheel, but now with Hamster being part of the name “H” in HLinq seems appropriate.

HLinq syntax

When I made decision to make my query language as Linq to HTTP, it became obvious that its syntax must be translation of C# lambdas in the HTTP Url character set.

So I looked at documentation of what characters are allowed in the Url. The idea was to have set of character that will be used in query and at the same time does not need to be unencoded. But this needs to be fairly easy to use.

There is RFC-3986 for URI scheme. But it is just long text file so lets refer to this SO question instead 🙂

Lets start with general operations you can do in Linq. Usually you are searching through a collection with Where, transforming data with Select and when this data is presented to the user via UI you usually order it by OrderBy or limit the set of records with Skip(x).Take(y). So those operations need to be available in HLinq too.

var persons = queryable.Where(p => p.Age >= 18).Select(p => p.Name).ToArray();

In theory whole Linq expression: .Where(p => p.Age >= 18).Select(p => p.Name) should be possible to pass via url without a problem, even if > and < are not reserved RFC-3986 characters you can still write them in the browser address bar and open the page.

But this seems to be to complicated. p=>p seems redundant. In C# you have specify set of parameters in the expression. This is understandable. But HLinq is not C# in URL. So something like p.Name would be enough. If you are calling persons endpoint like below

GET /api/persons

You already are in the context of a Person type, so specifying it in the query one more time is not helpful but complicates stuff for the user. Lets remove it.

GET /api/persons.Where(p.Age >= 18).Select(p.Name)

In URL case of characters usually does not matter. Of course you can code you WWW server to recognize the difference but, usually it does not. So lower case character would be mostly used.

GET /api/persons.where(p.age >= 18).select(p.name)

This syntax would be fine and really close to the C# syntax. With one slight problem. In Linq you are able to execute methods. List of method parameters also are using circle braces. In C# it does not present a problem since you are using IDE with syntax highlighting, you can write your query in multiple lines and etc. It is still pretty easy to read. But with HLinq it will be compressed in one, very often, long line, without much of white space. That makes it much more harder to understand.

Also in HLinq methods will be used inside Where or Select methods, which are totally different operations from the perspective of HLinq – they are not really methods.

If you look at RFC document, there are two sections for delimiters (called delims in document):

   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

General delimiters consists of [ and ], and sub-delimiters consist of ( and ).

If HLinq would borrowed that distinction so query looks like:

GET /api/persons.where[p.name.contains(Kowalski)].select[p.name]

That is pretty nice. At least for me pretty easy to understand what is the intended behavior after just one glance.

Of course still things may get a bit more complicated when you include condition groups for example. Usually in programming languages you group conditional statements with circle braces ( and ). HLinq is no different in that regard.

GET /api/persons.where[(p.name.contains(Kowalski)&&p.name.startsWith(Jan))||p.dateOfBirth>2010-01-01]

This looks more complicated. But so as any code gets more complicated with each conditional operation you need to do to query the data you need. There is no easy way around that. But groups of conditional operators, whether it is in math or in programming languages is usually done with circle braces. And so I did decided to do the same in HLinq.

Of course if you do HLinq query from some programing language or via some tool you can add some white spacing for easier readability. For example above condition can be represented as:

/api/persons.where[
  (
     p.name.contains(Kowalski)
     && p.name.startsWith(Jan)
  )
  ||
  p.dateOfBirth > 2010-01-01
]

This is a bit easier to read and understand query logic. But this won’t work with the browser address bar.

In current, first version of HLinq query syntax looks like below.

query               = array-result-query *("." array-result-query ) *1( count )

array-result-query  = select | where | 1( ordering-query ) | skip | take 

ordering-query      = order-by | order-by-descending 1*( "." 1( then-by | then-by-descending ) )

select              = "select[" prop-list "]"

prop-list           = prop *( "," prop )

prop                = "x." name *( "." name )

name                = ALPHA *( ALPHA | DIGIT | "_" )

where               = "where[" filter *( logical-op filter ) "]"

filter              = prop | group | method

group               = "(" filter *( logical-op filter ) ")"

method              = db-method | prop-method

prop-method         = prop "(" 1( constant ) 1*( "," constant ) ")"

db-method           = name "(" 1( prop ) 1*( "," constant ) ")"

constant            = string | quoted-string | int | float | date-time | quoted-date-time

string              = 1*( ALPHA | DIGIT | UNICODE )

quoted-string       = quote string quote

quote               = "\"" | "'"

int                 = 1*( DIGIT )

float               = 1*( DIGIT ) "." 1*( DIGIT )

date-time           = date *1( time )

date                = 4( DIGIT ) "-" 2( DIGIT ) "-" 2( DIGIT )

time                = "T" 2( DIGIT ) ":" 2(DIGIT) ":" 2( DIGIT ) *1( miliseconds ) *1( zone )

miliseconds         = "." 7( DIGIT )

zone                = "+" 2( DIGIT ) ":" 2( DIGIT )

quoted-date-time    = quote date-time quote

logical-op          = "&&" | "||"

order-by            = "orderBy[" prop "]"

order-by-descending = "orderByDescending[" prop "]"

then-by             = "thenBy[" prop "]"

then-by-descending  = "orderByDescending[" prop "]"

skip                = "skip[" int "]"

take                = "take[" int "]"

count               = "count[]" 

Few clarifications to above grammar notation:

  • | means ‘or’.
  • 1( ) means group have to occur only one time – no more, no less
  • 1*( ) means group have to occur one or more times
  • *1( ) means group have to occur at least one time
  • *( ) means group is optional and can be repeated unspecified number of times

HLinq architecture

Applying HLinq query string to IQueryable is done few steps.

  • IHLinqTokenizer produces IToken[] array from input string
  • IHLinqParser produces IHLinqQuery which is tree-like structure of groups of tokens
  • IHLinqQueryApplier applies IHLinqQuery to source IQueryable transforming it to result of the equivalent Linq query result

Each of those steps consists of multiple substeps. Each of those substeps is configurable by appropriate interfaces.

Tokinizer step

IHLinqTokenizer, HLinq implementation of first query processing pipeline step, works based on ITokenPossibility interface implementations for each of IToken implementation. Each ITokenPossibility implementation have its own token grammar rules based on which assigns if currently considered substring of HLinq query string may be specific token type or not.

For example if query string is “select[x.name]" IHLinqTokenizer asks all ITokenPossibility implementations to give possibility of their tokens being in current range. If current range is 0 to 1, substring is "s". Where token possibility is at that point 0% because Where token expected value is not starting with "s". Select token possibility at that point is:

50% + 50%*("s".length / "select".length) = 50% + 50% * (1/6) ≈ 58,3%

IHLinqTokenizer goes through every ITokenPossibility and discards those that gives 0%. Then advances range to next character. In this case it would be "se". At this point all of other tokens should be discarded (none other tokens can be starting token and starts with "s"), so IHLinqTokenizer advances range to "sel", "sele", "selec" and finally to "select" at which point possibility of Select token should be 100% unless next character is not "[". If this happens (i.e. typo and user sent "selecte[x.Name]"), possibility drops to 0% and tokenizer throws an error – query could not be tokenized and is invalid. If next character is valid then Select token is created and added to result collection of tokens. At the same time current range of string is 6..7 of value "[". Tokenizer again goes through entire ITokenPossibility collection and discards those that gives value of 0%.

query: "select[x.name]"
RangeSubstringNext CharacterSelect PossibilityWhere Possibility
0..1se~58%0%
0..2sel~67%0%
0..3sele75%0%
0..4selec~83%0%
0..5select~92%0%
0..6select[100%0%

For another query, that shows better next character value consideration, let us consider "orderByDescending[x.id]"

query: "orderByDescending[x.id]
RangeSubstringNext CharacterOrderBy PossibilityOrderByDescending Possibility
0..1or~57%~53%
0..2ord~64%~56%
0..3orde~71%~59%
0..4order~79%~62%
0..5orderB~86%~65%
0..6orderBy~93%~68%
0..7orderByD0%~71%
0..6orderByDe0%~74%
0..8orderByDes0%~77%
0..9orderByDesc0%~79%
0..10orderByDesce0%~82%
0..11orderByDesce0%~82%
0..12orderByDescen0%~85%
0..13orderByDescend0%~88%
0..14orderByDescendi0%~91%
0..15orderByDesceendin0%~94%
0..16orderByDescending0%~97%
0..17orderByDescending[0%~100%

As you can see possibility of OrderBy token being in the beginning of the query is greater then OrderByDescending. This continues as tokenizer advances analyzing the query to the point when it encounter "D" after "orderBy" – then OrderBy ITokenPossibility drops possibility of its token to 0% because next character must be square bracket: "[".

Those are examples of keyword based tokens, which are tokens that only allows specific string in the query to be considered. There are also other type of tokens, that allows more possibilities: delimited based tokens. Those are Property, Method and NameOrValue. At the tokenizer query processing steps, HLinq does not really now what identifiers are allowed for properties or methods, which is why it looks for ending characters of such tokens.

For example: Method token delimiter is only '(' because method call can only be used with list of parameters enclosed in circle brackets. Property token is usually used for conditions with operators like !=, == , >, < and others, so its delimiters are respectively: !, =, > and <.

Let us consider query from first example: "select[x.name]". After tokenizer done its work with first for tokens: Select, LeftSquareBracket, Entity and Dot for "select[x." substring, it starts to consider rest, but only possible tokens are Property and Method at this range.

RangeSubstringNext CharacterProperty PossibilityMethod Possibility
9..10Na5050
9..11Nam5050
9..12Name5050
9..13Name]1000

As you can see both tokens are considered by tokenizer till the point of first character ']' that does not match delimiters of one of them. Then only possible token is Property and this is added to resulting list of tokens.

Parser step

Second step in HLinq query processing pipeline is parsing mainly done by IHLinqParser implementation. Parser analyzes collection of tokens and creates tree structure of ITreeElements that are then transformed into Entity Framework method expression parameters.

Parser, tree elements are similar in concept to tokens, but instead of allowed collection of previous tokens and allowed next characters, tree element:

  • have allowed collections of starting tokens
  • list of allowed parents
  • list of allowed ending tokens

For example SelectRoot tree element requires as starting tokens

  • Select and RightSquareToken
  • or Dot, Select and RightSquareToken

It is, of course, because each root can be placed at the start of the query ("select[x.name]") or after another root tree element ("where[x.dateOfBirth>2005-01-01].select[x.name]").

SelectRoot and any another root element must be finished with RightSquareToken (']' character). Also each root does not have collection of valid parents, because they are root themselves and do not have root branches.

Other elements have set of valid parents. For example SelectRoot can be parent of only PropertyAssignment element. This is because query syntax only allows x.name or newName=x.name statements.

Other tree elements have more complicated structure. For example Property can be used from inside the SelectRoot (via PropertyAssignment) or from WhereRoot (inside the Condition element) or be placed inside one of the ordering roots. Or for example CountRoot is very simple and does not have any children.

Full structure of tree elements is represented in below graph.

Parser knows the structure of this graph and knows the requirements of starting and ending tokens for each of those elements. Based on that parser walks through collection of tokens and builds the tree.

For example string "select[x.name]" will be tokenized as [Select, LeftSquareBracket, Entity, Dot, PropertyName, RightSquareBracket]. Then parser inspects tokens and builds a tree:

  • Only viable root element is SelectRoot because starting tokens are [Select, LeftSquareBracket]
  • Parser creates SelectRoot and removes 0 tokens at the start
  • Only viable children of this root is PropertyAssignment and it have valid starting tokens [Entity, Dot]
  • Parser creates PropertyAssignment inside root and removes 2 tokens at the start
  • Viable children of PropertyAssignment are:
    • Property that requires [Entity, Dot] tokens
    • InitializerPropertyName that requires [NameOrValue, Assignment] tokens
    • InitializerConstValue that requires [Assigment, NameOrValue] tokens
  • Only Property is possible and parser creates child of that type in PropertyAssigment parent and removes 3 tokens at the start
  • The are no possibilities of children of Property element; parser finishes element and goes up in the tree
  • With 1 token left (RightSquareBracket) none of the children of PropertyAssignment are possible; parsers goes up in the tree
  • RightSquareBracket is valid finish of SelectRoot element. Parser removes last token and finishes root element.
  • With none tokens left, parser ends its work and return valid tree structure to the applier.

The resulting tree will be having following structure (in [] brackets are tokens assigned to tree element) :

Applier step

When parser finishes its tree, it is passed to the IHLinqQueryApplier. This class in turn does two things:

  • Iterates root elements
    • For each root element create Linq expression tree
    • Applies expression by calling appropriate IQueryable method
    • Passes resulting IQueryable instance to the next root

This is quite simple. The only complicated thing here is using converting strings to appropriate CLR types and finding out which property and method names are correct.

First thing is done by value converters. Second by expression builders.

For SelectRoot element appropriate applier instance does either:

  • select property of an object
  • map object to another type

which both are basically the same from logical perspective, but not from .NET Expression Tree perspective.

Mapping object by selecting property is equivalent of following Linq code:

query = query.Select(x => x.Name);

On the other hand mapping to another complex type by selecting set of properties (or renaming some of them) is equivalent to:

query = query.Select(x => new{
   NewName = x.Name,
   x.Address 
});

Of course both are very different expressions even if C# Linq syntax is very similar. First is just single MemberExpression that selects single property of the type. Second is quite complicated MemberInitExpression that must be called with NewExpression as its first parameter and set of MemberBindings as a second one. All of those expressions are created by IElementToExpressionConverter implementations. Inside those classes if some constant value is referenced (i.e. in where[x.age>18] 18 is constant) before binding it inside expression, first it is converted to the same type as property (for age property it would be most probably an integral type).

Applier creates expression tree for user HLinq query using expression converters. Expression converters are using value converters to convert string constants into other types, if needed. After that applier just call correct IQueryable method. For SelectRoot it is IQueryable extension method Select.

public static IQueryable<TResult> Select<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, int, TResult>> selector)

For WhereRoot this is Queryable.Where, for CountRoot it is Queryable.Count and etc.

Lets consider simple example of "select[x.name,active=true]" HLinq query. Applier for SelectRoot will call ElementToExpressionConverter<SelectRoot> that implements interface IElementToExpressionConverter<SelectRoot>. This class will call expression converters for all the children of main root tree element. The result will be similar to the graph below.

After calling Queryable.Select method with source IQueryable instance and this expression. The result will be used as source IQueryable instance used by next tree element. If there are no other roots, like in our example, applier calls Enumarable.ToArray method that forces IQueryable to be enumerated. If source is the database EF will translate expressions to SQL and fetch the data. If source is in memory collection, this will cause expressions to be compiled to a delegates and applied to a collection.

In above set of transformations applied to the IQueryable, value converters are used convert "true" and "10" to appropriate types. First is not really converted (from string to string) but still this no-op is done by converter. Second is converted to integral value before passed to Queryable.Take method.

Customization and Extensibility

Almost all parts of HLinq services are resolved from IServiceProvider. In Asp.NET Core it is the same service provider that is used by the API. That means if you need to modify some parts of HLinq or specific behavior of the package you can just override specific service.

For example all ITokenPossibility implementation are resolved from service provider and some of them contains string value of the token it is expecting to find in the query string.

public sealed class Possibility(IGrammar grammar) : TokenPossibility<Select>(grammar, "select")
{
    protected override bool PreviousTokensMatch(List<IToken> previousTokens) =>
        Rule.PreviousTokensMatch(previousTokens);
}

Second parameter of base class TokenPossibility is value of the token. So to create translation of the HLinq query syntax you can create your own implementation that have different value at this place.

I.e. in my language, Polish it would be something like:

public class SelectPossibility(IGrammar grammar) : TokenPossibility<Select>(grammar, "wybierz")
{
    protected override bool PreviousTokensMatch(List<IToken> previousTokens) =>
        Rule.PreviousTokensMatch(previousTokens);
}

Exactly the same class but with different token string. This way instead of querying with "select[x.name]" you query with "wybierz[x.name]".

Similar way you can DB specific functions or custom filters, by adding/overwriting another service. For DB functions it is IStaticMethodSource. For custom filters IStaticMethodToExpressionConverter.

In example HLinq.PgSql package implementation of IStaticMethodSource is following:

public class PgSqlEntityFrameworkStaticMethodProvider : IStaticMethodSource
{
    public Type[] Types { get; } = [typeof(NpgsqlDbFunctionsExtensions)];
}

This means that when HLinq will see use of method inside the where filter, it will try to resolve it from the NpgsqlDbFunctionsExtensions class. This will cause method ILike and other to be recognized and translated correctly to the SQL query.

public static bool ILike(this DbFunctions _, string matchExpression, string pattern)
    => throw new InvalidOperationException(CoreStrings.FunctionOnClient(nameof(ILike)));

From other hand if you overwrite IStaticMethodToExpressionConverter with your own implementation you can add your own custom filtering methods. For example right now you can’t use following syntax in filtering conditions:

where[(x.firstName + " " + x.lastName)==John Doe]

At least not directly, but it is possible to accomplish that via custom filter.

public class CustomFilterConverter(IPropertiesCache propertiesCache) : IStaticMethodToExpressionConverter
{
    private readonly StaticMethodToExpressionConverter _converter = new();

    public Expression BuildStatic(IBuilderContext context, IMethod method, IParametersConverter parametersConverter)
    {
        if (method.GetName(context.HLinqQuery) != "hasFullName")
        {
            return _converter.BuildStatic(context, method, parametersConverter);
        }

        var fullNameSearchConstant = method.Children[1].Tokens[0].GetValue(context.HLinqQuery);

        //the below expression is equivalent to:
        //LambdaExpression condition = (Person p) => p.FirstName + " " + p.LastName == fullNameSearchConstant;

        var stringConcatMethod = typeof(string).GetMethod("Concat", [typeof(string), typeof(string)]);

        var firstNamePlusSpace = Expression.Add(Expression.Property(context.Param, propertiesCache.Single(context.Type, "FirstName")!), Expression.Constant(" "), stringConcatMethod);
        var firstNameSpaceAndLastName = Expression.Add(firstNamePlusSpace, Expression.Property(context.Param, propertiesCache.Single(context.Type, "LastName")!), stringConcatMethod);
        return Expression.Equal(firstNameSpaceAndLastName, Expression.Constant(fullNameSearchConstant));
    }
}

Following custom implementation of IStaticMethodToExpressionConverter will cause HLinq to replace "hasFullName(x, John Doe)" HLinq method equivalent expression of p.FirstName + " " + p.LastName == "John Doe".

Those are just 2 examples. You can overwrite much more to make HLinq behave they way you want.

InterfacePurpose for overwrite
ITokenPossibilityTranslation
IConfigurableValueConverterAdd new conversions for custom types for constant expressions
INullKeywordTranslation
IHLinqQueryApplier, IElementToExpressionConverterCustom tree to Expressions conversions
IStaticMethodSourceCustom DB functions
IHLinqParserCustom parser rules, change syntax
IHLinqTokenizerChange tokenizer rules
IGrammarCustom grammar/syntax rules
IMethodsCacheCustom method related Reflection operations/Methods cache
IPropertiesCacheCustom property related Reflection operations/Property cache

Conclusion

HLinq library is easy to use and easy to integrate with existing solutions written in Asp.NET and .NET in general. It is also easy to customize and extend. Its syntax is based on Linq so it should be very easy to understand for every .NET developer. It is also compatible with HTTP Uri standard so it is possible to use it inside the web browser address bar. Syntax is also not that complicated to understand for non-technical people.

Disable Xunit Test on specific OS

Lately I was having a problem with some of my tests causing problems for other developers on the project I am working on. They were using Windows for their development while I am using Debian for my work. The application that was being tested was not designed for non Unix system – now rarely anybody is hosting they production code on Windows Server – everybody is using Unix for .net.

Still tests need to not disrupt whatever you are doing now for development. This needed to be fixed.

Some of those tests were actually only valid on Linux OS. There were no way to make it work on different Os. The quickest way to make this work was to just disable them on Windows.

This would work but is not pretty:

[Fact]
public void LinuxOnly()
{
   if (!OperatingSystem.IsWindows())
   {
       //actual test code
   }
}

More elegant way would be to disable them with an attribute. And it is possible. We can write our own Fact attribute that disables test with a condition. And it is pretty easy to do and looks nice.

public class LinuxFact : FactAttribute
{
    public LinuxOnlyFact()
    {
        if (OperatingSystem.IsWindows())
        {
            Skip = "It is not possible to run this test on Windows OS";
        }
    }
}

Then you have apply it to a test method like Fact attribute.

[LinuxFact]
public void LinuxOnly()
{
   //actual test code
}

and test will run on Linux just fine. On Windows on the other hand it will be shown as skipped.

Happy coding!

Writing a simple C# source code generator with Fluent API

Introduction

Usually in .NET world when you want to write generic code you use generic method or classes. But sometimes it is still not enough when you want to automate some things. I.e. you want to dynamically register in DI all the classes that implements some interface. Probably, in this case, you would do something like that:

var implementations = typeof(MyClass).Assembly.Types
   .Where(t => t.GetInterfaces().Any(i => i == typeof(IMyInterface))
foreach(var implementation in implementations)
{
   services.AddScoped<IMyInterface>(implementation);
}

And it would be completely fine. Even if Reflection is pretty slow, your application would not be getting significantly slower because of that. Certainly you would not notice the slow down.

For some other similar requirements some people used Fody or similar tool, that alters your assembly based on specified weavers. It works, though I always found if very confusing during debugging (because source code is no longer valid).

But we are not here to do the ‘ok’ job. We want better, more efficient code, that is generic but still close to native code. We strife for greatness!

Ok, maybe this is a bit too much, but following new .NET optimizations, we want to use new tools at our disposal to write simple but yet, efficient code.

We can do that by meta programming in C# and .NET using Roslyn Source Generators.

Source code generator

Writing your own source code generator is not that easy as it could be. Certainly it is a bit mode advanced topic and Microsoft is not putting as much money (and attention) into it as into new and groovy features that supposed to get you hook up into C#. But it should not stop you because it is also really cool seeing your source code generator generating new code on the fly, while you are making changes in your IDE. Seeing an error becoming valid code because code was generated in the background based on logic you coded is really great feeling!

In above video you can see my IDE, Rider, generating a code, class Hi with method Log generated on the fly based on new text file that is added to the project. First new Hi().Log() method call in the Program.cs is invalid, but when new txt file is added, source code generator generates new class, Program.cs becomes valid and when application is running, new line is written to the console. Pretty cool!

This may sound a bit complicated, and to some extent it is, but it probably won’t be most complicated thing you did or will do.

Creating new project

As with many C# examples, also in this, you have to create a new project.

This new project to be valid source code generator, it have to target .NET Standard 2.0. It is pretty simple to do with dropdown selection in your IDE or you can just change .csproj file directly:

<TargetFramework>netstandard2.0</TargetFramework>

Then mark your project as Roslyn component. You need to add following properties to <PropertyGroup> section in your project file:

<IsRoslynComponent>true</IsRoslynComponent>
<EnforceExtendedAnalyzerRules>true</EnforceExtendedAnalyzerRules>

After that install one package that helps you a bit with Roslyn APIs. For me personally they were a bit hard to understand when I first started working with them:

<PackageReference Include="HamsterWheel.FluentCodeGenerators" Version="0.4.0" PrivateAssets="all" />

You need to add also importing all of code generator Nuget packages to the output directory of your generator. Without it it will throw errors like below:

CSC : warning CS8784: Generator 'DemoIncrementalGenerator' failed to initialize. It will not contribute to the output and compilation errors may occur as a result. Exception was of type 'FileNotFoundException' with message 'Could not load file or assembly 'HamsterWheel.FluentCodeGenerators, Version=0.4.1.0, Culture=neutral, PublicKeyToken=null'. The system cannot find the file specified. [/builds/hamster-wheel/fluentcodegenerators/demo/HamsterWheel.FluentCodeGenerators.Demo.Use/HamsterWheel.FluentCodeGenerators.Demo.Use.csproj]

To fix that add following tags to your project file:

<Content Include="$(PKGHamsterWheel_FluentCodeGenerators)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="False" />
<Content Include="$(PKGHamsterWheel_FluentCodeGenerators_Abstractions)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="False" />

and also change the package reference tags:

<ItemGroup>
  <PackageReference Include="HamsterWheel.FluentCodeGenerators" Version="0.4.1" GeneratePathProperty="true" />
  <PackageReference Include="HamsterWheel.FluentCodeGenerators.Abstractions" Version="0.4.1" GeneratePathProperty="true" />
</ItemGroup>

This is enough to work on the Nuget package but if you want to develop your source code generator, it is much easier to have it in the same solution you want generate code for. To do this you need to do one additional change:

<PropertyGroup>
  <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths">
  <ItemGroup>
     <TargetPathWithTargetPlatformMoniker Include="$(PKGHamsterWheel_FluentCodeGenerators_Abstractions)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false"/>
     <TargetPathWithTargetPlatformMoniker Include="$(PKGHamsterWheel_FluentCodeGenerators)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false"/>
  </ItemGroup>
</Target>

Above directives will make sure that all necessary packages, that are referenced by your source code generator project, are copied to appropriate directory from where your IDE and dotnet build are using your code generator .dll file. Without it it would be another FileNotFoundException during initialization.

After that your project should look similar too below:

<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <TargetFramework>netstandard2.0</TargetFramework>
        <LangVersion>latestmajor</LangVersion>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>

        <EnforceExtendedAnalyzerRules>true</EnforceExtendedAnalyzerRules>
        <IsRoslynComponent>true</IsRoslynComponent>
        <CopyLocalLockFileAssemblies>true</CopyLocalLockFileAssemblies>
        <IsPackable>false</IsPackable>
    </PropertyGroup>
    
    <ItemGroup>
        <PackageReference Include="HamsterWheel.FluentCodeGenerators" Version="0.4.1" GeneratePathProperty="true" PrivateAssets="all" />
        <PackageReference Include="HamsterWheel.FluentCodeGenerators.Abstractions" Version="0.4.1" GeneratePathProperty="true" PrivateAssets="all" />
    </ItemGroup>

    <PropertyGroup>
        <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
    </PropertyGroup>

    <Target Name="GetDependencyTargetPaths">
        <ItemGroup>
            <TargetPathWithTargetPlatformMoniker Include="$(PKGHamsterWheel_FluentCodeGenerators_Abstractions)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false"/>
            <TargetPathWithTargetPlatformMoniker Include="$(PKGHamsterWheel_FluentCodeGenerators)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false"/>
        </ItemGroup>
    </Target>

</Project>

Writing the generator

With project fully prepared we can jump to writing actual generator. First we need to add incremental generator class.

Lets start with following:

[Generator(LanguageNames.CSharp)]
public class DemoSolutionIncrementalGenerator : IIncrementalGenerator
{
    public void Initialize(IncrementalGeneratorInitializationContext context)
    {
    }
}

This is bare minimum code for incremental Roslyn source code generator. But as minimal starting point it does not do anything, so it is not quite useful. Yet.

Lets add following lines into Initializer method:

context.RegisterPostInitializationOutput(c =>
{
    c.AddSource("DummyFile.g.cs", """
                                  public class MyDummyGeneratedClassFile
                                  {
                                  }
                                  """);
});

This will instruct generator to emit single file called Dummy.g.cs with following content:

public class MyDummyGeneratedClassFile
{
}

This is just single, empty public class. Not very helpful, but as a starting demo is just fine.

This is good for very simple types that are to be added to the project before actual incremental generator is being used. In example if you coded you generator to enhance definition of user classes decorated by a specific attribute the flow would be:

  1. Your generator package is installed
  2. Incremental generator runs and RegisterPostInitializationOutput is executed.
  3. This will add generated attribute to user project
  4. User is coding new class using this attribute
  5. Your incremental generator is looking at source code for this specific attribute
  6. If any was found incremental source code generation is executed

This is how Regex code generators works with GeneratedRegex attribute.

Steps 5 and 6 require to use Roslyn API for obtaining IValueProvider. For example if we want to add new class for each of text files added to the project, like in demo above, we should add few line of code:

var additionalFilesProvider = context.AdditionalTextsProvider
            .Where(AdditionalTextPredicates.FileNameExtensionIs(".txt"))
            .Select(AdditionalTextSelectors.FileNamePathAndContent)
            .Collect();

This will instruct Roslyn to look for all the additional text files. Additional text is another type of file linked to the C# project via <AdditionalFiles Include="someFile.txt"/> project directive. Then Roslyn is looking in that collection for files that ends with txt extension. Then selects name of the file and its content. After that it is collected into single collection and fed to your source code generator.

But this is just a provider. Actual implementation of the class that logs additional file content to the console will have the template like below:

public class {{ File Name }}
{
   public void Log()
   {
       Console.WriteLine("{{ file content }}");
   }
}

Above template can be generated within code generator:

context.RegisterSourceOutput(additionalFilesProvider,
    (pc, files) =>
    {
        foreach (var file in files)
        {
            pc.AddSource(Path.GetFileNameWithoutExtension(file.FileName), $$"""
                                                                          public class {{Path.GetFileNameWithoutExtension(file.FileName)}}
                                                                          {
                                                                              public void Log()
                                                                              {
                                                                                  Console.WriteLine({{file.Content.TripleQuote()}});
                                                                              }
                                                                          }
                                                                          """);
        }
    });

Lets brake it down into steps:

  • It iterates through every additional file
    • for each one registers new generator output
    • each output is named the same as additional file is named
    • each output contains class that is named the same as additional file
    • each class contains Log() method that push additional file content to the console

This is totally fine for simple classes and it will work for them. Things starts to brake down a bit:

  • when you have to share logic between different classes.
  • when you need to make sure all used types are imported with using statement
  • when you need to balance multiple levels of different parenthesis
  • when you need to make sure everything is properly indented
  • when you need to make sure parameter names match, all ; are added and all returns are in proper places

When it is true, you have giant spaghetti of strings, interpolated strings, imports and methods calls.

It is getting even more complicated, when you have few dozens of files connected to the one generator logic (in example Hamster Wheel Source Generator project contains over 300 files!). Then you have to take into account that different logic can run in different places in different generated classes. Taking care of just the idea of type you are generating without taking care of specific C# syntax it will be using is much easier to wrap your head around.

This is where HamsterWheel.FluentCodeGenerator library is coming in. It helps with all of that and more.

In example above generator code can be rewritten as:

context.WithClass(Path.GetFileNameWithoutExtension(file.FileName),
    c => c.WithMethod("Log",
            m => m.WithBody(b => b.Append($"Console.WriteLine({file.Content.TripleQuote()});")))
);

This generator will emit very similar code:

[GeneratedCode("HamsterWheel.FluentCodeGenerators", "Version=0.4.2.0")]
public class Hi
{
    public void Log()
    {
        Console.WriteLine("""
        Hi!
        """);
    }
}

It adds GeneratedCode attribute to each type it generates so other parts of the system knows it was automatically generated code. Also it have better indentation in triple quoted strings.

Maybe it does not seems to be much but this package have much more features. My favorite is automated usings management. I can count how many times during development of mine code generators I had an error when some type was not imported. This is something your rarely keep count of when writing production code yourself. Usually it is done automatically by your IDE – you just write few letters and choose correct type from drop-down of IDE hints, you IDE takes over the rest. When you do something similar inside FluentCodeGenerators fluent API, it does something very similar. In example following code:

var ipAddressType = typeof(IPAdress);
classContext.WithProp<Type>("MyType", p => p.MakeComputed().WithExpressionBody(b => b.Append($"typeof({ipAddressType})")))

Generator will automatically add appropriate namespace using statement:

using System.CodeDom.Compiler;
using System.Net; //<-- this namespace was added automatically

[GeneratedCode("HamsterWheel.FluentCodeGenerators", "Version=0.4.1.0")]
public class MyClass
{
    public Type MyType => typeof(IPAddress);
}

This is done by having InterpolatedStringHandler being used instead of regular string. This way you can handle specific chunks of it and act differently when chunk is actual type.

Summary

Writing your own C# source code generator may look quite complicated at the beginning, when you are trying to figure it out. APIs are not very friendly (at least to me). When you try to find relevant docs (code comments in those APIs basically are not existent), you find out that MSDN docs about Roslyn code generators are lacking a bit in the usual quality.

But when you actually get a hand of it and won’t be scared by bugs in IDE or compiler pipeline – the end result is feels great! Especially with Fluent API.

HamsterWheel.FluentCodeGenerators is out

I released today first new Nuget package connected to my new project that I am working on for sometime: dynamically configurable, zero-downtime API, with low-code flows to write small pieces of logic – Hamster Wheel.

System heavily relies on dynamic source code generation. For that I wrote
https://www.nuget.org/packages/HamsterWheel.FluentCodeGenerators and https://www.nuget.org/packages/HamsterWheel.FluentCodeGenerators.Abstractions
small package that helps with writing C# incremental source code generators for Roslyn compiler.

Original Roslyn APIs are a bit hard to use so this package helps with that. Also it helps with:

– automatically emitting using statements
– formatting generated code
– with importing types during code generation
– with indentation and parenthesis balancing
– helps with adding and using method parameters in generated code
– `async` methods generation
– provides nicer to use wrappers to Roslyn `IncrementalValueProvider` and others
– allows to share pieces of code between files/classes (i.e. interfaces implementation)

More details are available here:
https://github.com/npodbielski/HamsterWheel.FluentCodeGenerators including readme with instructions of how to use it and a lot of examples of the features!

Convert LVM logical volume Raid 10 to Raid 5

Two weeks ago I was running out of space on my main raid matrix on my server. Server was configured and assembled few years back, and at that time I had about 700GB of data there so having RAID 10 of 4 1TB disks (that account for about 1,7TB of actual space for data) was enough. Especially when you count in a price of 2TB (or 4TB) NVME M2 disk back then.

But you hardly remove any data so in time it takes more and more of your hard disk space. And in this case adding another disk was not the quickest solution to a problem; quickest would be changing the raid type to get one 1TB of space more.

It is possible (in theory, according to LVM documentation). I just never done it and server is constantly in use so I did not want to go offline or login in single user mode to take volume or group offline first. Anyway here is how I did it.

  • first change RAID 10 to stripped.
  • convert stripped to raid5_n
  • convert raid5_n to raid5_ls
  • add 4th disk to raid

Convert RAID 10 to stripped

This is simple and immediate step. Just run:

lvconvert --type striped {{vg}}/{{lv}}

Of course replace {{vg}} and {{lv}} with actual name of your Volume Group and Logical Volume.

Convert stripped volume to RAID5_n

This is not immediate and LVM need time to convert volume. Of course time of this operation is very dependent on size of your disks and performance of your machine.

You need to run following command:

lvconvert --type raid5 {{vg}}/{{lv}}

Since this need time to be completed in the background you need to check from time to time progress of conversion. This can be done by running command:

 lvs -a -o name,copy_percent,devices {{vg}}

Look for value represented by Cp%Sync. When this is done it will 100.00.

Convert RAID5n to RAID5ls

Raid5_n in LVM is just a intermediate type of raid that is meant to be used to followup it to another type – it is not meant to be used for production scenarios. Just for conversion it is fine but if you want RAID5_ls is more appropriate. If you interested why, you can read about that in docs.

lvconvert --type raid5 {{vg}}/{{lv}}

This will by default convert logical volume to raid5_ls, which is more production ready. Again this will take time (and add 3 disk) so relax and check status from time to time running command:

 lvs -a -o name,copy_percent,devices {{vg}}

Add 4th disk to RAID5

By default converting stripped logical volume to RAID5 will create matrix of 3 disks. To add another one and have more space for you data you need instruct LVM to add another disk to matrix. It is possible running following command:

lvconvert --stripes 3 {{pv}}

This time LVM won’t choose physical device for you. You need to point out to the correct one, so change {{pv}} to name of your actual device. This should be something like /dev/sda (if you are using HDD or SDD disks) or /dev/nvmeXnY if you are using NVME disks.

This will take some time (again! but this is the last one!), but you can use your new space right away and LVM will synchronize all disks in the background so you do not need keep tabs on it.

Word of caution

One thing you need to be worry about: sometimes LVM is complaining about lacks of space between conversions, specifically converting stripped to raid5 may be problematic since it requires few extents for metadata (about 4mb on each disk if I remember correctly). If you have some another (not used physical device) it may use it for conversion. There is also possibility to use memory for conversion – but this is not persistent – and I am not sure what will be an outcome of such conversion to RAID10.

Generally for RAID conversions in LVM it is usually needed to have 4-6 extents of free space on each disk so make sure you leave some free space on each disk when you configure your logical volumes for first time – in example leave out 10 extents.