I have a dream, I called it ERL

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Do you think this type of competition is useful?

Yes
26
79%
No
7
21%
 
Total votes: 33

User avatar
Rebel
Posts: 4138
Joined: Thu Aug 18, 2011 10:04 am

I have a dream, I called it ERL

Post by Rebel » Mon Aug 28, 2017 7:13 pm

ERL stands for Evaluation Rating List, a new type of competition with a wink.

For years I have been toying with the idea, so now and then dropped a pilot balloon but I can't remember anyone to respond to it. But sometimes I am just stubborn, as in this case, and set the first step.

----------------

The goal is to improve the evaluation in a new way, that is, without the obstacle of search. Imagine a reasonable strong (open source) engine with a reasonable good search, readable source code and we replace the evaluation funnction with our own. What are the advantages c.q disadvantages?

Advantages

1. It's much more easy to discover the weaknesses of your evaluation since search hardly (to none) plays its dominant role. You don't lose (or win) a game because you are outsearched. You lose (or win) a game because of your evaluation.

2. Playing X versus Y -- since the 2 searches are indentical -- you are measuring the evaluation strength.

3. If we can determine strength we can create a competition based on fixed depth games in order to avoid the last issue that may influence the result as engine X and Y have different time cycles, engine X might have a slow evaluation while engine Y has a fast one. As such we eliminate the last obstacle for a reasonable fair estimation who has the strongest eval based within the scope of this project.

4. The learning effect. Will depend on the number of participants considering the status is open source and GPL.

.....

Read further at: http://rebel13.nl/misc/erl.html

And vote :wink:

User avatar
hgm
Posts: 22210
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Contact:

Re: I have a dream, I called it ERL

Post by hgm » Mon Aug 28, 2017 7:25 pm

How would you avoid evalutions that try to do things that normally are done by search in a horribly inefficient static way (like super-soma)? Running SEE on all squares to detect threats against the side-to-move, and discount the eval if there are multiple such threats, must be worth a lot.
Last edited by hgm on Mon Aug 28, 2017 8:02 pm, edited 1 time in total.

Dann Corbit
Posts: 8600
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: I have a dream, I called it ERL

Post by Dann Corbit » Mon Aug 28, 2017 7:53 pm

hgm wrote:How would you avoid evalutions that try to do things that normally are done by search in a horribly efficient static way (like super-soma)? Running SEE on all squares to detect threats against the side-to-move, and discount the eval if there are multiple such threats, must be worth a lot.
I guess that the main idea is that all implementations will differ only in eval, so if there are missing features, then they are missing for all.

And if some special search feature were to be implemented, then it would be implemented for all.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

User avatar
Rebel
Posts: 4138
Joined: Thu Aug 18, 2011 10:04 am

Re: I have a dream, I called it ERL

Post by Rebel » Mon Aug 28, 2017 8:03 pm

hgm wrote:How would you avoid evalutions that try to do things that normally are done by search in a horribly efficient static way (like super-soma)? Running SEE on all squares to detect threats against the side-to-move, and discount the eval if there are multiple such threats, must be worth a lot.
You raise a good point and I will give you my opinion. My first engine (1980/1) was SOMA -- it even could static evaluate a check mate -- as at the time I wasn't aware of QS. And actually the current (2017) evaluation has a SEE on every square and I can tell you from experience that SOMA is a waste of energy, QS is way superior. One exception would be a double attack (such as a knight fork attacking 2 pieces) giving it a bonus, est. elo 5-10.

PK
Posts: 755
Joined: Mon Jan 15, 2007 10:23 am
Location: Warsza
Contact:

Re: I have a dream, I called it ERL

Post by PK » Mon Aug 28, 2017 8:07 pm

I think evaluation speed should be somehow taken into account. So we might create a base engine that uses only a subset of search enhancements that do not depend on exact evaluation scores (so null move is OK, late move reduction and late move pruning are OK, but razoring and futility pruning are not) and start from there. We may throttle speed of such an engine to no more than 500.000 nodes per second, so that we do not care for implementing super-fast eval, but still dissuade people from creating one that is slower than that. I may provide improved Sungorus/crippled Rodent with such features if You wish.

User avatar
hgm
Posts: 22210
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Contact:

Re: I have a dream, I called it ERL

Post by hgm » Mon Aug 28, 2017 8:13 pm

But 'eval' is an ill-defined concept. Super-soma does the equivalent of QS as static evaluation. Come to think of it, for truly large variants this might be a winner: if you have several independent complex tactical exchanges in different places on the board, QS would try to permute the moves of these in many different ways. A static judgement would just add the statically determined outcome of all of these.

Uri Blass
Posts: 8015
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: I have a dream, I called it ERL

Post by Uri Blass » Mon Aug 28, 2017 8:27 pm

There is always a problem of doing search inside the evaluation.

I think that it may be more interesting to have a competition of detecting
the final result of the game by evaluation because humans can do often better than engines with it.

Here are some examples that humans can do better than engines

[D]5rk1/5ppp/8/8/8/8/5RPP/5RK1 b - - 1 1

Every strong human know that white is winning even without proving a mate score and can say that he is 100% sure about it without engine analysis but engines do not show known_win evaluation for white.


Another example:

[D]4r1k1/6pp/8/8/8/8/6PP/5RK1 w - - 1 1

Humans know it is a draw without calculations.

Third example:

[D]5bk1/2pp2pp/8/8/8/8/2PPPPPP/5BK1 w - - 1 1

Humans know that white does not lose without calculations.

I suggest the following competition:

Everybody should write a function that return one of the following options:
1)white wins
2)black wins
3)draw
4)White wins or draw
5)Black wins or draw
6)Do not know.

Later people test the function on a lot of positions from games.

For every correct result of type 1,2,3 you get 2 points
For every correct result of type 4.5 you get 1 point.
For every wrong result you lose 1,000,000 points.

You check if the result is correct simply by the result of a game between top engines.

In order to prevent people to make long searches in the evaluation you basically need to be able to evaluate all positions in a file of million positions in not more than 10 seconds with some accepted hardware.

The advantage of this competition may be that later people can use the function of the winner for pruning decision in the search(for example if white is worse then you can prune line when the evaluation say black is winning or black is winning or draw).

User avatar
Rebel
Posts: 4138
Joined: Thu Aug 18, 2011 10:04 am

Re: I have a dream, I called it ERL

Post by Rebel » Tue Aug 29, 2017 5:38 am

PK wrote: I think evaluation speed should be somehow taken into account.
I have been thinking about it and the most easy solution would be to run 2 rating lists, one on depth, one on time control. I am willing to do that since I don't expect to receive 10 new entries each week :wink:
PK wrote:So we might create a base engine that uses only a subset of search enhancements that do not depend on exact evaluation scores (so null move is OK, late move reduction and late move pruning are OK, but razoring and futility pruning are not) and start from there. We may throttle speed of such an engine to no more than 500.000 nodes per second, so that we do not care for implementing super-fast eval, but still dissuade people from creating one that is slower than that. I may provide improved Sungorus/crippled Rodent with such features if You wish.

Interesting. The benefit of TOGA is that it was so simple to replace its eval with my own, is that the case with yours too?

User avatar
Rebel
Posts: 4138
Joined: Thu Aug 18, 2011 10:04 am

Re: I have a dream, I called it ERL

Post by Rebel » Tue Aug 29, 2017 5:57 am

hgm wrote: But 'eval' is an ill-defined concept. Super-soma does the equivalent of QS as static evaluation.
I know and it's hopeless inaccurate. Over the years I have given SOMA several tries, pins, hanging pieces are doable to some extend, overloading is mission impossible.
hgm wrote:Come to think of it, for truly large variants this might be a winner: if you have several independent complex tactical exchanges in different places on the board, QS would try to permute the moves of these in many different ways. A static judgement would just add the statically determined outcome of all of these.
I hear you. What I said to Pawel, 2 lists, one list on fixed depth, programmers can go wild and instead of a quick and dirty evaluation of the bishop-pair, knight outposts etc. add more accurate knowledge to it, the other list on time control.

Lyudmil Tsvetkov
Posts: 6033
Joined: Tue Jun 12, 2012 10:41 am

Re: I have a dream, I called it ERL

Post by Lyudmil Tsvetkov » Tue Aug 29, 2017 6:05 am

no one tells you going the purely evaluation way will be easier.

to get a purely evaluation engine to the same level(say 3000 elo) as another evaluation+search standard engine, you will need to tune couple of times more parameters to compensate for the obvious depth deficiency, and tuning those or tuning a smaller number of parameters together with the search is basically equally difficult.

Post Reply