Discussion:
service state transition to failure
(too old to reply)
Naveen Singh
2015-10-08 04:21:12 UTC
Permalink
Hi
I have few questions about service state transition to failure:

1. Can service state transition to Failure w/o generating an error event? I
am seeing that WiFi connection is not going through even if the AP is found
in scan. I dumped the service properties and I see that state is Failure
but Event is a NULL string.

2. We are handling the error property event by clearing the error property.
This clears the error property and service state to Idle

3. Even after the error event is cleared we do see a service property
change event for State (transitioning to failure).

4. Is it advisable to handle the service property change event for state
(transitioning to failure) same way as we handle the service property
change event for error.

Regards
Naveen
Patrik Flykt
2015-10-08 13:20:15 UTC
Permalink
Hi,
Post by Naveen Singh
1. Can service state transition to Failure w/o generating an error event? I
am seeing that WiFi connection is not going through even if the AP is found
in scan. I dumped the service properties and I see that state is Failure
but Event is a NULL string.
When a service goes to state 'failure', it's an indication that none of
the other properties for that service should be considered valid. It may
happen that the current implementation does not send updates for any of
its properties after a transition to 'failure', not even the Error
property. Report to the mailing list if this happens and the level of
frustration associated so we'll take a look.
Post by Naveen Singh
2. We are handling the error property event by clearing the error property.
This clears the error property and service state to Idle
The Error property is actually specified read only in
doc/service-api.txt. The current code in ConnMan allows the service
property to be cleared, which is a bug. But the state is still 'failure'
for the service after clearing the property, so no state transitions
will take place.
Post by Naveen Singh
3. Even after the error event is cleared we do see a service property
change event for State (transitioning to failure).
Yes. Only the 'Error' property got cleared, not the state transition.
Post by Naveen Singh
4. Is it advisable to handle the service property change event for state
(transitioning to failure) same way as we handle the service property
change event for error.
State transition to failure causes all properties to become invalid and
should be the one to follow. The 'Error' property may try to explain
what happened, but usually the information is not very well suited for
UIs. ConnMan might not announce the 'Error' property explicitly in all
cases of failure; see above.

HTH,

Patrik
Naveen Singh
2015-10-08 20:25:55 UTC
Permalink
Hi Patrik
Thanks for your response.
Post by Naveen Singh
Hi,
Post by Naveen Singh
1. Can service state transition to Failure w/o generating an error
event? I
Post by Naveen Singh
am seeing that WiFi connection is not going through even if the AP is
found
Post by Naveen Singh
in scan. I dumped the service properties and I see that state is Failure
but Event is a NULL string.
When a service goes to state 'failure', it's an indication that none of
the other properties for that service should be considered valid. It may
happen that the current implementation does not send updates for any of
its properties after a transition to 'failure', not even the Error
property. Report to the mailing list if this happens and the level of
frustration associated so we'll take a look.
I looked into the code and the documentation and I do not see any way for
service to go back to idle (even clearing the error property). I am not
sure how devices which are supposed to auto-connect handle this?
Post by Naveen Singh
Post by Naveen Singh
2. We are handling the error property event by clearing the error
property.
Post by Naveen Singh
This clears the error property and service state to Idle
The Error property is actually specified read only in
doc/service-api.txt. The current code in ConnMan allows the service
property to be cleared, which is a bug. But the state is still 'failure'
for the service after clearing the property, so no state transitions
will take place.
Only the error property is allowed to clear which does not change the
state. Even the state is specified read only in the documentation.
Post by Naveen Singh
Post by Naveen Singh
3. Even after the error event is cleared we do see a service property
change event for State (transitioning to failure).
Yes. Only the 'Error' property got cleared, not the state transition.
Post by Naveen Singh
4. Is it advisable to handle the service property change event for state
(transitioning to failure) same way as we handle the service property
change event for error.
State transition to failure causes all properties to become invalid and
should be the one to follow. The 'Error' property may try to explain
what happened, but usually the information is not very well suited for
UIs. ConnMan might not announce the 'Error' property explicitly in all
cases of failure; see above.
I am planning to handle the state property change event. How do you expect
it to be handled? Clear the error property?
Post by Naveen Singh
HTH,
Patrik
Regards
Naveen
Post by Naveen Singh
_______________________________________________
connman mailing list
https://lists.connman.net/mailman/listinfo/connman
Patrik Flykt
2015-10-09 08:09:38 UTC
Permalink
Hi,
Post by Naveen Singh
I looked into the code and the documentation and I do not see any way for
service to go back to idle (even clearing the error property). I am not
sure how devices which are supposed to auto-connect handle this?
This all works as intended. ConnMan will run autoconnect internally, and
if a service fails to connect it is excluded the next time autoconnect
is run. Internally from ConnMan's perspective, what would be the use of
retrying a service if it is already known to fail?

That the service is being excluded from autoconnect is a temporary
setting and will be cleared once the service is connected successfully
or the service disappears and reappears again. As autoconnect is not
reconnecting a failed service, it means that an outside action needs to
call Connect() via D-Bus. The rationale here is that at this point the
user may have corrected something or knows otherwise that a connection
will now succeed. ConnMan will retry to connect the service via its
autoconnect mechanism if the service has been removed by wpa_supplicant
and re-detected later on; as said the failure state has thereby been
forgotten. So yes, ConnMan will retry every now and then even when only
relying on its autoconnect mechanism.
Post by Naveen Singh
Only the error property is allowed to clear which does not change the
state. Even the state is specified read only in the documentation.
Correct.

Patrik
Naveen Singh
2015-10-08 20:30:53 UTC
Permalink
Post by Naveen Singh
Hi,
Post by Naveen Singh
1. Can service state transition to Failure w/o generating an error
event? I
Post by Naveen Singh
am seeing that WiFi connection is not going through even if the AP is
found
Post by Naveen Singh
in scan. I dumped the service properties and I see that state is Failure
but Event is a NULL string.
When a service goes to state 'failure', it's an indication that none of
the other properties for that service should be considered valid. It may
happen that the current implementation does not send updates for any of
its properties after a transition to 'failure', not even the Error
property. Report to the mailing list if this happens and the level of
frustration associated so we'll take a look.
Ways to repro this:
1.Have an AP in bridge mode.
2. Device has been provisioned to connect to this AP.
3. Remove the WAN cable from AP and Reboot the device and let it
auto-connect to this AP.
4. Service state would be stuck in configuration state and eventually it
goes to the failure state.
5. Now it is in failure state and will never be able to connect.
Post by Naveen Singh
Post by Naveen Singh
2. We are handling the error property event by clearing the error
property.
Post by Naveen Singh
This clears the error property and service state to Idle
The Error property is actually specified read only in
doc/service-api.txt. The current code in ConnMan allows the service
property to be cleared, which is a bug. But the state is still 'failure'
for the service after clearing the property, so no state transitions
will take place.
Post by Naveen Singh
3. Even after the error event is cleared we do see a service property
change event for State (transitioning to failure).
Yes. Only the 'Error' property got cleared, not the state transition.
Post by Naveen Singh
4. Is it advisable to handle the service property change event for state
(transitioning to failure) same way as we handle the service property
change event for error.
State transition to failure causes all properties to become invalid and
should be the one to follow. The 'Error' property may try to explain
what happened, but usually the information is not very well suited for
UIs. ConnMan might not announce the 'Error' property explicitly in all
cases of failure; see above.
HTH,
Patrik
_______________________________________________
connman mailing list
https://lists.connman.net/mailman/listinfo/connman
Patrik Flykt
2015-10-09 08:13:40 UTC
Permalink
Post by Naveen Singh
5. Now it is in failure state and will never be able to connect.
It will be autoconnected once its state is no longer failure. The
failure state is cleared once the service is removed. The service is
removed once wpa_supplicant times out the wifi network. The next time
the wifi network is discovered via autoconnect wifi scan, it is created
with state 'idle' and is therefore a candidate for autoconnection.

Cheers,

Patrik
Naveen Singh
2015-10-09 17:52:01 UTC
Permalink
Hi Patrik,
Post by Patrik Flykt
Post by Naveen Singh
5. Now it is in failure state and will never be able to connect.
It will be autoconnected once its state is no longer failure. The
failure state is cleared once the service is removed. The service is
removed once wpa_supplicant times out the wifi network. The next time
the wifi network is discovered via autoconnect wifi scan, it is created
with state 'idle' and is therefore a candidate for autoconnection.
This may not happen at all. My understanding is that wpa_supplicant would
time out only if the AP is not seen in subsequent scans. But the AP is
always found in scan as there is nothing wrong at 802.11 level. The user
found that WAN cable was not connected so he went ahead and fixed it. And
now the connection to DHCP server is established but connection will still
not happen. Is user supposed to power off the AP so that it disappears from
scan list.

Can app SW remove the service? And then the next scan will create the
service and autoconnect may happen
Post by Patrik Flykt
Cheers,
Patrik
_______________________________________________
connman mailing list
https://lists.connman.net/mailman/listinfo/connman
Patrik Flykt
2015-10-12 13:31:42 UTC
Permalink
Hi,
Post by Naveen Singh
This may not happen at all. My understanding is that wpa_supplicant would
time out only if the AP is not seen in subsequent scans. But the AP is
always found in scan as there is nothing wrong at 802.11 level. The user
found that WAN cable was not connected so he went ahead and fixed it. And
now the connection to DHCP server is established but connection will still
not happen. Is user supposed to power off the AP so that it disappears from
scan list.
wpa_s will time out wifi networks in 2 minutes if no scans have been
done to refresh them. For ConnMan it will take ~6 min 20 sec to get
fresh results with a new scan after the 2 minute expiry time.

If the user is connected to the system, nothing prevents the user from
connecting manually and immediately after said WAN cable is reattached.

Cheers,

Patrik
Naveen Singh
2015-10-12 17:38:48 UTC
Permalink
Post by Patrik Flykt
Hi,
Post by Naveen Singh
This may not happen at all. My understanding is that wpa_supplicant would
time out only if the AP is not seen in subsequent scans. But the AP is
always found in scan as there is nothing wrong at 802.11 level. The user
found that WAN cable was not connected so he went ahead and fixed it. And
now the connection to DHCP server is established but connection will
still
Post by Naveen Singh
not happen. Is user supposed to power off the AP so that it disappears
from
Post by Naveen Singh
scan list.
wpa_s will time out wifi networks in 2 minutes if no scans have been
done to refresh them. For ConnMan it will take ~6 min 20 sec to get
fresh results with a new scan after the 2 minute expiry time.
If the user is connected to the system, nothing prevents the user from
connecting manually and immediately after said WAN cable is reattached.
This certainly is an issue because of following reasons:
1. User could be initiating scan for it to get connected back to network so
there will never be a 2 minute expiry.
2. WAN detachment was just an example, it could very well be an
intermittent problem which does not require any intervention from user and
problem auto-corrects itself.
3. Expecting user to connect manually is not an option all the time. User
rely on their devices to get connected on their own so that they can
control these devices from anywhere.
4. I tested multiple devices in the same environment and each and every
device auto-connected on its own once the problem vansihes. This seems to
me as a use case that we have not considered and it surely requires some
fix in the connman.
Post by Patrik Flykt
Cheers,
Patrik
_______________________________________________
connman mailing list
https://lists.connman.net/mailman/listinfo/connman
Naveen Singh
2015-10-13 04:46:12 UTC
Permalink
In my previous email when I meant *"**User could be initiating scan" *I
actually meant "*Application could be initiating scan".*
On Mon, Oct 12, 2015 at 6:31 AM, Patrik Flykt <
Post by Patrik Flykt
Hi,
Post by Naveen Singh
This may not happen at all. My understanding is that wpa_supplicant
would
Post by Naveen Singh
time out only if the AP is not seen in subsequent scans. But the AP is
always found in scan as there is nothing wrong at 802.11 level. The user
found that WAN cable was not connected so he went ahead and fixed it.
And
Post by Naveen Singh
now the connection to DHCP server is established but connection will
still
Post by Naveen Singh
not happen. Is user supposed to power off the AP so that it disappears
from
Post by Naveen Singh
scan list.
wpa_s will time out wifi networks in 2 minutes if no scans have been
done to refresh them. For ConnMan it will take ~6 min 20 sec to get
fresh results with a new scan after the 2 minute expiry time.
If the user is connected to the system, nothing prevents the user from
connecting manually and immediately after said WAN cable is reattached.
1. User could be initiating scan for it to get connected back to network
so there will never be a 2 minute expiry.
2. WAN detachment was just an example, it could very well be an
intermittent problem which does not require any intervention from user and
problem auto-corrects itself.
3. Expecting user to connect manually is not an option all the time. User
rely on their devices to get connected on their own so that they can
control these devices from anywhere.
4. I tested multiple devices in the same environment and each and every
device auto-connected on its own once the problem vansihes. This seems to
me as a use case that we have not considered and it surely requires some
fix in the connman.
Post by Patrik Flykt
Cheers,
Patrik
_______________________________________________
connman mailing list
https://lists.connman.net/mailman/listinfo/connman
Patrik Flykt
2015-10-13 08:00:03 UTC
Permalink
Hi,
Post by Naveen Singh
In my previous email when I meant *"**User could be initiating scan" *I
actually meant "*Application could be initiating scan".*
Well, don't. Now the responsibility of correct behavior is taken away
from ConnMan and placed on the application. What does the application
know about network connectivity that ConnMan doesn't?

If some other entity is requesting sudden irregular scans, it is a sign
for ConnMan that fresh information is needed; for sure the networks
available are not the ones desired and therefore there is even less
point in trying to connect to a service that has already failed...


Cheers,

Patrik
Naveen Singh
2015-10-13 21:42:26 UTC
Permalink
Post by Patrik Flykt
Hi,
Post by Naveen Singh
In my previous email when I meant *"**User could be initiating scan" *I
actually meant "*Application could be initiating scan".*
Well, don't. Now the responsibility of correct behavior is taken away
from ConnMan and placed on the application. What does the application
know about network connectivity that ConnMan doesn't?
There is nothing that application knows that connman does not know. In
fact application gets to know through
connman that connection did not go through. The way application gets
connected back is to initiate a scan and hoping that
one of these scan would find the AP (or services) and then run autoconnect
would trigger and get device connected. But in this case run autoconnect
would
not attempt connection because service state was left to failure.
Post by Patrik Flykt
If some other entity is requesting sudden irregular scans, it is a sign
for ConnMan that fresh information is needed; for sure the networks
available are not the ones desired and therefore there is even less
point in trying to connect to a service that has already failed...
It is not a irregular scan. It is a scan attempt to get connected w/o any
user intervention.
Post by Patrik Flykt
Cheers,
Patrik
_______________________________________________
connman mailing list
https://lists.connman.net/mailman/listinfo/connman
Patrik Flykt
2015-10-14 06:01:05 UTC
Permalink
Post by Naveen Singh
There is nothing that application knows that connman does not know.
In fact application gets to know through connman that connection did
not go through. The way application gets connected back is to initiate
a scan and hoping that one of these scan would find the AP (or
services) and then run autoconnect would trigger and get device
connected. But in this case run autoconnect would not attempt
connection because service state was left to failure.
This approach won't work at all. When ConnMan receives a Scan() over
D-Bus, it will scan immediately and reset its own autoscan timer. Which
means that if the application issues a scan after less than ~6 minutes,
it causes ConnMan to scan so often that wpa_s never times out its WiFi
networks. So if the service has failed, it has failed for all eternity.

The solution here is very simple. Get rid of the application assisted
scan and let ConnMan handle it instead.

Cheers,

Patrik
Grant Erickson
2015-10-14 06:24:10 UTC
Permalink
Post by Patrik Flykt
Post by Naveen Singh
There is nothing that application knows that connman does not know.
In fact application gets to know through connman that connection did
not go through. The way application gets connected back is to initiate
a scan and hoping that one of these scan would find the AP (or
services) and then run autoconnect would trigger and get device
connected. But in this case run autoconnect would not attempt
connection because service state was left to failure.
This approach won't work at all. When ConnMan receives a Scan() over
D-Bus, it will scan immediately and reset its own autoscan timer. Which
means that if the application issues a scan after less than ~6 minutes,
it causes ConnMan to scan so often that wpa_s never times out its WiFi
networks. So if the service has failed, it has failed for all eternity.
The solution here is very simple. Get rid of the application assisted
scan and let ConnMan handle it instead.
Patrik,

The solution, it turns out, is not that simple, at least for the application at hand.

This particular application is embedded, largely user-unattended, and sleepy (from a CPU perspective). It must work day-in, day-out, week-in, week-out, month-in, and month-out without user intervention after the user has interactively established the first initial connection.

Due to these constraints, the application knows best when to scan and does so at its discretion. Neither connman nor wpa_s autoscan infrastructure scan at the application-desired times. When they do scan automatically, it typically results in one or both of an undesirable expenditure of power or a momentary and inopportune move off-channel.

Absent the proposed change, per doc/overview-api.txt, with a WiFi network continually refreshed, there is no exit path from the Failure service state and, consequently, no unattended means by which the application can recover WiFi connectivity under these circumstances in a deterministic and application-controlled way.

The proposed change offers, for unattended applications that wish to avail themselves of it, an exit gate to effectively acknowledge a service error by clearing it. There is no downside or consequence to interactive applications that wish to continue to operate as they do today.

Best,

Grant
Patrik Flykt
2015-10-14 07:37:47 UTC
Permalink
Hi,
Post by Grant Erickson
Due to these constraints, the application knows best when to scan and
does so at its discretion. Neither connman nor wpa_s autoscan
infrastructure scan at the application-desired times. When they do
scan automatically, it typically results in one or both of an
undesirable expenditure of power or a momentary and inopportune move
off-channel.
Fair enough. There is always the BackgroundScanning option in main.conf
that you then should set to false in order not to have ConnMan
restarting autoscans for no apparent benefit and an undesired increase
in at least power consumption.

Notice that support for wpa_supplicant's internal autoscan mechanism was
removed a while back in version 1.28, as it interfered with hidden SSID
detection. Nowadays only ConnMan's internal autoscan implementation is
being used.
Post by Grant Erickson
Absent the proposed change, per doc/overview-api.txt, with a WiFi
network continually refreshed, there is no exit path from the Failure
service state and, consequently, no unattended means by which the
application can recover WiFi connectivity under these circumstances in
a deterministic and application-controlled way.
Since the application has full knowledge of what the suitable scan
interval is for this device, there is really no reason why the
application cannot issue a Connect() as well after a finished scan. This
way the application can gracefully get away from situations where
previous service connect attempts failed. This also works should the
application have had a need to disconnect the service in between.
Post by Grant Erickson
The proposed change offers, for unattended applications that wish to
avail themselves of it, an exit gate to effectively acknowledge a
service error by clearing it. There is no downside or consequence to
interactive applications that wish to continue to operate as they do
today.
If the application already knows better when to scan, it also knows
better when to issue a reconnect. Opening up service state transitions
to user or UI interference is a can of worms I cannot maintain in any
sensible manner. Autoscan backoff strategy and wifi network expiry are
up for tweaking to better values, but here one needs to come with an
explanation how everything functions smoother afterwards.

Cheers,

Patrik

Loading...