Android WatchDog分析

2019-11-06 09:59:16

字体：大中小

来源：转载

供稿：网友

　　Android Watchdog是用于监控其它系统服务是否处于正常工作状态的一种机制。一些重要的系统服务，如果处于死锁等异常状态时，系统已处于非正常的工作状态，这时重启系统来恢复android是非常必要的动作。

一、Watchdog的启动。Ｗatchdog 是在SystemServer当中启动的： Slog.i(TAG, “Init Watchdog”); final Watchdog watchdog = Watchdog.getInstance(); watchdog.init(context, mActivityManagerService); Watchdog使用了单实例模式： public static Watchdog getInstance() { if (sWatchdog == null) { sWatchdog = new Watchdog(); }

return sWatchdog;}

接着我们来看init函数： public void init(Context context, ActivityManagerService activity) { mResolver = context.getContentResolver(); mActivity = activity;

context.registerReceiver(new RebootRequestReceiver(), new IntentFilter(Intent.ACTION_REBOOT), android.Manifest.permission.REBOOT, null);}注册了ACTION_REBOOT的广播接收器final class RebootRequestReceiver extends BroadcastReceiver { @Override public void onReceive(Context c, Intent intent) { if (intent.getIntExtra("nowait", 0) != 0) { rebootSystem("Received ACTION_REBOOT broadcast"); return; } Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent); }}

rebootSystem函数应该就是重启系统了： void rebootSystem(String reason) { Slog.i(TAG, “Rebooting system because: ” + reason); ipowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE); try { pms.reboot(false, reason, false); } catch (RemoteException ex) { } } 果然是这样，所以watchdog有一个重要的工作，就是接收广播并重启系统。

二、Watchdog的工作原理。当需要使用Watchdog时，首先将被监控对象的线程的handler传给watchdog Watchdog.getInstance().addThread(mHandler); 将被监控对象传给watchdog Watchdog.getInstance().addMonitor(this);

addThread是把当前线程的handler传入，并new了一个HandlerChecker对象 public void addThread(Handler thread) { addThread(thread, DEFAULT_TIMEOUT);}public void addThread(Handler thread, long timeoutMillis) { synchronized (this) { if (isAlive()) { throw new RuntimeException("Threads can't be added once the Watchdog is running"); } final String name = thread.getLooper().getThread().getName(); mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis)); }}

HandlerChecker是一个runnable: public final class HandlerChecker implements Runnable 这里可以看出，HandlerChecker使用的是被监控对象的线程。

addMonitor函数： public void addMonitor(Monitor monitor) { mMonitors.add(monitor); } 只是做了一个保存而已。后续会用到。

接下来，我们来看watchdog运行时，是怎样使用这两个对象来达到监控的目的的。

Watchdog的run()函数：

@Overridepublic void run() { boolean waitedHalf = false; boolean mSFHang = false; while (true) {//死循环 final ArrayList<HandlerChecker> blockedCheckers; String subject; mSFHang = false; if (exceptionHWT != null && waitedHalf == false ) { exceptionHWT.WDTMatterjava(300); } final boolean allowRestart; int debuggerWasConnected = 0; Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis()); synchronized (this) { Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis()); long timeout = CHECK_INTERVAL; long SFHangTime; // Make sure we (re)spin the checkers that have become idle within // this wait-and-check interval for (int i=0; i<mHandlerCheckers.size(); i++) { HandlerChecker hc = mHandlerCheckers.get(i); hc.scheduleCheckLocked();//逐个的检查 } if (debuggerWasConnected > 0) { debuggerWasConnected--; } // NOTE: We use uptimeMillis() here because we do not want to increment the time we // wait while asleep. If the device is asleep then the thing that we are waiting // to timeout on is asleep as well and won't have a chance to run, causing a false // positive on when to kill things. long start = SystemClock.uptimeMillis(); while (timeout > 0) { if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } try { wait(timeout);//等待30秒，或者有notify激活 } catch (InterruptedException e) { Log.wtf(TAG, e); } if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);//检查等待时间是否真的到达 } final int waitState = evaluateCheckerCompletionLocked();//检查状态 if (waitState == COMPLETED) { // The monitors have returned; reset waitedHalf = false; //CputimeEnable(new String("0")); continue; } else if (waitState == WAITING) { // still waiting but within their configured intervals; back off and recheck // CputimeEnable(new String("0")); continue; } else if (waitState == WAITED_HALF) { if (!waitedHalf) { ... waitedHalf = true; } continue; } // something is overdue! blockedCheckers = getBlockedCheckersLocked(); subject = describeCheckersLocked(blockedCheckers); allowRestart = mAllowRestart; } ... PRocess.killProcess(Process.myPid()); System.exit(10); } waitedHalf = false; }}

看这段代码中使用到的几个重要的函数： public void scheduleCheckLocked() { if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) { mCompleted = true; return; }

if (!mCompleted) { // we already have a check in flight, so no need return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); mHandler.postAtFrontOfQueue(this); }

前面说过HandlerChecker使用的是被监控对象的线程handler，所以这里mHandler.postAtFrontOfQueue实际上就是上被监控对象发消息。看一下postAtFrontOfQueue函数：

public final boolean postAtFrontOfQueue(Runnable r) { return this.sendMessageAtFrontOfQueue(getPostMessage(r));}private static Message getPostMessage(Runnable r) { Message m = Message.obtain(); m.callback = r; return m;}

public void dispatchMessage(Message msg) { if(msg.callback != null) { handleCallback(msg); } else { if(this.mCallback != null && this.mCallback.handleMessage(msg)) { return; } this.handleMessage(msg); }

}private static void handleCallback(Message message) { message.callback.run();}所以，postAtFrontOfQueue(r)最终会调用r.run()。所以，如果被监控对象如果发生消息堵塞，根本就不可能会处理到postAtFrontOfQueue的这个消息。也即被控制对象的monitor()函数不会被调用，最终导致超时未响应。因此，watchdog实际上还有监控消息队例是否堵塞的作用。

总之，watchdog是每过30秒，通过向被监控对象发消息的方式，来检查被监控对象的状态的。

接下来，我们看postAtFrontOfQueue后最终来处理这个消息的地方，即HandlerChecker的run函数: public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); }

synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } }

一个重要的地方mCurrentMonitor.monitor()，我们来看一个实际调用的地方，是怎样实现这个接口的： public void monitor() { synchronized (this) { } } 这是ActivityManagerService的实现，什么都没有做，只是用了一个synchronized(this)，如果发生了死锁，那么monitor()就会一直处于等待状态，mCompleted = true;就不会被执行到，那么mCompleted 的值就为false; 未完待续

上一篇：Android 网络框架（Android-async-http，OKHttp，retrofit，volley，xUtils，Afinal）

下一篇：Android事件分发机制完全解析(2)