Android Watchdog是用于监控其它系统服务是否处于正常工作状态的一种机制。 一些重要的系统服务,如果处于死锁等异常状态时,系统已处于非正常的工作状态,这时重启系统来恢复android是非常必要的动作。
一、Watchdog的启动。 Watchdog 是在SystemServer当中启动的: Slog.i(TAG, “Init Watchdog”); final Watchdog watchdog = Watchdog.getInstance(); watchdog.init(context, mActivityManagerService); Watchdog使用了单实例模式: public static Watchdog getInstance() { if (sWatchdog == null) { sWatchdog = new Watchdog(); }
return sWatchdog;}接着我们来看init函数: public void init(Context context, ActivityManagerService activity) { mResolver = context.getContentResolver(); mActivity = activity;
context.registerReceiver(new RebootRequestReceiver(), new IntentFilter(Intent.ACTION_REBOOT), android.Manifest.permission.REBOOT, null);}注册了ACTION_REBOOT的广播接收器final class RebootRequestReceiver extends BroadcastReceiver { @Override public void onReceive(Context c, Intent intent) { if (intent.getIntExtra("nowait", 0) != 0) { rebootSystem("Received ACTION_REBOOT broadcast"); return; } Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent); }}rebootSystem函数应该就是重启系统了: void rebootSystem(String reason) { Slog.i(TAG, “Rebooting system because: ” + reason); ipowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE); try { pms.reboot(false, reason, false); } catch (RemoteException ex) { } } 果然是这样,所以watchdog有一个重要的工作,就是接收广播并重启系统。
二、Watchdog的工作原理。 当需要使用Watchdog时,首先将被监控对象的线程的handler传给watchdog Watchdog.getInstance().addThread(mHandler); 将被监控对象传给watchdog Watchdog.getInstance().addMonitor(this);
addThread是把当前线程的handler传入,并new了一个HandlerChecker对象 public void addThread(Handler thread) { addThread(thread, DEFAULT_TIMEOUT);}public void addThread(Handler thread, long timeoutMillis) { synchronized (this) { if (isAlive()) { throw new RuntimeException("Threads can't be added once the Watchdog is running"); } final String name = thread.getLooper().getThread().getName(); mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis)); }}HandlerChecker是一个runnable: public final class HandlerChecker implements Runnable 这里可以看出,HandlerChecker使用的是被监控对象的线程。
addMonitor函数: public void addMonitor(Monitor monitor) { mMonitors.add(monitor); } 只是做了一个保存而已。后续会用到。
接下来,我们来看watchdog运行时,是怎样使用这两个对象来达到监控的目的的。
Watchdog的run()函数:
@Overridepublic void run() { boolean waitedHalf = false; boolean mSFHang = false; while (true) {//死循环 final ArrayList<HandlerChecker> blockedCheckers; String subject; mSFHang = false; if (exceptionHWT != null && waitedHalf == false ) { exceptionHWT.WDTMatterjava(300); } final boolean allowRestart; int debuggerWasConnected = 0; Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis()); synchronized (this) { Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis()); long timeout = CHECK_INTERVAL; long SFHangTime; // Make sure we (re)spin the checkers that have become idle within // this wait-and-check interval for (int i=0; i<mHandlerCheckers.size(); i++) { HandlerChecker hc = mHandlerCheckers.get(i); hc.scheduleCheckLocked();//逐个的检查 } if (debuggerWasConnected > 0) { debuggerWasConnected--; } // NOTE: We use uptimeMillis() here because we do not want to increment the time we // wait while asleep. If the device is asleep then the thing that we are waiting // to timeout on is asleep as well and won't have a chance to run, causing a false // positive on when to kill things. long start = SystemClock.uptimeMillis(); while (timeout > 0) { if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } try { wait(timeout);//等待30秒,或者有notify激活 } catch (InterruptedException e) { Log.wtf(TAG, e); } if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);//检查等待时间是否真的到达 } final int waitState = evaluateCheckerCompletionLocked();//检查状态 if (waitState == COMPLETED) { // The monitors have returned; reset waitedHalf = false; //CputimeEnable(new String("0")); continue; } else if (waitState == WAITING) { // still waiting but within their configured intervals; back off and recheck // CputimeEnable(new String("0")); continue; } else if (waitState == WAITED_HALF) { if (!waitedHalf) { ... waitedHalf = true; } continue; } // something is overdue! blockedCheckers = getBlockedCheckersLocked(); subject = describeCheckersLocked(blockedCheckers); allowRestart = mAllowRestart; } ... PRocess.killProcess(Process.myPid()); System.exit(10); } waitedHalf = false; }}看这段代码中使用到的几个重要的函数: public void scheduleCheckLocked() { if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) { mCompleted = true; return; }
if (!mCompleted) { // we already have a check in flight, so no need return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); mHandler.postAtFrontOfQueue(this); }前面说过HandlerChecker使用的是被监控对象的线程handler,所以这里mHandler.postAtFrontOfQueue实际上就是上被监控对象发消息。 看一下postAtFrontOfQueue函数:
public final boolean postAtFrontOfQueue(Runnable r) { return this.sendMessageAtFrontOfQueue(getPostMessage(r));}private static Message getPostMessage(Runnable r) { Message m = Message.obtain(); m.callback = r; return m;}public void dispatchMessage(Message msg) { if(msg.callback != null) { handleCallback(msg); } else { if(this.mCallback != null && this.mCallback.handleMessage(msg)) { return; } this.handleMessage(msg); }
}private static void handleCallback(Message message) { message.callback.run();}所以,postAtFrontOfQueue(r)最终会调用r.run()。所以,如果被监控对象如果发生消息堵塞,根本就不可能会处理到postAtFrontOfQueue的这个消息。也即被控制对象的monitor()函数不会被调用,最终导致超时未响应。因此,watchdog实际上还有监控消息队例是否堵塞的作用。总之,watchdog是每过30秒,通过向被监控对象发消息的方式,来检查被监控对象的状态的。
接下来,我们看postAtFrontOfQueue后最终来处理这个消息的地方,即HandlerChecker的run函数: public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); }
synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } }一个重要的地方mCurrentMonitor.monitor(),我们来看一个实际调用的地方,是怎样实现这个接口的: public void monitor() { synchronized (this) { } } 这是ActivityManagerService的实现,什么都没有做,只是用了一个synchronized(this),如果发生了死锁,那么monitor()就会一直处于等待状态,mCompleted = true;就不会被执行到,那么mCompleted 的值就为false; 未完待续
新闻热点
疑难解答